How to Load CSV Data in Pandas Using read_csv()

Pandas read_csv() is the inbuilt function that is used to load CSV data or comma-separated values (csv) file into DataFrame. It also supports optionally iterating or breaking of the file into chunks. We can import pandas as pd in the program file and then use its functions to perform the required operations. If you want to open a CSV file in Pandas, you can use the pd.read_csv() function and pass the filepath to its parameter.

Steps to Load CSV Data in Pandas

Pandas DataFrame can be created using the pd.read_csv() function. For that, you need to follow the below steps.

Step 1: Prepare the CSV file.

Let’s create a file called data.csv and add the following data in that file.

Service,ShowName,Seasons
Netflix,Stranger Things,3
Disney+,The Mandalorian,1
Hulu,Simpsons,31
Prime Video,Fleabag,2
AppleTV+,The Morning Show,1

The first line of a file is column names, and from the second line, there is data for each column.

Step 2: Create a program file and import pandas

If you have not installed the Pandas yet, then please install the library and create a file called app.py and add the below first line.

import pandas as pd

Now, we can use the Pandas read_csv() function and pass the local CSV file to that function.

Step 3: Use read_csv() function to load CSV file

The read_csv() function in Pandas takes many arguments. One required argument is either file local path or URL to the file path. The syntax of the function is the following.

pd.read_csv(filepath_or_buffer, sep=’, ‘, 
delimiter=None, header=’infer’, names=None, index_col=None, 
usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, 
dtype=None, engine=None, converters=None, true_values=None, 
false_values=None, skipinitialspace=False, 
skiprows=None, nrows=None, na_values=None, 
keep_default_na=True, na_filter=True, 
verbose=False, skip_blank_lines=True, 
parse_dates=False, infer_datetime_format=False, 
keep_date_col=False, date_parser=None, dayfirst=False, 
iterator=False, chunksize=None, compression=’infer’, 
thousands=None, decimal=b’.’, lineterminator=None, 
quotechar='”‘, quoting=0, escapechar=None, 
comment=None, encoding=None, dialect=None, 
tupleize_cols=None, error_bad_lines=True, 
warn_bad_lines=True, skipfooter=0, 
doublequote=True, delim_whitespace=False, low_memory=True, 
memory_map=False, float_precision=None)

Okay, now let’s write read_csv() function to load csv file in our program and create a DataFrame.

# app.py

import pandas as pd

df = pd.read_csv('data.csv')
print(df)

The data.csv file and app.py are in the same directory. So, we just wrote the file’s name, and then the function returns DataFrame of the CSV data.

Run the file and see the output.

  Service          ShowName  Seasons
0      Netflix   Stranger Things        3
1      Disney+   The Mandalorian        1
2         Hulu          Simpsons       31
3  Prime Video           Fleabag        2
4     AppleTV+  The Morning Show        1

Select a Subset of Columns in DataFrame

Now, what if you want to select the subset of columns from the CSV file?

For example, what if you want to select only the ShowName and Seasons columns.

See the following code.

import pandas as pd

data = pd.read_csv('data.csv')
df = pd.DataFrame(data, columns=['ShowName', 'Seasons'])
print(df)

Output

            ShowName  Seasons
0   Stranger Things        3
1   The Mandalorian        1
2          Simpsons       31
3           Fleabag        2
4  The Morning Show        1

You will need to make sure that the column names specified in the code exactly matches with the column names within the CSV file. Otherwise, you will get the NaN values.

Load a csv while specifying “.” as missing values

See the following code.

import pandas as pd

df = pd.read_csv('data.csv', na_values=['.'])
frame = pd.isnull(df)
print(frame)

Output

 Service  ShowName  Seasons
0    False     False    False
1    False     False    False
2    False     False    False
3    False     False    False
4    False     False    False

Load a CSV in Pandas while skipping the top 2 rows

In this example, we will skip the first two rows while creating DataFrame from the CSV file.

import pandas as pd

df = pd.read_csv('data.csv', skiprows=2)
print(df)

Output

Disney+   The Mandalorian   1
0         Hulu          Simpsons  31
1  Prime Video           Fleabag   2
2     AppleTV+  The Morning Show   1

So, this is how you can load CSV in Pandas with different use cases.

See also

Pandas read_csv()

Pandas to_json()

Pandas column type

Replace NaN values with zeros

Pandas DataFrame join()

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.