Pandas read_csv() is the inbuilt function that is used to load CSV data or comma-separated values (csv) file into DataFrame. It also supports optionally iterating or breaking of the file into chunks. We can import pandas as pd in the program file and then use its functions to perform the required operations. If you want to open a CSV file in Pandas, you can use the pd.read_csv() function and pass the filepath to its parameter.
Steps to Load CSV Data in Pandas
Pandas DataFrame can be created using the pd.read_csv() function. For that, you need to follow the below steps.
Step 1: Prepare the CSV file.
Let’s create a file called data.csv and add the following data in that file.
Service,ShowName,Seasons Netflix,Stranger Things,3 Disney+,The Mandalorian,1 Hulu,Simpsons,31 Prime Video,Fleabag,2 AppleTV+,The Morning Show,1
The first line of a file is column names, and from the second line, there is data for each column.
Step 2: Create a program file and import pandas
If you have not installed the Pandas yet, then please install the library and create a file called app.py and add the below first line.
import pandas as pd
Now, we can use the Pandas read_csv() function and pass the local CSV file to that function.
Step 3: Use read_csv() function to load CSV file
The read_csv() function in Pandas takes many arguments. One required argument is either file local path or URL to the file path. The syntax of the function is the following.
pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar='”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
Okay, now let’s write read_csv() function to load csv file in our program and create a DataFrame.
# app.py import pandas as pd df = pd.read_csv('data.csv') print(df)
The data.csv file and app.py are in the same directory. So, we just wrote the file’s name, and then the function returns DataFrame of the CSV data.
Run the file and see the output.
Service ShowName Seasons 0 Netflix Stranger Things 3 1 Disney+ The Mandalorian 1 2 Hulu Simpsons 31 3 Prime Video Fleabag 2 4 AppleTV+ The Morning Show 1
Select a Subset of Columns in DataFrame
Now, what if you want to select the subset of columns from the CSV file?
For example, what if you want to select only the ShowName and Seasons columns.
See the following code.
import pandas as pd data = pd.read_csv('data.csv') df = pd.DataFrame(data, columns=['ShowName', 'Seasons']) print(df)
Output
ShowName Seasons 0 Stranger Things 3 1 The Mandalorian 1 2 Simpsons 31 3 Fleabag 2 4 The Morning Show 1
You will need to make sure that the column names specified in the code exactly matches with the column names within the CSV file. Otherwise, you will get the NaN values.
Load a csv while specifying “.” as missing values
See the following code.
import pandas as pd df = pd.read_csv('data.csv', na_values=['.']) frame = pd.isnull(df) print(frame)
Output
Service ShowName Seasons 0 False False False 1 False False False 2 False False False 3 False False False 4 False False False
Load a CSV in Pandas while skipping the top 2 rows
In this example, we will skip the first two rows while creating DataFrame from the CSV file.
import pandas as pd df = pd.read_csv('data.csv', skiprows=2) print(df)
Output
Disney+ The Mandalorian 1 0 Hulu Simpsons 31 1 Prime Video Fleabag 2 2 AppleTV+ The Morning Show 1
So, this is how you can load CSV in Pandas with different use cases.