Python Pandas read_csv() | How to Import CSV Data in Pandas
Pandas read_csv() is an inbuilt function that is used to import the data from a CSV file and analyze that data in Python. Python programming language is a great choice for doing the data analysis, primarily because of the great ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
You can find more about Dataframe here: Pandas DataFrame Example
For this example, I am using Jupyter Notebook. If you are new to Jupyter Notebook and do not know how to install in the local machine that I recommend you to check out my article Getting Started With Jupyter Notebook. It will guide you to install and up and running with Jupyter Notebook.
Pandas DataFrame read_csv Example
If we need to import the data to the Jupyter Notebook then first we need data. For that, I am using the following link to access the Olympics data.
Now, save that file in the CSV format inside the local project folder. I have saved that with a filename of the data.csv file.
Okay, now open the Jupyter Notebook and start working on the project.
Steps to import csv data in Pandas
Step 1: Import the Pandas module.
The first step is to import the Pandas module.
Write the following one line of code inside the First Notebook cell and run the cell.
import pandas as pd
It has successfully imported the pandas library to our project.
The next step is to use the read_csv function to read the csv file and display the content.
Step 2: Use read_csv function to display a content.
Pandas read_csv function has the following syntax.
pandas.read_csv('filename or filepath', ['dozens of optional parameters'])
The read_csv method has only one required parameter which is a filename, the other lots of parameters are optional and we will see some of them in this example.
Let’s write the following code in the next cell in Jupyter Notebook.
data = pd.read_csv('data.csv', skiprows=4)
Here, the first parameter is our file’s name, which is the Olympics data file.
The second argument is skiprows. It means that we will skip the first four rows of the file and then we will start reading that file.
Let’s see the content of the file by the following code. You need to add this code to the third cell in the notebook.
Just write the data and hit the Ctrl + Enter and you will see the output like the below image.
Step 3: Use head() and tail() in Python Pandas
Okay, So in the above step, we have imported so many rows. But there is a way that you can use to filter the data either first 5 rows or last 5 rows using the head() and tail() function.
Let’s see these functions in action.
Write the following code in the next cell of the notebook.
Now, run the cell and see the output below.
You can see that it has returned the first five rows of that CSV file.
Now, let’s print the last five rows using pandas tail() function.
See the output below.
Step 4: Load a CSV with no headers
We can load a CSV file with no header. Let’s see that in action.
Go to the second step and write the below code.
data = pd.read_csv('data.csv', skiprows=4, header=None) data
Here, we have added one parameter called header=None. Which means you will be no longer able to see the header. Now, run the code again and you will find the output like the below image.
Step 5: Load a CSV with specifying column names
In this case, we will only load a CSV with specifying column names. See the below code.
data = pd.read_csv('data.csv', names=['City', 'Edition', 'Sport', 'NOC', 'Gender', 'Medal']) data
The above code only returns the above-specified columns.
If you want to find more about pandas read_csv() function, then check out the original documentation.
Finally, how to import CSV data in Pandas example is over.