AppDividend
Latest Code Tutorials

Pandas Filter: DataFrame.filter() Function in Python

0

Pandas Dataframe.filter() is an inbuilt function that is used to subset columns or rows of DataFrame according to labels in the particular index. The DataFrame filter() returns subset the DataFrame rows or columns according to the detailed index labels. One thing to note that this routine does not filter a DataFrame on its contents. The filter() function is applied to the labels of the index.

How To Filter Pandas Dataframe

Python Pandas allows us to slice and dice the data in multiple ways. Often, you may want to subset a pandas dataframe based on one or more values of a specific column. Necessarily, we would like to select rows based on one value or multiple values present in a column.

To filter data in Pandas, we have the following options.

  1. Use Pandas filter() method.
  2. Use Pandas query() function
  3. Use Pandas DataFrame indices.

Syntax

DataFrame.filter(self: ~FrameOrSeries, items=None, like: Union[str, NoneType] = None, regex: Union[str, NoneType] = None, axis=None)

Parameters

items: list-like

Keep labels from the axis which are in items.

like: str

Keep labels from the axis for which “like in label == True”.

regex: str (regular expression)

Keep labels from axis for which re.search(regex, label) == True.

axis: {0 or ‘index’, 1 or ‘columns’, None}, default value is None.

The axis to filter on, expressed either as the index (int) or axis name (str).

By default, this is an info axis, ‘index’ for Series, ‘columns’ for DataFrame.

Example

Let’s use an external CSV file for this example. The file I am using is called People.csv file, and we will import the data using pandas read_csv() function.

Then we will create a DataFrame from the CSV data.

For this example, we only select the first 10 rows, so I have used DataFrame.head() function to limit the rows to 10.

Then we will use the filter() function to select the data based on labels.

See the below code.

# app.py

import pandas as pd

dt = pd.read_csv('people.csv')
df = pd.DataFrame(data=dt)
df10 = df.head(10)
print(df10)

Output

python3 app.py
   Name Sex  Age  Height  Weight
0  Alex   M   41      74     170
1  Bert   M   42      68     166
2  Carl   M   32      70     155
3  Dave   M   39      72     167
4  Elly   F   30      66     124
5  Fran   F   33      66     115
6  Gwen   F   26      64     121
7  Hank   M   30      71     158
8  Ivan   M   53      72     175
9  Jake   M   32      69     143

You can see that we have a total of 5 columns and 10 rows.

Now, we will select only Name, Height, and Weight using the Pandas filter() method.

# app.py

import pandas as pd

dt = pd.read_csv('people.csv')
df = pd.DataFrame(data=dt)
df10 = df.head(10)
print(df10)
dFilter = df10.filter(['Name', 'Height', 'Weight'])
print(dFilter)

Output

python3 app.py
   Name  Height  Weight
0  Alex      74     170
1  Bert      68     166
2  Carl      70     155
3  Dave      72     167
4  Elly      66     124
5  Fran      66     115
6  Gwen      64     121
7  Hank      71     158
8  Ivan      72     175
9  Jake      69     143

Pandas filter with Python regex

Let’s pass a regular expression parameter to the filter() function.

Python RegEx or Regular Expression is the sequence of characters that forms the search pattern.

Python RegEx can be used to check if the string contains the specified search pattern.

Let’s select columns by its name that contain ‘A’.

# app.py

import pandas as pd

dt = pd.read_csv('people.csv')
df = pd.DataFrame(data=dt)
df10 = df.head(10)
dFilter = df10.filter(regex ='[A]') 
print(dFilter)

Output

python3 app.py
   Age
0   41
1   42
2   32
3   39
4   30
5   33
6   26
7   30
8   53
9   32

The regular expression ‘[A]‘ looks for all column names, which has an ‘A’.

We have only one column that contains A; that is why it returns the Age column.

Pandas select rows in DataFrame using filter()

Let’s select data based on the index of the DataFrame.

# app.py

import pandas as pd

dt = pd.read_csv('people.csv')
df = pd.DataFrame(data=dt)
df10 = df.head(10)
dFilter = df10.filter(like='6', axis=0) 
print(dFilter)

In the above code, we are selecting that row whose index is 6.

In Pandas DataFrame, the index starts from 0. So 6 must be 7th index in the DataFrame.

python3 app.py
   Name Sex  Age  Height  Weight
6  Gwen   F   26      64     121

Pandas filter using df.query()

The filter() is not the only function we can use to filter the rows and columns.

Pandas DataFrame.query() is an inbuilt function that is useful to filter the rows.

See the following code.

# app.py

import pandas as pd

dt = pd.read_csv('people.csv')
df = pd.DataFrame(data=dt)
df10 = df.head(10)
dFilter = df10.query('Age>40')
print(dFilter)

Output

python3 app.py
   Name Sex  Age  Height  Weight
0  Alex   M   41      74     170
1  Bert   M   42      68     166
8  Ivan   M   53      72     175

In the above example, we are filtering the rows which have Age < 40. So we will get all the people who have Age > 40.

It is like SQL SELECT Query with WHERE Clause.

Pandas DataFrame filter multiple columns

We can filter multiple columns in Pandas DataFrame using & operator, don’t forget to wrap the sub-statements with().

See the following code.

# app.py

import pandas as pd

dt = pd.read_csv('people.csv')
df = pd.DataFrame(data=dt)
df10 = df.head(10)
dFilter = df10[(df10.Age >= 40) & (df10.Sex == 'M')]
print(dFilter)

Output

python3 app.py
   Name Sex  Age  Height  Weight
0  Alex   M   41      74     170
1  Bert   M   42      68     166
8  Ivan   M   53      72     175

In the above code, we are filtering the data based on two conditions.

  1. Age > 40
  2. Sex == M

If one then is False, then it filters out that data.

The returning data will satisfy our conditions.

Conclusion

We can filter Pandas DataFrame using df.filter(), df.query(), and df[] indices method. 

We can also filter multiple columns by using & operator.

See also

Pandas rename DataFrame column

Pandas assign()

Pandas iloc[]

Pandas Pivot Table

Pandas transpose()

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.