Pandas DataFrame filter() method is used to subset rows or columns of a DataFrame based on specific label criteria.
Syntax
DataFrame.filter(items=None, like=None, regex=None, axis=None)
Parameters
Name | Description |
items | It is a list of labels to filter by; it returns columns or rows with these labels. |
like | A string to filter columns or rows based on a string contained in the label. |
regex | A string pattern filters labels that match the regex pattern. |
axis | It specifies whether to filter by labels in the index (0 or ‘index’) or columns (1 or ‘columns’). |
Return value
This method returns a DataFrame that is a subset of the original DataFrame based on the provided filtering criteria.
Important points
- Filtering axis: You can specify whether to filter by rows (axis=0) or columns (axis=1).
- Multiple criteria: Allows filtering using lists of labels, like column names.
- Regex support: Supports regular expressions for more complex filtering conditions.
- Flexibility: Provides a flexible way to look at a subset of data based on label criteria.
Example 1: Filtering columns by name
import pandas as pd
df = pd.DataFrame({'A1': range(1, 6), 'B1': range(11, 16), 'C2': range(21, 26)})
filtered_df = df.filter(items=['A1', 'C2'])
print(filtered_df)
Output
In this code, we created a new DataFrame filtered_df by selecting only the columns ‘A’ and ‘C’ from the original DataFrame df.
Example 2: Filtering columns using “like”
import pandas as pd
df = pd.DataFrame({'A1': range(1, 6), 'B1': range(11, 16), 'C2': range(21, 26)})
filtered_df = df.filter(like='1')
print(filtered_df)
Output
In this code, we filtered columns from the DataFrame df to create filtered_df, which includes only those columns whose names contain ‘1’.
Example 3: Filtering columns with regular expressions
import pandas as pd
df = pd.DataFrame({'A1': range(1, 6), 'B1': range(11, 16), 'C2': range(21, 26)})
filtered_df = df.filter(regex='[A-C]1')
print(filtered_df)
Output
We created filtered_df by filtering columns from df whose names match the regular expression [A-C]1 (i.e., names starting with ‘A’, ‘B’, or ‘C’ and ending with ‘1’).
Example 4: Filtering rows by index labels
import pandas as pd
df = pd.DataFrame({'A1': [1, 2, 3], 'B1':[4, 5, 6]},
index = ['row1', 'row2', 'row3'])
filtered_df = df.filter(items=['row1', 'row3'], axis=0)
print(filtered_df)
Output
We created filtered_df by selecting rows ‘row1’ and ‘row3’ from the DataFrame df.
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.