Pandas where: Understanding pd.DataFrame.where()
Pandas have a few compelling data structures:
- A table with multiple columns is the DataFrame.
- A column of a DataFrame, or a list-like object, is called a Series.
A DataFrame is a table much like in SQL or Excel. It’s similar in structure, making it possible to use similar operations such as aggregation, filtering, and pivoting.
Pandas where
Pandas where() is a library function used to check the DataFrame for one or more conditions and returns the result. The pd.DataFrame.where() function accepts multiple arguments and returns the results based on the conditions.
By default, the rows not satisfying the condition are filled with NaN values. The difference between the numpy where and pandas where is that the DataFrame supplies the default values that the where() method is called.
Syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors=’raise’, try_cast=False, raise_on_error=None)
Parameters
cond: bool Series/DataFrame, array-like, or callable
If the where cond is True, keep the original value. If the where cond False, replace with the corresponding value from others. If cond is callable, it is computed on the Series/DataFrame and returns the boolean Series/DataFrame or array.
other: scalar, Series/DataFrame, or callable entries, where cond is False, are replaced with the corresponding value from other parameters. If the other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame.
inplace: bool, default False
Whether to operate in place on the data.
axis: int, default None
Alignment axis, if needed.
level: int, default None
Alignment level, if needed.
errors: str, {‘raise’, ‘ignore’}, default ‘raise.’
Note that currently, the errors parameter won’t affect the results and will always coerce to a suitable dtype.
- ‘raise’: allow exceptions to be raised.
- ‘ignore’: Suppress exceptions on error return the original object.
try_cast: bool, default False.
Try to cast a result back to the input type (if possible).
Return Value
It returns the Same type as the caller.
Implementing Pandas DataFrame where()
See the following code.
# app.py import pandas as pd data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'], 'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'], 'Season': [3, 12, 1, 2], 'Main Actor': ['Millie', 'Gillian', 'Pedro', 'Karl Urban']} df = pd.DataFrame.from_dict(datae # filtering data filter = df["Show"] == "Mandalorian" df.where(filter, inplace=True) print(df)
Output
Show Streaming Season Main Actor 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 Mandalorian Disney Plus 1.0 Pedro 3 NaN NaN NaN NaN
The main task of the where() method in the above example is to check the DataFrame for one condition, the “show = Mandalorian,” and it returned the output accordingly. By default, if the rows are not satisfying a condition, it is filled with NaN value.
In our result DataFrame, only the row with Mandalorian value got returned, and other values are NaN.
Pandas Series where
Let’s create a series using the Python range() function and use the where conditions to fetch the required values.
# app.py import pandas as pd s = pd.Series(range(5)) s.where(s > 2, inplace=True) print(s)
Output
0 NaN 1 NaN 2 NaN 3 3.0 4 4.0 dtype: float64
As shown in the output, every row which doesn’t satisfy value > 2 is replaced with NaN.
Pandas where: Applying multiple conditions
To apply multiple conditions in pandas where() method, use & operator between the conditions.
If one condition fails, it won’t return the actual value with DataFrame. See the following code.
# app.py import pandas as pd data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'], 'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'], 'Season': [3, 12, 1, 2], 'Main Actor': ['Millie', 'Gillian', 'Padro', 'Karl Urban']} df = pd.DataFrame.from_dict(data) # filtering data filter = df["Show"] == "Stranger Things" filter2 = df["Season"] == 3 df.where(filter & filter2, inplace=True) print(df)
Output
Show Streaming Season Main Actor 0 Stranger Things Netflix 3.0 Millie 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN
If one condition does not satisfy, it will return NaN values.
# app.py import pandas as pd data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'], 'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'], 'Season': [3, 12, 1, 2], 'Main Actor': ['Millie', 'Gillian', 'Padro', 'Karl Urban']} df = pd.DataFrame.from_dict(data) # filtering data filter = df["Show"] == "Stranger Things" filter2 = df["Season"] == 12 df.where(filter & filter2, inplace=True) print(df)
In this example, the first condition satisfies Show = Stranger Things, but in filter2, it does not satisfy the Stranger Things show has 12 seasons.
So, it won’t return the Stranger Things row and The X-Files row because then the first condition will not satisfy. So, it will return the NaN values.
Output
Show Streaming Season Main Actor 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN
You can see that due to one condition failing, it will return a DataFrame with NaN values.
That is it for Pandas DataFrame, where() function example.