AppDividend
Latest Code Tutorials

Pandas where: Understanding pd.DataFrame.where()

Pandas have a few compelling data structures:

  1. A table with multiple columns is the DataFrame.
  2. A column of a DataFrame, or a list-like object, is called a Series.

DataFrame is a table much like in SQL or Excel. It’s similar in structure, making it possible to use similar operations such as aggregation, filtering, and pivoting.

Pandas where

Pandas where() is a library function used to check the DataFrame for one or more conditions and returns the result. The pd.DataFrame.where() function accepts multiple arguments and returns the results based on the conditions.

By default, the rows not satisfying the condition are filled with NaN values. The difference between the numpy where and pandas where is that the DataFrame supplies the default values that the where() method is called.

Syntax

DataFrame.where(cond, other=nan, inplace=False, 
                axis=None, level=None, errors=’raise’, 
                try_cast=False, raise_on_error=None)

Parameters

cond: bool Series/DataFrame, array-like, or callable

If the where cond is True, keep the original value. If the where cond False, replace with the corresponding value from others. If cond is callable, it is computed on the Series/DataFrame and returns the boolean Series/DataFrame or array.

other: scalar, Series/DataFrame, or callable entries, where cond is False, are replaced with the corresponding value from other parameters. If the other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. 

inplace: bool, default False

Whether to operate in place on the data.

axis: int, default None

Alignment axis, if needed.

level: int, default None

Alignment level, if needed.

errors: str, {‘raise’, ‘ignore’}, default ‘raise.’

Note that currently, the errors parameter won’t affect the results and will always coerce to a suitable dtype.

  1. raise’: allow exceptions to be raised.
  2. ‘ignore’: Suppress exceptions on error return the original object.

try_cast: bool, default False.

Try to cast a result back to the input type (if possible).

Return Value

It returns the Same type as the caller.

Implementing Pandas DataFrame where()

See the following code.

# app.py

import pandas as pd

data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'],
        'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'],
        'Season': [3, 12, 1, 2],
        'Main Actor': ['Millie', 'Gillian', 'Pedro', 'Karl Urban']}
df = pd.DataFrame.from_dict(datae

# filtering data
filter = df["Show"] == "Mandalorian"

df.where(filter, inplace=True)
print(df)

Output

          Show    Streaming  Season Main Actor
0          NaN          NaN     NaN        NaN
1          NaN          NaN     NaN        NaN
2  Mandalorian  Disney Plus     1.0      Pedro
3          NaN          NaN     NaN        NaN

The main task of the where() method in the above example is to check the DataFrame for one condition, the “show = Mandalorian,” and it returned the output accordingly. By default, if the rows are not satisfying a condition, it is filled with NaN value.

In our result DataFrame, only the row with Mandalorian value got returned, and other values are NaN.

Pandas Series where

Let’s create a series using the Python range() function and use the where conditions to fetch the required values.

# app.py

import pandas as pd

s = pd.Series(range(5))
s.where(s > 2, inplace=True)
print(s)

Output

0    NaN
1    NaN
2    NaN
3    3.0
4    4.0
dtype: float64

As shown in the output, every row which doesn’t satisfy value  > 2  is replaced with NaN.

Pandas where: Applying multiple conditions

To apply multiple conditions in pandas where() method, use & operator between the conditions.

If one condition fails, it won’t return the actual value with DataFrame. See the following code.

# app.py

import pandas as pd

data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'],
        'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'],
        'Season': [3, 12, 1, 2],
        'Main Actor': ['Millie', 'Gillian', 'Padro', 'Karl Urban']}
df = pd.DataFrame.from_dict(data)

# filtering data
filter = df["Show"] == "Stranger Things"
filter2 = df["Season"] == 3

df.where(filter & filter2, inplace=True)
print(df)

Output

              Show Streaming  Season Main Actor
0  Stranger Things   Netflix     3.0     Millie
1              NaN       NaN     NaN        NaN
2              NaN       NaN     NaN        NaN
3              NaN       NaN     NaN        NaN

If one condition does not satisfy, it will return NaN values.

# app.py

import pandas as pd

data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'],
        'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'],
        'Season': [3, 12, 1, 2],
        'Main Actor': ['Millie', 'Gillian', 'Padro', 'Karl Urban']}
df = pd.DataFrame.from_dict(data)

# filtering data
filter = df["Show"] == "Stranger Things"
filter2 = df["Season"] == 12

df.where(filter & filter2, inplace=True)
print(df)

In this example, the first condition satisfies Show = Stranger Things, but in filter2, it does not satisfy the Stranger Things show has 12 seasons.

So, it won’t return the Stranger Things row and The X-Files row because then the first condition will not satisfy. So, it will return the NaN values.

Output

   Show   Streaming  Season   Main Actor
0  NaN       NaN     NaN        NaN
1  NaN       NaN     NaN        NaN
2  NaN       NaN     NaN        NaN
3  NaN       NaN     NaN        NaN

You can see that due to one condition failing, it will return a DataFrame with NaN values.

That is it for Pandas DataFrame, where() function example.

See Also

Pandas mean

Pandas Drop column

Pandas DataFrame drop_duplicates()

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.