AppDividend
Latest Code Tutorials

Pandas where: How to Use Pandas DataFrame where()

0

Pandas have a few compelling data structures:

  1. A table with multiple columns is the DataFrame.
  2. A column of a DataFrame, or a list-like object, is called a Series.

DataFrame is a table much like in SQL or Excel. It’s similar in structure, too, making it possible to use similar operations such as aggregation, filtering, and pivoting.

Pandas where

Pandas where() function is used to check the DataFrame for one or more conditions and return the result accordingly. By default, the rows not satisfying the condition are filled with NaN value. The difference between the numpy where and DataFrame where is that the DataFrame supplies the default values that the where() method is being called.

Syntax

DataFrame.where(cond, other=nan, inplace=False, 
                axis=None, level=None, errors=’raise’, 
                try_cast=False, raise_on_error=None)

Parameters

cond: bool Series/DataFrame, array-like, or callable

If the where cond is True, keep the original value. if the where cond False, replace with the corresponding value from others. If cond is callable, it is computed on the Series/DataFrame and returns the boolean Series/DataFrame or array.

other: scalar, Series/DataFrame, or callable

Entries, where cond is False, are replaced with the corresponding value from other parameters. If the other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. 

inplace: bool, default False

Whether to operate in place on the data.

axis: int, default None

Alignment axis, if needed.

level: int, default None

Alignment level, if needed.

errors: str, {‘raise’, ‘ignore’}, default ‘raise.’

Note that currently, the errors parameter won’t affect the results and will always coerce to a suitable dtype.

  1. raise’: Allow exceptions to be raised.
  2. ‘ignore’: Suppress exceptions on error return the original object.

try_cast: bool, default False.

Try to cast a result back to the input type (if possible).

Return Value

It returns the Same type as the caller.

Pandas DataFrame where() example

See the following code.

# app.py

import pandas as pd

data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'],
        'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'],
        'Season': [3, 12, 1, 2],
        'Main Actor': ['Millie', 'Gillian', 'Pedro', 'Karl Urban']}
df = pd.DataFrame.from_dict(datae

# filtering data
filter = df["Show"] == "Mandalorian"

df.where(filter, inplace=True)
print(df)

Output

          Show    Streaming  Season Main Actor
0          NaN          NaN     NaN        NaN
1          NaN          NaN     NaN        NaN
2  Mandalorian  Disney Plus     1.0      Pedro
3          NaN          NaN     NaN        NaN

The main task of the where() method in the above example is to check the DataFrame for one condition, which is the “show = Mandalorian,” and it returned the output accordingly. By default, if the rows are not satisfying a condition, it is filled with NaN value.

You can see that in our result DataFrame, only the row which has Mandalorian value got returned, and other values are NaN.

Pandas Series where

Let’s create a series using Python range() function and use the where conditions to fetch the required values.

# app.py

import pandas as pd

s = pd.Series(range(5))
s.where(s > 2, inplace=True)
print(s)

Output

0    NaN
1    NaN
2    NaN
3    3.0
4    4.0
dtype: float64

As shown in the output, every row which doesn’t satisfy value  > 2  is replaced with NaN.

Pandas where: Applying multiple conditions

To apply multiple conditions in pandas where() method, use & operator between the conditions.

If one condition fails, then it won’t return the actual value with DataFrame. See the following code.

# app.py

import pandas as pd

data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'],
        'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'],
        'Season': [3, 12, 1, 2],
        'Main Actor': ['Millie', 'Gillian', 'Padro', 'Karl Urban']}
df = pd.DataFrame.from_dict(data)

# filtering data
filter = df["Show"] == "Stranger Things"
filter2 = df["Season"] == 3

df.where(filter & filter2, inplace=True)
print(df)

Output

              Show Streaming  Season Main Actor
0  Stranger Things   Netflix     3.0     Millie
1              NaN       NaN     NaN        NaN
2              NaN       NaN     NaN        NaN
3              NaN       NaN     NaN        NaN

Now, if one condition does not satisfy, then it will return NaN values.

# app.py

import pandas as pd

data = {'Show': ['Stranger Things', 'The X-Files', 'Mandalorian', 'The Boys'],
        'Streaming': ['Netflix', 'Fx', 'Disney Plus', 'Amazon Prime'],
        'Season': [3, 12, 1, 2],
        'Main Actor': ['Millie', 'Gillian', 'Padro', 'Karl Urban']}
df = pd.DataFrame.from_dict(data)

# filtering data
filter = df["Show"] == "Stranger Things"
filter2 = df["Season"] == 12

df.where(filter & filter2, inplace=True)
print(df)

In this example, the first condition satisfies Show = Stranger Things, but in the filter2, it does not satisfy the Stranger Things show has 12 seasons.

So, it won’t return Stranger Things row as well as The X-Files row because then the first condition will not satisfy. So, it will return the NaN values.

Output

   Show   Streaming  Season   Main Actor
0  NaN       NaN     NaN        NaN
1  NaN       NaN     NaN        NaN
2  NaN       NaN     NaN        NaN
3  NaN       NaN     NaN        NaN

You can see that due to one condition fail, and it will return a DataFrame with NaN values.

That is it for Pandas DataFrame, where() function example.

See Also

Pandas mean

Pandas Drop column

Pandas DataFrame drop_duplicates()

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.