Pandas DataFrame apply() Method

The apply() method from Pandas allows you to apply a function along an axis (axis=0 for columns, axis=1 for rows) of the DataFrame.

The axis argument determines whether the function is applied to columns (axis=0) or rows (axis=1).

This method is capable of handling different data types within the DataFrame.

Syntax

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

Parameters

Name Description
func The function to apply to each column/row.
axis Axis along which the function is applied (0 for columns, 1 for rows).
raw It determines if rows or columns are passed as Series or ndarrays (False means Series).
result_type It determines the output type (expand, reduce, broadcast).
args Positional arguments to pass to func.
**kwds Additional keyword arguments to pass to func.

Return value

It returns a Series or DataFrame, depending on the function applied and the result_type.

Example 1: Applying a function to each column

Applying a function to each column

import pandas as pd

df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])

result = df.apply(lambda x: x.max() - x.min())

print(result)

Output

A   3
B   5

dtype: int64

The apply() method is used to apply a lambda function lambda x: x.max() – x.min(), which calculates the range (difference between the maximum and minimum values) of each column.

  • For column ‘A’, the range is 4 – 1 = 3.
  • For column ‘B’, the range is 9 – 4 = 5.

Example 2: Applying a function to each row

Applying a function to each row using df.apply()

import pandas as pd

df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])

result = df.apply(lambda x: x['A'] + x['B'], axis=1)

print(result)

Output

0   13
1    5

dtype: int64

We calculated the sum of values row-wise by passing the “axis = 1” argument.

Example 3: Using another library function

Using another library function

Let’s use the numpy library’s sum() method.

import pandas as pd
import numpy as np

df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])

result = df.apply(np.sum)

print(result)

Output

A   5
B  13

dtype: int64

We applied a np.sum() method to calculate the sum of each column.

Example 4: Applying a user-defined function

import pandas as pd

df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])

def my_func(column):
  return column.max() - column.min()

result = df.apply(my_func)

print(result)

Output

A   3
B   5

dtype: int64

Example 5: Specifying result_type

Specifying result_type

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2]})

result = df.apply(lambda x: [1, 2], axis=1, result_type='expand')

print(result)

Output

   0  1
0  1  2
1  1  2

In this code snippet, a Pandas DataFrame df is transformed by applying a lambda function to each row, resulting in a new DataFrame result with two columns filled with the values [1, 2] for each row.

Important points

  1. Element-wise function application: For element-wise operation, applymap() on DataFrame and map() on Series are often more suitable.
  2. Efficiency considerations: For certain operations, vectorized operations using DataFrame methods are more efficient than using apply().
  3. Complex operations: This method is helpful for more complex operations that cannot be vectorized.
  4. Raw parameter: Setting raw=True can sometimes improve performance by passing ndarrays instead of Series to the function.
  5. Handling NaN values: The method helps handle NaN values in data cleaning processes, as you can apply functions that consider NaN values.

That’s all!

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.