The apply() method from Pandas allows you to apply a function along an axis (axis=0 for columns, axis=1 for rows) of the DataFrame.
The axis argument determines whether the function is applied to columns (axis=0) or rows (axis=1).
This method is capable of handling different data types within the DataFrame.
Syntax
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
Parameters
Name | Description |
func | The function to apply to each column/row. |
axis | Axis along which the function is applied (0 for columns, 1 for rows). |
raw | It determines if rows or columns are passed as Series or ndarrays (False means Series). |
result_type | It determines the output type (expand, reduce, broadcast). |
args | Positional arguments to pass to func. |
**kwds | Additional keyword arguments to pass to func. |
Return value
It returns a Series or DataFrame, depending on the function applied and the result_type.
Example 1: Applying a function to each column
import pandas as pd
df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])
result = df.apply(lambda x: x.max() - x.min())
print(result)
Output
A 3
B 5
dtype: int64
The apply() method is used to apply a lambda function lambda x: x.max() – x.min(), which calculates the range (difference between the maximum and minimum values) of each column.
- For column ‘A’, the range is 4 – 1 = 3.
- For column ‘B’, the range is 9 – 4 = 5.
Example 2: Applying a function to each row
import pandas as pd
df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])
result = df.apply(lambda x: x['A'] + x['B'], axis=1)
print(result)
Output
0 13
1 5
dtype: int64
We calculated the sum of values row-wise by passing the “axis = 1” argument.
Example 3: Using another library function
Let’s use the numpy library’s sum() method.
import pandas as pd
import numpy as np
df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])
result = df.apply(np.sum)
print(result)
Output
A 5
B 13
dtype: int64
We applied a np.sum() method to calculate the sum of each column.
Example 4: Applying a user-defined function
import pandas as pd
df = pd.DataFrame([[4, 9], [1, 4]], columns=['A', 'B'])
def my_func(column):
return column.max() - column.min()
result = df.apply(my_func)
print(result)
Output
A 3
B 5
dtype: int64
Example 5: Specifying result_type
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2]})
result = df.apply(lambda x: [1, 2], axis=1, result_type='expand')
print(result)
Output
0 1
0 1 2
1 1 2
In this code snippet, a Pandas DataFrame df is transformed by applying a lambda function to each row, resulting in a new DataFrame result with two columns filled with the values [1, 2] for each row.
Important points
- Element-wise function application: For element-wise operation, applymap() on DataFrame and map() on Series are often more suitable.
- Efficiency considerations: For certain operations, vectorized operations using DataFrame methods are more efficient than using apply().
- Complex operations: This method is helpful for more complex operations that cannot be vectorized.
- Raw parameter: Setting raw=True can sometimes improve performance by passing ndarrays instead of Series to the function.
- Handling NaN values: The method helps handle NaN values in data cleaning processes, as you can apply functions that consider NaN values.
That’s all!
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.