Pandas DataFrame apply: The Complete Guide

0
138
Pandas DataFrame apply() Function Example

Objects passed to the apply() method are Series objects whose indexes are either DataFrame’s index, axis=0, or the DataFrame’s columns, axis=1.

Pandas DataFrame apply

Pandas DataFrame apply() is a library function that allows the users to pass a function and apply it to every value of the Series.

To apply a function to every row in a Pandas DataFrame, use the Pandas df.apply() function.

Syntax

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)

Parameters

The apply() method has the following parameters: 

  • func: It is the function to apply to each row or column.
  • axis: It takes integer values and can have values 0 and 1. Its default value is 0. 0 signifies index, and 1 signifies columns. It tells the axis along which the function is applied.
  • raw: It takes boolean values. Its default value is False. It determines if a row or column is passed as a Series or ndarray object. 
  • result_type: It can have 3 values ‘expand’, ‘reduce’, ‘broadcast,’ or be labeled as None. This function only acts when we work column-wise when axis=1.
    • expand‘ = In this list-like, results will be turned into columns.
    • reduce‘ = It returns a series if possible rather than expanding list-like results.
    • broadcast‘ = In this, results will be broadcast to the original shape of the DataFrame. The original index and columns will be retained in this case. The default value, which is none depends on the return value of the applied function.
  • args: It takes the form of a tuple. Positional arguments to pass to the method in addition to array/series.
  • **kwds: It is the additional keyword arguments to pass as keywords arguments to functions.

Return Value

The DataFrame apply() method returns a Series or DataFrame, which is the result of applying function along the given axis of the DataFrame.

Example program on pandas.apply()

Write a program to show the working of pandas.apply()

import numpy as np
import pandas as pd

df = pd.DataFrame([[1, 4], [9, 16], [25, 36]], columns=['1st', '2nd'])
print(df, '\n')
df2 = df.apply(np.sqrt)
print(df2)

Output

 1st  2nd
0    1    4
1    9   16
2   25   36

   1st  2nd
0  1.0  2.0
1  3.0  4.0
2  5.0  6.0

In the above code, we can see that we have created a DataFrame named data1 in which we’ve taken different values such as 1,4,9,16, and so on.

After that, we used the universal function np sqrt() in the apply method to reduce the DataFrame values to the square root of the inserted values(We can also use user-defined functions here in the apply() method). After that, we printed the DataFrame.

Example 2

In this example, we will add a new column called sum, which adds the values of the rows.

import numpy as np
import pandas as pd

df = pd.DataFrame([[1, 4], [9, 16], [25, 36]], columns=['1st', '2nd'])
print(df, '\n')
df['add'] = df.apply(np.sum, axis=1)

print('\nAfter Applying Function: ')

# printing the new dataframe
print(df)

Output

   1st  2nd
0    1    4
1    9   16
2   25   36


After Applying Function:
   1st  2nd  add
0    1    4    5
1    9   16   25
2   25   36   61

The output shows that the new column add has the sum of particular row values.

Apply lambda function to each row or each column in Dataframe

Python lambda or anonymous function is a method defined without the name. While the standard functions are defined using the def keyword and in Python, the anonymous functions are defined using a lambda keyword.

Let’s say; we have the lambda function that accepts a series as an argument and returns the new series object by multiplying 11 in each value of the given Series.

For example,

lambda a : a * 11

Okay, now let’s see how to apply the lambda function to each row or column of our DataFrame.

We can apply the lambda a: a * 11 function to each column in the DataFrame, pass the lambda function as the only argument in DataFrame.apply() with the above-created DataFrame object.

See the following code.

import pandas as pd

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Lambda Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying lambda function
modDfObj = dfObj.apply(lambda a: a * 11)
print('After Lambda Function applied')
print(modDfObj)

Output

Before Lambda Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Lambda Function applied
     x     y     z
0  121   231   209
1  242   462   418
2  363   693   627
3  484   924   836
4  605  1155  1045

Apply a lambda function to each row

To apply the lambda function to each row in DataFrame, pass the lambda function as the first and only argument in DataFrame.apply() with the above created DataFrame object.

Also, we have to pass axis = 1 as a parameter that indicates that the apply() function should be given to each row.

import pandas as pd

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Lambda Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying lambda function
modDfObj = dfObj.apply(lambda a: a * 11, axis=1)
print('After Lambda Function applied')
print(modDfObj)

Output

Before Lambda Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Lambda Function applied
     x     y     z
0  121   231   209
1  242   462   418
2  363   693   627
3  484   924   836
4  605  1155  1045

So, DataFrame.apply() calls the passed lambda method for each row and gives each row’s contents as a Series to this lambda function.

Finally, the apply() function returns the modified copy of the DataFrame constructed with rows returned by lambda functions instead of altering an original DataFrame.

Apply a User Defined function

Instead of we pass the lambda function, we will give the user-defined function in the apply() method, and it will return the output based on the logic of the user-defined function.

import pandas as pd


def sicmundus(x):
    return x + 33


matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before User defined Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying user defined function
modDfObj = dfObj.apply(sicmundus)
print('After User defined Function applied')
print(modDfObj)

Output

Before User defined Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After User defined Function applied
    x    y    z
0  44   54   52
1  55   75   71
2  66   96   90
3  77  117  109
4  88  138  128

In this example, we add 33 to all the DataFrame values using a User-defined function.

Conclusion

In this article, we have discussed how to apply a given lambda function or the user-defined function or numpy function to each row or column in a DataFrame.

That is for the Pandas DataFrame apply() function.

See also

How To Apply Formula To Entire Column And Row

Pandas DataFrame merge

Pandas DataFrame groupby()

Leave A Reply

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.