AppDividend
Latest Code Tutorials

Python Pandas: How To Apply Formula To Entire Column and Row

Python pandas.apply() is a member function in Dataframe class to apply a function along the axis of the Dataframe. For example, along each row or column. Pandas DataFrame is the two-dimensional data structure; for example, the data is aligned in the tabular fashion in rows and columns. In this tutorial, we will see how to apply formula to entire column and row in Pandas with example.

How To Apply Formula To Entire Column and Row

Pandas.dataframe.apply() function is used to apply the function along the axis of a DataFrame. Objects passed to that function are Series objects whose index is either a DataFrame’s index (axis=0) or a DataFrame’s columns (axis=1).

By default (result_type=None), a final return type is inferred from a return type of the applied function. Otherwise, it depends on the result_type argument.

Syntax

DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)

Parameters

  • func: It is a function to be applied to each row or column. This function accepts the series and returns a series.
  • axis: It is an axis along which the function is applied in the dataframe. The default value is 0.
    • If a value is 0, then it applies a function to each column.
    • If a value is 1, then it applies a function to each row.
  • args: It can be tuple or list of arguments to pass to function.

Example of apply function to Pandas Dataframe

For this example, you should have installed a pandas library in your machine and make sure, and you defined the correct path; otherwise, you can’t resolve the pandas package correctly in your program.

Let’s define a dataframe with 3 columns and 5 rows.

See the following code.

import pandas as pd

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print(dfObj)

Output

(pythonenv) ➜  pyt python3 app.py
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
(pythonenv) ➜  pyt

So, we have defined 5 * 3 matrix(5 rows and 3 columns)

Apply lambda function to each row or each column in Dataframe

Python lambda or anonymous function is a kind of function that is defined without a name. While the standard functions are defined using a def keyword, in Python, the anonymous functions are defined using the lambda keyword.

Let’s say; we have the lambda function that accepts a series as argument returns the new series object by multiplying 11 in each value of the given series for example,

lambda a : a * 11

Okay, now let’s see how to apply the above lambda function to each row or column of our dataframe.

Python Pandas: Apply a lambda function to each column

We can apply the lambda a: a * 11 function to each column in the dataframe, pass the lambda function as only argument in Dataframe.apply() with the above created dataframe object.

See the following code.

import pandas as pd

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Lambda Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying lambda function
modDfObj = dfObj.apply(lambda a: a * 11)
print('After Lambda Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before Lambda Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Lambda Function applied
     x     y     z
0  121   231   209
1  242   462   418
2  363   693   627
3  484   924   836
4  605  1155  1045

As there were 3 columns in dataframe(x, y, z), so our lambda function is called three times, and for each call, the column will pass as an argument to the lambda function.

As our lambda function returns the copy of the series by infringement of the value of each element in the given column by 11. This returned series replaces the column in the copy of the dataframe.

So, Dataframe.apply() calls the passed lambda function for each column and pass the column contents as series to this lambda function.

Finally, it returns a modified copy of the dataframe constructed with columns returned by lambda functions, instead of altering the original dataframe.

Python Pandas: Apply a lambda function to each row

We can apply the lambda function to each row in the dataframe, pass the lambda function as the first argument and also pass axis=1 as the second argument in Dataframe.apply() with the above created dataframe object.

Let’s change our Lambda function to a: a + 2 and see the output.

import pandas as pd

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Lambda Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying lambda function
modDfObj = dfObj.apply(lambda a: a + 2, axis=1)
print('After Lambda Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before Lambda Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Lambda Function applied
    x    y   z
0  13   23  21
1  24   44  40
2  35   65  59
3  46   86  78
4  57  107  97

In the above example, Pandas Dataframe.apply() calls the passed lambda function for each row and gives each row contents as series to this lambda function. 

Finally, it returns a modified copy of the dataframe constructed with rows returned by lambda functions, instead of altering the original dataframe.

Python pandas: Apply a numpy functions row or column

In real-world python applications, we apply already present numpy functions to columns and rows in the dataframe.

Let’s apply numpy.square() function to rows and columns of the dataframe.

See the following code.

import pandas as pd
import numpy as np

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Numpy square() Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying numpy square function
modDfObj = dfObj.apply(np.square)
print('After Numpy square() Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before Numpy square() Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Numpy square() Function applied
      x      y     z
0   121    441   361
1   484   1764  1444
2  1089   3969  3249
3  1936   7056  5776
4  3025  11025  9025

We can also apply a numpy.square() function to each row instead of a column by passing an extra argument. See the following code.

import pandas as pd
import numpy as np

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Numpy sqrt() Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying numpy sqrt function
modDfObj = dfObj.apply(np.sqrt, axis=1)
print('After Numpy sqrt() Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before Numpy sqrt() Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Numpy sqrt() Function applied
          x          y         z
0  3.316625   4.582576  4.358899
1  4.690416   6.480741  6.164414
2  5.744563   7.937254  7.549834
3  6.633250   9.165151  8.717798
4  7.416198  10.246951  9.746794

Python pandas: Apply a reduce functions row or column

Up to now, we have to apply a kind of function that accepts every column or row as series and returns the series of the same size.

But we can also call a function that accepts the series and returns the single variable instead of series.

For example, let’s apply numpy.sum() to each column in the dataframe to find out the sum of each value in each column.

See the following code.

import pandas as pd
import numpy as np

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Numpy sum() Reduce Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying sum reduce function
modDfObj = dfObj.apply(np.sum)
print('After Numpy sum() Reduce Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before Numpy sum() Reduce Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Numpy sum() Reduce Function applied
x    165
y    315
z    285
dtype: int64

Now let’s apply numpy.sum() to each row in dataframe to find out the sum of each value in each row.

import pandas as pd
import numpy as np

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before Numpy sum() Reduce Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying sum reduce function
modDfObj = dfObj.apply(np.sum, axis=1)
print('After Numpy sum() Reduce Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before Numpy sum() Reduce Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After Numpy sum() Reduce Function applied
0     51
1    102
2    153
3    204
4    255
dtype: int64

Python Pandas: Apply a User Defined function

Let’s define a UDF(User defined function).

def subtractData(x):
   return x - 2

If you pass any arguments, then it will subtract 2 from that number and return it.

Now, let’s use this function in our DataFrame example.

import pandas as pd
import numpy as np

matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76),
          (55, 105, 95)]

# user-defined function
def subtractData(x):
    return x - 2


# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'))
print('Before subtractData() Function applied')
print(dfObj)
print('------------------')

# modify the dataframe by applying subtractData() function
modDfObj = dfObj.apply(subtractData)
print('After subtractData() Function applied')
print(modDfObj)

Output

(pythonenv) ➜  pyt python3 app.py
Before subtractData() Function applied
    x    y   z
0  11   21  19
1  22   42  38
2  33   63  57
3  44   84  76
4  55  105  95
------------------
After subtractData() Function applied
    x    y   z
0   9   19  17
1  20   40  36
2  31   61  55
3  42   82  74
4  53  103  93

So, it will subtract 2 from every item of the matrix and return the modified DataFrame.

Finally, how to apply formula to entire column or row or whole dataframe in Pandas example is over.

See also

How to remove rows in Pandas DataFrame

How to add rows in Pandas DataFrame

Pandas set_index()

Pandas value_counts()

Pandas boolean_indexing()

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.