Python Pandas: How To Apply Formula To Entire Column and Row
Python pandas.apply() is a member function in Dataframe class to apply a function along the axis of the Dataframe. For example, along each row or column. Pandas DataFrame is the two-dimensional data structure; for example, the data is aligned in the tabular fashion in rows and columns. In this tutorial, we will see how to apply formula to entire column and row in Pandas with example.
How To Apply Formula To Entire Column and Row
Pandas.dataframe.apply() function is used to apply the function along the axis of a DataFrame. Objects passed to that function are Series objects whose index is either a DataFrame’s index (axis=0) or a DataFrame’s columns (axis=1).
By default (result_type=None), a final return type is inferred from a return type of the applied function. Otherwise, it depends on the result_type argument.
Syntax
DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
Parameters
- func: It is a function to be applied to each row or column. This function accepts the series and returns a series.
- axis: It is an axis along which the function is applied in the dataframe. The default value is 0.
- If a value is 0, then it applies a function to each column.
- If a value is 1, then it applies a function to each row.
- args: It can be tuple or list of arguments to pass to function.
Example of apply function to Pandas Dataframe
For this example, you should have installed a pandas library in your machine and make sure, and you defined the correct path; otherwise, you can’t resolve the pandas package correctly in your program.
Let’s define a dataframe with 3 columns and 5 rows.
See the following code.
import pandas as pd matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print(dfObj)
Output
(pythonenv) ➜ pyt python3 app.py x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 (pythonenv) ➜ pyt
So, we have defined 5 * 3 matrix(5 rows and 3 columns)
Apply lambda function to each row or each column in Dataframe
Python lambda or anonymous function is a kind of function that is defined without a name. While the standard functions are defined using a def keyword, in Python, the anonymous functions are defined using the lambda keyword.
Let’s say; we have the lambda function that accepts a series as argument returns the new series object by multiplying 11 in each value of the given series for example,
lambda a : a * 11
Okay, now let’s see how to apply the above lambda function to each row or column of our dataframe.
Python Pandas: Apply a lambda function to each column
We can apply the lambda a: a * 11 function to each column in the dataframe, pass the lambda function as only argument in Dataframe.apply() with the above created dataframe object.
See the following code.
import pandas as pd matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before Lambda Function applied') print(dfObj) print('------------------') # modify the dataframe by applying lambda function modDfObj = dfObj.apply(lambda a: a * 11) print('After Lambda Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before Lambda Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After Lambda Function applied x y z 0 121 231 209 1 242 462 418 2 363 693 627 3 484 924 836 4 605 1155 1045
As there were 3 columns in dataframe(x, y, z), so our lambda function is called three times, and for each call, the column will pass as an argument to the lambda function.
As our lambda function returns the copy of the series by infringement of the value of each element in the given column by 11. This returned series replaces the column in the copy of the dataframe.
So, Dataframe.apply() calls the passed lambda function for each column and pass the column contents as series to this lambda function.
Finally, it returns a modified copy of the dataframe constructed with columns returned by lambda functions, instead of altering the original dataframe.
Python Pandas: Apply a lambda function to each row
We can apply the lambda function to each row in the dataframe, pass the lambda function as the first argument and also pass axis=1 as the second argument in Dataframe.apply() with the above created dataframe object.
Let’s change our Lambda function to a: a + 2 and see the output.
import pandas as pd matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before Lambda Function applied') print(dfObj) print('------------------') # modify the dataframe by applying lambda function modDfObj = dfObj.apply(lambda a: a + 2, axis=1) print('After Lambda Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before Lambda Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After Lambda Function applied x y z 0 13 23 21 1 24 44 40 2 35 65 59 3 46 86 78 4 57 107 97
In the above example, Pandas Dataframe.apply() calls the passed lambda function for each row and gives each row contents as series to this lambda function.
Finally, it returns a modified copy of the dataframe constructed with rows returned by lambda functions, instead of altering the original dataframe.
Python pandas: Apply a numpy functions row or column
In real-world python applications, we apply already present numpy functions to columns and rows in the dataframe.
Let’s apply numpy.square() function to rows and columns of the dataframe.
See the following code.
import pandas as pd import numpy as np matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before Numpy square() Function applied') print(dfObj) print('------------------') # modify the dataframe by applying numpy square function modDfObj = dfObj.apply(np.square) print('After Numpy square() Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before Numpy square() Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After Numpy square() Function applied x y z 0 121 441 361 1 484 1764 1444 2 1089 3969 3249 3 1936 7056 5776 4 3025 11025 9025
We can also apply a numpy.square() function to each row instead of a column by passing an extra argument. See the following code.
import pandas as pd import numpy as np matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before Numpy sqrt() Function applied') print(dfObj) print('------------------') # modify the dataframe by applying numpy sqrt function modDfObj = dfObj.apply(np.sqrt, axis=1) print('After Numpy sqrt() Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before Numpy sqrt() Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After Numpy sqrt() Function applied x y z 0 3.316625 4.582576 4.358899 1 4.690416 6.480741 6.164414 2 5.744563 7.937254 7.549834 3 6.633250 9.165151 8.717798 4 7.416198 10.246951 9.746794
Python pandas: Apply a reduce functions row or column
Up to now, we have to apply a kind of function that accepts every column or row as series and returns the series of the same size.
But we can also call a function that accepts the series and returns the single variable instead of series.
For example, let’s apply numpy.sum() to each column in the dataframe to find out the sum of each value in each column.
See the following code.
import pandas as pd import numpy as np matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before Numpy sum() Reduce Function applied') print(dfObj) print('------------------') # modify the dataframe by applying sum reduce function modDfObj = dfObj.apply(np.sum) print('After Numpy sum() Reduce Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before Numpy sum() Reduce Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After Numpy sum() Reduce Function applied x 165 y 315 z 285 dtype: int64
Now let’s apply numpy.sum() to each row in dataframe to find out the sum of each value in each row.
import pandas as pd import numpy as np matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before Numpy sum() Reduce Function applied') print(dfObj) print('------------------') # modify the dataframe by applying sum reduce function modDfObj = dfObj.apply(np.sum, axis=1) print('After Numpy sum() Reduce Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before Numpy sum() Reduce Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After Numpy sum() Reduce Function applied 0 51 1 102 2 153 3 204 4 255 dtype: int64
Python Pandas: Apply a User Defined function
Let’s define a UDF(User defined function).
def subtractData(x): return x - 2
If you pass any arguments, then it will subtract 2 from that number and return it.
Now, let’s use this function in our DataFrame example.
import pandas as pd import numpy as np matrix = [(11, 21, 19), (22, 42, 38), (33, 63, 57), (44, 84, 76), (55, 105, 95)] # user-defined function def subtractData(x): return x - 2 # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz')) print('Before subtractData() Function applied') print(dfObj) print('------------------') # modify the dataframe by applying subtractData() function modDfObj = dfObj.apply(subtractData) print('After subtractData() Function applied') print(modDfObj)
Output
(pythonenv) ➜ pyt python3 app.py Before subtractData() Function applied x y z 0 11 21 19 1 22 42 38 2 33 63 57 3 44 84 76 4 55 105 95 ------------------ After subtractData() Function applied x y z 0 9 19 17 1 20 40 36 2 31 61 55 3 42 82 74 4 53 103 93
So, it will subtract 2 from every item of the matrix and return the modified DataFrame.
Finally, how to apply formula to entire column or row or whole dataframe in Pandas example is over.
See also
How to remove rows in Pandas DataFrame