Pandas DataFrame transform: The Complete Guide

Pandas DataFrame transform() method calls the function on itself, producing a DataFrame with transformed values that have the same axis length as of the initial DataFrame. The transform() function is super useful when you are looking to manipulate rows or columns. 

Pandas dataframe transform

Pandas DataFrame transform() is an inbuilt method that calls a function on self-producing a DataFrame with transformed values, and that has the same axis length as self. The transform is an operation used in conjunction with a groupby method(which is one of the most useful operations in pandas).

Almost, pandas users likely have used an aggregate, filter, or apply with groupby to summarize data. However, the transform() method is a little more challenging to understand, especially coming from an Excel world.

To import and read excel files in Python, use the Pandas read_excel() method. The read_excel() function is to read the excel sheet data into the DataFrame object. It is represented in the two-dimensional tabular view.

Pandas Transform vs. Pandas Aggregate

While aggregation must return a reduced version of the data, the transformation can return some transformed version of the full data to recombine.

For such a transformation, the output is the same shape as the input. The common example is to center the data by subtracting the group-wise mean.

Difference Between Apply And Transform Function

The apply() function sends a complete copy of the DataFrame to work upon so we can manipulate all the rows or columns simultaneously.

The transform() function manipulates a single row or column based on axis value and doesn’t manipulate the whole DataFrame. So, we can use either apply() or the transform() function depending on the requirement.

Let’s see the syntax of the df.transform() method.

Syntax

DataFrame.transform(func, axis=0, *args, **kwargs)

Parameters

It has four parameters, which are briefly defined below. 

  1. function: It is the function, string, list, or dictionary. It is the function which is used for transforming the data. 
  2. axis: It takes either 0 or 1. If 0 (also called ‘index’) they the function is applied to each column. If 1(also called ‘columns’), then the function is applied to each row.
  3. *args: It is the positional arguments that are passed to the functions.
  4. **kwargs: It’s the keyword arguments to pass to function.

Return Value

The transform() function returns a transformed DataFrame.

Example program on pandas.DataFrame.transform()

Write a program to show the working of pandas.DataFrame.transform().

import pandas as pd
df = pd.DataFrame({"A": [3, 4, 5, 6, 7],
                   "B": [8, 9, 10, 11, 12],
                   "C": [13, 64, 74, 23, 76],
                   "D": [53, 35, 64, 76, 85]})

print(df)
resultdf = df.transform(func=lambda x: x + 2)
print("\nDataFrame after being transformed:\n")
print("\n", resultdf)

Output

  A   B   C   D
0  3   8  13  53
1  4   9  64  35
2  5  10  74  64
3  6  11  23  76
4  7  12  76  85

DataFrame after being transformed:


    A   B   C   D
0  5  10  15  55
1  6  11  66  37
2  7  12  76  66
3  8  13  25  78
4  9  14  78  87

In the above code, we have seen that we have created a DataFrame, then Transformed the DataFrame by adding 2 to each element of the DataFrame and printed the transformed DataFrame.

Write a program to multiply each element of the DataFrame by 5 and then print the resulting DataFrame.

See the following code.

import pandas as pd

df = pd.DataFrame({"A": [3, 4, 5, 6, 7],
                   "B": [8, 9, 10, 11, 12],
                   "C": [13, 64, 74, 23, 76],
                   "D": [53, 35, 64, 76, 85]})

print(df)
resultdf = df.transform(func=lambda x: x*5)
print("\nDataFrame after being transformed:\n")
print("\n", resultdf)

Output

A   B   C   D
0  3   8  13  53
1  4   9  64  35
2  5  10  74  64
3  6  11  23  76
4  7  12  76  85

DataFrame after being transformed:


     A   B    C    D
0  15  40   65  265
1  20  45  320  175
2  25  50  370  320
3  30  55  115  380
4  35  60  380  425

In the above example, we have seen that we have created a DataFrame, then transformed the DataFrame by multiplying each element by 5 of the DataFrame and printed the transformed DataFrame.

Pandas DataFrame and Numpy

Let’s create a DataFrame from a numpy array and use the transform() function.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])

print(df)
resultdf = df.transform(func=lambda x: x*5)
print("\nDataFrame after being transformed:\n")
print("\n", resultdf)

Output

 a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

DataFrame after being transformed:


     a   b   c
0   5  10  15
1  20  25  30
2  35  40  45

Conclusion

The DataFrame.transform() function returns the self-produced DataFrame with transformed values after applying the function specified in its parameter. This output DataFrame has the same length as the passed DataFrame.

See also

Pandas DataFrame rank()

Pandas DataFrame merge()

Pandas DataFrame fillna()

Pandas DataFrame append()

Pandas DataFrame apply()

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.