Pandas pivot table is used to reshape it in a way that makes it easier to understand or analyze. Often you will use a pivot to demonstrate the relationship between two columns that can be difficult to reason about before the pivot. Pandas pivot table creates a spreadsheet-style pivot table as the DataFrame. Levels in a pivot table will be stored in the MultiIndex objects (hierarchical indexes) on the index and columns of a result DataFrame.
You might be familiar with a concept of the pivot tables from Excel, where they had trademarked Name PivotTable.
That PivotTable tool enabled users to automatically sort, count, total, or average the data stored in one table.
Pandas Pivot Table
Pivoting your data enables you to reshape it in such a way that it makes much easier to understand or analyze. Often, pivot tables are associated with Microsoft Excel. The reshaping power of pivot makes it much easier to understand relationships in your datasets.
Syntax
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False)
Parameters
data
It is the DataFrame.
values
column to aggregate, optional
index
It is a column, Grouper, array, or list of the previous
If the array is passed, it must be the same length as the data. The list contains any of the other types. The keys to the group by on the pivot table index. If the array is passed, it is being used in the same manner as column values.
columns
It is a column, Grouper, array, or list of the previous
If the array is passed, it must be the same length as data. The list contains any of the other data types (except list). The keys to the group by on the pivot table column. If the array is passed, it is being used in the same manner as column values.
aggfunc
It is a function, list of functions, dictionary, default numpy.mean().
If the list of functions passed, the resulting pivot table would have hierarchical columns whose top level are the method names (inferred from the function objects themselves) If the dict is given, a key is a column to aggregate and value is function or list of functions.
fill_value
It is a scalar, default None
Value to replace missing values with.
margins
It is a bool, default False.
It adds all row / columns (e.g. for subtotal / grand totals).
dropna
It is a bool, default True.
Do not include the columns whose entries are all NaN.
margins_name
It is a string, default ‘All’
It is the Name of the row/column that will contain the totals when the margin is True.
observed
It is a bool, default False
This argument only applies if any of the groupers are Categoricals.
If True, then only show observed values for categorical groupers.
If False then shows all values for categorical groupers. It changed in version 0.25.0.
Example of Pandas pivot table
Let’s create a DataFrame.
import pandas as pd series = [('Stranger Things', 3, 'Millie', 16), ('Game of Thrones', 8, 'Emilia', 35), ('Westworld', 3, 'Evan', 35), ('La Casa De Papel', 4, 'Sergio', 44), ('Friends', 10, 'Jennifer', 51), ('The Big Bang Theory', 12, 'Jim Parsons', 40), ('Breaking Bad', 5, 'Brayan', 55), ('Better Call Saul', 6, 'Bob', 52)] # Create a DataFrame object dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor', 'Age']) print(dfObj)
Output
pyt python3 app.py Name Seasons Actor Age 0 Stranger Things 3 Millie 16 1 Game of Thrones 8 Emilia 35 2 Westworld 3 Evan 35 3 La Casa De Papel 4 Sergio 44 4 Friends 10 Jennifer 51 5 The Big Bang Theory 12 Jim Parsons 40 6 Breaking Bad 5 Brayan 55 7 Better Call Saul 6 Bob 52
Now, let’s create a Pivot table from the above dataframe.
pivotTable = pd.pivot_table(dfObj, index=['Name', 'Actor'])
I use pivot to examine the Name of the show and its respective actor.
See the full code below.
import pandas as pd series = [('Stranger Things', 3, 'Millie', 16), ('Game of Thrones', 8, 'Emilia', 35), ('Westworld', 3, 'Evan', 35), ('La Casa De Papel', 4, 'Sergio', 44), ('Friends', 10, 'Jennifer', 51), ('The Big Bang Theory', 12, 'Jim Parsons', 40), ('Breaking Bad', 5, 'Brayan', 55), ('Better Call Saul', 6, 'Bob', 52)] # Create a DataFrame object dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor', 'Age']) print('Before Pivot') print(dfObj) pivotTable = pd.pivot_table(dfObj, index=['Name', 'Actor']) print('After Pivot') print(pivotTable)
Output
python3 app.py Before Pivot Name Seasons Actor Age 0 Stranger Things 3 Millie 16 1 Game of Thrones 8 Emilia 35 2 Westworld 3 Evan 35 3 La Casa De Papel 4 Sergio 44 4 Friends 10 Jennifer 51 5 The Big Bang Theory 12 Jim Parsons 40 6 Breaking Bad 5 Brayan 55 7 Better Call Saul 6 Bob 52 After Pivot Age Seasons Name Actor Better Call Saul Bob 52 6 Breaking Bad Brayan 55 5 Friends Jennifer 51 10 Game of Thrones Emilia 35 8 La Casa De Papel Sergio 44 4 Stranger Things Millie 16 3 The Big Bang Theory Jim Parsons 40 12 Westworld Evan 35 3
Example 2: Import DataSet using read_csv() method
In the above code example, we have created a Data using tuples.
Let’s take a real-world example. In the real world, all the external data might be in CSV files.
So, let’s direct use the pandas.read_csv() function to read the csv file and create a DataFrame from that csv data.
You can find that CSV dataset here.
I have downloaded and put it inside the project folder.
Now, write the following code.
# app.py import pandas as pd import numpy as np # reading the data data = pd.read_csv('100 Sales Records.csv', index_col=0) # diplay first 10 rows finalSet = data.head(10) print(finalSet)
Output
python3 app.py Country Item Type Sales Channel Order Priority Order Date Order ID ... Units Sold Unit Price Unit Cost Total Revenue Total Cost Total Profit Region ... Australia and Oceania Tuvalu Baby Food Offline H 5/28/2010 669165933 ... 9925 255.28 159.42 2533654.00 1582243.50 951410.50 Central America and the Caribbean Grenada Cereal Online C 8/22/2012 963881480 ... 2804 205.70 117.11 576782.80 328376.44 248406.36 Europe Russia Office Supplies Offline L 5/2/2014 341417157 ... 1779 651.21 524.96 1158502.59 933903.84 224598.75 Sub-Saharan Africa Sao Tome and Principe Fruits Online C 6/20/2014 514321792 ... 8102 9.33 6.92 75591.66 56065.84 19525.82 Sub-Saharan Africa Rwanda Office Supplies Offline L 2/1/2013 115456712 ... 5062 651.21 524.96 3296425.02 2657347.52 639077.50 Australia and Oceania Solomon Islands Baby Food Online C 2/4/2015 547995746 ... 2974 255.28 159.42 759202.72 474115.08 285087.64 Sub-Saharan Africa Angola Household Offline M 4/23/2011 135425221 ... 4187 668.27 502.54 2798046.49 2104134.98 693911.51 Sub-Saharan Africa Burkina Faso Vegetables Online H 7/17/2012 871543967 ... 8082 154.06 90.93 1245112.92 734896.26 510216.66 Sub-Saharan Africa Republic of the Congo Personal Care Offline M 7/14/2015 770463311 ... 6070 81.73 56.67 496101.10 343986.90 152114.20 Sub-Saharan Africa Senegal Cereal Online H 4/18/2014 616607081 ... 6593 205.70 117.11 1356180.10 772106.23 584073.87
We have taken just the first 10 rows from the 100 rows. The pandas.pd.head(n) function is used to select the first n number of rows.
Now, Let’s say that our goal is to determine the Total Units sold per Region.
Let’s create a pivot table for that.
Write the following code to find the total units sold per Region using a pivot table.
# app.py import pandas as pd import numpy as np # reading the data data = pd.read_csv('100 Sales Records.csv', index_col=0) # diplay first 10 rows finalSet = data.head(10) pivotTable = pd.pivot_table(finalSet, index= 'Region', values= "Units Sold", aggfunc='sum') print(pivotTable)
In the above example, we have passed data, index, values, and aggregate function.
We need to find the total number of units sold in each Region, that is why we have used sum as aggregate function.
Output
python3 app.py Units Sold Region Australia and Oceania 12899 Central America and the Caribbean 2804 Europe 1779 Sub-Saharan Africa 38096
Hurray!! We have got the Pivot table based on Region and how many units they have sold in particular Region.
Remember, this above output is based on the first 10 rows and not complete 100 rows.
Scenario 2: Total sales by Item Type
See the following code.
# app.py import pandas as pd import numpy as np # reading the data data = pd.read_csv('100 Sales Records.csv', index_col=0) # diplay first 10 rows finalSet = data.head(10) pivotTable = pd.pivot_table(finalSet, index= 'Item Type', values= "Units Sold", aggfunc='sum') print(pivotTable)
Output
pyt python3 app.py Units Sold Item Type Baby Food 12899 Cereal 9397 Fruits 8102 Household 4187 Office Supplies 6841 Personal Care 6070 Vegetables 8082
Creating a multi-index pivot table in Pandas
You may have used groupby() to achieve some of the pivot table functionality. However, the pivot_table() inbuilt function offers straightforward parameter names and default values that can help simplify complex procedures like multi-indexing.
To group the data by more than one column, all we have to do is pass in a list of column names. Let’s categorize the data by Order Priority and Item Type. The values will be Total Revenue.
# app.py import pandas as pd import numpy as np # reading the data data = pd.read_csv('100 Sales Records.csv', index_col=0) # diplay first 10 rows finalSet = data.head(10) pivotTable = pd.pivot_table(finalSet, index= ['Item Type', 'Order Priority'], values= 'Total Revenue', aggfunc='sum') print(pivotTable)
Output
python3 app.py Total Revenue Item Type Order Priority Baby Food C 759202.72 H 2533654.00 Cereal C 576782.80 H 1356180.10 Fruits C 75591.66 Household M 2798046.49 Office Supplies L 4454927.61 Personal Care M 496101.10 Vegetables H 1245112.92
These examples also reveal where the pivot table got its Name from: it allows you to rotate or pivot the summary table, and this rotation gives us a different perspective of the data.
A perspective that can very well help you quickly gain valuable insights.
Conclusion
Pivot tables are traditionally associated with Excel. However, you can easily create the pivot table in Python using pandas.
You just saw how to create pivot tables across multiple scenarios. But the concepts reviewed here can be applied across a large number of different scenarios. It depends on how you want to analyze the large datasets.
You can find additional information about pivot tables by visiting the pandas documentation.