AppDividend
Latest Code Tutorials

Python Pandas: Pivot Table Example | pandas.pivot_table

0

Pandas pivot table is used to reshape it in a way that makes it easier to understand or analyze. Often you will use a pivot to demonstrate the relationship between two columns that can be difficult to reason about before the pivot. Pandas pivot table creates a spreadsheet-style pivot table as the DataFrame. Levels in a pivot table will be stored in the MultiIndex objects (hierarchical indexes) on the index and columns of a result DataFrame.

You might be familiar with a concept of the pivot tables from Excel, where they had trademarked Name PivotTable.

That PivotTable tool enabled users to automatically sort, count, total, or average the data stored in one table. 

Content Overview

Pandas Pivot Table

Pivoting your data enables you to reshape it in such a way that it makes much easier to understand or analyze. Often, pivot tables are associated with Microsoft Excel. The reshaping power of pivot makes it much easier to understand relationships in your datasets.

Syntax

pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', 
                   fill_value=None, margins=False, dropna=True, 
                   margins_name='All', observed=False)

Parameters

data

It is the DataFrame.

values

column to aggregate, optional

index

It is a column, Grouper, array, or list of the previous

If the array is passed, it must be the same length as the data. The list contains any of the other types. The keys to the group by on the pivot table index. If the array is passed, it is being used in the same manner as column values.

columns

It is a column, Grouper, array, or list of the previous

If the array is passed, it must be the same length as data. The list contains any of the other data types (except list). The keys to the group by on the pivot table column. If the array is passed, it is being used in the same manner as column values.

aggfunc

It is a function, list of functions, dictionary, default numpy.mean().

If the list of functions passed, the resulting pivot table would have hierarchical columns whose top level are the method names (inferred from the function objects themselves) If the dict is given, a key is a column to aggregate and value is function or list of functions.

fill_value

It is a scalar, default None

Value to replace missing values with.

margins

It is a bool, default False.

It adds all row / columns (e.g. for subtotal / grand totals).

dropna

It is a bool, default True.

Do not include the columns whose entries are all NaN.

margins_name

It is a string, default ‘All’

It is the Name of the row/column that will contain the totals when the margin is True.

observed

It is a bool, default False

This argument only applies if any of the groupers are Categoricals.

If True, then only show observed values for categorical groupers.

If False then shows all values for categorical groupers. It changed in version 0.25.0.

Example of Pandas pivot table

Let’s create a DataFrame.

import pandas as pd

series = [('Stranger Things', 3, 'Millie', 16), ('Game of Thrones', 8, 'Emilia', 35),
          ('Westworld', 3, 'Evan', 35), ('La Casa De Papel', 4, 'Sergio', 44),
          ('Friends', 10, 'Jennifer', 51),
          ('The Big Bang Theory', 12, 'Jim Parsons', 40),
          ('Breaking Bad', 5, 'Brayan', 55), ('Better Call Saul', 6, 'Bob', 52)]

# Create a DataFrame object
dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor', 'Age'])
print(dfObj)

Output

pyt python3 app.py
                  Name  Seasons        Actor  Age
0      Stranger Things        3       Millie   16
1      Game of Thrones        8       Emilia   35
2            Westworld        3         Evan   35
3     La Casa De Papel        4       Sergio   44
4              Friends       10     Jennifer   51
5  The Big Bang Theory       12  Jim Parsons   40
6         Breaking Bad        5       Brayan   55
7     Better Call Saul        6          Bob   52

Now, let’s create a Pivot table from the above dataframe.

pivotTable = pd.pivot_table(dfObj, index=['Name', 'Actor'])

I use pivot to examine the Name of the show and its respective actor.

See the full code below.

import pandas as pd

series = [('Stranger Things', 3, 'Millie', 16), ('Game of Thrones', 8, 'Emilia', 35),
          ('Westworld', 3, 'Evan', 35), ('La Casa De Papel', 4, 'Sergio', 44),
          ('Friends', 10, 'Jennifer', 51),
          ('The Big Bang Theory', 12, 'Jim Parsons', 40),
          ('Breaking Bad', 5, 'Brayan', 55), ('Better Call Saul', 6, 'Bob', 52)]

# Create a DataFrame object
dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor', 'Age'])
print('Before Pivot')
print(dfObj)

pivotTable = pd.pivot_table(dfObj, index=['Name', 'Actor'])
print('After Pivot')
print(pivotTable)

Output

python3 app.py
Before Pivot
                  Name  Seasons        Actor  Age
0      Stranger Things        3       Millie   16
1      Game of Thrones        8       Emilia   35
2            Westworld        3         Evan   35
3     La Casa De Papel        4       Sergio   44
4              Friends       10     Jennifer   51
5  The Big Bang Theory       12  Jim Parsons   40
6         Breaking Bad        5       Brayan   55
7     Better Call Saul        6          Bob   52
After Pivot
                                 Age  Seasons
Name                Actor
Better Call Saul    Bob           52        6
Breaking Bad        Brayan        55        5
Friends             Jennifer      51       10
Game of Thrones     Emilia        35        8
La Casa De Papel    Sergio        44        4
Stranger Things     Millie        16        3
The Big Bang Theory Jim Parsons   40       12
Westworld           Evan          35        3

Example 2: Import DataSet using read_csv() method

In the above code example, we have created a Data using tuples.

Let’s take a real-world example. In the real world, all the external data might be in CSV files.

So, let’s direct use the pandas.read_csv() function to read the csv file and create a DataFrame from that csv data.

You can find that CSV dataset here.

I have downloaded and put it inside the project folder.

Now, write the following code.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
print(finalSet)

Output

python3 app.py
                                                 Country        Item Type Sales Channel Order Priority Order Date   Order ID  ... Units Sold  Unit Price  Unit Cost  Total Revenue  Total Cost  Total Profit
Region                                                                                                                        ...
Australia and Oceania                             Tuvalu        Baby Food       Offline              H  5/28/2010  669165933  ...       9925      255.28     159.42     2533654.00  1582243.50     951410.50
Central America and the Caribbean                Grenada           Cereal        Online              C  8/22/2012  963881480  ...       2804      205.70     117.11      576782.80   328376.44     248406.36
Europe                                            Russia  Office Supplies       Offline              L   5/2/2014  341417157  ...       1779      651.21     524.96     1158502.59   933903.84     224598.75
Sub-Saharan Africa                 Sao Tome and Principe           Fruits        Online              C  6/20/2014  514321792  ...       8102        9.33       6.92       75591.66    56065.84      19525.82
Sub-Saharan Africa                                Rwanda  Office Supplies       Offline              L   2/1/2013  115456712  ...       5062      651.21     524.96     3296425.02  2657347.52     639077.50
Australia and Oceania                    Solomon Islands        Baby Food        Online              C   2/4/2015  547995746  ...       2974      255.28     159.42      759202.72   474115.08     285087.64
Sub-Saharan Africa                                Angola        Household       Offline              M  4/23/2011  135425221  ...       4187      668.27     502.54     2798046.49  2104134.98     693911.51
Sub-Saharan Africa                          Burkina Faso       Vegetables        Online              H  7/17/2012  871543967  ...       8082      154.06      90.93     1245112.92   734896.26     510216.66
Sub-Saharan Africa                 Republic of the Congo    Personal Care       Offline              M  7/14/2015  770463311  ...       6070       81.73      56.67      496101.10   343986.90     152114.20
Sub-Saharan Africa                               Senegal           Cereal        Online              H  4/18/2014  616607081  ...       6593      205.70     117.11     1356180.10   772106.23     584073.87

We have taken just the first 10 rows from the 100 rows. The pandas.pd.head(n) function is used to select the first n number of rows.

Now, Let’s say that our goal is to determine the Total Units sold per Region.

Let’s create a pivot table for that.

Write the following code to find the total units sold per Region using a pivot table.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
pivotTable = pd.pivot_table(finalSet, index= 'Region', values= "Units Sold", aggfunc='sum')
print(pivotTable)

In the above example, we have passed data, index, values, and aggregate function.

We need to find the total number of units sold in each Region, that is why we have used sum as aggregate function.

Output

 python3 app.py
                                   Units Sold
Region
Australia and Oceania                   12899
Central America and the Caribbean        2804
Europe                                   1779
Sub-Saharan Africa                      38096

Hurray!! We have got the Pivot table based on Region and how many units they have sold in particular Region.

Remember, this above output is based on the first 10 rows and not complete 100 rows.

Scenario 2: Total sales by Item Type

See the following code.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
pivotTable = pd.pivot_table(finalSet, index= 'Item Type', values= "Units Sold", aggfunc='sum')
print(pivotTable)

Output

pyt python3 app.py
                 Units Sold
Item Type
Baby Food             12899
Cereal                 9397
Fruits                 8102
Household              4187
Office Supplies        6841
Personal Care          6070
Vegetables             8082

Creating a multi-index pivot table in Pandas

You may have used groupby() to achieve some of the pivot table functionality. However, the pivot_table() inbuilt function offers straightforward parameter names and default values that can help simplify complex procedures like multi-indexing.

To group the data by more than one column, all we have to do is pass in a list of column names. Let’s categorize the data by Order Priority and Item Type. The values will be Total Revenue.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
pivotTable = pd.pivot_table(finalSet, index= ['Item Type', 'Order Priority'], values= 'Total Revenue', aggfunc='sum')
print(pivotTable)

Output

python3 app.py
                                Total Revenue
Item Type       Order Priority
Baby Food       C                   759202.72
                H                  2533654.00
Cereal          C                   576782.80
                H                  1356180.10
Fruits          C                    75591.66
Household       M                  2798046.49
Office Supplies L                  4454927.61
Personal Care   M                   496101.10
Vegetables      H                  1245112.92

These examples also reveal where the pivot table got its Name from: it allows you to rotate or pivot the summary table, and this rotation gives us a different perspective of the data.

A perspective that can very well help you quickly gain valuable insights.

Conclusion

Pivot tables are traditionally associated with Excel. However, you can easily create the pivot table in Python using pandas.

You just saw how to create pivot tables across multiple scenarios. But the concepts reviewed here can be applied across a large number of different scenarios. It depends on how you want to analyze the large datasets.

You can find additional information about pivot tables by visiting the pandas documentation.

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.