Pandas crosstab() function is used to compute a simple cross-tabulation (or contingency table) of two (or more) factors.
This method is helpful for summarizing data by grouping it into a table of frequencies.
This method is often used in conjunction with aggfunc to provide summarized statistical data.
Syntax
pandas.crosstab(index, columns, values=None, rownames=None,
colnames=None, aggfunc=None, margins=False, margins_name='All',
dropna=True, normalize=False)
Parameters
Name | Description |
index | Array-like values to group by in the rows. |
columns | Array-like values to group by in the columns. |
values | Array-like, optional, it is an array of values to aggregate according to the factors. |
rownames | Sequence, default None, must match the number of row arrays passed. |
colnames | Sequence, default None, must match the number of column arrays passed. |
aggfunc | Function, optional, used to aggregate values for each level of the cross-tabulation. |
margins | Boolean, default False, adds row/column margins (subtotals). |
margins_name | String, default ‘All’, name of the row/column that will contain the totals when margins is True. |
dropna | Boolean, default True, does not include columns whose entries are all NaN. |
normalize | Boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False, normalize over the total dataset or each level of row/column factors. |
Return value
It returns a DataFrame representing the cross-tabulation.
Example 1: Basic cross-tabulation
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
'B': ['one', 'one', 'two', 'three']})
print(pd.crosstab(df['A'], df['B']))
Output
In this code example, we computed the frequency of each combination of the values in columns ‘A’ and ‘B’. The result is a new DataFrame where:
- Each unique value from column ‘A’ becomes a row.
- Each unique value from column ‘B’ becomes a column.
- Each cell in the resulting DataFrame contains the count of occurrences of the corresponding combination of values from ‘A’ and ‘B’.
So, the output will be a table showing the count of each combination of ‘foo’ and ‘bar’ with ‘one’, ‘two’, and ‘three’.
Example 2: Adding margins
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
'B': ['one', 'one', 'two', 'three']})
print(pd.crosstab(df['A'], df['B'], margins=True))
Output
In this code example, we included subtotal rows and columns.
Example 3: Using Aggregation Function
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
'B': ['one', 'one', 'two', 'three'],
'C': np.random.randn(4)})
print(pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=sum))
Output
We aggregated the data using the sum() function.
Example 4: Normalizing the Data
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
'B': ['one', 'one', 'two', 'three'],
'C': np.random.randn(4)})
print(pd.crosstab(df['A'], df['B'], normalize='index'))
Output
In this code, we normalized the frequency table by row.
Example 5: Multiple indexes and columns
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
'B': ['one', 'one', 'two', 'three'],
'C': ['small', 'large', 'large', 'small'],
'D': [1, 2, 2, 3]})
print(pd.crosstab([df['A'], df['C']], df['B']))
Output
In this code, we created a cross-tabulation with multiple rows and columns grouping.