Pandas crosstab() Method

Pandas crosstab() function is used to compute a simple cross-tabulation (or contingency table) of two (or more) factors.

This method is helpful for summarizing data by grouping it into a table of frequencies.

This method is often used in conjunction with aggfunc to provide summarized statistical data.

Syntax

pandas.crosstab(index, columns, values=None, rownames=None,
                colnames=None, aggfunc=None, margins=False, margins_name='All',
                dropna=True, normalize=False)

Parameters

Name	Description
index	Array-like values to group by in the rows.
columns	Array-like values to group by in the columns.
values	Array-like, optional, it is an array of values to aggregate according to the factors.
rownames	Sequence, default None, must match the number of row arrays passed.
colnames	Sequence, default None, must match the number of column arrays passed.
aggfunc	Function, optional, used to aggregate values for each level of the cross-tabulation.
margins	Boolean, default False, adds row/column margins (subtotals).
margins_name	String, default ‘All’, name of the row/column that will contain the totals when margins is True.
dropna	Boolean, default True, does not include columns whose entries are all NaN.
normalize	Boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False, normalize over the total dataset or each level of row/column factors.

Return value

It returns a DataFrame representing the cross-tabulation.

Example 1: Basic cross-tabulation

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
                   'B': ['one', 'one', 'two', 'three']})

print(pd.crosstab(df['A'], df['B']))

Output

In this code example, we computed the frequency of each combination of the values in columns ‘A’ and ‘B’. The result is a new DataFrame where:

Each unique value from column ‘A’ becomes a row.
Each unique value from column ‘B’ becomes a column.
Each cell in the resulting DataFrame contains the count of occurrences of the corresponding combination of values from ‘A’ and ‘B’.

So, the output will be a table showing the count of each combination of ‘foo’ and ‘bar’ with ‘one’, ‘two’, and ‘three’.

Example 2: Adding margins

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
                   'B': ['one', 'one', 'two', 'three']})

print(pd.crosstab(df['A'], df['B'], margins=True))

Output

In this code example, we included subtotal rows and columns.

Example 3: Using Aggregation Function

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
                   'B': ['one', 'one', 'two', 'three'],
                   'C': np.random.randn(4)})

print(pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=sum))

Output

We aggregated the data using the sum() function.

Example 4: Normalizing the Data

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
                   'B': ['one', 'one', 'two', 'three'],
                   'C': np.random.randn(4)})

print(pd.crosstab(df['A'], df['B'], normalize='index'))

Output

In this code, we normalized the frequency table by row.

Example 5: Multiple indexes and columns

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
                   'B': ['one', 'one', 'two', 'three'],
                   'C': ['small', 'large', 'large', 'small'],
                   'D': [1, 2, 2, 3]})

print(pd.crosstab([df['A'], df['C']], df['B']))

Output

In this code, we created a cross-tabulation with multiple rows and columns grouping.

Post Views: 5

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.