AppDividend
Latest Code Tutorials

Pandas crosstab() Function Example in Python

0

Pandas crosstab() function is used to compute cross-tabulation of two or more factors. It is defined under the Pandas library. By default, it computes a frequency table of all the factors mentioned unless an array or list of values and aggregation functions are passed.

Syntax

pandas.crosstab(index, columns, values = None, rownames=None, 
colnames = None, aggfunc = None, margins = False, 
margins_name: str = ‘All’, dropna: bool = True, 
normalize = False) ->’DataFrame’ 

Parameters

The crosstab() method has the following parameters: 

  • index: It indicates the values to the group by in the rows. It takes array-like, series, list, or arrays/series.
  • columns:  It tells about the values to the group by in the columns. It takes array-like, series, list, or array/series.
  • values: It is an array of values to aggregate according to the factors. It requires aggfunc to be specified.
  • rownames: It is optional, and it should match the number of row arrays passed.
  • colnames: It is also optional, and it should match the number of column arrays passed.
  • aggfunc: It is an optional function. If this function is specified, it requires values to be defined as well.
  • margins: It takes Boolean values, and by default, it is set to False. It adds rows/column margins.
  • margins_name: It takes str values, and by default, it is set to “All”. It is the name of the row/column that will contain the totals when margins are True.
  • Dropna: It also takes a Boolean value and, by default, is set to True. It doesn’t include columns whose entries are all NaN.
  •  normalize: It can take boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, and by default, its value is False. It normalizes by dividing the values by the sum of values.
    • If passed ‘all’ or True, it will normalize overall values.
    • If passed ‘index’, it will normalize over each row.
    • If passed ‘columns’, it will normalize over each column.
    • If margins are True, it will also normalize margin values.

Return Value

The crosstab() function returns a DataFrame, which is the cross-tabulation of the data.

Example program on pandas.crosstab()

Write a program to show the working of pandas.crosstab().

import pandas as pd
import numpy as np

data1 = np.array(["a", "a", "a", "a", "b", "b",
                  "b", "b", "c", "c", "c"], dtype=object)
data2 = np.array(["1st", "1st", "1st", "2nd", "1st", "1st",
                  "1st", "2nd", "2nd", "2nd", "2nd"], dtype=object)
data3 = r = np.array(["x1", "x1", "y1", "x1", "x1", "y1",
                      "y1", "x1", "y1", "y1", "y1"],
                     dtype=object)

ctab = pd.crosstab(data1, [data2, data3], rownames=['p'], colnames=['q', 'r'])
print(ctab)

Output

q 1st    2nd
r  x1 y1  x1 y1
p
a   2  1   1  0
b   1  2   1  0
c   0  0   0  3

Here in the above example, we can see that we have performed the cross-tabulation based on more than two factors.

That is it for the Pandas crosstab() function.

See also

Pandas Pivot Table

Pandas filter()

Pandas assign new columns

Pandas to_json()

Replace NaN values with zeros

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.