Pandas DataFrame groupby() Method

Pandas groupby() method groups DataFrame or Series objects based on specific criteria. Therefore, it can be useful for performing aggregation and transformation operations on the grouped data. The method returns a GroupBy object, which can be used to apply various aggregation functions like sum(), mean(), count(), and many more.

Syntax

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, 
                sort=True, group_keys=True, squeeze=False, **kwargs)

Parameters

The groupby() function contains 7 parameters.

  1. by: It determines the groups for the groupby() function. Its default value is none. It is the mapping function.
  2. axis: It takes integer values; by default, it is 0.
  3. level: If the axis is a hierarchical MultiIndex, the grouping is done by a particular level or multiple levels.
  4. as_index: It is of the Boolean data type. We return the object with group labels as the index for aggregated output. It is only relevant for DataFrame input.
  5. sort: Sort group keys. We get better performance by turning this off. 
  6. group_keys: It is also of Boolean data type and has the value true by default. When calling apply, add group keys to the index to identify pieces.
  7. Squeeze: By default, it is also of the Boolean data type, False. It reduces the dimensionality of the return type if possible. Otherwise, it returns a consistent type.

Return Value

The groupby() function returns a groupby object that contains information about the different groups.

Example 1

import pandas as pd

dataset = {
 'Name': ['Rohit', 'Arun', 'Sohit', 'Arun', 'Shubh'],
 'Roll no': ['01', '02', '03', '04', '05'],
 'maths': ['93', '63', '74', '94', '83'],
 'science': ['88', '55', '66', '94', '35'],
 'english': ['93', '74', '84', '92', '87']}

df = pd.DataFrame(dataset)
by_name = df.groupby(['Name'])
print(by_name)

Output

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10e965250>

In the output, what is that DataFrameGroupBy thing? It is a .__str__() that doesn’t give you much information about what it is or how it works.

The DataFrameGroupBy object can be challenging to wrap your head around because it’s lazy. It doesn’t do any operations to produce a helpful result until you say so.

One term frequently used alongside the .groupby() method is split-apply-combine. This refers to the chain of the following three steps:

  1. First, split a DataFrame into groups.
  2. Apply some operations to each of those smaller DataFrames.
  3. Combine the results.

It can be challenging to inspect df.groupby(“Name”) because it does virtually nothing until you do something with a resulting object. Again, the Pandas GroupBy object is lazy. It delays almost any part of the split-apply-combine process until you call a method.

Example 2

import pandas as pd

dataset = {
  'Name': ['Rohit', 'Arun', 'Sohit', 'Arun', 'Shubh'],
  'Roll no': ['01', '02', '03', '04', '05'],
  'maths': ['93', '63', '74', '94', '83'],
  'science': ['88', '55'a '66', '94', '35'],
  'english': ['93', '74', '84', '92', '87']}

df = pd.DataFrame(dataset)
by_name = df.groupby(['Name'])

for Name, maths in by_name:
  print(f"First 2 entries for {Name!r}")
  print("------------------------")
  print(maths.head(2), end="\n\n")

Output

First 2 entries for 'Arun'
------------------------
   Name Roll no maths science english
1  Arun      02    63      55      74
3  Arun      04    94      94      92

First 2 entries for 'Rohit'
------------------------
    Name Roll no maths science english
0  Rohit      01    93      88      93

First 2 entries for 'Shubh'
------------------------
    Name Roll no maths science english
4  Shubh      05    83      35      87

First 2 entries for 'Sohit'
------------------------
    Name Roll no maths science english
2  Sohit      03    74      66      84

If you’re working on the difficult aggregation problem, iterating over a Pandas GroupBy object can be a considerable way to visualize a split part of split-apply-combine.

Very few other methods and properties let you look into the individual groups and their splits. For example, the .groups attribute will give you the dictionary of {group Name: group label} pairs.

That’s it.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.