Pandas DataFrame set_index() Method

The set_index() method in Pandas sets one or more columns of a DataFrame as the index. The method returns a new DataFrame with the specified columns set as the index without modifying the original DataFrame.

Syntax

DataFrame.set_index(keys, drop=True, append=False, 
                    inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default, it yields the new object.

Parameters

  1. keys: Column name or list of a column name.
  2. drop: A Boolean value falls into the column used for the index if True.
  3. append: It appends the column to the existing index column if True.
  4. inplace: It makes the changes in the DataFrame if True.
  5. verify_integrity: It checks the new index column for duplicates if True.

Example

import pandas as pd

# Create a sample DataFrame
data = {'ID': [101, 102, 103], 
        'Name': ['Krunal', 'Ankit', 'Rushabh'], 
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Set the 'ID' column as the index
df_indexed = df.set_index('ID')

print("\nDataFrame after setting the 'ID' column as the index:")
print(df_indexed)

Output

Original DataFrame:
   ID  Name   Age
0  101 Krunal  25
1  102 Ankit   30
2  103 Rushabh 35

DataFrame after setting the 'ID' column as the index:
     Name    Age
ID
101  Krunal   25
102  Ankit    30
103  Rushabh  35

In this code example, we have a DataFrame with three columns: ID, Name, and Age.

We used the set_index() method to set the ‘ID’ column as the index.

The result is a new DataFrame with the ‘ID’ column as the index, while the original DataFrame remains unchanged.

Set multiple columns as an index

You can set multiple columns as the index by passing a list of column names to the Pandas set_index() function.

import pandas as pd

# Create a sample DataFrame
data = {'ID': [101, 102, 103], 
        'Name': ['Alice', 'Bob', 'Charlie'], 
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Set the 'ID' and 'Name' columns as the index
df_multi_indexed = df.set_index(['ID', 'Name'])

print("\nDataFrame after setting the 'ID' and 'Name' columns as the index:")
print(df_multi_indexed)

Output

Original DataFrame:
   ID  Name   Age
0  101 Krunal  25
1  102 Ankit   30
2  103 Rushabh 35

DataFrame after setting the 'ID' and 'Name' columns as the index:
             Age
ID   Name
101  Krunal   25
102  Ankit    30
103  Rushabh  35

In the above code example, we set both the ‘ID’ and ‘Name’ columns as the index, creating a MultiIndex DataFrame.

Remember that if you want to modify the original DataFrame in place, you can set the inplace parameter to True:

df.set_index('ID', inplace=True)

That’s it.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.