The set_index() method in Pandas sets one or more columns of a DataFrame as the index. The method returns a new DataFrame with the specified columns set as the index without modifying the original DataFrame.
Syntax
DataFrame.set_index(keys, drop=True, append=False,
inplace=False, verify_integrity=False)
Set the DataFrame index (row labels) using one or more existing columns. By default, it yields the new object.
Parameters
- keys: Column name or list of a column name.
- drop: A Boolean value falls into the column used for the index if True.
- append: It appends the column to the existing index column if True.
- inplace: It makes the changes in the DataFrame if True.
- verify_integrity: It checks the new index column for duplicates if True.
Example
import pandas as pd
# Create a sample DataFrame
data = {'ID': [101, 102, 103],
'Name': ['Krunal', 'Ankit', 'Rushabh'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Set the 'ID' column as the index
df_indexed = df.set_index('ID')
print("\nDataFrame after setting the 'ID' column as the index:")
print(df_indexed)
Output
Original DataFrame:
ID Name Age
0 101 Krunal 25
1 102 Ankit 30
2 103 Rushabh 35
DataFrame after setting the 'ID' column as the index:
Name Age
ID
101 Krunal 25
102 Ankit 30
103 Rushabh 35
In this code example, we have a DataFrame with three columns: ID, Name, and Age.
We used the set_index() method to set the ‘ID’ column as the index.
The result is a new DataFrame with the ‘ID’ column as the index, while the original DataFrame remains unchanged.
Set multiple columns as an index
You can set multiple columns as the index by passing a list of column names to the Pandas set_index() function.
import pandas as pd
# Create a sample DataFrame
data = {'ID': [101, 102, 103],
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Set the 'ID' and 'Name' columns as the index
df_multi_indexed = df.set_index(['ID', 'Name'])
print("\nDataFrame after setting the 'ID' and 'Name' columns as the index:")
print(df_multi_indexed)
Output
Original DataFrame:
ID Name Age
0 101 Krunal 25
1 102 Ankit 30
2 103 Rushabh 35
DataFrame after setting the 'ID' and 'Name' columns as the index:
Age
ID Name
101 Krunal 25
102 Ankit 30
103 Rushabh 35
In the above code example, we set both the ‘ID’ and ‘Name’ columns as the index, creating a MultiIndex DataFrame.
Remember that if you want to modify the original DataFrame in place, you can set the inplace parameter to True:
df.set_index('ID', inplace=True)
That’s it.