AppDividend
Latest Code Tutorials

Pandas Set Index Example | Python DataFrame.set_index()

0

Pandas Set Index Example | Python DataFrame.set_index() Tutorial is today’s topic. Pandas set_index() is a method to set the List, Series or Data frame as an index of a Data Frame. Pandas DataFrame is a 2-D labeled data structure with columns of a potentially different type. Pandas DataFrame is nothing but an in-memory representation of an excel sheet via Python programming language.

An index object is an immutable array. Indexing allows us to access a row or column using the label.

Pandas Set Index Example

Content Overview

The syntax for the Pandas Set Index is the following.

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields the new object.

keys: Column name or list of a column name.
drop: It’s a Boolean value which falls the column used for the index if True.
append: It appends the column to the existing index column if True.
inplace: It makes the changes in the DataFrame if True.
verify_integrity: It checks the new index column for duplicates if True.

We will use Real data that can be found on the following google doc link.

https://docs.google.com/spreadsheets/d/1zeeZQzFoHE2j_ZrqDkVJK9eF7OH1yvg75c8S-aBcxaU/edit#gid=0

Now, open the Jupyter Notebook and import the Pandas Library first.

Write the following code inside the first cell in Jupyter Notebook.

import pandas as pd

Run the cell by hitting Ctrl + Enter.

Okay, now we will use the  read_csv() function of the DataFrame data structure in Pandas. So write the following code in the next cell.

data = pd.read_csv('data.csv', skiprows=4)
data

So, we have used the read_csv() function and skipped the first four rows and then display the remaining rows. Run the cell and see the output. It will show the first 30 rows and the last 30 rows if there are so many rows. In our data file, there are above 29,000 rows. That is why we can see the first and last 30 rows.

Pandas Set Index Example | Python DataFrame.set_index() Tutorial

If you get the above output, then you have successfully imported the data.

Now, let’s see the type of index object.

Okay, in the next cell, type the following code to see the type of index object.

type(data.index)

See the below output.

Python DataFrame.set_index() Tutorial

Here you can see that the index has its type. 

Remember that the index data is immutable, and we can not be able to change that in any circumstances.

#Pandas DataFrame set_index()

Now, we will set an index for the Python DataFrame using the set_index() method.

There are two ways to set the DataFrame index.

  1. Use the parameter inplace=True to set the current DataFrame index.
  2. Assign the newly created DataFrame index to a variable and use that variable further to use the Indexed result.

Let’s see the first way. Let’s choose the Athlete as an index and set that column as an index.

Write the following code in the next cell and see the output.

data.set_index('Athlete',inplace=True)

Run the cell and now display the DataFrame using the following code in the next cell.

data

We can see that in the output that the DataFrame is indexed based on the Athlete Names.

Pandas DataFrame set_index() Example

Here, in the code, we have passed the inplace=True as a parameter which means that we are assigning the Athlete index to the current DataFrame.

#Reset Index in Pandas DataFrame

We can use the reset_index() function to reset the index. Let’s see the following code.

data.reset_index(inplace=True)
data

See the output below.

Reset Index in Pandas DataFrame

Now, see the second way to use the set_index() method.

Write the following code in the next cell.

indexedData = data.set_index('Athlete')
indexedData

See the below output.

Pandas Set Index Example | Python DataFrame.set_index() Tutorial For Beginners

Here, we can see that we have not passed the second parameter and also we have saved the data to the other variable and display that data into the Jupyter Notebook.

So, in this tutorial, we have seen both the methods to use any column as an index and also see how we can reset that index using the reset_index() method.

#Other Examples of Python Set Index

Python is an extraordinary language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. 

Python’s Pandas is one of those packages and makes importing and analyzing data much more comfortable.

Pandas set_index() is the method to set a List, Series or Data frame as an index of a Data Frame.

Index column can be set while making the data frame too. But sometimes the data frame is made out of two or more data frames, and hence later the index can be changed using the set_index() method.

>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
...                    'year': [2012, 2014, 2013, 2014],
...                    'sale': [55, 40, 84, 31]})
>>> df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Set the index to become the ‘month’ column:

>>> df.set_index('month')
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

Create the MultiIndex using columns ‘year’ and ‘month’:

>>> df.set_index(['year', 'month'])
            sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31

Create the MultiIndex using an Index and a column:

>>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
         month  sale
   year
1  2012  1      55
2  2014  4      40
3  2013  7      84
4  2014  10     31

Create a MultiIndex using two Series:

>>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
      month  year  sale
1 1       1  2012    55
2 4       4  2014    40
3 9       7  2013    84
4 16     10  2014    31

#Python Dataframe set_index not setting

Let’s say you have a data frame and trying to set the index to the column ‘Timestamp’.

Currently, the index is just a row number. For example, the Timestamp’s format is 2019-10-02 15:42:00.

You need to write the following code to set_index.

df.set_index('Timestamp', inplace=True, drop=True)

You need to either specify inplace=True or assign the result to the variable.

How to convert index of pandas dataframe into column

Let’s figure out how to convert an index of the data frame to a column.

From our example, let’s set index to the column sales. 

For that, we need to write the following code snippet.

df['sales'] = df.index

Or, we can reset_index().

df.reset_index(level=0, inplace=True)

#Pandas set index to multiple columns

In this example, two columns will be made as an index column.

The drop parameter is used to Drop the column, and the append parameter is used to append the passed columns to the already existing index column.

df.set_index(["Month", "Year"], inplace = True, 
                            append = True, drop = False)

With df.reset_index(level=df.index.names, inplace=True) one can convert a given whole multiindex into columns.

Finally, Pandas Set Index Example | Python DataFrame.set_index() Tutorial For Beginners is over.

Recommended Posts

Pandas Boolean Indexing Example

Pandas Series Sort_values Tutorial

Pandas Series value_counts Tutorial

Matplotlib Tutorial With Example

Python Scikit Learn Tutorial For Beginners

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.