AppDividend
Latest Code Tutorials

Python Pandas iloc | How To Select Data in Pandas Using iloc

0

Pandas.DataFrame.iloc is a unique inbuilt method that returns integer-location based indexing for selection by position. Pandas Dataframe.iloc[] function is used when an index label of the data frame is something other than the numeric series of 0, 1, 2, 3….n, or in some scenario, the user doesn’t know the index label.

Rows can be extracted using the imaginary index position, which isn’t visible in the DataFrame.

Content Overview

Pandas iloc

DataFrame.iloc[] method provides a way to select the DataFrame rows. The iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Pandas.DataFrame.iloc will raise an IndexError if the requested indexer is out-of-bounds, except slice indexers, which allow the out-of-bounds indexing.

Syntax

pandas.DataFrame.iloc[row, column]

Allowed inputs are:

  1. The integers, e.g., 5.
  2. The list or array of integers, e.g., [4, 3, 0].
  3. The slice object with ints, e.g., 1:7.
  4. The boolean array.
  5. The callable function with an argument (the calling Series or DataFrame) and it returns valid output for indexing. This is very useful in method chains, when you don’t have the reference to the calling object, but would like to base your selection on some logic or value.

There are two “arguments” to iloc:

  1. A row selector.
  2. A column selector.

For example.

# Single selections using iloc and DataFrame
# Rows:
data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.
data.iloc[1] # second row of data frame (Evan Zigomalas)
data.iloc[-1] # last row of data frame (Mi Richan)
# Columns:
data.iloc[:,0] # first column of data frame (first_name)
data.iloc[:,1] # second column of data frame (last_name)
data.iloc[:,-1] # last column of data frame (id)

Multiple columns and rows can be selected using the .iloc

# Multiple row and column selections using iloc and DataFrame
data.iloc[0:5] # first five rows of dataframe
data.iloc[:, 0:2] # first two columns of data frame with all rows
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.
data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1).

Example of iloc[]

In this example, we will use an external CSV file. We import the CSV file and read the file using the pandas read_csv() method.

You can download the CSV file from here.

Now, we will use the first 10 records of the CSV file in this example.

Then we will select the DataFrame rows using pandas.DataFrame.iloc[] method.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
df = pd.DataFrame(finalSet)
print(df)

Output

python3 app.py
                                                 Country        Item Type Sales Channel Order Priority Order Date   Order ID  ... Units Sold  Unit Price  Unit Cost  Total Revenue  Total Cost  Total Profit
Region                                                                                                                        ...
Australia and Oceania                             Tuvalu        Baby Food       Offline              H  5/28/2010  669165933  ...       9925      255.28     159.42     2533654.00  1582243.50     951410.50
Central America and the Caribbean                Grenada           Cereal        Online              C  8/22/2012  963881480  ...       2804      205.70     117.11      576782.80   328376.44     248406.36
Europe                                            Russia  Office Supplies       Offline              L   5/2/2014  341417157  ...       1779      651.21     524.96     1158502.59   933903.84     224598.75
Sub-Saharan Africa                 Sao Tome and Principe           Fruits        Online              C  6/20/2014  514321792  ...       8102        9.33       6.92       75591.66    56065.84      19525.82
Sub-Saharan Africa                                Rwanda  Office Supplies       Offline              L   2/1/2013  115456712  ...       5062      651.21     524.96     3296425.02  2657347.52     639077.50
Australia and Oceania                    Solomon Islands        Baby Food        Online              C   2/4/2015  547995746  ...       2974      255.28     159.42      759202.72   474115.08     285087.64
Sub-Saharan Africa                                Angola        Household       Offline              M  4/23/2011  135425221  ...       4187      668.27     502.54     2798046.49  2104134.98     693911.51
Sub-Saharan Africa                          Burkina Faso       Vegetables        Online              H  7/17/2012  871543967  ...       8082      154.06      90.93     1245112.92   734896.26     510216.66
Sub-Saharan Africa                 Republic of the Congo    Personal Care       Offline              M  7/14/2015  770463311  ...       6070       81.73      56.67      496101.10   343986.90     152114.20
Sub-Saharan Africa                               Senegal           Cereal        Online              H  4/18/2014  616607081  ...       6593      205.70     117.11     1356180.10   772106.23     584073.87

[10 rows x 13 columns]

Now, let’s select the first row of the DataFrame using iloc[0].

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
df = pd.DataFrame(finalSet)
print(df.iloc[0])

Output

python3 app.py
Country                Tuvalu
Item Type           Baby Food
Sales Channel         Offline
Order Priority              H
Order Date          5/28/2010
Order ID            669165933
Ship Date           6/27/2010
Units Sold               9925
Unit Price             255.28
Unit Cost              159.42
Total Revenue     2.53365e+06
Total Cost        1.58224e+06
Total Profit           951410
Name: Australia and Oceania, dtype: object

Pandas iloc: pass row index and column index

Let’s pass the row index and column index in the iloc[] method. In the output, we will get a particular value from the DataFrame. See the below code.

# app.py

import pandas as pd
import numpy as np

# reading the data
series = [('Stranger Things', 3, 'Millie'),
          ('Game of Thrones', 8, 'Emilia'), ('La Casa De Papel', 4, 'Sergio'),
          ('Westworld', 3, 'Evan Rachel'), ('Stranger Things', 3, 'Millie'),
         ('La Casa De Papel', 4, 'Sergio')]

# Create a DataFrame object
dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor'])

df = pd.DataFrame(dfObj)
print(df.iloc[4, 2])

Output

pyt python3 app.py
Millie
(pythonenv) ➜  pyt

In the above example, it will select the value which is in the 4th row and 2nd column. 

Remember DataFrame row and column index starts from 0.

In the output, we will get the Millie because 4th row is Stranger Things, 3, Millie and 2nd column is Millie.

How to select multiple rows with index in Pandas

In the following code example, multiple rows are extracted first by passing a list and then bypassing integers to fetch rows between that range.

See the following code.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
df = pd.DataFrame(finalSet)
print(df.iloc[[2, 4, 6, 8]])

In the above code, we have passed the list of an index as an argument to the iloc[].

Output

python3 app.py
                                  Country        Item Type Sales Channel Order Priority Order Date   Order ID  Ship Date  Units Sold  Unit Price  Unit Cost  Total Revenue  Total Cost  Total Profit
Region
Europe                             Russia  Office Supplies       Offline              L   5/2/2014  341417157   5/8/2014        1779      651.21     524.96     1158502.59   933903.84     224598.75
Sub-Saharan Africa                 Rwanda  Office Supplies       Offline              L   2/1/2013  115456712   2/6/2013        5062      651.21     524.96     3296425.02  2657347.52     639077.50
Sub-Saharan Africa                 Angola        Household       Offline              M  4/23/2011  135425221  4/27/2011        4187      668.27     502.54     2798046.49  2104134.98     693911.51
Sub-Saharan Africa  Republic of the Congo    Personal Care       Offline              M  7/14/2015  770463311  8/25/2015        6070       81.73      56.67      496101.10   343986.90     152114.20

Pandas iloc[] with Slice object

Let’s pass the python slice as an index and see the output.

# app.py

import pandas as pd
import numpy as np

# reading the data
data = pd.read_csv('100 Sales Records.csv', index_col=0)

# diplay first 10 rows
finalSet = data.head(10)
df = pd.DataFrame(finalSet)
print(df.iloc[3:7])

Output

python3 app.py
                                     Country        Item Type Sales Channel Order Priority Order Date   Order ID  Ship Date  Units Sold  Unit Price  Unit Cost  Total Revenue  Total Cost  Total Profit
Region
Sub-Saharan Africa     Sao Tome and Principe           Fruits        Online              C  6/20/2014  514321792   7/5/2014        8102        9.33       6.92       75591.66    56065.84      19525.82
Sub-Saharan Africa                    Rwanda  Office Supplies       Offline              L   2/1/2013  115456712   2/6/2013        5062      651.21     524.96     3296425.02  2657347.52     639077.50
Australia and Oceania        Solomon Islands        Baby Food        Online              C   2/4/2015  547995746  2/21/2015        2974      255.28     159.42      759202.72   474115.08     285087.64
Sub-Saharan Africa                    Angola        Household       Offline              M  4/23/2011  135425221  4/27/2011        4187      668.27     502.54     2798046.49  2104134.98     693911.51

DataFrame.iloc[] with Python lambda function

Let’s use a callable method chain. The x passed to a lambda function is the DataFrame being sliced and it selects the rows whose index label even.

In this example, we won’t use external CSV data, and we will create the DataFrame from tuples.

See the following code.

# app.py

import pandas as pd
import numpy as np

# reading the data
series = [('Stranger Things', 3, 'Millie'),
          ('Game of Thrones', 8, 'Emilia'), ('La Casa De Papel', 4, 'Sergio'),
          ('Westworld', 3, 'Evan Rachel'), ('Stranger Things', 3, 'Millie'),
         ('La Casa De Papel', 4, 'Sergio')]

# Create a DataFrame object
dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor'])

df = pd.DataFrame(dfObj)
print(df.iloc[lambda x: x.index % 2 == 0])

Output

python3 app.py
               Name  Seasons   Actor
0   Stranger Things        3  Millie
2  La Casa De Papel        4  Sergio
4   Stranger Things        3  Millie

You can see that it returns even indexed rows. We have passed the lambda function to write the logic that removes odd rows and selects even rows and returns it.

Boolean / Logical indexing using .iloc

Let’s pass the list of boolean values True and False to the iloc[] method and see the output.

# app.py

import pandas as pd
import numpy as np

# reading the data
series = [('Stranger Things', 3, 'Millie'),
          ('Game of Thrones', 8, 'Emilia'), ('La Casa De Papel', 4, 'Sergio'),
          ('Westworld', 3, 'Evan Rachel'), ('Stranger Things', 3, 'Millie'),
         ('La Casa De Papel', 4, 'Sergio')]

# Create a DataFrame object
dfObj = pd.DataFrame(series, columns=['Name', 'Seasons', 'Actor'])

df = pd.DataFrame(dfObj)
print(df.iloc[[True, True, True, True, False, False]])

Output

python3 app.py
               Name  Seasons        Actor
0   Stranger Things        3       Millie
1   Game of Thrones        8       Emilia
2  La Casa De Papel        4       Sergio
3         Westworld        3  Evan Rachel

Conclusion

There are many ways to select and index rows and columns from Pandas DataFrames.

  1. Selecting the data by row numbers (.iloc).
  2. Selecting the data by label or by a conditional statement (.loc)

We have only seen the iloc[] method, and we will see loc[] soon.

The iloc syntax is data.iloc[<row selection>, <column selection>], which is sure to be the source of confusion for R users.

The “iloc” in pandas is used to select rows and columns by number(index), in the order that they appear in the DataFrame.

You can imagine that each row has the row number from 0 to the total rows (data.shape[0]), and iloc[] allows the selections based on these numbers. The same applies to columns (ranging from 0 to data.shape[1] ).

Finally, Python Pandas iloc for select data example is over.

See also

Pandas value_counts()

Pandas pivot_table()

Pandas set_index()

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.