AppDividend
Latest Code Tutorials

Python Pandas: How To Remove Rows In DataFrame

0

Python Pandas dataframe drop() is an inbuilt function that is used to drop the rows. The drop() removes the row based on an index provided to that function. We can remove one or more than one row from a DataFrame using multiple ways. We can drop the rows using a particular index or list of indexes if we want to remove multiple rows.

How To Remove Rows In DataFrame

Pandas DataFrame provides a member function drop() whose syntax is following.

 

See the following code example.

# app.py

import pandas as pd

shows = [('The Witcher', 'Henry Cavil', 'Geralt'),
         ('Stranger Things', 'Millie Brown', 'Eleven'),
         ('BoJack Horseman', 'Will', 'BoJack'),
         ('Adventures of Sabrina', 'Kiernan Shipka', 'Spellman'),
         ('House of Cards', 'Kevin Spacey', 'Frank Underwood')]

df = pd.DataFrame(shows,
                  columns=['Series', 'Name', 'Character Name'],
                  index=['a', 'b', 'c', 'd', 'e'])

print(df)
print('------------------------------')
print("After dropping 'C indexed' row")
print('------------------------------')
print(df.drop('c'))

In the above code, we have defined one dataframe and then print that dataframe, which contains five rows. Each row has its index, so we can easily remove the particular row using their index.

In our code, I have removed the ‘C’ indexed row. So and print the dataframe.

Output

python3 app.py
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
c        BoJack Horseman            Will           BoJack
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood
------------------------------
After dropping 'C indexed' row
------------------------------
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood

Remove Multiple rows in Pandas DataFrame

If we pass a list of indexes to the drop() function, it will remove the multiple rows.

See the following code.

# app.py

import pandas as pd

shows = [('The Witcher', 'Henry Cavil', 'Geralt'),
         ('Stranger Things', 'Millie Brown', 'Eleven'),
         ('BoJack Horseman', 'Will', 'BoJack'),
         ('Adventures of Sabrina', 'Kiernan Shipka', 'Spellman'),
         ('House of Cards', 'Kevin Spacey', 'Frank Underwood')]

df = pd.DataFrame(shows,
                  columns=['Series', 'Name', 'Character Name'],
                  index=['a', 'b', 'c', 'd', 'e'])

print(df)
print('------------------------------')
print("After dropping 'C indexed' row")
print('------------------------------')
print(df.drop(['c', 'd', 'e']))

Output

python3 app.py
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
c        BoJack Horseman            Will           BoJack
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood
------------------------------
After dropping 'C indexed' row
------------------------------
            Series          Name Character Name
a      The Witcher   Henry Cavil         Geralt
b  Stranger Things  Millie Brown         Eleven

From the output, you can see that we have removed three rows whose indexes are c, d, and e.

So, this is the one way to remove single or multiple rows in Python pandas dataframe.

Delete rows based on condition on a column

As in SQL, we can also remove a specific row based on the condition.

See the following code.

# app.py

import pandas as pd

shows = [('The Witcher', 'Henry Cavil', 'Geralt'),
         ('Stranger Things', 'Millie Brown', 'Eleven'),
         ('BoJack Horseman', 'Will', 'BoJack'),
         ('Adventures of Sabrina', 'Kiernan Shipka', 'Spellman'),
         ('House of Cards', 'Kevin Spacey', 'Frank Underwood')]

df = pd.DataFrame(shows,
                  columns=['Series', 'Name', 'Character Name'],
                  index=['a', 'b', 'c', 'd', 'e'])

print(df)
print('------------------------------')
print("After dropping 'Spellman' row")
print('------------------------------')
index = df[df['Character Name'] == 'Spellman'].index
df.drop(index, inplace=True)
print(df)

In the above code, we are getting an index based on the condition, which is the Character Name == ‘Spellman‘. 

index = df[df['Character Name'] == 'Spellman'].index

It will give an Index object containing index labels for which column ‘Character Name’ has value ‘Spellman‘ value. So, we get the d index.

Output

 python3 app.py
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
c        BoJack Horseman            Will           BoJack
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood
------------------------------
After dropping 'Spellman' row
------------------------------
            Series          Name   Character Name
a      The Witcher   Henry Cavil           Geralt
b  Stranger Things  Millie Brown           Eleven
c  BoJack Horseman          Will           BoJack
e   House of Cards  Kevin Spacey  Frank Underwood

Drop rows based on multiple conditions on a column

Let’s delete all rows for which column ‘Character Name‘ has a value ‘BoJack‘ or ‘Name‘ is ‘Will‘.

See the following code.

# app.py

import pandas as pd

shows = [('The Witcher', 'Henry Cavil', 'Geralt'),
         ('Stranger Things', 'Millie Brown', 'Eleven'),
         ('BoJack Horseman', 'Will', 'BoJack'),
         ('Adventures of Sabrina', 'Kiernan Shipka', 'Spellman'),
         ('House of Cards', 'Kevin Spacey', 'Frank Underwood')]

df = pd.DataFrame(shows,
                  columns=['Series', 'Name', 'Character Name'],
                  index=['a', 'b', 'c', 'd', 'e'])

print(df)
print('------------------------------')
print("After dropping 'BoJack' row")
print('------------------------------')
indexNames = df[(df['Character Name'] == 'BoJack')
                | (df['Name'] == 'Will')].index
df.drop(indexNames, inplace=True)
print(df)

Output

python3 app.py
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
c        BoJack Horseman            Will           BoJack
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood
------------------------------
After dropping 'BoJack' row
------------------------------
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood

Remove rows based on multiple conditions on different columns

Let’s delete all rows for which column ‘Character Name’ has ‘Eleven‘ and ‘Series’ has ‘Stranger Things‘.

See the following code.

# app.py

import pandas as pd

shows = [('The Witcher', 'Henry Cavil', 'Geralt'),
         ('Stranger Things', 'Millie Brown', 'Eleven'),
         ('BoJack Horseman', 'Will', 'BoJack'),
         ('Adventures of Sabrina', 'Kiernan Shipka', 'Spellman'),
         ('House of Cards', 'Kevin Spacey', 'Frank Underwood')]

df = pd.DataFrame(shows,
                  columns=['Series', 'Name', 'Character Name'],
                  index=['a', 'b', 'c', 'd', 'e'])

print(df)
print('------------------------------')
print("After dropping 'Eleven' row")
print('------------------------------')
indexNames = df[(df['Character Name'] == 'Eleven') &
                (df['Series'] == 'Stranger Things')].index
df.drop(indexNames, inplace=True)
print(df)

In the above case, we need to use & between multiple conditions.

If it satisfies the condition, then and then it removes the row; otherwise, it won’t remove the Pandas row.

Output

python3 app.py
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
b        Stranger Things    Millie Brown           Eleven
c        BoJack Horseman            Will           BoJack
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood
------------------------------
After dropping 'BoJack' row
------------------------------
                  Series            Name   Character Name
a            The Witcher     Henry Cavil           Geralt
c        BoJack Horseman            Will           BoJack
d  Adventures of Sabrina  Kiernan Shipka         Spellman
e         House of Cards    Kevin Spacey  Frank Underwood

Conclusion

Pandas dataframe drop() function is used to remove the rows with the help of their index, or we can apply multiple conditions. Whichever conditions hold, we will get their index and ultimately remove the row from the dataframe.

See also

How to add rows in Pandas dataFrame

Pandas set_index()

Pandas boolean indexing

Pandas sort_values()

Pandas value_counts()

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.