How to Compare Two DataFrames in Pandas

To compare two DataFrames in Pandas, use the .compare() method. It performs comparisons by rows and columns and checks each element against the corresponding element in the other DataFrame.

This method requires the two DataFrames to have the exact same shape (same number of rows and columns) and identical row and column labels. If these conditions are not met, the method will raise an error.

This method is specifically helpful in identifying changes between two datasets that are supposed to be similar or identical.

Syntax

DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)

Parameters

Name Description
other It is the DataFrame for comparison.
align_axis This suggests the axis of comparison, with 0 for rows and 1, the default value, for columns.
keep_shape This is a boolean parameter. Setting this to True prevents the dropping of any row or column and compares dropped rows and columns with all elements the same for the two data frames for the default value False.
keep_equal Another boolean parameter. Setting this to True shows equal values between the two DataFrames, while compare shows the positions with the same values for the two data frames as NaN for the default value False.

Return value

This method returns a new DataFrame that shows the differences side by side.

The new DataFrame has a MultiIndex in the columns, where each original column is expanded into two levels: one showing the value from the first DataFrame and the other showing the corresponding value from the second DataFrame.

The structure of the returned DataFrame depends on keep_shape and align_axis parameters.

Example 1: Comparing two DataFrames

Visual Representation of Comparing two DataFrames in Pandas

import pandas as pd

# Creating two DataFrames for comparison
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})

# Comparing the DataFrames
diff = df1.compare(df2)
print(diff)

Output

Output of comparing two data frames in Pandas

Example 2: Keeping shape and equal values

Keeping shape and equal values

import pandas as pd

# Creating two DataFrames for comparison
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})

# Comparing the DataFrames
diff = df1.compare(df2, keep_shape=True, keep_equal=True)
print(diff)

Output

Output of keeping shape and equal values

Example 3: Comparing with alignment along an index

Comparing with alignment along an index

import pandas as pd

# Creating two DataFrames for comparison
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})

# Compare with align_axis set to 0
result = df1.compare(df2, align_axis=0)
print(result)

Output

Comparing with alignment along an index

Example 4: DataFrames with Different Shapes

import pandas as pd

# Creating two DataFrames for comparison
df1 = pd.DataFrame({'A': [1, 2], 'B': [4, 5]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})

# Compare
result = df1.compare(df2)
print(result)

Output

ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

We got the ValueError because we compared two data frames with different shapes.

Example 5: Comparing with non-overlapping columns

import pandas as pd

# Creating two DataFrames with non-overlapping columns
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'C': [1, 2, 4], 'D': [4, 5, 7]})

# Compare
result = df1.compare(df2)
print(result)

Output

ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

We got an error because the index and columns of Data Frame objects are different.

2 thoughts on “How to Compare Two DataFrames in Pandas”

  1. The main problem is just you don’t have brackets enter function where. so let’s see this example
    dfA[‘matchPrice?’] = np.where((dfA[‘price’]) == (dfB[‘mrp’]), ‘True’, ‘False’)

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.