How to Create and Check Empty DataFrame in Pandas

What does it mean when you say an empty DataFrame? An empty dataframe does not have any rows, but it can have column names.

Creating an empty DataFrame

The most optimal way to create an empty DataFrame in Pandas is to use the pd.DataFrame() function with no arguments. It will not have any rows or columns. You can use it as a placeholder for future data.

import pandas as pd

df = pd.DataFrame()

print(df)

# Output:
# Empty DataFrame
# Columns: []
# Index: []

Empty DataFrame with columns

There are instances where you need column names to create an empty DataFrame so that it will be easy to add data based on the column names in the future.

This approach is slightly slower than no columns because it adds the overhead of creating column names. However, you have upfront column names, which can be an added advantage.

import pandas as pd

df = pd.DataFrame(columns=['ID', 'Category', 'Product'])

print(df)

# Output:
# Empty DataFrame
# Columns: [ID, Category, Product]
# Index: []

With columns and data Types

If you want to enforce data type early before putting the data into it, you can initialize an empty DataFrame with column names and data types.

import pandas as pd

df = pd.DataFrame({
    'ID': pd.Series(dtype='int'),
    'Category': pd.Series(dtype='str'),
    'Product': pd.Series(dtype='str')
})

print(df)

# Output:
# Empty DataFrame
# Columns: [ID, Categort, Product]
# Index: []

Using Structured NumPy Arrays

If you want to access low-level memory while initializing a DataFrame, use the structured numpy arrays.

import numpy as np
import pandas as pd

dtype = np.dtype([('ID', 'i4'), ('Category', 'U20'), ('Product', 'U20')])
data = np.array([], dtype=dtype)
df = pd.DataFrame(data)

print(df)

# Output:
# Empty DataFrame
# Columns: [ID, Categort, Product]
# Index: []

Checking if a DataFrame is Empty

An efficient way to check if a DataFrame is empty is to use the df.empty attribute.

It has a time complexity of O(1) because it checks len(df.index) internally, which does not take any time at all!

import pandas as pd

df = pd.DataFrame()

is_empty = df.empty

if is_empty:
    print("Empty DataFrame")
else:
    print("Not empty")

# Output: Empty DataFrame

Checking row count

Another way is to explicitly check the number of rows of the DataFrame. If it is empty, it has a length of zero.

import pandas as pd

empty_df = pd.DataFrame()

if len(empty_df) == 0:
    print("Empty DataFrame")
else:
    print("Not empty")

# Output: Empty DataFrame

It has O(1) time but is less idiomatic than df.empty.

Adding new Data to the DataFrame

After initializing an empty DataFrame, you can fill the DataFrame with data using the pd.concat() function.

import pandas as pd

# Create an empty DataFrame with predefined columns and dtypes
df = pd.DataFrame({
    'ID': pd.Series(dtype='int'),
    'Category': pd.Series(dtype='str'),
    'Product': pd.Series(dtype='str'),
})

# Check emptiness
print(df.empty)  # Output: True

# Add data later
new_data = [{'ID': 101, 'Category': "Gadget", 'Product': 'iPhone'}, 
            {'ID': 102, 'Category': "Gadget", 'Product': 'iPad'},
            {'ID': 103, 'Category': "Gadget", 'Product': 'iPod'},
            {'ID': 104, 'Category': "Gadget", 'Product': 'iMac'},
            {'ID': 105, 'Category': "Gadget", 'Product': 'MacBook'},
            {'ID': 106, 'Category': "Gadget", 'Product': 'MacBook Pro'},
            {'ID': 107, 'Category': "Gadget", 'Product': 'MacBook Air'},
            ]
df = pd.concat([df, pd.DataFrame(new_data)], ignore_index=True)

# Re-check emptiness
print(df.empty)  
# Output: False

print(df)
# Output:
#    ID Category Product
# 0  101  Gadget  iPhone
# 1  102  Gadget   iPad
# 2  103  Gadget  iPod
# 3  104  Gadget iMac
# 4  105  Gadget MacBook

Edge cases

DataFrame with NaN Values

What if your DataFrame only contains NA values? Does it count as an empty DataFrame? The df.empty attribute returns False if the DataFrame has rows, even if all values are NaN.

import numpy as np
import pandas as pd

empty_df = pd.DataFrame({"Name": [np.nan, np.nan, np.nan]})

if len(empty_df) == 0:
    print("Empty DataFrame")
    
else:
    print("Not empty DataFrame")

# Output: Not empty DataFrame

Even though row values are NaN, it is not considered empty. However, you can use the df.dropna().empty.

import numpy as np
import pandas as pd

df = pd.DataFrame({"Name": [np.nan, np.nan, np.nan]})

empty_df = df.dropna().empty

if empty_df:
    print("Empty DataFrame")
    
else:
    print("Not empty DataFrame")

# Output: Not empty DataFrame

The .dropna() method removes NaN values from the DataFrame, which becomes empty, and then the .empty attribute returns True.

Empty vs. None

The df is None is different from df.empty because if df is None, it means the DataFrame object does not exist in memory. Where df.empty has an internal object in the memory, but it has zero rows.

Post Views: 99

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.