What does it mean when you say an empty DataFrame? An empty dataframe does not have any rows, but it can have column names.
Creating an empty DataFrame
The most optimal way to create an empty DataFrame in Pandas is to use the pd.DataFrame() function with no arguments. It will not have any rows or columns. You can use it as a placeholder for future data.
import pandas as pd df = pd.DataFrame() print(df) # Output: # Empty DataFrame # Columns: [] # Index: []
Empty DataFrame with columns
There are instances where you need column names to create an empty DataFrame so that it will be easy to add data based on the column names in the future.
This approach is slightly slower than no columns because it adds the overhead of creating column names. However, you have upfront column names, which can be an added advantage.
import pandas as pd df = pd.DataFrame(columns=['ID', 'Category', 'Product']) print(df) # Output: # Empty DataFrame # Columns: [ID, Category, Product] # Index: []
With columns and data Types
If you want to enforce data type early before putting the data into it, you can initialize an empty DataFrame with column names and data types.
import pandas as pd df = pd.DataFrame({ 'ID': pd.Series(dtype='int'), 'Category': pd.Series(dtype='str'), 'Product': pd.Series(dtype='str') }) print(df) # Output: # Empty DataFrame # Columns: [ID, Categort, Product] # Index: []
Using Structured NumPy Arrays
If you want to access low-level memory while initializing a DataFrame, use the structured numpy arrays.
import numpy as np import pandas as pd dtype = np.dtype([('ID', 'i4'), ('Category', 'U20'), ('Product', 'U20')]) data = np.array([], dtype=dtype) df = pd.DataFrame(data) print(df) # Output: # Empty DataFrame # Columns: [ID, Categort, Product] # Index: []
Checking if a DataFrame is Empty
An efficient way to check if a DataFrame is empty is to use the df.empty attribute. It has a time complexity of O(1) because it checks len(df.index) internally, which does not take any time at all!
import pandas as pd df = pd.DataFrame() is_empty = df.empty if is_empty: print("Empty DataFrame") else: print("Not empty") # Output: Empty DataFrame
Checking row count
Another way is to explicitly check the number of rows of the DataFrame. If it is empty, it has zero length.
import pandas as pd empty_df = pd.DataFrame() if len(empty_df) == 0: print("Empty DataFrame") else: print("Not empty") # Output: Empty DataFrame
It has O(1) time but is less idiomatic than df.empty.
Adding new Data to the DataFrame
After initializing an empty DataFrame, you can fill the DataFrame with data using the pd.concat() function.
import pandas as pd # Create an empty DataFrame with predefined columns and dtypes df = pd.DataFrame({ 'ID': pd.Series(dtype='int'), 'Category': pd.Series(dtype='str'), 'Product': pd.Series(dtype='str'), }) # Check emptiness print(df.empty) # Output: True # Add data later new_data = [{'ID': 101, 'Category': "Gadget", 'Product': 'iPhone'}, {'ID': 102, 'Category': "Gadget", 'Product': 'iPad'}, {'ID': 103, 'Category': "Gadget", 'Product': 'iPod'}, {'ID': 104, 'Category': "Gadget", 'Product': 'iMac'}, {'ID': 105, 'Category': "Gadget", 'Product': 'MacBook'}, {'ID': 106, 'Category': "Gadget", 'Product': 'MacBook Pro'}, {'ID': 107, 'Category': "Gadget", 'Product': 'MacBook Air'}, ] df = pd.concat([df, pd.DataFrame(new_data)], ignore_index=True) # Re-check emptiness print(df.empty) # Output: False print(df) # Output: # ID Category Product # 0 101 Gadget iPhone # 1 102 Gadget iPad # 2 103 Gadget iPod # 3 104 Gadget iMac # 4 105 Gadget MacBook
Edge cases
DataFrame with NaN Values
What if your DataFrame only contains NA values? Does it count as an empty DataFrame? The df.empty attribute returns False if the DataFrame has rows, even if all values are NaN.
import numpy as np import pandas as pd empty_df = pd.DataFrame({"Name": [np.nan, np.nan, np.nan]}) if len(empty_df) == 0: print("Empty DataFrame") else: print("Not empty DataFrame") # Output: Not empty DataFrame
Even though row values are NaN, it is not considered empty. However, you can use the df.dropna().empty.
import numpy as np import pandas as pd df = pd.DataFrame({"Name": [np.nan, np.nan, np.nan]}) empty_df = df.dropna().empty if empty_df: print("Empty DataFrame") else: print("Not empty DataFrame") # Output: Not empty DataFrame
The .dropna() method removes NaN values from the DataFrame, which becomes empty, and then the .empty attribute returns True.
Empty vs. None
The df is None is different from df.empty because if df is None, it means the DataFrame object does not exist in memory. Where df.empty has an internal object in the memory, but it has zero rows.