Pandas DataFrame copy() method is used to create a copy of a DataFrame.
When working with Pandas, understand the difference between copying by reference and copying by value.
This method addresses this by providing a way to create an independent copy of a DataFrame.
The modifications to the data or indices of the copy will not be reflected in the original DataFrame, and vice versa.
There are two types of copy in Pandas DataFrame:
- Deep Copy: A true deep copy means a new DataFrame is created; its data and index are independent of the original. Changes made to the copy won’t affect the original and vice versa. This is generally safer.
- Shallow Copy: A shallow copy shares the data and the index with the original DataFrame. Changing the data or index in the copy will modify the original. Use shallow copies with caution, often only for specific performance optimization needs.
Syntax
DataFrame.copy(deep=True)
Parameters
Name | Description |
deep | It is a boolean value that decides the type of copy (deep or shallow). True for a deep copy and False for a shallow copy. |
Return value
It returns a new DataFrame, a copy of the original DataFrame.
Example 1: Deep Copy
import pandas as pd
# Original DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Deep Copy
df_copy = df.copy()
# Modifying the copy
df_copy['A'] = [10, 20, 30]
print("The original DataFrame remains unchanged")
print(df)
print("Copied DataFrame")
print(df_copy)
Output
In this code, the deep copy (df_deep) remains unchanged when a value in the original DataFrame (df) is modified.
Example 2: Shallow copy
By default, copy() makes a deep copy of the DataFrame. A deep copy creates a new object and recursively copies all the objects it contains.
However, you can make a shallow copy by setting the deep parameter to False. A shallow copy does not copy the data but only the references to the objects.
import pandas as pd
# Creating a DataFrame with a list
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Making a shallow copy of the DataFrame
df_shallow_copy = df.copy(deep=False)
# Changing a value of the original DataFrame
df['A'][0] = 100
# Printing both DataFrames
print("Original DataFrame:\n", df)
print("\nShallow Copy DataFrame:\n", df_shallow_copy)
Output
In this shallow copy example, changing a column value in the original DataFrame is also reflected in the shallow copy.
This happens because the value inside the DataFrame is a mutable object, and a shallow copy doesn’t create a copy of the nested mutable objects.
Important points
- For large DataFrames, a shallow copy is more memory-efficient if you only change the index or columns but not the data.
- Deep copying is significant when the DataFrame contains objects like lists, dictionaries, or other DataFrames. In these cases, a shallow copy would only copy the references to these objects, not the objects themselves.
- A deep copy duplicates the data in memory, which might be considered with very large DataFrames.