Tuples are the perfect way to represent records or rows of the Data Frame, where each element in the tuple corresponds to a specific field or column.
When you are creating a DataFrame, a list of tuples represents the rows of the DataFrame. Each tuple within the list corresponds to a single row, and the elements within each tuple represent the values for the different columns in that row.
Here are two ways to create a DataFrame from a list of tuples:
- Using pd.DataFrame()
- Using from_records()
Method 1: Using pd.DataFrame()
The most common way to create a DataFrame in Pandas from any type of structure, including a list, is the .DataFrame() constructor.
If the tuple contains nested tuples or lists, each nested tuple/list becomes a row in the DataFrame.
import pandas as pd list_of_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)] print("Before converting to data frame") print(list_of_tuples) df = pd.DataFrame(list_of_tuples, columns=['col1', 'col2', 'col3']) print("After converting to data frame") print(df)
Output
Here, from the above image, you can see that each tuple of the list represents the individual rows of the data frame.
While creating a data frame, we pass the “columns” argument that will create columns in the final DataFrame.
Pros
- The .DataFrame() constructor not only works with lists but also works well with tuples, dictionaries, or numpy arrays. You can create a DataFrame out of any data structure.
- It requires less coding, which makes it very easy to implement and understand.
- It is an efficient and idiomatic way to create a DataFrame.
Cons
- It becomes less efficient once the list is really large.
- If you have record-like data, it has the least advantage in it.
Method 2: Using from_records()
The pd.DataFrame.from_records() method is specifically helpful for converting a list of tuples (or other sequences) to a DataFrame. Each tuple in the list becomes a row in the DataFrame.
import pandas as pd list_of_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)] print("Before converting to data frame") print(list_of_tuples) df = pd.DataFrame.from_records(list_of_tuples, columns=['col1', 'col2', 'col3']) print("After converting to data frame") print(df)
Output
You can see from the above image that almost everything is the same as the first approach, and it returns a DataFrame with proper rows and columns.
Pros
- The from_records() method is intentionally designed for record-like data, which makes it very efficient.
- It handles numpy arrays extremely well if your data structure is that.
Cons
- It can become more verbose than .DataFrame() constructor.
- It is less versatile when working with different types of data.
That’s it!