Numpy arrays are optimized for homogeneous numerical data, while DataFrames are designed for mixed data types in columns. Numpy arrays use 0-based indexing, while you can assign custom column names and row indices of the DataFrames.
The main advantage of using DataFrame over Array is that DataFrames have column names and row indices, making data more readable and easier to manipulate. That’s why we need a conversion from an Array to a DataFrame.
Here are mainly two ways to convert a Numpy array to Pandas DataFrame:
- Using pd.DataFrame()
- Using from_records()
Method 1: Using pd.DataFrame()
The basic definition of pd.DataFrame() is that it accepts a numpy array and returns a DataFrame. But wait, what about column names and index labels? A Numpy array is just data, and it does not have column names, but if you check pd.DataFrame()’s function arguments it has parameters like ‘data’, ‘columns’, and ‘index’.
Simple, pd.DataFrame(arr) function returns a DataFrame with columns labeled as 0, 1, 3, and row-index starting with 0, 1, 2.
2D array to DataFrame
Let’s install numpy and pandas library if not installed and import it at the start of the program like this:
import numpy as np import pandas as pd
Let’s define a 2D array using np.array() function:
arr_2d = np.array([[11, 12, 13], [41, 51, 61]])
Now, pass the 2D array to the pd.DataFrame() function:
df = pd.DataFrame(arr_2d)
Here is the complete code:
import numpy as np import pandas as pd arr_2d = np.array([[11, 12, 13], [41, 51, 61]]) print(type(arr_2d)) df = pd.DataFrame(arr_2d) print(df) print(type(df))
Output
The above image shows that the output DataFrame has column labels 0, 1, and 2 and row indices 0 and 1. The data of the numpy array remains the same in the DataFrame.
Using the type() function, we understand that the data type has been changed from “numpy.ndarray” to “pandas.core.frame.DataFrame”.
Adding column names
The pd.DataFrame() function has one argument called “columns” that you can use to assign column names to your data frame while converting.
df = pd.DataFrame(arr_2d, columns=['col1', 'col2', 'col3'])
Output
The above screenshot shows that we assign the column name labels to “col1”, “col2”, and “col3”.
Add Custom Index
We can also assign an index for each row using the “index” argument.
df = pd.DataFrame(arr_2d, columns=['col1', 'col2', 'col3'], index=['row1', 'row2'])
Output
The above picture shows that we assigned custom index row1 and row2 to the output DataFrame.
1D array to DataFrame
If you are working with a one-dimensional numpy array and convert it into a DataFrame, the output will be a DataFrame with a single column.
import numpy as np import pandas as pd arr_1d = np.array([19, 21, 18]) print(type(arr_1d)) df = pd.DataFrame(arr_1d, columns=['col1']) print(df) print(type(df))
Output
Structured NumPy Arrays
If your input numpy array is a structured array, meaning it has named fields (structured dtype), columns will be named automatically, and it is not 0-based labels.
import numpy as np import pandas as pd struct_arr = np.array([(1, 'a'), (2, 'b')], dtype=[('id', int), ('value', 'U1')]) print(struct_arr) print(type(struct_arr)) df = pd.DataFrame(struct_arr) print(df) print(type(df))
Output
Method 2: Using from_records()
If your input is an array of tuples or a list of tuples, by using the from_records() method, it will convert each tuple into a row of a DataFrame. It can automatically infer column names from structured dtype, just like we saw in the section on “Structured numpy arrays” in this article.
For a regular 2D numpy array, using from_records() would treat each row as a record. But if it’s a structured array with named fields, from_records() will use those names as columns.
Basic 2D Array Conversion using from_records()
Let’s define a 2D array and pass it to the from_records() function that will return a DataFrame. The function accepts array_2d and columns (column names).
import numpy as np import pandas as pd arr_2d = np.array([[11, 12, 13], [41, 51, 61]]) print(arr_2d) print(type(arr_2d)) df = pd.DataFrame.from_records(arr_2d, columns=['A', 'B', 'C']) print(df) print(type(df))
Output
The above visual representation states that we can easily convert a 2D numpy array into a DataFrame using the pd.df.from_records() method.
Structured array with named fields
The from_records() function shines bright when your input is a structured array because column names are inferred automatically.
import numpy as np import pandas as pd # Define a structured array with named fields dtype = [('id', int), ('name', 'U10'), ('score', float)] structured_arr = np.array([(1, 'Alice', 89.5), (2, 'Bob', 92.3)], dtype=dtype) print(structured_arr) print(type(structured_arr)) df = pd.DataFrame.from_records(structured_arr) print(df) print(type(df))
Output
The code and figure above illustrate our definition of a structured array with named fields. When we convert this array to a DataFrame, the named fields will serve as the column names, while each tuple is represented as a row within the DataFrame.
1D Structured Array
When you convert a 1D structured array, each field becomes a column of the DataFrame.
import numpy as np import pandas as pd # 1D structured array dtype = [('temp', float), ('humid', float)] arr_1d = np.array([(25.5, 80.0), (26.1, 75.5)], dtype=dtype) print(arr_1d) print(type(arr_1d)) df = pd.DataFrame.from_records(arr_1d) print(df) print(type(df))
Output
Conclusion
If you are working on simple 2D arrays and quickly convert to a DataFrame with or without column names or indices, I recommend you use the “pd.DataFrame()” function.
If your input contains an array or list filled with tuples or working with structured arrays, use the “pd.DataFrame.from_records()” method. It is also helpful in large datasets.
You can check out Converting Pandas DataFrame to Numpy Array.