How to Convert Numpy Array to Pandas DataFrame

The main advantage of using a DataFrame over an Array is that DataFrames have column names and row indices, making data more readable and easier to manipulate. That’s why we need to convert an Array to a DataFrame.

Here are mainly two ways to convert a Numpy array to a Pandas DataFrame:

Using pd.DataFrame()
Using from_records()

Method 1: Using pd.DataFrame()

The basic definition of pd.DataFrame() is that it accepts a numpy array and returns a DataFrame. But wait, what about column names and index labels? A Numpy array is just data, and it does not have column names, but if you check pd.DataFrame()’s function arguments it has parameters like ‘data’, ‘columns’, and ‘index’.

Simple, pd.DataFrame(arr) function returns a DataFrame with columns labeled as 0, 1, 3, and row-index starting with 0, 1, 2.

2D array to DataFrame

Let’s install the numpy and pandas libraries if they are not installed and import them at the start of the program like this:

import numpy as np

import pandas as pd

Let’s define a 2D array using np.array() function:

arr_2d = np.array([[11, 12, 13], [41, 51, 61]])

Now, pass the 2D array to the pd.DataFrame() function:

df = pd.DataFrame(arr_2d)

Here is the complete code:

import numpy as np
import pandas as pd

arr_2d = np.array([[11, 12, 13], [41, 51, 61]])
print(type(arr_2d))

df = pd.DataFrame(arr_2d)

print(df)
print(type(df))

Output

The above image shows that the output DataFrame has column labels 0, 1, and 2 and row indices 0 and 1. The data of the numpy array remains the same in the DataFrame.

Using the type() function, we understand that the data type has been changed from “numpy.ndarray” to “pandas.core.frame.DataFrame”.

Adding column names

The pd.DataFrame() function has one argument called “columns” that you can use to assign column names to your data frame while converting.

df = pd.DataFrame(arr_2d, columns=['col1', 'col2', 'col3'])

Output

The above screenshot shows that we assign the column name labels to “col1”, “col2”, and “col3”.

Add Custom Index

We can also assign an index for each row using the “index” argument.

df = pd.DataFrame(arr_2d, columns=['col1', 'col2', 'col3'],
                          index=['row1', 'row2'])

Output

The above picture shows that we assigned custom index row1 and row2 to the output DataFrame.

1D array to DataFrame

If you are working with a one-dimensional numpy array and convert it into a DataFrame, the output will be a DataFrame with a single column.

import numpy as np
import pandas as pd

arr_1d = np.array([19, 21, 18])
print(type(arr_1d))

df = pd.DataFrame(arr_1d, columns=['col1'])

print(df)
print(type(df))

Output

Structured NumPy Arrays

If your input numpy array is a structured array, meaning it has named fields (structured dtype), columns will be named automatically, and it is not 0-based labels.

import numpy as np
import pandas as pd

struct_arr = np.array([(1, 'a'), (2, 'b')], dtype=[('id', int), ('value', 'U1')])
print(struct_arr)
print(type(struct_arr))

df = pd.DataFrame(struct_arr)

print(df)
print(type(df))

Output

Method 2: Using from_records()

If your input is an array of tuples or a list of tuples, by using the from_records() method, it will convert each tuple into a row of a DataFrame. It can automatically infer column names from structured dtype, just like we saw in the section on “Structured numpy arrays” in this article.

For a regular 2D numpy array, using from_records() would treat each row as a record. But if it’s a structured array with named fields, from_records() will use those names as columns.

Basic 2D Array Conversion using from_records()

Let’s define a 2D array and pass it to the from_records() function that will return a DataFrame. The function accepts array_2d and columns (column names).

import numpy as np
import pandas as pd

arr_2d = np.array([[11, 12, 13], [41, 51, 61]])
print(arr_2d)
print(type(arr_2d))

df = pd.DataFrame.from_records(arr_2d, columns=['A', 'B', 'C'])

print(df)
print(type(df))

Output

The above visual representation states that we can easily convert a 2D numpy array into a DataFrame using the pd.df.from_records() method.

Structured array with named fields

The from_records() function shines bright when your input is a structured array because column names are inferred automatically.

import numpy as np
import pandas as pd

# Define a structured array with named fields
dtype = [('id', int), ('name', 'U10'), ('score', float)]
structured_arr = np.array([(1, 'Alice', 89.5), (2, 'Bob', 92.3)], dtype=dtype)
print(structured_arr)
print(type(structured_arr))

df = pd.DataFrame.from_records(structured_arr)

print(df)
print(type(df))

Output

The code and figure above illustrate our definition of a structured array with named fields. When we convert this array to a DataFrame, the named fields will serve as the column names, while each tuple is represented as a row within the DataFrame.

1D Structured Array

When you convert a 1D structured array, each field becomes a column of the DataFrame.

import numpy as np
import pandas as pd

# 1D structured array
dtype = [('temp', float), ('humid', float)]
arr_1d = np.array([(25.5, 80.0), (26.1, 75.5)], dtype=dtype)

print(arr_1d)
print(type(arr_1d))

df = pd.DataFrame.from_records(arr_1d)

print(df)
print(type(df))

Output

Conclusion

If you are working on simple 2D arrays and quickly convert to a DataFrame with or without column names or indices, I recommend you use the “pd.DataFrame()” function.

If your input contains an array or list filled with tuples or working with structured arrays, use the “pd.DataFrame.from_records()” method. It is also helpful in large datasets.

You can check out Converting Pandas DataFrame to Numpy Array.

Post Views: 140

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.