Pandas DataFrame to_numpy: How to Convert DataFrame to Numpy

Pandas Dataframe.to_numpy() is an inbuilt method that is used to convert a DataFrame to a Numpy array. The DataFrame is a two-dimensional data structure that can have the mutable size and is present in a tabular structure. To convert this data structure in the Numpy array, we use the function DataFrame.to_numpy() method.

To convert Pandas DataFrame to numpy array, you can use the DataFrame.to_numpy() function.

The data type of the returned array will be the standard Numpy datatype of all the types in the DataFrame.

For example, if the datatype is float32, then the resultant datatype will also be float32.

Syntax

DataFrame.to_numpy(dtype= None, copy= False) 

Parameters

DataFrame.to_numpy() function contains following two parameters.

  1. dtype: It is used to mention the data type we are passing. (Example: string, int)
  2. copy: It is a boolean value, and by default, it takes False. It ensures that the returned value is not the view on another array.

Return Value

The to_numpy() method returns a numpy array.

Example

Write a program to show the working of DataFrame.to_numpy().

See the following code.

import pandas as pd

data = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020],
                     'month': [2, 3, 4, 5, 6, 7],
                     'day': [4, 5, 6, 7, 8, 9]})
data_numpy = data.to_numpy()
print(data_numpy)

Output

[[2015    2    4]
 [2016    3    5]
 [2017    4    6]
 [2018    5    7]
 [2019    6    8]
 [2020    7    9]]

In the above example, we can see that we have created a DataFrame named data that contains data of year, month, and day.

Then, we have converted that data to numpy using to_numpy() and got out the desired output in the form of an array.

You can check the data type of the array using the type() function.

import pandas as pd

data = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020],
                     'month': [2, 3, 4, 5, 6, 7],
                     'day': [4, 5, 6, 7, 8, 9]})

print('The data type of data is: ', type(data))
data_numpy = data.to_numpy()
print('The data type of data_numpy is: ', type(data_numpy))

Output

The data type of data is:  <class 'pandas.core.frame.DataFrame'>
The data type of data_numpy is:  <class 'numpy.ndarray'>

You can see that both have different data types, and the to_numpy() function successfully converts DataFrame to Numpy array.

Example 2: Write a program to show the working of DataFrame.to_numpy() on heterogeneous data.

See the following code.

import pandas as pd

data = pd.DataFrame({'science marks': [84, 77, 66, 44, 37, 89],
                     'maths marks': [62.5, 73.6, 84.3, 67.5, 56.9, 87.5]})
data_innumpy = data.to_numpy()
print(data_innumpy)

Output

[[84.  62.5]
 [77.  73.6]
 [66.  84.3]
 [44.  67.5]
 [37.  56.9]
 [89.  87.5]]

Here in the above code, we can see that we have created a DataFrame that contains marks of science and maths.

The thing to notice here is that marks of science are present in integer format, and marks of maths are present in decimal.

Hence while converting it in numpy array, it takes the value of the lowest common type used.

Always remember that when dealing with a lot of data, you should clean the data first to get high accuracy.

Import CSV Data and convert it to numpy array

To import CSV data, you can use the read_csv() method.

It will convert CSV data to DataFrame automatically.

I am importing the shows_data.csv file. You can download it from here. You can name it whatever you like for your convenience. I have named it the shows_data.csv file.

In this example, we will get the data of the Title column of the first five rows.

import pandas as pd

data = pd.read_csv('shows_data.csv')
data.dropna(inplace=True)
shows = pd.DataFrame(data['Title'].head())
print(shows.to_numpy())

Output

[['Breaking Bad']
 ['Stranger Things']
 ['Money Heist']
 ['Sherlock']
 ['Better Call Saul']]

You can see that we only got the title of the first five shows in the numpy array.

We can also pass the dtype argument to the to_numpy() function.

import pandas as pd

data = pd.read_csv('shows_data.csv')
data.dropna(inplace=True)
shows = pd.DataFrame(data['Netflix'].head())
print(shows.to_numpy(dtype='float32'))

Output

[[1.]
 [1.]
 [1.]
 [1.]
 [1.]]

You can see that the output array is in the float data type.

So, to convert Pandas DataFrame to Numpy array, the to_numpy() array function is useful.

Finally, Pandas DataFrame to_numpy() example is over.

See Also

Pandas DataFrame to List

Pandas DataFrame to CSV

Pandas DataFrame to_json()

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.