How to Convert Floats to Integers in a Pandas DataFrame

The efficient way to convert floating-point numbers to integers in Pandas is by using the “.astype()” function. It truncates the decimal part of the float, and it doesn’t round. So, 2.99 becomes 2, not 3.

Syntax

df['col'].astype(int)

Here, ‘df’ is the input DataFrame, ‘col’ is the column of that DataFrame, and ‘int’ is the integer type.

But why do we need to convert a float to an integer? Integers consume less memory than floating-point numbers, making them more memory-efficient.

If you are dealing with large datasets and precision is not required, then you can convert them into integers.

Basic Conversion

Let’s create a DataFrame and convert one of its columns from float to integer.

import pandas as pd

# Data to be used
tour_data = {'visits': [1.23, 4.56, 19.21],
             'place': ['Goa', 'Lakshadweep', 'Andaman']}

# Create a DataFrame
df = pd.DataFrame(tour_data)
print("Before conversion: ")
print(df)

# Convert the datatype of the column 'visits'
df['visits'] = df['visits'].astype(int)
print("After conversion: ")
print(df)

Output

As shown in the above output image, the “visits” column contains integer values after conversion.

Handling Missing Values

While converting, if a column has missing values (NaN), the conversion will fail. We have to handle these NaN values first.

import pandas as pd
import numpy as np

# Data to be used
tour_data = {'visits': [1.23, 4.56, np.nan],
             'place': ['Goa', 'Lakshadweep', 'Andaman']}

# Create a DataFrame
df = pd.DataFrame(tour_data)

# Fill NaN values with a suitable value (e.g., 0)
df['visits'] = df['visits'].fillna(0)
print(df['visits'].dtype)  # float64

# Convert the datatype of the column 'visits'
df['visits'] = df['visits'].astype(int)
print("After conversion: ")
print(df['visits'].dtype)  # int64

In this code, we first replaced the NaN value with 0 (here, you can choose whichever value you want based on your requirements) and then converted the column into an integer.

Large Float Values (Potential Overflow)

Sometimes, if your floating-point numbers are very large, using the standard int type might lead to overflow errors or unexpected results.

You can use the np.int64 type to prevent any unexpected error.

import pandas as pd
import numpy as np

df = pd.DataFrame({'float_col': [2**31 - 0.5, 2**32 + 0.5]})

df['int_col_good'] = df['float_col'].astype(np.int64)

print(df['int_col_good'].dtype) # int64

Object Type Columns (Strings Representing Numbers)

If you have a dataset in which your numerical column might be read as strings. If that is the case, then you need to convert to a number first.

import pandas as pd

df = pd.DataFrame({'str_col': ['1.2', '3.7', '5.0']})
print(df['str_col'].dtype) # object

df['float_col'] = pd.to_numeric(df['str_col'])

df['int_col'] = df['float_col'].astype(int)
print(df['int_col'].dtype) # int64

Here, you can see that first, we converted a column from string to float using the pd.to_numeric() method. After converting it into float64, we converted it into int64.

Using apply()

There is another way for converting floats into integers in Pandas by using the .apply(np.int64) method.

For mixed data types, use custom functions and apply() for more robust conversion.

import pandas as pd
import numpy as np

# Data to be used
tour_data = {'visits': [1.23, 4.56, 21.19],
             'place': ['Goa', 'Lakshadweep', 'Andaman']}

# Create a DataFrame
df = pd.DataFrame(tour_data)

print("Before conversion: ")
print(df['visits'].dtype) # float64

# Convert the datatype of the column 'visits' 
# Using .apply() method
df['visits'] = df['visits'].apply(np.int64)
print("After conversion: ")
print(df['visits'].dtype) # int64

That’s all!

Post Views: 236

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.