The efficient way to convert floating-point numbers to integers in Pandas is by using the “.astype()” function. It truncates the decimal part of the float, and it doesn’t round. So, 2.99 becomes 2, not 3.
Syntax
df['col'].astype(int)
Here, ‘df’ is the input DataFrame, ‘col’ is the column of that DataFrame, and ‘int’ is the integer type.
But why do we need to convert float to integer? Integers consume less memory than floating numbers, making them memory efficient. If you are dealing with large datasets and precision is not required, then you can convert it into integers.
Basic Conversion
Let’s create a DataFrame and convert one of its columns from float to integer.
import pandas as pd # Data to be used tour_data = {'visits': [1.23, 4.56, 19.21], 'place': ['Goa', 'Lakshadweep', 'Andaman']} # Create a DataFrame df = pd.DataFrame(tour_data) print("Before conversion: ") print(df) # Convert the datatype of the column 'visits' df['visits'] = df['visits'].astype(int) print("After conversion: ") print(df)
Output
You can see from the above output image that the “visits” column has integer values after conversion.
Handling Missing Values
While conversion, if a column has missing values (NaN), the conversion will fail. We have to handle these NaN values first.
import pandas as pd import numpy as np # Data to be used tour_data = {'visits': [1.23, 4.56, np.nan], 'place': ['Goa', 'Lakshadweep', 'Andaman']} # Create a DataFrame df = pd.DataFrame(tour_data) # Fill NaN values with a suitable value (e.g., 0) df['visits'] = df['visits'].fillna(0) print(df['visits'].dtype) # float64 # Convert the datatype of the column 'visits' df['visits'] = df['visits'].astype(int) print("After conversion: ") print(df['visits'].dtype) # int64
In this code, first, we replaced the NaN value with 0 (here, you can choose whichever you want based on your requirement) and then converted the column into an integer.
Large Float Values (Potential Overflow)
Sometimes, if your floating-point numbers are very large, using the standard int type might lead to overflow errors or unexpected results. You can use the np.int64 type to prevent any unexpected error.
import pandas as pd import numpy as np df = pd.DataFrame({'float_col': [2**31 - 0.5, 2**32 + 0.5]}) df['int_col_good'] = df['float_col'].astype(np.int64) print(df['int_col_good'].dtype) # int64
Object Type Columns (Strings Representing Numbers)
If you have a dataset in which your numerical column might be read as strings. If that is the case, then you need to convert to numeric first.
import pandas as pd df = pd.DataFrame({'str_col': ['1.2', '3.7', '5.0']}) print(df['str_col'].dtype) # object df['float_col'] = pd.to_numeric(df['str_col']) df['int_col'] = df['float_col'].astype(int) print(df['int_col'].dtype) # int64
Here, you can see that first, we converted a column from string to float using the pd.to_numeric() method. After converting it into float64, we converted it into int64.
Using apply()
There is another way for converting floats into integers in Pandas by using the .apply(np.int64) method. For mixed data types, use custom functions and apply() for more robust conversion.
import pandas as pd import numpy as np # Data to be used tour_data = {'visits': [1.23, 4.56, 21.19], 'place': ['Goa', 'Lakshadweep', 'Andaman']} # Create a DataFrame df = pd.DataFrame(tour_data) print("Before conversion: ") print(df['visits'].dtype) # float64 # Convert the datatype of the column 'visits' # Using .apply() method df['visits'] = df['visits'].apply(np.int64) print("After conversion: ") print(df['visits'].dtype) # int64
That’s all!