pandas.Dataframe.astype() in Python

In this article, we are going to learn about a very useful function in Pandas of Python that is astype(). The primary use of this function is to convert datatypes of pandas data frame columns or series. We are going to see this with examples, please feel free to copy-paste the code and experiment on it in your own local machine.

First, Let’s create a DataFrame:

The following code snippet will help you to create a DataFrame.

import pandas as pd
data = {'col_one':[1,2],'col_two':[True,False]}
mydf = pd.DataFrame(data = data)

Now with the below code, you can change the datatypes of the DataFrame.

print('dataframe is')
print(mydf)
print()
print('initial dtypes')
print(mydf.dtypes)
print()
print('final dtypes')
print(mydf.astype('int32').dtypes)

Line 8 is the syntax of how to convert data type using astype function in pandas. it converts data type from int64 to int32. now the output will show you the changes in dtypes of whole data frame rather than a single column. To make changes to a single column you have to follow the below syntax

mydf.astype({'col_one':'int32'}).dtypes

This line will ensure that only col_one in mydf data frame will be altered. When you check with the command mydf.info() the dtype changes will not reflect in the mydf. To do so we can use the following syntax to change permanently.

print("before inplace replacement")
mydf.info()
mydf = mydf.astype('int32')
print("after inplace replacement")
mydf.info()

Line 3 in the above code will ensure that change took place are inplace changes i.e; permanent changes.

Okay, now you know how to change the data type(short form dtype) of a data frame column or whole data frame. Let’s talk about the advantage of this data type change with the help of an example

import pandas as pd
col_one = [1,2,3,4,5,6,7,8,9,10.0] 
col_two = [True,False,True,False,True,False,True,False,True,False] 
mydata = { 'col_one':col_one,'col_two':col_two} 
df = pd.DataFrame(data = mydata) 
print(df)

now when I check df.info() the following pic will show you the output.

 

see the memory usage it was showing 170.0 bytes, but col_one dtype is float64 but when we see the content, all are integers hence we can change the data type to int32(since all are small if you want, change it to int64)

Then above code will ensure that the dtype of col_one was changed from float64 to int32 and df.info() will show you the details. See the changes it was showing 130.0 bytes. For just a 10-row data frame you are able to optimize your dataset for about a 40.0 bytes. Imagine you have a billion-row dataset and you will decrease the memory usage enormously and it will help in data analysis. I see this as the major advantage of the astype dtype conversions.

Please feel free to share your thoughts, suggestions, and doubts through comments.

 

Leave a Reply

Your email address will not be published. Required fields are marked *