How to change column type in Pandas Python
Let’s look at different ways of changing datatypes of columns in Python. We come across many situations where there is a need for datatype casting while working with datasets. Pandas Dataframe is a very useful two-dimensional data structure where we can store and modify data. We have various functions in Python to achieve this task.
Here we are going to explore three ways,
- Using
astype()
- Using
to_numeric()
- Using
convert_dtypes()
Before understanding different options, we should create a dataset used as an example in this tutorial.
import pandas as pd data = pd.DataFrame( [ ('1', 1, 'marcus','1'), ('2', 2, 'nila','2'), ('3', 3, 'harry','three'), ('4', 4, 'geetha','four'), ], columns=['First','Second','Third','Fourth'] ) print(data) print(data.dtypes)
Output:
First Second Third Fourth 0 1 1 marcus 1 1 2 2 nila 2 2 3 3 harry three 3 4 4 geetha four First object Second int64 Third object Fourth object dtype: object
Using astype()
It is used as the DataFrame.astype()
method to change the column’s datatype to the specified one. The data type can be built-in pandas, NumPy, or python datatype.
Let us convert the column First of string type into the integer type. To achieve this, we call the astype function on the data frame and explicitly define the data type we wish to change it into.
data['First'] = data['First'].astype(int) print(data) print(data.dtypes)
Output:
First Second Third Fourth 0 1 1 marcus 1 1 2 2 nila 2 2 3 3 harry three 3 4 4 geetha four First int32 Second int64 Third object Fourth object dtype: object
- It can also be used to cast multiple columns at a time by giving a dictionary of columns as a parameter.
- You can also give instructions to
astype()
on how to behave when an invalid datatype is found. - To achieve this, you need to give a corresponding ‘errors’ argument.
- It will be ‘raise’ for raising exceptions and ‘ignore’ for ignoring them.
data['Fourth']=data['Fourth'].astype(int,errors='ignore') print(data.dtypes)
Output:
First int32 Second int64 Third object Fourth object dtype: object
Using to_numeric()
This method is used to convert columns with non-numeric datatypes into numeric ones. Let’s try to cast column First into ‘int’ by using this method,
data['First']=pd.to_numeric(data['First']) print(data.dtypes)
Output:
First int64 Second int64 Third object Fourth object dtype: object
- Now, we can also convert multiple columns at a time using apply() method.
data[["First", "Second"]] = data[["First", "Second"]].apply(pd.to_numeric) print(data.dtypes)
Output:
First int64 Second int64 Third object Fourth object dtype: object
- This method also accepts ‘errors’ argument similar to
astype()
and has another option called ‘coerce’ which sets to ‘NaN’ where we cannot change the value. - You can learn more about it in the official documentation.
Using convert_dtypes
This method changes the data type automatically during the runtime. It converts the columns into the best possible datatypes based on the values.
data=data.convert_dtypes() print(data.dtypes)
Output:
First string Second Int64 Third string Fourth string dtype: object
Note: Observe that column First is converted into ‘string’ type but we would want it to be ‘int’ type. So, it is not good to use this method for correct results. You can always use the above methods to explicitly change the data types.
You may also learn,
Leave a Reply