How to change column type in Pandas Python

Let’s look at different ways of changing datatypes of columns in Python. We come across many situations where there is a need for datatype casting while working with datasets. Pandas Dataframe is a very useful two-dimensional data structure where we can store and modify data. We have various functions in Python to achieve this task.

Here we are going to explore three ways,

  • Using astype()
  • Using to_numeric()
  • Using convert_dtypes()

Before understanding different options, we should create a dataset used as an example in this tutorial.

import pandas as pd
data = pd.DataFrame(
[
('1', 1, 'marcus','1'),
('2', 2, 'nila','2'),
('3', 3, 'harry','three'),
('4', 4, 'geetha','four'),
],
columns=['First','Second','Third','Fourth']
)
print(data)
print(data.dtypes)

Output:

  First  Second   Third Fourth
0     1       1  marcus      1
1     2       2    nila      2
2     3       3   harry  three
3     4       4  geetha   four
First     object
Second     int64
Third     object
Fourth    object
dtype: object

Using astype()

It is used as the DataFrame.astype() method to change the column’s datatype to the specified one. The data type can be built-in pandas, NumPy, or python datatype.

Let us convert the column First of string type into the integer type. To achieve this, we call the astype function on the data frame and explicitly define the data type we wish to change it into.

data['First'] = data['First'].astype(int)
print(data)
print(data.dtypes)

Output:

   First  Second   Third Fourth
0      1       1  marcus      1
1      2       2    nila      2
2      3       3   harry  three
3      4       4  geetha   four
First      int32
Second     int64
Third     object
Fourth    object
dtype: object
  • It can also be used to cast multiple columns at a time by giving a dictionary of columns as a parameter.
  • You can also give instructions to astype() on how to behave when an invalid datatype is found.
  • To achieve this, you need to give a corresponding ‘errors’ argument.
  • It will be ‘raise’ for raising exceptions and ‘ignore’ for ignoring them.
data['Fourth']=data['Fourth'].astype(int,errors='ignore')
print(data.dtypes)

Output:

First      int32
Second     int64
Third     object
Fourth    object
dtype: object

Using to_numeric()

This method is used to convert columns with non-numeric datatypes into numeric ones. Let’s try to cast column First into ‘int’ by using this method,

data['First']=pd.to_numeric(data['First'])
print(data.dtypes)

Output:

First      int64
Second     int64
Third     object
Fourth    object
dtype: object
  • Now, we can also convert multiple columns at a time using apply() method.
data[["First", "Second"]] = data[["First", "Second"]].apply(pd.to_numeric)
print(data.dtypes)

Output:

First      int64
Second     int64
Third     object
Fourth    object
dtype: object
  • This method also accepts ‘errors’ argument similar to astype() and has another option called ‘coerce’ which sets to ‘NaN’ where we cannot change the value.
  • You can learn more about it in the official documentation.

Using convert_dtypes

This method changes the data type automatically during the runtime. It converts the columns into the best possible datatypes based on the values.

data=data.convert_dtypes()
print(data.dtypes)

Output:

First     string
Second     Int64
Third     string
Fourth    string
dtype: object

Note: Observe that column First is converted into ‘string’ type but we would want it to be ‘int’ type. So, it is not good to use this method for correct results. You can always use the above methods to explicitly change the data types.

You may also learn,

Leave a Reply

Your email address will not be published.