Interpolate a data frame in pandas

In this tutorial, we will learn the concept of interpolating the missing values in a data frame in Pandas. Interpolate method is different from fillna method. In fillna method, Nan values are filled based on the substitution value given by the user. Whereas, the interpolate method uses different types of interpolation techniques like linear, quadratic, cubic, etc. So, let’s begin the tutorial.

Dataframe.interpolate() method

This method has the following arguments:

method: The method used to interpolate the data. Some of them are: linear, time, pad, quadratic, cubic, Krogh

axis: The axis on which the process is done. ‘index’ or 0, ‘column’ or 1

limit: Number of consecutive Nan to convert them to data.

limit_direction: The direction to interpolate Nan. ‘forward’,’backward’, ‘both’.

limit_area: For interpolation ‘inside’ is used. For extrapolation ‘outside’ is used.

downcast: Used for downcasting the datatypes.

kwargs: Keyword arguments.

Linear Interpolation: pandas

Let us consider the following data frame as input:

  a    b         c
0 NaN  -0.5652  36.0
1 2.0  NaN      52.0
2 3.0  -1.8682  NaN
3 NaN  NaN      -11.0
4 NaN  8.0000   98.0

Linear interpolation is demonstrated here. First, we will create a data frame with Nan values using the NumPy library and then use the interpolate method.

import pandas as p
import numpy as n
data1 = { 'a':[n.nan,2,3,n.nan,n.nan], 'b':[-0.5652,n.nan,-1.8682,n.nan,8],'c':[36,52,n.nan,-11,98]}
d1 = p.DataFrame(data1)
print(d1.interpolate())

OUTPUT:

  a    b      c
0 NaN -0.5652 36.0
1 2.0 -1.2167 52.0
2 3.0 -1.8682 20.5
3 3.0 3.0659 -11.0
4 3.0 8.0000 98.0

Here, the first value of the first column has not changed because there is no value prior to it to perform the linear interpolation process.

Using the argument limit_direction

Here, we will use the limit_direction as backward and mention the method as linear. The interpolation process is performed in the reverse direction.

import pandas as p
import numpy as n
data1 = { 'a':[n.nan,2,3,n.nan,n.nan], 'b':[-0.5652,n.nan,-1.8682,n.nan,8],'c':[36,52,n.nan,-11,98]}
d1 = p.DataFrame(data1) 
print(d1.interpolate(method='linear',limit_direction='backward'))

OUTPUT:

  a    b       c
0 2.0 -0.5652 36.0
1 2.0 -1.2167 52.0
2 3.0 -1.8682 20.5
3 NaN  3.0659 -11.0
4 NaN  8.0000 98.0

Here, the last value of the first column is Nan because there is no value below it to perform the interpolation.

Polynomial Interpolation:

Let us consider the following data frame as the input.

  0
0 1.0
1 NaN
2 NaN
3 333.0

In polynomial interpolation, we should specify the value for the method as ‘polynomial’. We should also specify the value of the order.

import pandas as p
import numpy as n
data2 = [1,n.nan,n.nan,333]
d2 = p.DataFrame(data2)
print(d2.interpolate(method='polynomial',order=1))

OUTPUT:

  0
0 1.000000
1 111.666667
2 222.333333
3 333.000000

Using the argument limit_area

By using the limit_area we can specify to interpolate or extrapolate the values. Here we will provide the value for the argument as ‘inside’. So, it will perform the interpolation.

import pandas as p
import numpy as n
data2 = [1,n.nan,n.nan,333]
d2 = p.DataFrame(data2)
print(d2.interpolate(method='polynomial',order=1,limit_area='inside'))

OUTPUT:

  0
0 1.000000
1 111.666667
2 222.333333
3 333.000000

Leave a Reply

Your email address will not be published.