How to Normalize a Pandas DataFrame Column
In this tutorial, you will learn how to Normalize a Pandas DataFrame column with Python code. Normalizing means, that you will be able to represent the data of the column in a range between 0 to 1.
At first, you have to import the required modules which can be done by writing the code as:
import pandas as pd from sklearn import preprocessing
Along with the above line of code, you will write one more line as:
%matplotlib inline
What this does is, basically it just represents graphs that you create with your project will be projected in the same window and not in a different window.
Now let’s create data that you will be working on:
data = {'data_range': [100,55,33,29,-57,56,93,-8,79,120]} data_frame = pd.DataFrame(data) data_frame
This will just show our unnormalized data as:
We can also plot this above-unnormalized data as a bar graph by using the command as:
data_frame['data_range'].plot(kind='bar')
The graph of our unnormalized data is:
It can be seen clearly from the graph that our data is unnormalized, and now you will be using various preprocessing tools to convert it into a normalized data.
A = data_frame.values #returns an array min_max_scaler = preprocessing.MinMaxScaler() x_scaled = min_max_scaler.fit_transform(A)
Where A is nothing but just a Numpy array and MinMaxScaler() converts the value of unnormalized data to float and x_scaled contains our normalized data.
We can also see our normalized data that x_scaled contains as:
normalized_dataframe = pd.DataFrame(x_scaled) normalized_dataframe
The results of the above command will be:
Now you can plot and show normalized data on a graph by using the following line of code:
normalized_dataframe.plot(kind='bar')
So we are able to Normalize a Pandas DataFrame Column successfully in Python. I hope, you enjoyed doing the task.
Also, read: Drop Rows and Columns in Pandas with Python Programming
Leave a Reply