Normalize features in TensorFlow with Python
In Machine Learning, we perform normalization on our dataset to change the numeric columns values present in the dataset. The goal is to get a common scale and get the values in a range without losing the information. Generally, we calculate the mean, and the standard deviation to perform normalization of a group in our input tensor.
Python program to Normalization of features in TensorFlow
Basic normalization code:
To perform normalization in TensorFlow, when we are using tf.estimator, we have to add an argument normalizer_fn in tf.feature_column.numeric_feature to normalize using the same parameters for training, evaluation, and serving.
normalized_feature = tf.feature_column.numeric_column( feature_name, normalizer_fn=zscore )
Here zscore is a parameter that defines the relation between the values and mean of those values. Function for zscore is:
def zscore( x ): mean = 3.04 std = 1.2 return (x-mean)/std
Let’s work with an example:
- Importing libraries and data: Here we will use these modules shutil, numpy, pandas, tensorflow. And we will use the dataset california_houisng_train.csv which is provided by googleapis.
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")
msk = np.random.rand(len(df)) < 0.8
traindf = df[msk]
evaldf = df[~msk]
traindf.head(4) #printing upper rows of datasetOutput:
longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
|---|---|---|---|---|---|---|---|---|---|
0 | -114.31 | 34.19 | 15.0 | 5612.0 | 1283.0 | 1015.0 | 472.0 | 1.4936 | 66900.0 |
1 | -114.47 | 34.40 | 19.0 | 7650.0 | 1901.0 | 1129.0 | 463.0 | 1.8200 | 80100.0 |
2 | -114.56 | 33.69 | 17.0 | 720.0 | 174.0 | 333.0 | 117.0 | 1.6509 | 85700.0 |
3 | -114.57 | 33.64 | 14.0 | 1501.0 | 337.0 | 515.0 | 226.0 | 3.1917 | 73400.0 |
- Get normalization parameters: We will perform normalization on the numeric features present in the dataset, taking all the numeric features in a separate variable and then analyzing which parameters require normalization by using zscore and getting parameters that require normalization with their mean and standard deviation as output.
def get_normalization_parameters(traindf, features):
def _z_score_params(column):
mean = traindf[column].mean()
std = traindf[column].std()
return {'mean': mean, 'std': std}
normalization_parameters = {}
for column in features:
normalization_parameters[column] = _z_score_params(column)
return normalization_parameters
NUMERIC_FEATURES = ['housing_median_age', 'total_rooms', 'total_bedrooms',
'population', 'households', 'median_income']
normalization_parameters = get_normalization_parameters(traindf,
NUMERIC_FEATURES)
normalization_parametersOutput:
- Performing Normalization: Here, we are creating the feature columns by using the mean and standard deviation that we have calculated above. And then by using the feature columns we are forming the estimators.
def _numeric_column_normalized(column_name, normalizer_fn):
return tf.feature_column.numeric_column(column_name,
normalizer_fn=normalizer_fn)
def create_feature_cols(features, use_normalization):
normalized_feature_columns = []
for column_name in features:
if use_normalization:
column_params = normalization_parameters[column_name]
mean = column_params['mean']
std = column_params['std']
def normalize_column(col):
return (col - mean)/std
normalizer_fn = normalize_column
else:
normalizer_fn = None
normalized_feature_columns.append(_numeric_column_normalized(column_name,
normalizer_fn))
print(normalized_feature_columns)
return normalized_feature_columns
feature_columns = create_feature_cols(NUMERIC_FEATURES,True)Output:
[NumericColumn(key='housing_median_age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=<function create_feature_cols.<locals>.normalize_column at 0x000001C775ED9B70>)] [NumericColumn(key='housing_median_age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=<function create_feature_cols.<locals>.normalize_column at 0x000001C775ED9B70>), NumericColumn(key='total_rooms', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=<function create_feature_cols.<locals>.normalize_column at 0x000001C775F496A8>)]..........
Our data is normalized and we can work upon it to train our Machine Learning model and do predictions.
Leave a Reply