Tensorflow Estimator in Python machine learning

In this tutorial, we will learn about TensorFlow estimators using Python programming language. Estimators are high-level API that simplifies the task of machine learning. After the data is ser up the model is defined using TensorFlow estimators. tf.estimator.Estimator library provides a wide range of estimators for our use.

The module tf.estimators provide us with a wide variety of classes for use in the model, they include LinearRegressor, LinearClassifier, etc. The estimators can also be custom made. A premade estimator is any class derived from the tf.estimators.Estimator class. The interface of estimator consists of a train-evaluate predict loop that is similar to that of sci-kit-learn.

The Schematic diagram of an estimator is  as below:

Schematic diagram of an estimator

 

Estimator Classes

For this tutorial, we can create a model data with a synthetic dataset with some overlap. Below are the various classes in Tensorflow estimators API.

import numpy as np
from sklearn.datasets import make_classification

np.random.seed(42)
X, y = make_classification(n_samples=100000, n_features=2, n_informative=2, n_redundant=0)
n_train_samples = 1000

X_train, y_train = X[:n_train_samples], y[:n_train_samples]
X_test, y_test = X[n_train_samples:], y[n_train_samples:]

 

  • BaselineClassifier predicts the average in a dominant class.
import tensorflow as tf


def input_fn(X, y): 
    dataset = tf.data.Dataset.from_tensor_slices(({'X': X[:, 0], 'Y': X[:, 1]}, y))
    dataset = dataset.shuffle(1000).batch(1000)
    return dataset


from tensorflow.estimator import BaselineClassifier
clss = BaselineClassifier(n_classes=2)


clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10)
y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test))

y_pred = np.array([p['class_ids'][0] for p in y_pred])

  • LinaerClassifier is a classifier train a linear model to classify instances into multiple classes if the number of classes is being only  two, then classifier is a binary classifier
from tensorflow.estimator import LinearClassifier
feature_columns = [
    tf.feature_column.numeric_column(key='X', dtype=tf.float32),
    tf.feature_column.numeric_column(key='Y', dtype=tf.float32)
]
clss = LinearClassifier(n_classes=2, feature_columns=feature_columns)


clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10)
y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test))
y_pred = np.array([p['class_ids'][0] for p in y_pred])
  • DNNClassifier is the class to keep a neural network classifier that implements a multilayer perception classified network.
from tensorflow.estimator import DNNClassifier
feature_columns = [
    tf.feature_column.numeric_column(key='X', dtype=tf.float32),
    tf.feature_column.numeric_column(key='Y', dtype=tf.float32)
]
clss = DNNClassifier(n_classes=2, feature_columns=feature_columns, hidden_units=[32, 32])


clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10000)
y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test))
y_pred = np.array([p['class_ids'][0] for p in y_pred])
  • BoostedTreesClassifier is one of the methods of implementation of tree ensemble for structured data implementation, these classifiers are fast train, requires a lot of tuning and there is no necessity for large dataset to work upon.
from tensorflow.estimator import BoostedTreesClassifier
feature_columns = [
    tf.feature_column.numeric_column(key='X', dtype=tf.float32),
    tf.feature_column.numeric_column(key='Y', dtype=tf.float32)
]
clss = BoostedTreesClassifier(n_classes=2, feature_columns=feature_columns, n_trees=100, n_batches_per_layer=1)


clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10000)
y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test))
y_pred = np.array([p['class_ids'][0] for p in y_pred])

MODEL implementation using Estimators

import pandas as pd
import tensorflow as tf
import numpy as np

train = pd.read_csv('./train-ready.csv')
test = pd.read_csv('./test-ready.csv')


def my_model(features, labels, mode, params):

    
    n = tf.feature_column.input_layer(features, params['feature_columns'])
    for units in params['hidden_units']:
        n = tf.layers.dense(n, units=units, activation=tf.nn.relu)


    logits = tf.layers.dense(n, params['n_classes'], activation=None)


    predicted_classes = tf.argmax(logits, 1)
    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {
            'class_ids': predicted_classes[:, tf.newaxis],
            'probabilities': tf.nn.softmax(logits),
            'logits': logits,
        }
        return tf.estimator.EstimatorSpec(mode, predictions=predictions)

    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)


    accuracy = tf.metrics.accuracy(labels=labels,
                                   predictions=predicted_classes,
                                   name='acc_op')
    metrics = {'accuracy': accuracy}
    tf.summary.scalar('accuracy', accuracy[1])

    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(
            mode, loss=loss, eval_metric_ops=metrics)

    # Create training op.
    assert mode == tf.estimator.ModeKeys.TRAIN

    optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)


    def train_input_fn(features, labels, batch_size):
    """trainig input"""

    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(10).repeat().batch(batch_size)
    return dataset


    from sklearn.model_selection import train_test_split 
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1)  


    classifier = tf.estimator.Estimator(
    model_fn=my_model,
    params={
        'feature_columns': feature_columns,      
        'hidden_units': [units, int(units/2)],
        'n_classes': 2,
    })

    ## Training and evaluating model 

    batch_size = 100
    train_steps = 400
  
    for i in range(100):
    
        classifier.train(
        input_fn=lambda:train_input_fn(X_train, y_train,
                                       batch_size),
        steps=train_steps)

That’s all the different API in the TensorFlow estimator class and the implementation in a simple model for demonstration of the use of Estimator in Python.

Leave a Reply