Tensorflow Estimator in Python machine learning
In this tutorial, we will learn about TensorFlow estimators using Python programming language. Estimators are high-level API that simplifies the task of machine learning. After the data is ser up the model is defined using TensorFlow estimators. tf.estimator.Estimator library provides a wide range of estimators for our use.
The module tf.estimators provide us with a wide variety of classes for use in the model, they include LinearRegressor, LinearClassifier, etc. The estimators can also be custom made. A premade estimator is any class derived from the tf.estimators.Estimator class. The interface of estimator consists of a train-evaluate predict loop that is similar to that of sci-kit-learn.
The Schematic diagram of an estimator is as below:
Estimator Classes
For this tutorial, we can create a model data with a synthetic dataset with some overlap. Below are the various classes in Tensorflow estimators API.
import numpy as np from sklearn.datasets import make_classification np.random.seed(42) X, y = make_classification(n_samples=100000, n_features=2, n_informative=2, n_redundant=0) n_train_samples = 1000 X_train, y_train = X[:n_train_samples], y[:n_train_samples] X_test, y_test = X[n_train_samples:], y[n_train_samples:]
- BaselineClassifier predicts the average in a dominant class.
import tensorflow as tf def input_fn(X, y): dataset = tf.data.Dataset.from_tensor_slices(({'X': X[:, 0], 'Y': X[:, 1]}, y)) dataset = dataset.shuffle(1000).batch(1000) return dataset from tensorflow.estimator import BaselineClassifier clss = BaselineClassifier(n_classes=2) clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10) y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test)) y_pred = np.array([p['class_ids'][0] for p in y_pred])
- LinaerClassifier is a classifier train a linear model to classify instances into multiple classes if the number of classes is being only two, then classifier is a binary classifier
from tensorflow.estimator import LinearClassifier feature_columns = [ tf.feature_column.numeric_column(key='X', dtype=tf.float32), tf.feature_column.numeric_column(key='Y', dtype=tf.float32) ] clss = LinearClassifier(n_classes=2, feature_columns=feature_columns) clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10) y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test)) y_pred = np.array([p['class_ids'][0] for p in y_pred])
- DNNClassifier is the class to keep a neural network classifier that implements a multilayer perception classified network.
from tensorflow.estimator import DNNClassifier feature_columns = [ tf.feature_column.numeric_column(key='X', dtype=tf.float32), tf.feature_column.numeric_column(key='Y', dtype=tf.float32) ] clss = DNNClassifier(n_classes=2, feature_columns=feature_columns, hidden_units=[32, 32]) clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10000) y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test)) y_pred = np.array([p['class_ids'][0] for p in y_pred])
- BoostedTreesClassifier is one of the methods of implementation of tree ensemble for structured data implementation, these classifiers are fast train, requires a lot of tuning and there is no necessity for large dataset to work upon.
from tensorflow.estimator import BoostedTreesClassifier feature_columns = [ tf.feature_column.numeric_column(key='X', dtype=tf.float32), tf.feature_column.numeric_column(key='Y', dtype=tf.float32) ] clss = BoostedTreesClassifier(n_classes=2, feature_columns=feature_columns, n_trees=100, n_batches_per_layer=1) clss.train(input_fn=lambda: input_fn(X_train, y_train), max_steps=10000) y_pred = clss.predict(input_fn=lambda: input_fn(X_test, y_test)) y_pred = np.array([p['class_ids'][0] for p in y_pred])
MODEL implementation using Estimators
import pandas as pd import tensorflow as tf import numpy as np train = pd.read_csv('./train-ready.csv') test = pd.read_csv('./test-ready.csv') def my_model(features, labels, mode, params): n = tf.feature_column.input_layer(features, params['feature_columns']) for units in params['hidden_units']: n = tf.layers.dense(n, units=units, activation=tf.nn.relu) logits = tf.layers.dense(n, params['n_classes'], activation=None) predicted_classes = tf.argmax(logits, 1) if mode == tf.estimator.ModeKeys.PREDICT: predictions = { 'class_ids': predicted_classes[:, tf.newaxis], 'probabilities': tf.nn.softmax(logits), 'logits': logits, } return tf.estimator.EstimatorSpec(mode, predictions=predictions) loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits) accuracy = tf.metrics.accuracy(labels=labels, predictions=predicted_classes, name='acc_op') metrics = {'accuracy': accuracy} tf.summary.scalar('accuracy', accuracy[1]) if mode == tf.estimator.ModeKeys.EVAL: return tf.estimator.EstimatorSpec( mode, loss=loss, eval_metric_ops=metrics) # Create training op. assert mode == tf.estimator.ModeKeys.TRAIN optimizer = tf.train.AdagradOptimizer(learning_rate=0.1) train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op) def train_input_fn(features, labels, batch_size): """trainig input""" dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) dataset = dataset.shuffle(10).repeat().batch(batch_size) return dataset from sklearn.model_selection import train_test_split X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1) classifier = tf.estimator.Estimator( model_fn=my_model, params={ 'feature_columns': feature_columns, 'hidden_units': [units, int(units/2)], 'n_classes': 2, }) ## Training and evaluating model batch_size = 100 train_steps = 400 for i in range(100): classifier.train( input_fn=lambda:train_input_fn(X_train, y_train, batch_size), steps=train_steps)
That’s all the different API in the TensorFlow estimator class and the implementation in a simple model for demonstration of the use of Estimator in Python.
Leave a Reply