How to train Tensorflow models in Python?

In this tutorial, I will explain what is Tensorflow and how to build, compile and train the models in Tensorflow Python deep learning module. So let’s continue…

Basically, Tensors are multi-dimensional array and these multi-dimensional array acts as an input in TensorFlow. Similar to graphs, a tensor has a node and an edge where node carries the mathematical operations and produces endpoint outputs and the edge contains the relationship between input and output.

In this article, we will train mnist dataset which will predict any handwritten digits images ranging from 0 – 9 using TensorFlow.

How to use Google Colab for running Tensorflow models?

Google colab is similar to Jupyter notebook that supports free GPUs(Graphics Processing Unit) where we can compile and run python codes without downloading any software in our system. We just need to go to this link ->
It is a very easy and efficient way to learn Tensorflow as we don’t have to go a long process of downloading Anaconda and setting up the path in the system. We will have to focus only on the implementation part of the technique in Google Colab.

Below are some simple steps that we have to follow to use Google Colab:

  • Sign in your Google account.
  • Visit the above link.
  • Start Coding.

Build Compile and Train the Tensorflow models in Python

For training any Tensorflow model we have to –

  • Load the dataset.
  • Build the model (mention how many hidden layers we want along with their activation function)
  • Define the loss function.
  • Obtain training data and use an optimizer in your model.

Optimizer are used for improving speed and performance for training a specific model.

In our Google Colab, we have to install and import TensorFlow. We also have to import matplotlib.pyplot to visualize the image which is to be trained and NumPy to perform certain operation while predicting the number present in the image. The code for the above process is –

!pip install tensorflow==2.0.0-beta1
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

How to load and split the dataset?

First of all, see the code below:

handwritten_dataset = tf.keras.datasets.mnist                            #downloads the mnist dataset and store them in a variable. 

(x_train, y_train), (x_test, y_test) = handwritten_dataset.load_data()   #splits the dataset into train and test data
x_train, x_test = x_train / 255.0, x_test / 255.0                        #as the pixel value of an image ranges from 0-255 so dividing the pixel value by 255 range becomes 0-1

In the above code, the handwritten_dataset contains the mnist dataset which is available in Keras. We have to split the dataset into (x_train,y_train) and (x_test,y_test).

The (x_train,y_train) will train the model and the (x_test,y_test) will evaluate the accuracy of the model. The x_train or x_test are the handwritten digits images and y_train or y_test are the labels(digit in integer format) associated with the image. To normalize, the training and testing dataset is divided by 255.

As mnist dataset contains 60000 training images and 10000 testing images. To find the shape we can write –


Output of the above code will be –

(60000, 28, 28)
(10000, 28, 28)

Now to visualize the datasets we can use matplotlib.pyplot.

plt.imshow(x_train[1205], cmap='gray_r')

Output –



Build the Model

Now we need to build a model in which the training data has to fit in order to predict the test data. First of all we will add a layer to flatten the image i.e if image resolution is 28 x 28 pixels then flatten layer will generate 784 nodes in the flatten layer which will be fed as an input layer in the model.

Next, will add a single hidden layer having 128 nodes with a ‘relu‘ activation function and then we will add an ouput layer having 10 nodes with a ‘softmax‘ activation function.

Relu(Rectified Linear Unit) – This function will output the input directly if the input is positive and if the input is negative it will result 0.

Softmax function – This function returns the probabilities of every possible output. The output having maximum probability will be considered as a correct prediction.

In the above problem of recognizing handwritten digits the softmax will return an array of 10 elements which is the probabilities of all the numbers from 0 to 9.

The number which will have the highest probability will be the result of our program.

Below is the image that represents the above explanation of our program:

Build the Model for machine learning

The code for building the model is –

classification_model = keras.models.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dense(10, activation='softmax')

Compile the model

Now we have to compile the model by giving an optimizer and a loss function to the model for calculating and minimizing the loss.

We use optimizer to speed up the training process. Here we will use ‘adam‘ optimizer which is a replacement of classical stochastic gradient descent technique.

In classical stochastic gradient descent technique, the learning rate is unchanged for the whole training process. On the other hand, as adam optimization algorithm takes advantage of both Adaptive Gradient Descent Technique and RMSprop for faster training process.

Here we will use “sparse categorical crossentropy” as our loss function because this is a classification type of problem where we have to classify images which comes under those nine category(i.e, from 0-9). Sparse categorical crossentropy will calculate the loss for categorizing the image and we will use “accuracy” as our metrics which will represent the accuracy of our model.

The code for compiling the model is –


Train and evaluate the Model

Now for training our model we have to fit the training data into our model and we also have mention the the number of epochs. An epoch is iterating the whole training data for 1 time. If the number of epoch is 5 then the whole training data will be processed 5 times.

While training the data we will see the loss and the accuracy for every epoch. The loss should decrease and the accuracy should increase from every epoch.

The code for training and evaluating the model for 5 epochs is –, y_train, epochs=5)

classification_model.evaluate(x_test,  y_test)

The output will be-

Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 5s 83us/sample - loss: 0.2947 - accuracy: 0.9149
Epoch 2/5
60000/60000 [==============================] - 5s 81us/sample - loss: 0.1444 - accuracy: 0.9565
Epoch 3/5
60000/60000 [==============================] - 4s 75us/sample - loss: 0.1086 - accuracy: 0.9668
Epoch 4/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.0891 - accuracy: 0.9726
Epoch 5/5
60000/60000 [==============================] - 5s 75us/sample - loss: 0.0746 - accuracy: 0.9769
10000/10000 - 0s - loss: 0.0715 - accuracy: 0.9789
[0.07148841358898207, 0.9789]

Now if we train our model for 10 epochs the output will be similar and close to 98%. If we increase the number of epochs further, our model will start over-fitting. In the case of overfitting, it will start memorizing the result of training data instead of learning from the training data.

The above model is trained to an accuracy of ~98%.

Prediction of Model

Now we will see how our model is predicting. We will predict the images present in x_test. Suppose, we want to predict the first  image the i.e, x_test[0] then its real label will be y_test[0] and the predicted label will be predictions[0].

The prediction label will result an array of 10 elements which is the probability of occurrence from 0 to 9 respectively. The number having maximum probability will be the correct predicted result. If y_test[0] and np.argmax(prediction[0]) be same then it will be clear that our model predicted correctly for the first image.

The code for prediction is –

print("predicted value =",np.argmax(predictions[0]))
print("real value =", y_test[0])

The Output of the code –

predicted value = 7 
real_value = 7

Hence we see that our model predicted correctly for first image in the test data.

Hence summarizing the training process, first of all, we load the data. After that, we split the data into training data and testing data. Then, we build a model where an image size of 28×28 pixels is flattened into 784 nodes in flatten layer. It is an input to the hidden layer containing 256 nodes with ‘relu’ activation. Those 256 nodes serves as an input to the output layer containing 10 nodes where each node represents the probability of each number from 0-9.

Also, read: News category prediction with Natural language processing [NLP]

Then we compiled our model using ‘adam’ optimizer and set the loss function to ‘sparse_categorical_crossentropy’. Then we trained our model for 5 epochs and evaluated the loss and accuracy for test data. At last we predicted the first image of our test data.

Leave a Reply

Your email address will not be published. Required fields are marked *