Develop A Neural Network That Can Read Handwriting
In this tutorial, we shall learn to develop a neural network that can read handwriting with python.
For this tutorial, we shall use the MNIST dataset, this dataset contains handwritten digit images of 28×28 pixel size. So we shall be predicting the digits from 0 to 9, i.e. there are a total of 10 classes to make predictions.
The following version of NumPy & TensorFlow should be installed on their system for the code to work
NumPy: 1.14.3
TensorFlow: 1.4.0
Prerequisite: basics of TensorFlow with examples
A Neural Network That Can Read Handwriting
Let’s start by loading the MNIST dataset.
import tensorflow as tf import numpy as np from PIL import Image from tensorflow.examples.tutorials.mnist import input_data mnist_data = input_data.read_data_sets("MNIST_data/", one_hot=True) no_train = mnist_data.train.num_examples no_validation = mnist_data.validation.num_examples no_test = mnist_data.test.num_examples
Now we move on to the next step which is defining the neural network layers.
In this neural network, we shall use three hidden layers and then the final output layer will be defined. After this, we fix the learning rate, no of iterations for the training of the model.
no_input = 784 #(28x28 pixels) no_hidden1 = 1024 no_hidden2 = 512 no_hidden3 = 256 no_output = 10 learning_rate = 1e-5 no_iterations = 2000 batch_size = 256 dropout = 0.5
The next step is to define variables as placeholders for the data that we feed into it.
X = tf.placeholder("float", [None, n_input]) Y = tf.placeholder("float", [None, n_output]) probe = tf.placeholder(tf.float32)
Now we will be giving weights to the hidden layers and also set the bias values for each. The values of weights are set carefully so that the model learns something productive at every iteration. For the bias, we use small constant value to ensure the variables activate from the initial stage and contribute to learning.
weights = { 'weight1': tf.Variable(tf.truncated_normal([no_input, no_hidden1], stddev=0.1)), 'weight2': tf.Variable(tf.truncated_normal([no_hidden1, no_hidden2], stddev=0.1)), 'weight3': tf.Variable(tf.truncated_normal([no_hidden2, no_hidden3], stddev=0.1)), 'out': tf.Variable(tf.truncated_normal([no_hidden3, no_output], stddev=0.1)), } biases = { 'bias1': tf.Variable(tf.constant(0.1, shape=[no_hidden1])), 'bias2': tf.Variable(tf.constant(0.1, shape=[no_hidden2])), 'bias3': tf.Variable(tf.constant(0.1, shape=[no_hidden3])), 'out': tf.Variable(tf.constant(0.1, shape=[no_output])) }
Next, set up the various layers of the neural network by defining operations that will help manipulate with variables. Each hidden layer will execute matrix multiplication on the previous layer’s outputs. Multiply the current layer’s weights, and add the bias to these values.
layer_1 = tf.add(tf.matmul(X, weights['weight1']), biases['bias1']) layer_2 = tf.add(tf.matmul(layer_1, weights['weight2']), biases['bias2']) layer_3 = tf.add(tf.matmul(layer_2, weights['weight3']), biases['bias3']) layer_drop = tf.nn.dropout(layer_3, prob) final_layer = tf.matmul(layer_3, weights['out']) + biases['out']
The final step in building the graph is to define the loss function that we want to optimize. Cross-entropy, also known as log-loss, which quantifies the difference between two probability distributions.
cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( labels=Y, logits=final_layer )) training = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) predict = tf.equal(tf.argmax(final_layer, 1), tf.argmax(Y, 1)) accuracy = tf.reduce_mean(tf.cast(predict, tf.float32)) init = tf.global_variables_initializer() sess = tf.Session() sess.run(init)
Driver Code:
Training the model is the next step.
for i in range(no_iterations): x, y = mnist_data.train.next_batch(batch_size) sess.run(training, feed_dict={ X: x, Y: y, prob: dropout }) if i % 100 == 0: minibatch_loss, minibatch_accuracy = sess.run( [cross_entropy, accuracy], feed_dict={X: x, Y: y, prob: 1.0} ) print( "Iteration", str(i), "\t| Loss =", str(minibatch_loss), "\t| Accuracy% =", str(minibatch_accuracy*100) )
Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz Iteration 0 | Loss = 7.307614 | Accuracy% = 18.75 Iteration 100 | Loss = 0.5415499 | Accuracy% = 83.59375 Iteration 200 | Loss = 0.4191438 | Accuracy% = 90.625 Iteration 300 | Loss = 0.3677881 | Accuracy% = 90.625 Iteration 400 | Loss = 0.3412871 | Accuracy% = 90.625 Iteration 500 | Loss = 0.3393182 | Accuracy% = 90.234375 Iteration 600 | Loss = 0.30351943 | Accuracy% = 90.234375 Iteration 700 | Loss = 0.4478323 | Accuracy% = 89.84375 Iteration 800 | Loss = 0.3525465 | Accuracy% = 89.84375 Iteration 900 | Loss = 0.3940174 | Accuracy% = 90.234375 Iteration 1000 | Loss = 0.36469018 | Accuracy% = 89.84375 Iteration 1100 | Loss = 0.28805807 | Accuracy% = 92.578125 Iteration 1200 | Loss = 0.3842911 | Accuracy% = 90.234375 Iteration 1300 | Loss = 0.3182351 | Accuracy% = 91.796875 Iteration 1400 | Loss = 0.25723037 | Accuracy% = 93.75 Iteration 1500 | Loss = 0.3597792 | Accuracy% = 91.796875 Iteration 1600 | Loss = 0.20875177 | Accuracy% = 94.140625 Iteration 1700 | Loss = 0.27065527 | Accuracy% = 93.75 Iteration 1800 | Loss = 0.16261025 | Accuracy% = 94.140625 Iteration 1900 | Loss = 0.3845265 | Accuracy% = 87.109375
Training of the model ends here, it is now time to test our model with a new image. Remember only an image of pixel size 28×28 is compatible with the model.
test_accuracy = sess.run(accuracy, feed_dict={X: mnist_data.test.images, Y: mnist_data.test.labels, prob: 1.0}) print("\nAccuracy on test set:", test_accuracy) img = np.invert(Image.open("number.png").convert('L')).ravel() prediction = sess.run(tf.argmax(final_layer, 1), feed_dict={X: [img]}) print ("The number in test image is:", np.squeeze(prediction))
The test image. https://drive.google.com/file/d/1tRoLy9534pb0Eakgz93kfd99-AkoKjgR/view?usp=sharing.
Accuracy on test set: 0.916 The number in test image is: 3
The model correctly predicts the digit from the image. Congratulations, we have successfully created a neural network that can read handwriting.
Try testing the model with various images and with various learning rates, iterations, etc to get better command on it. Also, train the model with different datasets (eg. English alphabets) and then try to train and test the model on them.
Leave a Reply