Digit Recognition using CNN in Python

In this Lecture we will learn about Digit Recognition with CNN in Python.
Digit recognition is one of the most fundamental tasks in computer vision and machine learning. It involves identifying handwritten digits from images, these are of very important applications in postal mail sorting, bank check processing, and form digitization.
In this post, we will see how to develop a system for digit recognition using Convolutional Neural Networks in Python.

What is CNN?

Convolutional Neural Networks are a type of deep learning models particularly designed to treat structured grid data, more specifically images. They automatically learn spatial hierarchies of representations from input images through a number of convolutional layers.

Dataset

We will use the MNIST dataset containing 60,000 training images and 10,000 test images of handwritten digits from 0 to 9. Each image is a 28×28 gray scale image.

Also read: Generative Adversarial Network (GAN) in Python – TensorFlow

Steps to Build the Model

  1. Import Libraries
  2. Load and Preprocess the Data
  3. Build the CNN Model
  4. Compile and Train the Model
  5. Evaluate the Model
  6. Make Predictions

Step 1 : Import Libraries

First, we’ll import the necessary libraries.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical

Step 2 : Load and Preprocess the Data

Next, we’ll load the MNIST dataset and preprocess the data.

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize the images to a range of 0 to 1
x_train = x_train / 255.0
x_test = x_test / 255.0

# Reshape the images to include the channel dimension (1 for grayscale)
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# One-hot encode the labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Output: 

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 [==============================] – 0s 0us/step

Step 3: Build the CNN Model

We’ll define a simple CNN architecture for our digit recognition task.

model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

Step 4: Compile and Train the Model

We’ll compile the model using the Adam optimizer and train it on the MNIST dataset.

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

Output:

Epoch 1/10
375/375 [==============================] - 41s 106ms/step - loss: 0.2359 - accuracy: 0.9295 - val_loss: 0.0756 - val_accuracy: 0.9775
Epoch 2/10
375/375 [==============================] - 40s 107ms/step - loss: 0.0642 - accuracy: 0.9804 - val_loss: 0.0516 - val_accuracy: 0.9858
Epoch 3/10
375/375 [==============================] - 39s 105ms/step - loss: 0.0451 - accuracy: 0.9860 - val_loss: 0.0525 - val_accuracy: 0.9837
Epoch 4/10
375/375 [==============================] - 39s 103ms/step - loss: 0.0353 - accuracy: 0.9890 - val_loss: 0.0453 - val_accuracy: 0.9866
Epoch 5/10
375/375 [==============================] - 43s 114ms/step - loss: 0.0270 - accuracy: 0.9916 - val_loss: 0.0389 - val_accuracy: 0.9892
Epoch 6/10
375/375 [==============================] - 39s 105ms/step - loss: 0.0209 - accuracy: 0.9933 - val_loss: 0.0368 - val_accuracy: 0.9899
Epoch 7/10
375/375 [==============================] - 39s 104ms/step - loss: 0.0160 - accuracy: 0.9950 - val_loss: 0.0390 - val_accuracy: 0.9892
Epoch 8/10
375/375 [==============================] - 38s 102ms/step - loss: 0.0141 - accuracy: 0.9950 - val_loss: 0.0362 - val_accuracy: 0.9898
Epoch 9/10
375/375 [==============================] - 40s 106ms/step - loss: 0.0106 - accuracy: 0.9964 - val_loss: 0.0514 - val_accuracy: 0.9869
Epoch 10/10
375/375 [==============================] - 38s 102ms/step - loss: 0.0092 - accuracy: 0.9967 - val_loss: 0.0376 - val_accuracy: 0.9911

Step 5: Evaluate the Model

After training, we’ll evaluate the model’s performance on the test dataset.

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_acc:.4f}')

Output:

313/313 [==============================] – 3s 9ms/step – loss: 0.0297 – accuracy: 0.9906 Test accuracy: 0.9906

Step 6: Make Predictions

Finally, we’ll use the trained model to make predictions on new images.

# Predict on the first 5 test images
predictions = model.predict(x_test[:5])

# Display the first 5 test images along with their predicted labels
for i in range(5):
    plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
    plt.title(f'Predicted: {np.argmax(predictions[i])}, True: {np.argmax(y_test[i])}')
    plt.show()

Output:

Conclusion

In this post, we have developed a digits recognition system based on CNNs in Python. We loaded the MNIST dataset, preprocessed it, defined actually a model of a CNN, trained, and evaluated. Deep Neural Networks are very effective means for image recognition problems, and with libraries like TensorFlow and Keras, it becomes quite straightforward to implement and try out different architectural variants.

 

Leave a Reply

Your email address will not be published. Required fields are marked *