Document Scanner using Python

In this tutorial, we will learn how to create a document scanner using python. This is a basic document scanner that can capture images of the documents and then can scan it or can also scan the uploaded images.

Creating a document scanner in Python

Requirements: To create a document scanner, we require python libraries like scikit-image, NumPy, OpenCV, imutils. We fulfill these requirements by installing specific libraries as follows:

To install these libraries run the following commands in anaconda prompt or command prompt-

  1. Scikit-image: pip install scikit-image
  2. NumPy- pip install numpy
  3. OpenCV- pip install opencv-python
  4. Imutils-pip install imutils

After installing the required libraries, we create a file named document_scanner.py

In document_scanner.py, write the following code:

Step 1: Import all the required libraries

from skimage. filters import threshold_local

import numpy as np

import cv2

import imutils

 

First of all our image is not uniform, hence we need to perform some functions on the image so that the useful information from the image is not lost. Therefore we use the libraries. The skimage. filters. threshold_local creates a threshold mask image of the original image. A threshold value is fix value and according to the threshold value, we obtain a mask image. This is necessary because the image may contain any noises, which we remove through this.

Step 2: We define a method order_coordinates  as follows:

def order_coordinates(pts):

            rectangle = np.zeros((4, 2), dtype = "float32")

            s = pts.sum(axis = 1)

            rectangle[0] = pts[np.argmin(s)]

            rectangle[2] = pts[np.argmax(s)]

            difference = np.diff(pts, axis = 1)

            rectangle[1] = pts[np.argmin(difference)]

            rectangle[3] = pts[np.argmax(difference)]

            return rectangle

 

The ordered rectangular co-ordinates are returned by the method defined here.

Step 3: Defining another method point_transform :

def point_transform(image, pts):

            rect = order_coordinates(pts)

            (upper_left, upper_right, bottom_right, bottom_left) = rect

            width1 = np.sqrt(((bottom_right[0] – bottom_left[0]) ** 2) + ((bottom_right[1] – bottom_left[1]) ** 2))

            width2 = np.sqrt(((upper_right[0] – upper_left[0]) ** 2) +((upper_right[1] – upper_left[1]) ** 2))

            Width = max(int(width1), int(width2)) #considers maximum width value as Width

            height1 = np.sqrt(((upper_right[0] – bottom_right[0]) ** 2) +((upper_right[1] – bottom_right[1]) ** 2))

            height2 = np.sqrt(((upper_left[0] – bottom_left[0]) ** 2) + ((upper_left[1] – bottom_left[1]) ** 2))

            Height = max(int(height1), int(height2)) #considers maximum height value as Height

            distance = np.array([[0, 0],[Width - 1, 0],[Width - 1, Height - 1],[0,Height - 1]], dtype ="float32")

            Matrix = cv2.getPerspectiveTransform(rect, distance) 

            warped_image = cv2.warpPerspective(image, Matrix, (Width, Height))

            return warped_image

 

The ordered points are obtained and then unpacked into four variables which are labeled as upper_left, upper_right, bottom_left, bottom_right respectively. Then the width of the new image is the maximum distance between upper_right & upper_left and bottom_right & bottom_left x-coordinates. Similarly, the height of the image is the maximum distance between upper_right & bottom_right and upper_left & bottom_left y-coordinates. Then the dimensions of the new image are stored in the variable distance. Performing calculation of a perspective transforms from four pairs of the corresponding points and the application a perspective transformation to the image. As a result, we get the final warped image.

Step: 4 Capturing the Image:

capture=cv2.VideoCapture(0)

while(True):

    ret,image=capture.read()

    image=cv2.imread(#image-path and name)

    ratio=image.shape[0]/image.shape[1]

    original=image.copy()

    image=imutils.resize(image,height=500)

    gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

    gray=cv2.GaussianBlur(gray,(5,5),0)

    edged=cv2.Canny(gray,75,200)

    contours = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

    contours = imutils.grab_contours(contours )

    contours = sorted(contours , key = cv2.contourArea, reverse = True)[:5]

    for ci in contours :

             perimeter = cv2.arcLength(ci, True)

             approx = cv2.approxPolyDP(ci, 0.02 * perimeter, True)

             if len(approx) == 4:

                         screenCnt = approx

                         break

    warped = point_transform(original, screenCnt.reshape(4, 2) * ratio)

    warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)

    T = threshold_local(warped, 11, offset = 10, method = "gaussian")

    warped = (warped > T).astype("uint8") * 255

    cv2.imshow("Original", imutils.resize(original, height = 650))

    cv2.imshow("Scanned", imutils.resize(warped, height = 650))

    if cv2.waitKey(0):

        break

capture.release()

cv2.destroyAllWindows()


 

The image is captured, resized because the captured image may be of varying sizes, hence to maintain uniformity and then converted to grayscale so that the images are in black and white format after which the edges are detected. Contours join all the continuous points, having the same color or intensity. Each individual contour is an array of x and y coordinates of boundary points of the object which are then sorted according to the area. The contours are then approximated and checked if it has four points. If it has four points then it is considered as our screen. Then the warped image is converted to grayscale and threshold it. As a result, we get a proper paper view of the image.

Also read:

Watermark image using opencv in python

Output of document scanner build in Python

create a document scanner in Python

 

This is how we can build a document scanner in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *