Read a Particular Page from a PDF File in Python

After reading this tutorial you will be able to read a Particular Page from a PDF File in Python. We use PyPDF2 Module for reading a Particular Page from a PDF File in Python. PyPDF2 is not a pre-defined Package. So, we have to install it by proceeding with the following command in your Command Prompt (cmd).

C:\Users\...\Python\Scripts> pip install PyPDF2

Then, PyPDF2 Package will be installed. PyPDF2 consists of various Classes. But, we need only PdfFileReader Class to read a PDF File. So, this can be imported as follows

from PyPDF2 import PdfFileReader as R

How to Read a Particular Page from a PDF File in Python

Here, PdfFileReader Class is imported as R (i.e. R=PdfFileReader). As we know, without opening a File, we can’t read data from it. So, let’s have a look at Opening a PDF file.

Opening a File:

f=open("Path_to_your_PDF_File","rb")

Where, is a File Object that holds your PDF File which is located at Specified Path (i.e. Path_to_your_PDF_File). Open() is a Builtin Function that opens a Specified File in Specified Mode (i.e. “rb”). rb is the combination of Reading Mode and Binary Mode. So f opens the given PDF File in Binary Readable Format.

To know more about File Reading Formats Click Here ->Introduction to file handling of python

So, we have to create an object for PdfFileReader Class (i.e. R) as follows

pdf=R(f)

From the above, pdf is the PdfFileReader Object which reads PDF Files. It consists of a list (i.e. pages) which holds the Page Objects for each page.

i.e. pdf.pages=[ PO1, PO2, PO3, … , POn]

where, PO1 to POn are the Page Objects of “n” Pages of given PDF File. pdf.pages[0] returns the Page Object of Page 1 i.e. PO1, pdf.pages[1] returns the Page Object of Page 2 i.e. PO2 and so on.

Each Page Object has various methods. But, we need only extractText() Method to extract the Text from that page. Let’s have a look at the following code to read a Particular Page from a PDF File in Python.

Example:

from PyPDF2 import PdfFileReader as R
f=open("Path_to_your_PDF_File","rb")
pdf=R(f)
page_no=2       # I have selected 3rd Page to display its Contents
P_O=pdf.pages[page_no]   # Since Pages starts counting from '0'
print(P_O.extractText())
f.close()

From the above Python Script,

  • f is the File Object
  • pdf is the PdfFileReader Object
  • page_no is the Number of the existing Page in PDF File
  • P_O is the Corresponding Page Object for given Page Number

Input:

A Sample PDF File -> PDF_sample.pdf

Output:

The output of the above code will be as follows

Read a Particular Page from a PDF File in Python

In this way, we can read a Particular Page from the given PDF File using Python.

For further References, Please refer Watermark on PDF

Leave a Reply

Your email address will not be published. Required fields are marked *