Read a Particular Page from a PDF File in Python
After reading this tutorial you will be able to read a Particular Page from a PDF File in Python. We use PyPDF2 Module for reading a Particular Page from a PDF File in Python. PyPDF2 is not a pre-defined Package. So, we have to install it by proceeding with the following command in your Command Prompt (cmd).
C:\Users\...\Python\Scripts> pip install PyPDF2
Then, PyPDF2 Package will be installed. PyPDF2 consists of various Classes. But, we need only PdfFileReader Class to read a PDF File. So, this can be imported as follows
from PyPDF2 import PdfFileReader as R
How to Read a Particular Page from a PDF File in Python
Here, PdfFileReader Class is imported as R (i.e. R=PdfFileReader). As we know, without opening a File, we can’t read data from it. So, let’s have a look at Opening a PDF file.
Opening a File:
Where, f is a File Object that holds your PDF File which is located at Specified Path (i.e. Path_to_your_PDF_File). Open() is a Builtin Function that opens a Specified File in Specified Mode (i.e. “rb”). rb is the combination of Reading Mode and Binary Mode. So f opens the given PDF File in Binary Readable Format.
To know more about File Reading Formats Click Here ->Introduction to file handling of python
So, we have to create an object for PdfFileReader Class (i.e. R) as follows
From the above, pdf is the PdfFileReader Object which reads PDF Files. It consists of a list (i.e. pages) which holds the Page Objects for each page.
i.e. pdf.pages=[ PO1, PO2, PO3, … , POn]
where, PO1 to POn are the Page Objects of “n” Pages of given PDF File. pdf.pages returns the Page Object of Page 1 i.e. PO1, pdf.pages returns the Page Object of Page 2 i.e. PO2 and so on.
Each Page Object has various methods. But, we need only extractText() Method to extract the Text from that page. Let’s have a look at the following code to read a Particular Page from a PDF File in Python.
from PyPDF2 import PdfFileReader as R f=open("Path_to_your_PDF_File","rb") pdf=R(f) page_no=2 # I have selected 3rd Page to display its Contents P_O=pdf.pages[page_no] # Since Pages starts counting from '0' print(P_O.extractText()) f.close()
From the above Python Script,
- f is the File Object
- pdf is the PdfFileReader Object
- page_no is the Number of the existing Page in PDF File
- P_O is the Corresponding Page Object for given Page Number
A Sample PDF File -> PDF_sample.pdf
The output of the above code will be as follows
In this way, we can read a Particular Page from the given PDF File using Python.
For further References, Please refer Watermark on PDF