Find the Page Number of a Text from a PDF file in Python

Here we will discuss how to find the page number of a text from a PDF file in Python. It is used for many purposes. You may need it while making software or doing a project.

While searching for content from a PDF, we can search a text.  And apply a code, to automatically find the page numbers, where the text is situated.

How to find the page number of a Text from a PDF file in Python

In many cases, we need to find the page numbers of a text. To search for particular content from a PDF file.

We will use ‘PyPDF2’ and ‘re’ libraries for this.

Install:

If ‘PyPDF2’ is not in your system. You can go through with the command given below to install it.

pip install PyPDF2

 

Used PDF File:

We have used here the ‘CodeSpeedy.pdf’ file. It consists of 25 pages.

Code:

At first, we will import the libraries ‘PyPDF2’ and ‘re’. Then we will read the PDF file and store it in the ‘obj’ variable. And stored the page numbers in the ‘pgno’ variable. Now we will input the string or text to be searched in ‘S’. Then for every page, we will check if the string is present on that page or not, using for loop. Finally, it will show the output.

import PyPDF2
import re

obj = PyPDF2.PdfFileReader(r"CodeSpeedy.pdf")

pgno = obj.getNumPages()

S = "Connect"

for i in range(0, pgno):
    PgOb = obj.getPage(i)
    Text = PgOb.extractText()
    if re.search(S,Text):
         print("String Found on Page: " + str(i))

Output:

String Found on Page: 2 
String Found on Page: 9 
String Found on Page: 10

In the above code, ‘PdfFileReader()’ is used to read the file. ‘getNumPages()’ is used to know the number of pages in the PDF. Here we are searching for the string ‘Connect’.

You can also read:

Count the number of pages in a PDF
Check if a string exists in a PDF

Leave a Reply

Your email address will not be published.