Delete empty pages from a PDF file in Python

Hello programmers, in this tutorial, we will learn how to delete empty pages from a PDF file in Python.

For deleting the page from the PDF file, we will use the PyPDF2 module of python.

Let’s start coding

  • For working with PyPDF 1st, we have to install this library in our system, and then we have to import this
# Installation of PyPDF2 library 
pip install PyPDF2
#importing PyPDF2 library
import PyPDF2
  • Now 1st, we have to open the PDF file to read in which we want to delete blank pages for this, we will use the PdfFileReader module
  • I know that in this PDF file I have 4 pages out of which 2 are empty.
  • And then 1st we count how many pages we have initially
file1 = open("C:\\Users\\sumit\\..files\\11.pdf", 'rb')
ReadPDF = PyPDF2.PdfFileReader(file1)
#No of pages initially
pages = ReadPDF.numPages
print(pages)
output:4
  • Now we will create a new file that only store those pages which are not blank in our previous PDF file, which we have opened for reading.
  • We have to use the PdfFileWriter module to create a new PDF file.
  • Now we simply run a for loop which reads each page of our previous file “file1” and extracts text from them using the extractText function, and then we use a conditional statement “if” which checks whether this particular page is blank or not.
  • If our page is not blank, then we add that page into our new PDF file “output” using addPage function.
  • At last, we check how many pages we have now in the new PDF file, and then we close that file.
#Creating new file which do not conatin any empty pages
output = PyPDF2.PdfFileWriter()
file2=open("C:\\Users\\sumit\\..files\\3.pdf","wb")

for i in range(pages):
    ReadPDF = PyPDF2.PdfFileReader(file1)
    pageObj = ReadPDF.getPage(i)
    text = pageObj.extractText()
    
    if (len(text) > 0):
        output.addPage(pageObj)
        

output.write(file2)
file2.close()

 

Now we successfully created a new PDF file that has no blank pages.

Hopefully, you have learned how to Delete empty pages from a PDF file in Python.

2 responses to “Delete empty pages from a PDF file in Python”

  1. Ryen DeGan says:

    Hey, I am following this example but can you give some clarity on the files because file 1 – opens “11.pdf” and file 2 – opens “3.pdf” then you write to file 2, and when I do this my new file 2 is empty with no data. Thanks

    • Sumit Chhirush says:

      11.pdf is a file that contains some empty pages, and file 3.pdf is the new file we created, which only stores those pages which are not empty.

Leave a Reply

Your email address will not be published. Required fields are marked *