Delete empty pages from a PDF file in Python
Hello programmers, in this tutorial, we will learn how to delete empty pages from a PDF file in Python.
For deleting the page from the PDF file, we will use the PyPDF2 module of python.
Let’s start coding
- For working with PyPDF 1st, we have to install this library in our system, and then we have to import this
# Installation of PyPDF2 library pip install PyPDF2 #importing PyPDF2 library import PyPDF2
- Now 1st, we have to open the PDF file to read in which we want to delete blank pages for this, we will use the PdfFileReader module
- I know that in this PDF file I have 4 pages out of which 2 are empty.
- And then 1st we count how many pages we have initially
file1 = open("C:\\Users\\sumit\\..files\\11.pdf", 'rb') ReadPDF = PyPDF2.PdfFileReader(file1) #No of pages initially pages = ReadPDF.numPages print(pages)
- Now we will create a new file that only store those pages which are not blank in our previous PDF file, which we have opened for reading.
- We have to use the PdfFileWriter module to create a new PDF file.
- Now we simply run a for loop which reads each page of our previous file “file1” and extracts text from them using the extractText function, and then we use a conditional statement “if” which checks whether this particular page is blank or not.
- If our page is not blank, then we add that page into our new PDF file “output” using addPage function.
- At last, we check how many pages we have now in the new PDF file, and then we close that file.
#Creating new file which do not conatin any empty pages output = PyPDF2.PdfFileWriter() file2=open("C:\\Users\\sumit\\..files\\3.pdf","wb") for i in range(pages): ReadPDF = PyPDF2.PdfFileReader(file1) pageObj = ReadPDF.getPage(i) text = pageObj.extractText() if (len(text) > 0): output.addPage(pageObj) output.write(file2) file2.close()
Now we successfully created a new PDF file that has no blank pages.
Hopefully, you have learned how to Delete empty pages from a PDF file in Python.
Hey, I am following this example but can you give some clarity on the files because file 1 – opens “11.pdf” and file 2 – opens “3.pdf” then you write to file 2, and when I do this my new file 2 is empty with no data. Thanks
11.pdf is a file that contains some empty pages, and file 3.pdf is the new file we created, which only stores those pages which are not empty.