Delete empty pages from pdf file in Python

In this tutorial, we will see how to delete the empty pages from a pdf file using Python.

Step 1: Installing and Importing Libraries

I am using the PyMuPDF library in this tutorial. It is the successor of the PyPDF2 library and is more efficient.

pip install PyMuPDF
import fitz  # PyMuPDF library

Step 2: Removing empty pages from PDF

To remove the empty pages, I am creating another file referred to as the output file. Firstly, I am creating a function to check whether the page is empty or not. This can be done by extracting the text on the page using .get_text()and then making sure its length is not zero. If it is zero, the page is empty, and we will not add it to our output file. We will repeat this process for all the pages of the pdf.

# For checking whether the page is empty or not.

def check_page(page):
    text = page.get_text()
    return len(text.strip()) == 0
inputfile_path = "/content/techub.pdf"
outputfile_path = "/content/techub_mod.pdf"

input_pdf = fitz.open(inputfile_path)
output_pdf = fitz.open()

for pgno in range(input_pdf.page_count):
  page = input_pdf[pgno]
  if not check_page(page):
    output_pdf.insert_pdf(input_pdf,from_page=pgno,to_page = pgno)

output_pdf.save(outputfile_path)
input_pdf.close()
output_pdf.close()

The function check_page()is the boolean function and returns the True or False as an argument. If the length comes out to be zero, then the page is empty, and the function returns true. Since the page is empty, it is not added to the output file.
.page_count() function returns the number of pages in the pdf.
.insert_pdf() function takes the argument from_page and to_page to count how many pages you want to insert in the new pdf, which is the output pdf here from the old pdf, which is the input pdf.

Leave a Reply

Your email address will not be published. Required fields are marked *