Extracting images from a PDF using Python
Hey there! In this tutorial, we will be learning to extract images contained within a PDF file using Python.
Implementation
Step 1
Open PyCharm and create a project titled PDF_Images. Save the desired PDF within this project. Then, open the terminal and type the below-listed commands to install the respective libraries:
pip install PyMuPDF pip install Pillow
- PyMuPDF: A Python binding for MuPDF, a lightweight PDF viewer.
- Pillow: A Python Imaging Library (PIL) that supports image processing capabilities such as opening, manipulating, and saving images of various formats.
Step 2
Within the main.py file in this project, type the below-specified code. Refer to the code’s comments for an explanation regarding the code.
# Import necessary libraries:
import fitz
import io
from PIL import Image
# open the desired PDF file:
pdf = fitz.open("demo.pdf")
# Determine number of pages in the PDF file:
pages = len(pdf)
# Iterate over each of the PDF pages:
# Index of 1st page -> 0
for i in range(pages):
# Access the page at index 'i':
page = pdf[i]
# Access all image objects present in this page:
image_list = page.getImageList()
# Iterate through these image objects:
for image_count, img in enumerate(image_list, start=1):
# Access XREF of the image:
xref = img[0]
# Extract image information:
img_info = pdf.extractImage(xref)
# Extract image bytes:
image_bytes = img_info["image"]
# Access image extension:
image_ext = img_info["ext"]
# Load this image to PIL:
image = Image.open(io.BytesIO(image_bytes))
# To save this image:
image.save(open(f"page{i+1}_image{image_count}.{image_ext}", "wb"))
This code aims at extracting all the images contained within the PDF. If you wish to extract images from a particular range of pages, then pass this range within the for-loop at line #13 in the above code.
Output
Click here, to view the PDF used for demonstration purposes.
The below-attached image shows that all the images extracted from this PDF are named appropriately and stored within this project.

Also read, Extracting Text from a Pdf file in Python
Leave a Reply