Extract Tables from PDF in Python
We are going to learn about how to extract tables from PDF in Python. While programming in many cases, we need to work with table data. But if they are in the PDF, we need to extract them first.
We will discuss two easy ways to extract tables from PDF in Python. For one we will use ‘Tabulate’ and for the other one, we will use ‘Camelot’.
How to extract tables from PDF in Python
It is easy to code in Python, as we can use inbuilt functions, packages, and many more.
We will show here two methods using inbuilt functions and packages.
Assume that we have the table in the PDF given below:
Sl. Name RollNo. Dept 1 Ana 011 CSE 2 Ram 012 CSE 3 Joe 014 EE 4 Ken 024 ME 5 Ben 035 CE
This PDF is saved as ‘CodeSpeedy.pdf’. It contains the table of students’ serial numbers, names, roll numbers, and department datasets.
We can extract these tables in many ways in Python. We will discuss two ways.
Using Tabulate: Extract tables from PDF
First, we need to install tabula-py and tabulate to extract PDF in Python.
You can use this command given below:
pip install tabula-py pip install tabulate
Then users can use the code below:
from tabula import read_pdf from tabulate import tabulate tables = read_pdf("CodeSpeedy.pdf",pages="all") print(tabulate(tables))
At first, we will import the necessary packages. then read the pdf and extract the tables from it.
Here, tabulate rearranges the data from the table, and read_pdf extracts the data from the tables in the PDF.
Using Camelot
We need to install Camelot-py to extract PDF in Python.
You can use the command below:
pip install camelot-py
By using Camelot code:
import camelot tables = camelot.read_pdf("CodeSpeedy.pdf") print(tables[0].df)
At first, we will import the camelot package. Then read the pdf file and extract the tables from it.
Here, read_pdf extracts the data from the tables and tables[ind].df indicates the table in the PDF.
These are some popular methods to extract tables from PDF in Python.
I hope it will be useful.
Thank you!
Also read:
Check if a string exists in a PDF
Leave a Reply