How to extract texts from a PDF file in PHP
Greetings programmers, in this tutorial we will see how to extract texts from a PDF file format in PHP.
Often programmers need to extract texts from various forms of data to do various types of manipulations with the meaningful data extracted. In this tutorial, we will extract the texts from a PDF file in a text file.
Before executing the program, we need to install the XAMPP server. Install XAMPP on your computer. Once done, go to the file location of the XAMPP server, enter the ‘htdocs’ directory and create a new PHP file required for the manipulations. The file location will be the following: ‘C:\Program Files\XAMPP\htdocs’.
We need to understand some of the header terms which has to be included for viewing the PDF file in the browser and get the output in the text file.
We include the following two headers for reading through the pdf file.
header('Content-Type: application/pdf');
The following line is used to send the information to the browser that the file is a PDF file that will be used for manipulations. It is mainly used to inform the browser about the file type.
header('Content-Disposition: inline; filename = "' . $pdfFile . '"');
The following line is used for displaying the file in the browser. It basically means that the content which is passed is a part of the web page.
The content of the pdf file is given below:
Hello from Codespeedy Technologies!!!!
PHP code to extract texts from PDF
Given below is the code for displaying the file in the browser and writing the text in another file.
<?php $pdfFile = '<pdfFile>.pdf'; #enter the pdf file name header('Content-Type: application/pdf'); #sending information browser about the pdf file header('Content-Disposition: inline; filename = "' . $pdfFile . '"'); #used for displaying the pdf file in browser @readfile($pdfFile); #reading the pdf file $outFile = fopen("<outputFile>.txt", "a") or die("Unable to open file!"); #creating a new file to insert the text from pdf while(!feof($pdfFile)) #while it is not the end of the file { $txt = fgets($pdfFile); #get the text from the file fwrite($outFile, $txt); #writes in the output file } fclose($pdfFile); #closing the pdf file fclose($outFile); #closing the output file ?>
Output
The pdf file is opened in the browser and a file is created in the same directory with the extracted texts from the pdf.
The contents of the output file are shown below.
Hello from Codespeedy Technologies!!!!
Explanation
We include all the necessary headers as discussed above. We keep the PDF file in the same directory and run the above code. On execution of the code, the pdf file is opened in the browser and then we make a new file which will be the output file where we copy all the extracted texts from the PDF and store it. After the manipulation is done, we close the file.
Leave a Reply