Convert an HTML table to pandas Dataframe
Using Python, let us understand how to convert an HTML table to a pandas data frame. HTML provides us with <table> tag for storing data in table format. Pandas library has read_html() function to import data to data frames.
read_html() function
- This function is used to read tables of an HTML file as Pandas data frames.
- We can read a local file as well as a file from the internet through URL.
Reading tables from a file
Consider an HTML file called ‘table.html’ containing a table as follows,
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Table Data</title> </head> <body> <table> <thead> <tr> <th>Full Name</th> <th>Position</th> <th>Salary</th> </tr> </thead> <tbody> <tr> <td>Bill Gates</td> <td>Founder MIcrosoft</td> <td>$1000</td> </tr> <tr> <td>Steve Jobs</td> <td>Founder Apple</td> <td>$1200</td> </tr> <tr> <td>Mark Zuckerberg</td> <td>Founder Facebook</td> <td>$1300</td> </tr> </tbody> </table> </body> </html>
- Pandas needs another library called ‘lxml’ for parsing HTML and XML files. So, install ‘lxml’ by executing this command.
pip install lxml
- Now, we are ready to use the function read_html(). We can get any number of tables into dataframes by indexing.
Below python code shows the usage of the function:
import pandas as pd tables = pd.read_html('table.html') print("Display table") df = tables[0] print(df)
Output:
Display table Full Name Position Salary 0 Bill Gates Founder MIcrosoft $1000 1 Steve Jobs Founder Apple $1200 2 Mark Zuckerberg Founder Facebook $1300
Reading tables from a URL
Similar to reading tables from an HTML file, we can also read tables from an HTML webpage using this function. In this case, we are going to provide the URL of the webpage.
For example,
import pandas as pd tables = pd.read_html('https://www.w3schools.com/html/html_tables.asp') print('Tables found:', len(tables)) df1 = tables[0] print('First Table') print(df1.head())
Output:
Tables found: 2 First Table Company Contact Country 0 Alfreds Futterkiste Maria Anders Germany 1 Centro comercial Moctezuma Francisco Chang Mexico 2 Ernst Handel Roland Mendel Austria 3 Island Trading Helen Bennett UK 4 Laughing Bacchus Winecellars Yoshi Tannamuri Canada
You may also learn,
Leave a Reply