Convert an HTML table to pandas Dataframe
Using Python, let us understand how to convert an HTML table to a pandas data frame. HTML provides us with <table> tag for storing data in table format. Pandas library has read_html() function to import data to data frames.
read_html() function
- This function is used to read tables of an HTML file as Pandas data frames.
- We can read a local file as well as a file from the internet through URL.
Reading tables from a file
Consider an HTML file called ‘table.html’ containing a table as follows,
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Table Data</title>
</head>
<body>
<table>
<thead>
<tr>
<th>Full Name</th>
<th>Position</th>
<th>Salary</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bill Gates</td>
<td>Founder MIcrosoft</td>
<td>$1000</td>
</tr>
<tr>
<td>Steve Jobs</td>
<td>Founder Apple</td>
<td>$1200</td>
</tr>
<tr>
<td>Mark Zuckerberg</td>
<td>Founder Facebook</td>
<td>$1300</td>
</tr>
</tbody>
</table>
</body>
</html>- Pandas needs another library called ‘lxml’ for parsing HTML and XML files. So, install ‘lxml’ by executing this command.
pip install lxml
- Now, we are ready to use the function read_html(). We can get any number of tables into dataframes by indexing.
Below python code shows the usage of the function:
import pandas as pd
tables = pd.read_html('table.html')
print("Display table")
df = tables[0]
print(df)Output:
Display table
Full Name Position Salary
0 Bill Gates Founder MIcrosoft $1000
1 Steve Jobs Founder Apple $1200
2 Mark Zuckerberg Founder Facebook $1300Reading tables from a URL
Similar to reading tables from an HTML file, we can also read tables from an HTML webpage using this function. In this case, we are going to provide the URL of the webpage.
For example,
import pandas as pd
tables = pd.read_html('https://www.w3schools.com/html/html_tables.asp')
print('Tables found:', len(tables))
df1 = tables[0]
print('First Table')
print(df1.head())Output:
Tables found: 2
First Table
Company Contact Country
0 Alfreds Futterkiste Maria Anders Germany
1 Centro comercial Moctezuma Francisco Chang Mexico
2 Ernst Handel Roland Mendel Austria
3 Island Trading Helen Bennett UK
4 Laughing Bacchus Winecellars Yoshi Tannamuri CanadaYou may also learn,
Leave a Reply