How to parse HTML in Python

Post Views: 1,287

In this post, we will learn how to parse HTML (Hypertext Markup language) in Python. Parsing is a technique of examining web text which is the combination of different tags, tokens, etc.

For parsing the HTML content of a webpage in Python we will use a Python module known as BeautifulSoup. So before we begin the tutorial we must have to install the prerequisites.

pip install requests
pip install beautifulsoup4

Also read: Python string rjust() and ljust() methods

Parse HTML in Python

Beautiful Soup is a library that is used to scrape the data from web pages. It is used to parse HTML and XML content in Python.

First of all import the requests module and the BeautyfulSoup module from bs4 as shown below.

import requests
from bs4 import BeautifulSoup

# Url of website
url="http://170.187.134.184"
rawdata=requests.get(url)
html=rawdata.content

Now we will use html.parser to parse the content of html and prettify it using BeautifulSoup.

# Parsing html content with beautifulsoup
soup = BeautifulSoup(html, 'html.parser')
print(soup)

Once the content is parsed using we can use different methods of beautiful soup to get the relevant data from the website.

print(soup.title)
paragraphs = soup.find_all('p')
print(paragraphs)

Combining the whole code at a place.

import requests
from bs4 import BeautifulSoup

# Url of website
url="http://170.187.134.184"
rawdata=requests.get(url)
html=rawdata.content

# Parsing html content with beautifulsoup
soup = BeautifulSoup(html, 'html.parser')


print(soup.title)
paragraphs = soup.find_all('p')
print(paragraphs)

Output:

<title>Programming Blog and Software Development Company - CodeSpeedy</title>
[<p>A Place Where You Find Solutions In Coding And Programming For PHP, WordPress, HTML, CSS, JavaScript, Python, C++ and much more.</p>, <p>Hire us for your software development, mobile app development and web development project.</p>, <p>Below are some of our popular categories from our programming blog. Click to browse the tutorials and articles.</p>, <p>CodeSpeedy Technology Private Limited is an Information technology company that keep helping the learners and developers to learn computer programming. CodeSpeedy also provides coding solutions along with various IT services ( web development, software development etc ).</p>, <p>We also provide training and internship on various computer programming field like Java, Python, C++, PHP, AI etc.
</p>, <p>
If you are looking for a web design company or web development company then hire our team. Our team also expert in developing software, Android and iOS, and Artificial Intelligence.
</p>, <p class="widgettitle">CodeSpeedy</p>, <p class="widgettitle">Useful Links</p>, <p>Location: Berhampore, West Bengal, India</p>]

If you have any queries related to this post feel free to ask us in the comment section of this post. If you want a post on any topic in Python comment below your topic name.

How to parse HTML in Python

Parse HTML in Python

Leave a Reply Cancel reply

Related Posts