Extract only HTML body texts using beautifulsoup in Python

Post Views: 1,035

Here, we will learn how to extract only HTML body texts using beautifulsoup in Python.

First, you should import requests and beautifulsoup.

We are importing requests to load the webpage.

from bs4 import BeautifulSoup
import requests

Then select the URL from which you want to extract the body texts.

URL we are going to use-

https://www.codewithharry.com/videos/cpp-tutorials-in-hindi-1/

Then, we will create an object with all the webpage data in the form of a string using:

requests.get("___").text

page=requests.get("https://www.codewithharry.com/videos/cpp-tutorials-in-hindi-1/").text
soup=BeautifulSoup(page,"html.parser")

Now, enter the following command to extract the body of the page.

artists=soup.body

The “artists” object contains the body of the page in the form of a string.

Finally, to extract the body texts of the page, we will use the ( ___.strings) to extract the required data.

for data in artists.strings:
    print(data)

OUTPUT: