Extract only HTML body texts using beautifulsoup in Python
Here, we will learn how to extract only HTML body texts using beautifulsoup in Python.
First, you should import requests and beautifulsoup.
We are importing requests to load the webpage.
from bs4 import BeautifulSoup import requests
Then select the URL from which you want to extract the body texts.
URL we are going to use-
https://www.codewithharry.com/videos/cpp-tutorials-in-hindi-1/
Then, we will create an object with all the webpage data in the form of a string using:
requests.get("___").text
page=requests.get("https://www.codewithharry.com/videos/cpp-tutorials-in-hindi-1/").text soup=BeautifulSoup(page,"html.parser")
Now, enter the following command to extract the body of the page.
artists=soup.body
The “artists” object contains the body of the page in the form of a string.
Finally, to extract the body texts of the page, we will use the ( ___.strings) to extract the required data.
for data in artists.strings: print(data)
OUTPUT:
Leave a Reply