How to use Xpath with BeautifulSoup with an Example

In this tutorial, we are going to see how to use Xpath with BeautifulSoup with an understandable example. Xpath works Similarly to a General file System. Here, BeautifulSoup by default doesn’t Support working with Xpath. We have to convert our soup object to an etree object. We will look in Detail at this below.

Importing the modules required and methods used:-

  •  requests module:- This module allows us to send HTTP requests Using Python.

 

import requests
  •  lxml module:-  It helps us to process webpages using python,(xml and html ). From this module, we import the etree method for working our Xpath with Beautifulsoup.

 

from lxml import etree
  • bs4 module:- In this module, we use the BeautifulSoup library for Fetching the data from a webpage(xml and html).

 

from bs4 import BeautifulSoup

How to use Xpath with BeautifulSoup

Before Knowing about the Usage of Xpaths with BeautifulSoup we should know how to get an Xpath from a webpage(from html Document ).

To get an Xpath from:-

  • Open the webpage and select the element for what the Xpath is needed.
  • Right-click on the element and select Inspect.
  • Now the html code of the element will be opened. Now Right-click on the highlighted code, select Copy, and again Copy Xpath
  • Refer here

How to use Xpath with BeautifulSoup with an Example

Now By using our Xpath we can find the Data Which the Xpath is referring from HTML content Fashioned in etree object,

 

import requests
from lxml import etree
from bs4 import BeautifulSoup
#Function to Find the element from the Xpath
def Xpath(url):
  Dict_Headers = ({'User-Agent':
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \
      (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',\
      'Accept-Language': 'en-US, en;q=0.5'})
  # Gets the requried data https browser's address bar
  webPage = requests.get(url,Dict_Headers)
  # Creating a soup Object from the html content
  Scraping = BeautifulSoup(webPage.content, "html.parser") 
  # Conveting Soup object to etree object for Xpath processing
  documentObjectModel = etree.HTML(str(Scraping)) 
  return (documentObjectModel.xpath('//*[@id="firstHeading"]')[0].text)
URL = "https://en.wikipedia.org/wiki/Earth"
print(Xpath(URL))

Below the data present in the Xpath is shown as our output for an example URL  https://en.wikipedia.org/wiki/Earth

Output:

Earth

Leave a Reply

Your email address will not be published.