How to use Xpath with BeautifulSoup with an Example

Post Views: 1,836

In this tutorial, we are going to see how to use Xpath with BeautifulSoup with an understandable example. Xpath works Similarly to a General file System. Here, BeautifulSoup by default doesn’t Support working with Xpath. We have to convert our soup object to an etree object. We will look in Detail at this below.

Importing the modules required and methods used:-

requests module:- This module allows us to send HTTP requests Using Python.

import requests

lxml module:- It helps us to process webpages using python,(xml and html ). From this module, we import the etree method for working our Xpath with Beautifulsoup.

from lxml import etree

bs4 module:- In this module, we use the BeautifulSoup library for Fetching the data from a webpage(xml and html).

from bs4 import BeautifulSoup

How to use Xpath with BeautifulSoup

Before Knowing about the Usage of Xpaths with BeautifulSoup we should know how to get an Xpath from a webpage(from html Document ).

To get an Xpath from:-

Open the webpage and select the element for what the Xpath is needed.
Right-click on the element and select Inspect.
Now the html code of the element will be opened. Now Right-click on the highlighted code, select Copy, and again Copy Xpath
Refer here

Now By using our Xpath we can find the Data Which the Xpath is referring from HTML content Fashioned in etree object,

import requests
from lxml import etree
from bs4 import BeautifulSoup
#Function to Find the element from the Xpath
def Xpath(url):
  Dict_Headers = ({'User-Agent':
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \
      (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',\
      'Accept-Language': 'en-US, en;q=0.5'})
  # Gets the requried data https browser's address bar
  webPage = requests.get(url,Dict_Headers)
  # Creating a soup Object from the html content
  Scraping = BeautifulSoup(webPage.content, "html.parser") 
  # Conveting Soup object to etree object for Xpath processing
  documentObjectModel = etree.HTML(str(Scraping)) 
  return (documentObjectModel.xpath('//*[@id="firstHeading"]')[0].text)
URL = "https://en.wikipedia.org/wiki/Earth"
print(Xpath(URL))

Below the data present in the Xpath is shown as our output for an example URL https://en.wikipedia.org/wiki/Earth

Output:

Earth

3 responses to “How to use Xpath with BeautifulSoup with an Example”

Eren says:

June 10, 2022 at 12:10 am

hello how can i use this myself i don’t understand the code
XPATH i =
//*[@id=”content”]/div[1]/div/div[1]/a/h1

Reply
Chaithanya Pranav Sai says:

June 28, 2022 at 4:24 pm

Rather that getting the full Xpath of the required element.Just try with normal Xpath.
You can also get with this in line 16 just replace the Xpath with your full Xpath, so you Can get the element of the Xpath

Reply
REGINALDO CLEMENTINO DOS SANTOS JUNIOR says:

August 6, 2023 at 8:26 pm

The xpath == ‘//*[@id=”firstHeading”]/span

the line16:
return (documentObjectModel.xpath(‘//*[@id=”firstHeading”]/span’)[0].text)

Reply

How to use Xpath with BeautifulSoup with an Example

Importing the modules required and methods used:-

How to use Xpath with BeautifulSoup

Output:

3 responses to “How to use Xpath with BeautifulSoup with an Example”

Leave a Reply Cancel reply

Related Posts