How to use Xpath with BeautifulSoup with an Example
In this tutorial, we are going to see how to use Xpath with BeautifulSoup with an understandable example. Xpath works Similarly to a General file System. Here, BeautifulSoup by default doesn’t Support working with Xpath. We have to convert our soup object to an etree object. We will look in Detail at this below.
Importing the modules required and methods used:-
- requests module:- This module allows us to send HTTP requests Using Python.
import requests
- lxml module:- It helps us to process webpages using python,(xml and html ). From this module, we import the etree method for working our Xpath with Beautifulsoup.
from lxml import etree
- bs4 module:- In this module, we use the BeautifulSoup library for Fetching the data from a webpage(xml and html).
from bs4 import BeautifulSoup
How to use Xpath with BeautifulSoup
Before Knowing about the Usage of Xpaths with BeautifulSoup we should know how to get an Xpath from a webpage(from html Document ).
To get an Xpath from:-
- Open the webpage and select the element for what the Xpath is needed.
- Right-click on the element and select Inspect.
- Now the html code of the element will be opened. Now Right-click on the highlighted code, select Copy, and again Copy Xpath
- Refer here
Now By using our Xpath we can find the Data Which the Xpath is referring from HTML content Fashioned in etree object,
import requests from lxml import etree from bs4 import BeautifulSoup #Function to Find the element from the Xpath def Xpath(url): Dict_Headers = ({'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \ (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',\ 'Accept-Language': 'en-US, en;q=0.5'}) # Gets the requried data https browser's address bar webPage = requests.get(url,Dict_Headers) # Creating a soup Object from the html content Scraping = BeautifulSoup(webPage.content, "html.parser") # Conveting Soup object to etree object for Xpath processing documentObjectModel = etree.HTML(str(Scraping)) return (documentObjectModel.xpath('//*[@id="firstHeading"]')[0].text) URL = "https://en.wikipedia.org/wiki/Earth" print(Xpath(URL))
Below the data present in the Xpath is shown as our output for an example URL https://en.wikipedia.org/wiki/Earth
Output:
Earth
hello how can i use this myself i don’t understand the code
XPATH i =
//*[@id=”content”]/div[1]/div/div[1]/a/h1
Rather that getting the full Xpath of the required element.Just try with normal Xpath.
You can also get with this in line 16 just replace the Xpath with your full Xpath, so you Can get the element of the Xpath