Scraping the data of webpage using xpath in scrapy
It will be interesting for us to learn data scraping of a webpage using Python. In order to do data scraping of a webpage using scrapy you have to do the following:
- Create a project
- Create spider
- Open spider folder
- define start url
- define response.xpath
Data Scraping of a webpage in Python using scrapy
In this Python tutorial, we will learn how to write a script in Python using scrapy and then extract data from the Flipkart website.
We have selected Flipkart as our example. So in this Python article, we will learn how to scrap data of Flipkart in Python using scrapy.
So, at first, we will start by creating the project in scrapy by just simply typing:-
scrapy startproject projectname
Creating a project
scrapy startproject projectname
Creating Spider
Now, You can start your project by firstly creating the spider by-
cd projectname scrapy genspider example flipkart.com
And after that our project has been successfully created. And now will start doing some coding stuff. So go to the spider folder and open your spider file and after that write some code to run the program. It is gonna look like this:-
Opening the Spider folder
# -*- coding: utf-8 -*- import scrapy class PrSpider(scrapy.Spider): name = 'spidername' allowed_domains = ['pr.com'] start_urls = ['http://flipkart.com/'] def parse(self, response): pass
So now we will first enter some site name in the start URL to fetch the data from the webpage. So we will do this by entering the website URL in the starturl code like:-
Defining Start Url
# -*- coding: utf-8 -*- import scrapy class PrSpider(scrapy.Spider): name = 'pr' allowed_domains = ['pr.com'] start_urls = ['https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy,4io&otracker=nmenu_sub_Electronics_0_Mi' ] def parse(self, response): pass
Defining the response.xpath
Defining the item data that we want to be printed:-
class PrSpider(scrapy.Spider): name = 'pr' allowed_domains = ['flipkart.com'] start_urls = ['https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy,4io&otracker=nmenu_sub_Electronics_0_Mi' ] def parse(self, response): Review = response.xpath('//div[@class="hGSR34"]/text()').extract() Price = response.xpath('//div[@class="_1vC4OE _2rQ-NK"]/text()').extract() Exchange_price = response.xpath('//span[@class="_2xjzPG"]/text()').extract() Offers = response.xpath('//li[@class="_1ZRRx1"]/text()').extract() Sale = response.xpath('//div[@class="_3RW86r"]/text()').extract() row_data = zip(Review, Price, Exchange_price,Offers,Sale) for item in row_data: # create dictionary for storing the scraped info scraped_info = { # key:value 'Review': item[1], 'Price': item[2], #'Rating': item[3], 'Exchange_price': item[3], # 'Emi': item[4], 'Offers': item[4], 'Sale': item[5] # 'Original_Price': item[3] # 'Discount':item[3], #'company_name': item[3], } # yield/give the scraped info to scrapy yield scraped_info
After doing all these stuff now go to the folder and write the python command:-
scrapy crawl pr
Also, learn,
- Importing dataset using Pandas (Python deep learning library )
- Database CRUD Operation in Python with MySQL – Create, Retrieve, Update, Delete
This script is not working.INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min).