Scraping the data of webpage using xpath in scrapy

It will be interesting for us to learn data scraping of a webpage using Python. In order to do data scraping of a webpage using scrapy you have to do the following:

  1. Create a project
  2. Create spider
  3. Open spider folder
  4. define start url
  5. define response.xpath

Data Scraping of a webpage in Python using scrapy

In this Python tutorial, we will learn how to write a script in Python using scrapy and then extract data from the Flipkart website.

We have selected Flipkart as our example. So in this Python article, we will learn how to scrap data of Flipkart in Python using scrapy.

So, at first, we will start by creating the project in scrapy by just simply typing:-

scrapy startproject projectname

Creating a project

scrapy startproject projectname

Creating Spider

Now, You can start your project by firstly creating the spider by-

cd projectname
scrapy genspider example flipkart.com

And after that our project has been successfully created. And now will start doing some coding stuff. So go to the spider folder and open your spider file and after that write some code to run the program. It is gonna look like this:-

Opening the Spider folder

# -*- coding: utf-8 -*-
import scrapy


class PrSpider(scrapy.Spider):
    name = 'spidername'
    allowed_domains = ['pr.com']
    start_urls = ['http://flipkart.com/']

    def parse(self, response):
        pass

So now we will first enter some site name in the start URL to fetch the data from the webpage. So we will do this by entering the website URL in the starturl code like:-



Defining Start Url

# -*- coding: utf-8 -*-
import scrapy


class PrSpider(scrapy.Spider):
    name = 'pr'
    allowed_domains = ['pr.com']
    start_urls = ['https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy,4io&otracker=nmenu_sub_Electronics_0_Mi' 
                ]
    def parse(self, response):
        pass

Defining the response.xpath

Defining the item data that we want to be printed:-

class PrSpider(scrapy.Spider):
    name = 'pr'
    allowed_domains = ['flipkart.com']
    start_urls = ['https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy,4io&otracker=nmenu_sub_Electronics_0_Mi'
                ]

    def parse(self, response):

        Review = response.xpath('//div[@class="hGSR34"]/text()').extract()
        Price = response.xpath('//div[@class="_1vC4OE _2rQ-NK"]/text()').extract()
        Exchange_price = response.xpath('//span[@class="_2xjzPG"]/text()').extract()
        Offers = response.xpath('//li[@class="_1ZRRx1"]/text()').extract()
        Sale = response.xpath('//div[@class="_3RW86r"]/text()').extract()
    
        row_data = zip(Review, Price, Exchange_price,Offers,Sale)

        for item in row_data:
            # create dictionary for storing the scraped info
            scraped_info = {
                # key:value
                'Review': item[1],
                'Price': item[2],
                #'Rating': item[3],
                'Exchange_price': item[3],
               # 'Emi': item[4],
                'Offers': item[4],
                'Sale': item[5]
              #  'Original_Price': item[3]
              #  'Discount':item[3],
                #'company_name': item[3],
            }

            # yield/give the scraped info to scrapy
            yield scraped_info

After doing all these stuff now go to the folder and write the python command:-

scrapy crawl pr

Also, learn,

Leave a Reply

Your email address will not be published. Required fields are marked *