lundi 4 juillet 2016

Scrapy contracts with multiple parse methods

What's the best approach to write contracts for Scrapy spiders that have more than one method to parse the response? I saw this answer but it didn't sound very clear to me.

My current example: I have a method called parse_product that extracts the information on a page but I have more data that I need to extract for the same product in another page, so I yield a new request at the end of this method to make a new request and let the new callback extracts theses fields and returns the item.

The problem is that if I write a contract for the second method, it will fail because it doesn't have the meta attribute (containing the item with most of the fields). If I write a contract for the first method, I can't check if it returns the fields, because it returns a new request, instead of the item.

def parse_product(self, response):
    il = ItemLoader(item=ProductItem(), response=response)
    # populate the item in here

    # yield the new request sending the ItemLoader to another callback
    yield scrapy.Request(new_url, callback=self.parse_images, meta={'item': il})

def parse_images(self, response):
     """
     @url http://foo.bar
     @returns items 1 1
     @scrapes field1 field2 field3
     """
     il = response.request.meta['item']
     # extract the new fields and add them to the item in here

     yield il.load_item()

In the example, I put the contract in the second method, but it gave me a KeyError exception on response.request.meta['item'], also, the fields field1 and field2 are populated in the first method.

Hope it's clear enough.

Aucun commentaire:

Enregistrer un commentaire