The initial problem
I am writing a Spider class (using the scrapy
library) and rely on a lot of scrapy
asynchronous magic to make it work. Here it is, stripped down:
class MySpider(CrawlSpider):
rules = [Rule(LinkExtractor(allow='myregex'), callback='parse_page')]
# some other class attributes
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.response = None
self.loader = None
def parse_page_section(self):
soup = BeautifulSoup(self.response.body, 'lxml')
# Complicated scraping logic using BeautifulSoup
self.loader.add_value(mykey, myvalue)
# more methods parsing other sections of the page
# also using self.response and self.loader
def parse_page(self, response):
self.response = response
self.loader = ItemLoader(item=Item(), response=response)
self.parse_page_section()
# call other methods to collect more stuff
self.loader.load_item()
The class attribute rule
tells my spider to follow certain links and jump to a callback function once the web-pages are downloaded. My goal is to test the parsing method called parse_page_section
without running the crawler or even making real HTTP requests.
What I tried
Instinctively, I turned myself to the unittest.mock
library. I understand how you mock a function to test whether it has been called (with which arguments and if there were any side effects...), but that's not what I want. I want to instantiate a fake object MySpider
and assign just enough attributes to be able to call parse_page_section
method on it.
In the above example, I need a response
object to instantiate my ItemLoader
and specifically a self.response.body
attribute to instantiate my BeautifulSoup
. In principle, I could make fake objects like this:
from argparse import Namespace
my_spider = MySpider(CrawlSpider)
my_spider.response = NameSpace(body='<html>...</html>')
That works well to for the BeautifulSoup
class but I would need to add more attributes to create an ItemLoader
object. For more complex situations, it would become ugly and unmanageable.
My questions
How do I test a method on an object in a partially defined state? Is this the right approach altogether? I can't find similar examples on the web, so I think my approach my may be wrong at a more fundamental level. Any insight would be greatly appreciated.
Aucun commentaire:
Enregistrer un commentaire