python - Scrapy: overwrite DEPTH_LIMIT variable based on value read from custom config -
i using initspider
, read custom json
configuration within def __init__(self, *a, **kw):
method.
the json config file contains directive can control crawling depth. can read configuration file , extract value. main problem how tell scrapy use value.
note: dont want use command line argument such -s depth_limit=3
, want parse custom configuration.
depth_limit
used in scrapy.spidermiddlewares.depth.depthmiddleware
. might have had quick @ code, you'll see depth_limit
value read when initializing middleware.
i think might solution you:
- in
__init__
method of spider, set spider attributemax_depth
custom value. - override
scrapy.spidermiddlewares.depth.depthmiddleware
, have checkmax_depth
attribute. - disable default
depthmiddleware
, enable own 1 in settings.
see http://doc.scrapy.org/en/latest/topics/spider-middleware.html
a quick example of overridden middleware described in step #2:
class mydepthmiddleware(depthmiddleware): def process_spider_output(self, response, result, spider): if hasattr(spider, 'max_depth'): self.maxdepth = getattr(spider, 'max_depth') return super(mydepthmiddleware, self).process_spider_output(response, result, spider)
Comments
Post a Comment