python - Scrapy: overwrite DEPTH_LIMIT variable based on value read from custom config -


i using initspider , read custom json configuration within def __init__(self, *a, **kw): method.

the json config file contains directive can control crawling depth. can read configuration file , extract value. main problem how tell scrapy use value.

note: dont want use command line argument such -s depth_limit=3, want parse custom configuration.

depth_limit used in scrapy.spidermiddlewares.depth.depthmiddleware. might have had quick @ code, you'll see depth_limit value read when initializing middleware.

i think might solution you:

  1. in __init__ method of spider, set spider attribute max_depth custom value.
  2. override scrapy.spidermiddlewares.depth.depthmiddleware , have check max_depth attribute.
  3. disable default depthmiddleware , enable own 1 in settings.

see http://doc.scrapy.org/en/latest/topics/spider-middleware.html

a quick example of overridden middleware described in step #2:

class mydepthmiddleware(depthmiddleware):      def process_spider_output(self, response, result, spider):     if hasattr(spider, 'max_depth'):         self.maxdepth = getattr(spider, 'max_depth')     return super(mydepthmiddleware, self).process_spider_output(response, result, spider) 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -