python - Empty Div return with Xpath or Css Selector Using Scrapy -

- March 15, 2011

i'm using scrapy crawl web page contains specific article.

i'm trying informations stored inside div class "return". big problem div return empty when use scrapy xpath or css selectors.

the div i'm trying extract:

<div class="return">                            <p><strong>conditionnement : </strong></p>                         <p class="one-product-detail">2 colis :<br>                         l178xl106xh80&nbsp;72kg<br>l178xl112xh80&nbsp;60kg<br>                         <span itemprop="weight" alt="3fin" class="hidden" hidden="">132kg</span></p>      </div>

my spider code:

import scrapy alinea.items import alineaitem  class alineaspider(scrapy.spider):     name = "alinea"     start_urls = [         "http://www.alinea.fr/",     ]     def parse(self, response):         # ref = input("enter item reference ?\n")         # 25321050         # link = "http://www.alinea.fr/alinea_fredhopper/catalogsearch_result/products/search/" + str(ref)         link = "http://www.alinea.fr/alinea_fredhopper/catalogsearch_result/products/search/" + str(25321050)         print(link)         return scrapy.request(link,                               callback=self.parse_page2)      def parse_page2(self, response):         self.logger.info("visited %s", response.url)          sel in response.xpath('//li[contains(@itemprop,"title")]/text()'):             print("**************")             print("description")             print(sel.extract())             print("**************")          # print("------------------------------------------------------------------")         #         # sel in response.xpath('//*[@class="delivery"]'):         #         #     print("**************")         #     print("details")         #     print(sel.extract())         #     print("**************")          print("------------------------------------------------------------------")          sel in response.css('[class="return"]'):              print("**************")             print("details")             print(sel.extract())             print("**************")

my terminal log:

2016-07-28 12:57:21 [alinea] info: visited http://www.alinea.fr/orca-canape-angle-gauche-droit-convertible-gris.html ************** description                      orca - canapé convertible d'angle gauche ou droit gris                 ************** ------------------------------------------------------------------ ************** details <div class="return">    </div> **************

the page visited has no content div @ all. supposed got.

if change other pages, example http://www.alinea.fr/orca-canape-angle-droit-gris-fonce.html, see div there , not empty.

output shell: scrapy shell 'http://www.alinea.fr/orca-canape-angle-droit-gris-fonce.html'

in [1]: response.xpath('//div[@class="return"]').extract() out[1]: [u'<div class="return">\n\n            \n<p><strong>conditionnement : </strong></p>\n<p class="one-product-detail">\n\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t2 colis :<br>\n\t\t\t\t\t\t\t\t\t l178xl106xh80\xa055kg<br>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t l178xl112xh80\xa053kg<br>\t\t\t\t\t\t<span itemprop="weight" alt="3fin" hidden class="hidden">108kg</span></p>\n        \n</div>']

if want text, use //text() instead, /text() gives text directly under div, in case whitespace.

in [2]: response.xpath('//div[@class="return"]/text()').extract() out[2]: [u'\n\n            \n', u'\n', u'\n        \n']  in [3]: [x.strip() x in response.xpath('//div[@class="return"]//text()').extract()]                                                                                                                                                              out[3]:  [u'',  u'conditionnement :',  u'',  u'2 colis :',  u'l178xl106xh80\xa055kg',  u'l178xl112xh80\xa053kg',  u'',  u'108kg',  u'']

Search This Blog

If cop

python - Empty Div return with Xpath or Css Selector Using Scrapy -

Comments

Post a Comment

Popular posts from this blog

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -