python - Empty Div return with Xpath or Css Selector Using Scrapy -


i'm using scrapy crawl web page contains specific article.

i'm trying informations stored inside div class "return". big problem div return empty when use scrapy xpath or css selectors.

the div i'm trying extract:

<div class="return">                            <p><strong>conditionnement : </strong></p>                         <p class="one-product-detail">2 colis :<br>                         l178xl106xh80&nbsp;72kg<br>l178xl112xh80&nbsp;60kg<br>                         <span itemprop="weight" alt="3fin" class="hidden" hidden="">132kg</span></p>      </div> 

my spider code:

import scrapy alinea.items import alineaitem  class alineaspider(scrapy.spider):     name = "alinea"     start_urls = [         "http://www.alinea.fr/",     ]     def parse(self, response):         # ref = input("enter item reference ?\n")         # 25321050         # link = "http://www.alinea.fr/alinea_fredhopper/catalogsearch_result/products/search/" + str(ref)         link = "http://www.alinea.fr/alinea_fredhopper/catalogsearch_result/products/search/" + str(25321050)         print(link)         return scrapy.request(link,                               callback=self.parse_page2)      def parse_page2(self, response):         self.logger.info("visited %s", response.url)          sel in response.xpath('//li[contains(@itemprop,"title")]/text()'):             print("**************")             print("description")             print(sel.extract())             print("**************")          # print("------------------------------------------------------------------")         #         # sel in response.xpath('//*[@class="delivery"]'):         #         #     print("**************")         #     print("details")         #     print(sel.extract())         #     print("**************")          print("------------------------------------------------------------------")          sel in response.css('[class="return"]'):              print("**************")             print("details")             print(sel.extract())             print("**************") 

my terminal log:

2016-07-28 12:57:21 [alinea] info: visited http://www.alinea.fr/orca-canape-angle-gauche-droit-convertible-gris.html ************** description                      orca - canapé convertible d'angle gauche ou droit gris                 ************** ------------------------------------------------------------------ ************** details <div class="return">    </div> ************** 

the page visited has no content div @ all. supposed got.

if change other pages, example http://www.alinea.fr/orca-canape-angle-droit-gris-fonce.html, see div there , not empty.

output shell: scrapy shell 'http://www.alinea.fr/orca-canape-angle-droit-gris-fonce.html'

in [1]: response.xpath('//div[@class="return"]').extract() out[1]: [u'<div class="return">\n\n            \n<p><strong>conditionnement : </strong></p>\n<p class="one-product-detail">\n\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t2 colis :<br>\n\t\t\t\t\t\t\t\t\t l178xl106xh80\xa055kg<br>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t l178xl112xh80\xa053kg<br>\t\t\t\t\t\t<span itemprop="weight" alt="3fin" hidden class="hidden">108kg</span></p>\n        \n</div>'] 

if want text, use //text() instead, /text() gives text directly under div, in case whitespace.

in [2]: response.xpath('//div[@class="return"]/text()').extract() out[2]: [u'\n\n            \n', u'\n', u'\n        \n']  in [3]: [x.strip() x in response.xpath('//div[@class="return"]//text()').extract()]                                                                                                                                                              out[3]:  [u'',  u'conditionnement :',  u'',  u'2 colis :',  u'l178xl106xh80\xa055kg',  u'l178xl112xh80\xa053kg',  u'',  u'108kg',  u''] 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -