python - Empty Div return with Xpath or Css Selector Using Scrapy -
i'm using scrapy crawl web page contains specific article.
i'm trying informations stored inside div class "return". big problem div return empty when use scrapy xpath or css selectors.
the div i'm trying extract:
<div class="return"> <p><strong>conditionnement : </strong></p> <p class="one-product-detail">2 colis :<br> l178xl106xh80 72kg<br>l178xl112xh80 60kg<br> <span itemprop="weight" alt="3fin" class="hidden" hidden="">132kg</span></p> </div>
my spider code:
import scrapy alinea.items import alineaitem class alineaspider(scrapy.spider): name = "alinea" start_urls = [ "http://www.alinea.fr/", ] def parse(self, response): # ref = input("enter item reference ?\n") # 25321050 # link = "http://www.alinea.fr/alinea_fredhopper/catalogsearch_result/products/search/" + str(ref) link = "http://www.alinea.fr/alinea_fredhopper/catalogsearch_result/products/search/" + str(25321050) print(link) return scrapy.request(link, callback=self.parse_page2) def parse_page2(self, response): self.logger.info("visited %s", response.url) sel in response.xpath('//li[contains(@itemprop,"title")]/text()'): print("**************") print("description") print(sel.extract()) print("**************") # print("------------------------------------------------------------------") # # sel in response.xpath('//*[@class="delivery"]'): # # print("**************") # print("details") # print(sel.extract()) # print("**************") print("------------------------------------------------------------------") sel in response.css('[class="return"]'): print("**************") print("details") print(sel.extract()) print("**************")
my terminal log:
2016-07-28 12:57:21 [alinea] info: visited http://www.alinea.fr/orca-canape-angle-gauche-droit-convertible-gris.html ************** description orca - canapé convertible d'angle gauche ou droit gris ************** ------------------------------------------------------------------ ************** details <div class="return"> </div> **************
the page visited has no content div
@ all. supposed got.
if change other pages, example http://www.alinea.fr/orca-canape-angle-droit-gris-fonce.html, see div
there , not empty.
output shell: scrapy shell 'http://www.alinea.fr/orca-canape-angle-droit-gris-fonce.html'
in [1]: response.xpath('//div[@class="return"]').extract() out[1]: [u'<div class="return">\n\n \n<p><strong>conditionnement : </strong></p>\n<p class="one-product-detail">\n\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t2 colis :<br>\n\t\t\t\t\t\t\t\t\t l178xl106xh80\xa055kg<br>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t l178xl112xh80\xa053kg<br>\t\t\t\t\t\t<span itemprop="weight" alt="3fin" hidden class="hidden">108kg</span></p>\n \n</div>']
if want text, use //text()
instead, /text()
gives text directly under div
, in case whitespace.
in [2]: response.xpath('//div[@class="return"]/text()').extract() out[2]: [u'\n\n \n', u'\n', u'\n \n'] in [3]: [x.strip() x in response.xpath('//div[@class="return"]//text()').extract()] out[3]: [u'', u'conditionnement :', u'', u'2 colis :', u'l178xl106xh80\xa055kg', u'l178xl112xh80\xa053kg', u'', u'108kg', u'']
Comments
Post a Comment