extract data from javascript using Python -


i new user python, , have inherited python notebook predecessor want improve. purpose of grab product details website.

how works:

  • it scrapes script website using beautiful soup:

    source = urllib2.urlopen('http://www.testwebsite.html').read() soup = bs4.beautifulsoup(source) job_postings = soup.findall("script") job_postings = [jp jp in job_postings if not jp.get('type') none                  , ''.join(jp.get('type')) =="text/javascript"                 , ''.join(jp.get('type')) =="text/javascript"] 

it returns script in webpage: (1st part of data)

window.wf=window.wf||{};wf.appdata=wf.appdata||{};wf.appdata.product_data_test123=wf.appdata.product_data_test123||{};wf.appdata.product_data_test123 = {"sku":"tes123","is_grid_view":false,,"default_img_display":0,"manufacturer_name":"supplier1","product_name":"product test","part_number":"1234","list_price":1000,"is_price_hidden":false,"base_price":1000,"has_opt":true,"opt_details":[{"option_ids":[],"regular_price":2681.25],"has_free_shipping":false,,"total_qty":1,"display_set_quantity":1,"is_standard_layout":true,"page_type":"productpage"};y_config.app.product_data_test123 = {"sku":"test123",........ same info here ....};

2 sd part of data:

\n wf.extend({"yui_config":{"app":{"pagealias":"productpage"}},"wf":{"appdata":{"pagealias":"productpage",,"mkcname":"au: furnitureroom","productreviews":{"b_show_review_tags":false,"kit_subgroup_price":null,"catalog_currency":"aud","price_model":null,"colors":"",,"available_after":{"date":"2016-07-28 18:05:16.000000","timezone":"australia\\/sydney"},"inventory_info":{"sku":"test123",,"latest_inventory_update":"2016-07-29 00:45:06","option_ids":[],"available_quantity":17,"display_quantity":17,","quantity_available_string":" more 10 in stock","short_lead_time_id":2,"short_lead_time_string":"leaves warehouse in 1 3 business days"}}};

then extract data need:

   jsonfile =  re.findall(r'wf.appdata.product_data_[a-z]{4}[0-9]{4} = (\{.*});yui_config.app.product_data_',str(job_postings)) 

i have this:

{"sku":"test123","is_grid_view":false,,"default_img_display":0,"manufacturer_name":"supplier1","product_name":"product test","part_number":"1234","list_price":1000,"is_price_hidden":false,"base_price":1000,"has_opt":true,"opt_details":[{"option_ids":[],"regular_price":2681.25],"has_free_shipping":false,,"total_qty":1,"display_set_quantity":1,"is_standard_layout":true,"page_type":"productpage"}

my problem now: want add "inventory_info" list data

i've tried:

     jsonfile =  re.findall(r'inventory_info' = (\{.*}),str(job_postings)) 

or

    jsonfile = re.compile('inventory_info' = ({.*?});', re.dotall) 

neither of work.

i'm knowledge of python limited i'm bit lost now. help.


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -