extract data from javascript using Python -


i new user python, , have inherited python notebook predecessor want improve. purpose of grab product details website.

how works:

  • it scrapes script website using beautiful soup:

    source = urllib2.urlopen('http://www.testwebsite.html').read() soup = bs4.beautifulsoup(source) job_postings = soup.findall("script") job_postings = [jp jp in job_postings if not jp.get('type') none                  , ''.join(jp.get('type')) =="text/javascript"                 , ''.join(jp.get('type')) =="text/javascript"] 

it returns script in webpage: (1st part of data)

window.wf=window.wf||{};wf.appdata=wf.appdata||{};wf.appdata.product_data_test123=wf.appdata.product_data_test123||{};wf.appdata.product_data_test123 = {"sku":"tes123","is_grid_view":false,,"default_img_display":0,"manufacturer_name":"supplier1","product_name":"product test","part_number":"1234","list_price":1000,"is_price_hidden":false,"base_price":1000,"has_opt":true,"opt_details":[{"option_ids":[],"regular_price":2681.25],"has_free_shipping":false,,"total_qty":1,"display_set_quantity":1,"is_standard_layout":true,"page_type":"productpage"};y_config.app.product_data_test123 = {"sku":"test123",........ same info here ....};

2 sd part of data:

\n wf.extend({"yui_config":{"app":{"pagealias":"productpage"}},"wf":{"appdata":{"pagealias":"productpage",,"mkcname":"au: furnitureroom","productreviews":{"b_show_review_tags":false,"kit_subgroup_price":null,"catalog_currency":"aud","price_model":null,"colors":"",,"available_after":{"date":"2016-07-28 18:05:16.000000","timezone":"australia\\/sydney"},"inventory_info":{"sku":"test123",,"latest_inventory_update":"2016-07-29 00:45:06","option_ids":[],"available_quantity":17,"display_quantity":17,","quantity_available_string":" more 10 in stock","short_lead_time_id":2,"short_lead_time_string":"leaves warehouse in 1 3 business days"}}};

then extract data need:

   jsonfile =  re.findall(r'wf.appdata.product_data_[a-z]{4}[0-9]{4} = (\{.*});yui_config.app.product_data_',str(job_postings)) 

i have this:

{"sku":"test123","is_grid_view":false,,"default_img_display":0,"manufacturer_name":"supplier1","product_name":"product test","part_number":"1234","list_price":1000,"is_price_hidden":false,"base_price":1000,"has_opt":true,"opt_details":[{"option_ids":[],"regular_price":2681.25],"has_free_shipping":false,,"total_qty":1,"display_set_quantity":1,"is_standard_layout":true,"page_type":"productpage"}

my problem now: want add "inventory_info" list data

i've tried:

     jsonfile =  re.findall(r'inventory_info' = (\{.*}),str(job_postings)) 

or

    jsonfile = re.compile('inventory_info' = ({.*?});', re.dotall) 

neither of work.

i'm knowledge of python limited i'm bit lost now. help.


Comments

Popular posts from this blog

Combining PHP Registration and Login into one class with multiple functions in one PHP file -

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -