java - parsing a table with jsoup -
i'm trying extract e-mail adress , phone number linkedin profile using jsoup, each of these informations in table. have written code extract them doesn't work, code should work on linkedin profile. or guidance appreciated.
public static void main(string[] args) { try { string url = "https://fr.linkedin.com/"; // fetch document on http document doc = jsoup.connect(url).get(); // page title string title = doc.title(); system.out.println("nom & prénom: " + title); // first method elements table = doc.select("div[class=more-info defer-load]").select("table"); iterator < element > iterator = table.select("ul li a").iterator(); while (iterator.hasnext()) { system.out.println(iterator.next().text()); } // second method (element tablee: doc.select("div[class=more-info defer-load]").select("table")) { (element row: tablee.select("tr")) { elements tds = row.select("td"); if (tds.size() > 0) { system.out.println(tds.get(0).text() + ":" + tds.get(1).text()); } } } } }
here example of html code i'm trying extract (taken linkedin profile)
<table summary="coordonnées en ligne"> <tr> <th>e-mail</th> <td> <div id="email"> <div id="email-view"> <ul> <li> <a href="mailto:adam1adam@gmail.com">adam1adam@gmail.com</a> </li> </ul> </div> </div> </td> </tr> <tr class="no-contact-info-data"> <th>messagerie instantanée</th> <td> <div id="im" class="editable-item"> </div> </td> </tr> <tr class="address-book"> <th>carnet d’adresses</th> <td> <span class="address-book"> <a title="une nouvelle fenêtre s’ouvrira" class="address-book-edit" href="/editcontact?editcontact=&contactmemberid=368674763">ajouter</a> des coordonnées. </span> </td> </tr> </table> <table summary="coordonnées"> <tr> <th>téléphone</th> <td> <div id="phone" class="editable-item"> <div id="phone-view"> <ul> <li>0021653191431 (mobile)</li> </ul> </div> </div> </td> </tr> <tr class="no-contact-info-data"> <th>adresse</th> <td> <div id="address" class="editable-item"> <div id="address-view"> <ul> </ul> </div> </div> </td> </tr> </table>
to scrape email , phone number, use css selectors target element identifiers.
string email = doc.select("div#email-view > ul > li > a").attr("href"); system.out.println(email); string phone = doc.select("div#phone-view > ul > li").text(); system.out.println(phone);
see css selectors more information.
output
mailto:adam1adam@gmail.com 0021653191431 (mobile)
Comments
Post a Comment