java - parsing a table with jsoup -


i'm trying extract e-mail adress , phone number linkedin profile using jsoup, each of these informations in table. have written code extract them doesn't work, code should work on linkedin profile. or guidance appreciated.

public static void main(string[] args) {     try {          string url = "https://fr.linkedin.com/";         // fetch document on http         document doc = jsoup.connect(url).get();          // page title          string title = doc.title();         system.out.println("nom & prénom: " + title);         //  first method         elements table = doc.select("div[class=more-info defer-load]").select("table");         iterator < element > iterator = table.select("ul li a").iterator();         while (iterator.hasnext()) {             system.out.println(iterator.next().text());         }         // second method         (element tablee: doc.select("div[class=more-info defer-load]").select("table")) {             (element row: tablee.select("tr")) {                 elements tds = row.select("td");                 if (tds.size() > 0) {                     system.out.println(tds.get(0).text() + ":" + tds.get(1).text());                 }             }         }     } } 

here example of html code i'm trying extract (taken linkedin profile)

<table summary="coordonnées en ligne">    <tr>       <th>e-mail</th>       <td>          <div id="email">             <div id="email-view">                <ul>                   <li>                      <a href="mailto:adam1adam@gmail.com">adam1adam@gmail.com</a>                   </li>                </ul>             </div>          </div>       </td>    </tr>    <tr class="no-contact-info-data">       <th>messagerie instantanée</th>       <td>          <div id="im" class="editable-item">          </div>       </td>    </tr>    <tr class="address-book">       <th>carnet d’adresses</th>       <td>          <span class="address-book">          <a title="une nouvelle fenêtre s’ouvrira" class="address-book-edit" href="/editcontact?editcontact=&contactmemberid=368674763">ajouter</a> des coordonnées.          </span>       </td>    </tr> </table> <table summary="coordonnées">    <tr>       <th>téléphone</th>       <td>          <div id="phone" class="editable-item">             <div id="phone-view">                <ul>                   <li>0021653191431&nbsp;(mobile)</li>                </ul>             </div>          </div>       </td>    </tr>    <tr class="no-contact-info-data">       <th>adresse</th>       <td>          <div id="address" class="editable-item">             <div id="address-view">                <ul>                </ul>             </div>          </div>       </td>    </tr> </table> 

to scrape email , phone number, use css selectors target element identifiers.

    string email = doc.select("div#email-view > ul > li > a").attr("href");     system.out.println(email);      string phone = doc.select("div#phone-view > ul > li").text();        system.out.println(phone); 

see css selectors more information.

output

mailto:adam1adam@gmail.com 0021653191431 (mobile) 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -