java - Split up jSoup scraping result -


i scraping this link using jsoup library on java. source works , want ask how split every elements get?

here source

package javaapplication1;  import java.io.ioexception; import java.sql.sqlexception; import org.jsoup.jsoup; import org.jsoup.nodes.document;  public class coba {      public static void main(string[] args) throws sqlexception  {     masukdb db=new masukdb();                 try {             document doc = null;             (int page = 1; page < 2; page++) {                 doc = jsoup.connect("http://hackaday.com/page/" + page).get();                 system.out.println("title : " + doc.select(".entry-title>a").text() + "\n");                 system.out.println("link : " + doc.select(".entry-title>a").attr("href") + "\n");                 system.out.println("body : " + string.join("", doc.select(".entry-content p").text()) + "\n");                 system.out.println("date : " + doc.select(".entry-date>a").text() + "\n");             }         } catch (ioexception e) {             e.printstacktrace();         }     } } 

in result, every page of website becomes 1 line, how split guys? , how link on every article, think css selector on link side still wrong mate

 doc.select(".entry-title>a").text() 

this search entire document , return list of links, scraping text node. however, wanting scrape every article , pertinent data each.

    document doc;     (int page = 1; page < 2; page++) {          doc = jsoup.connect("http://hackaday.com/page/" + page).get();          // list of articles on page         elements articles = doc.select("main#main article");          // iterate article list         (element article : articles) {              // find article header, includes title , date             element header = article.select("header.entry-header").first();              // find , scrape title/link header             element headertitle = header.select("h1.entry-title > a").first();             string title = headertitle.text();             string link = headertitle.attr("href");              // find , scrape date header             string date = header.select("div.entry-meta > span.entry-date > a").text();              // find , scrape every paragraph in article content             // want further refine logic here             // there may paragraphs don't want include             string body = article.select("div.entry-content p").text();              // view results             system.out.println(                     messageformat.format(                             "title={0} link={1} date={2} body={3}",                              title, link, date, body));         }     } 

see css selectors more examples on how scrape kind of data.


Comments

Popular posts from this blog

Combining PHP Registration and Login into one class with multiple functions in one PHP file -

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -