如何使用Jsoup从html中提取段落文本?
发布时间:2020-12-14 19:35:09 所属栏目:Java 来源:网络整理
导读:import java.io.IOException;import java.util.logging.Level;import java.util.logging.Logger;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;public class JavaApplication14
import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class JavaApplication14 { public static void main(String[] args) { try { Document doc = Jsoup.connect("tanmoy_mahathir.makes.org/thimble/146").get(); String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc." + "</p></body></html>"; Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()).log(Level.SEVERE,null,ex); } } } 任何人都可以帮我解决jsoup代码如何解析包括段落的部分,以便只打印 Hello,World! Nothing is impossible 解决方法
对于这一小部分html你只需要做
String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."+ +"</p></body></html>"; Document doc = Jsoup.parse(html); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); 正如我所看到你的链接包含几乎相同的html,你也可以用doc替换doc的定义 Document doc = Jsoup.connect("https://tanmoy_mahathir.makes.org/thimble/146").get(); UPDATE 这是完整的代码编译并运行正常. import java.io.IOException; import java.util.logging.*; import org.jsoup.*; import org.jsoup.nodes.*; import org.jsoup.select.*; public class JavaApplication14 { public static void main(String[] args) { try { String url = "https://tanmoy_mahathir.makes.org/thimble/146"; Document doc = Jsoup.connect(url).get(); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()) .log(Level.SEVERE,ex); } } } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |