ruby-on-rails – 如何使用Nokogiri提取子文本？

发布时间：2020-12-17 03:21:13 所属栏目：百科来源：网络整理

导读：我遇到了这个 HTML： div class='featured' h1 How to extract this? spanDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,sunt in culpa qui of

我遇到了这个 HTML：

<div class='featured'>
    <h1>
        How to extract this?
        <span>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,sunt in culpa qui officia deserunt mollit anim id est laborum.</span>
        <span class="moredetail ">
            <a href="/hello" title="hello">hello</a>
        </span>
        <div class="clear"></div>
    </h1>
</div>

我想提取< h1>文字“如何提取这个？”.我该怎么办？

我尝试使用以下代码,但附加了其他元素.我不确定如何排除它们所以我只得到< h1>文本本身.

doc = Nokogiri::HTML(open(url))      
records = doc.css(".featured h1")

解决方法

#css返回一个集合,使用#at_css获取第一个匹配的节点.它的所有内容,甚至文本都是儿童,在这种情况下,文本是它的第一个孩子.你也可以做一些像children.reject& element的事情？如果你想要所有不是元素的孩子.

data = '
<div class="featured">
    <h1>
        How to extract this?
        <span>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,sunt in culpa qui officia deserunt mollit anim id est laborum.</span>
        <span class="moredetail ">
            <a href="/hello" title="hello">hello</a>
        </span>
        <div class="clear"></div>
    </h1>
</div>
'

require 'nokogiri'
text = Nokogiri::HTML(data).at_css('.featured h1').children.first.text
text # => "n        How to extract this?n        "

或者,您可以使用xpath：

Nokogiri::HTML(data).at_xpath('//*[@class="featured"]/h1/text()').text

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!