PHP substr()函数,允许您设置开始和停止点并保持HTML格式？

发布时间：2020-12-13 16:49:16 所属栏目：PHP教程来源：网络整理

导读：使用 PHP中的普通substr()函数,您可以决定在哪里“开始”剪切字符串,以及设置为设置长度.长度可能是最常用的,但在这种情况下,我需要从头开始切断大约120个字符.问题是我需要保持字符串中的html完整,并且只剪切标签中的实际文本. 我为它找到了一些自定义函数,

使用 PHP中的普通substr()函数,您可以决定在哪里“开始”剪切字符串,以及设置为设置长度.长度可能是最常用的,但在这种情况下,我需要从头开始切断大约120个字符.问题是我需要保持字符串中的html完整,并且只剪切标签中的实际文本.

我为它找到了一些自定义函数,但我没有找到一个允许你设置起点的单一函数,例如.你想在哪里开始切割字符串.

这是我发现的一个：Using PHP substr() and strip_tags() while retaining formatting and without breaking HTML

所以,我基本上需要一个substr()函数,它与原始函数完全相同,除了保持格式化.

有什么建议？

要修改的示例内容：

<p>Contrary to popular belief,Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC,making it over 2000 years old. Richard McClintock,a Latin professor at Hampden-Sydney College in Virginia,looked up one of the more obscure Latin words,consectetur,from a Lorem Ipsum passage,and going <a href="#">through the cites</a> of the word in classical literature,discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus</p> <p>Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero,written in 45 BC. This book is a treatise on the theory of ethics,very popular during the <strong>Renaissance</strong>. The first line of Lorem Ipsum,"Lorem ipsum dolor sit amet..",comes from a line in section 1.10.32.</p>

从开始切断5后：

<p>ary to popular belief,comes from a line in section 1.10.32.</p>

并且开始和结束时关闭5：

<p>ary to popular belief,comes from a line in section 1.1</p>

是的,你抓住了我的漂移？

如果它是在一个中间停止切割的话,我宁愿它切掉整个单词,但这并不是非常重要.

**编辑：**固定报价.

解决方法

你问的问题涉及很多复杂问题(基本上,在给定字符串偏移的情况下生成一个有效的html子集),如果你以一种表达为文本字符数的方式重新构造你的问题,那真的会更好.你想保留而不是切割一个包含html的任意字符串.如果你这样做,这个问题就变得容易了,因为你可以使用真正的HTML解析器.你不必担心：

>意外地将元件切成两半.
>意外地将参与者减少一半.
>不计算元素内的文本.
>确保字符实体计为单个字符.
>确保所有元素都已正确关闭.
>确保不破坏字符串,因为您在utf-8字符串上使用substr().

使用正则表达式(使用u标志)和mb_substr()以及标记栈(我之前已经完成)可以实现这一点,但是有很多边缘情况,你通常会遇到困难.

但是,DOM解决方案相当简单：遍历所有文本节点,计算字符串长度,并根据需要删除或子串其文本内容.下面的代码执行此操作：

$html = <<<'EOT'
<p>Contrary to popular belief,comes from a line in section 1.10.32.</p>
EOT;

function substr_html($html,$start,$length=null,$removeemptyelements=true) {
    if (is_int($length)) {
        if ($length===0) return '';
        $end = $start + $length;
    } else {
        $end = null;
    }
    $d = new DOMDocument();
    $d->loadHTML('<html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><title></title></head><body>'.$html.'</body>');
    $body = $d->getElementsByTagName('body')->item(0);
    $dxp = new DOMXPath($d);
    $t_start = 0; // text node's start pos relative to all text
    $t_end   = null; // text node's end pos relative to all text

    // copy because we may modify result of $textnodes
    $textnodes = iterator_to_array($dxp->query('/descendant::*/text()',$body));

// PHP 5.2 doesn't seem to implement Traversable on DOMNodeList,// so `iterator_to_array()` won't work. Use this instead:
// $textnodelist = $dxp->query('/descendant::*/text()',$body);
// $textnodes = array();
// for ($i = 0; $i < $textnodelist->length; $i++) {
//  $textnodes[] = $textnodelist->item($i);
//}
//unset($textnodelist);

    foreach($textnodes as $text) {
        $t_end = $t_start + $text->length;
        $parent = $text->parentNode;
        if ($start >= $t_end || ($end!==null && $end < $t_start)) {
            $parent->removeChild($text);
        } else {
            $n_offset = max($start - $t_start,0);
            $n_length = ($end===null) ? $text->length : $end - $t_start;
            if (!($n_offset===0 && $n_length >= $text->length)) {
                $substr = $text->substringData($n_offset,$n_length);
                if (strlen($substr)) {
                    $text->deleteData(0,$text->length);
                    $text->appendData($substr);
                } else {
                    $parent->removeChild($text);
                }
            }
        }

        // if removing this text emptied the parent of nodes,remove the node!
        if ($removeemptyelements && !$parent->hasChildNodes()) {
            $parent->parentNode->removeChild($parent);
        }

        $t_start = $t_end;
    }
    unset($textnodes);
    $newstr = $d->saveHTML($body);

    // mb_substr() is to remove <body></body> tags
    return mb_substr($newstr,6,-7,'utf-8');
}


echo substr_html($html,480,30);

这将输出：

<p> of "de Finibus</p> <p>Bonorum et Mal</p>

请注意,您的“子串”跨越多个p元素并不会让您感到困惑.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!