PHP substr()函数,允许您设置开始和停止点并保持HTML格式?
使用
PHP中的普通substr()函数,您可以决定在哪里“开始”剪切字符串,以及设置为设置长度.长度可能是最常用的,但在这种情况下,我需要从头开始切断大约120个字符.问题是我需要保持字符串中的html完整,并且只剪切标签中的实际文本.
我为它找到了一些自定义函数,但我没有找到一个允许你设置起点的单一函数,例如.你想在哪里开始切割字符串. 这是我发现的一个:Using PHP substr() and strip_tags() while retaining formatting and without breaking HTML 所以,我基本上需要一个substr()函数,它与原始函数完全相同,除了保持格式化. 有什么建议? 要修改的示例内容: <p>Contrary to popular belief,Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC,making it over 2000 years old. Richard McClintock,a Latin professor at Hampden-Sydney College in Virginia,looked up one of the more obscure Latin words,consectetur,from a Lorem Ipsum passage,and going <a href="#">through the cites</a> of the word in classical literature,discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus</p> <p>Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero,written in 45 BC. This book is a treatise on the theory of ethics,very popular during the <strong>Renaissance</strong>. The first line of Lorem Ipsum,"Lorem ipsum dolor sit amet..",comes from a line in section 1.10.32.</p> 从开始切断5后: <p>ary to popular belief,comes from a line in section 1.10.32.</p> 并且开始和结束时关闭5: <p>ary to popular belief,comes from a line in section 1.1</p> 是的,你抓住了我的漂移? 如果它是在一个中间停止切割的话,我宁愿它切掉整个单词,但这并不是非常重要. **编辑:**固定报价. 解决方法
你问的问题涉及很多复杂问题(基本上,在给定字符串偏移的情况下生成一个有效的html子集),如果你以一种表达为文本字符数的方式重新构造你的问题,那真的会更好.你想保留而不是切割一个包含html的任意字符串.如果你这样做,这个问题就变得容易了,因为你可以使用真正的HTML解析器.你不必担心:
>意外地将元件切成两半. 使用正则表达式(使用u标志)和mb_substr()以及标记栈(我之前已经完成)可以实现这一点,但是有很多边缘情况,你通常会遇到困难. 但是,DOM解决方案相当简单:遍历所有文本节点,计算字符串长度,并根据需要删除或子串其文本内容.下面的代码执行此操作: $html = <<<'EOT' <p>Contrary to popular belief,comes from a line in section 1.10.32.</p> EOT; function substr_html($html,$start,$length=null,$removeemptyelements=true) { if (is_int($length)) { if ($length===0) return ''; $end = $start + $length; } else { $end = null; } $d = new DOMDocument(); $d->loadHTML('<html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><title></title></head><body>'.$html.'</body>'); $body = $d->getElementsByTagName('body')->item(0); $dxp = new DOMXPath($d); $t_start = 0; // text node's start pos relative to all text $t_end = null; // text node's end pos relative to all text // copy because we may modify result of $textnodes $textnodes = iterator_to_array($dxp->query('/descendant::*/text()',$body)); // PHP 5.2 doesn't seem to implement Traversable on DOMNodeList,// so `iterator_to_array()` won't work. Use this instead: // $textnodelist = $dxp->query('/descendant::*/text()',$body); // $textnodes = array(); // for ($i = 0; $i < $textnodelist->length; $i++) { // $textnodes[] = $textnodelist->item($i); //} //unset($textnodelist); foreach($textnodes as $text) { $t_end = $t_start + $text->length; $parent = $text->parentNode; if ($start >= $t_end || ($end!==null && $end < $t_start)) { $parent->removeChild($text); } else { $n_offset = max($start - $t_start,0); $n_length = ($end===null) ? $text->length : $end - $t_start; if (!($n_offset===0 && $n_length >= $text->length)) { $substr = $text->substringData($n_offset,$n_length); if (strlen($substr)) { $text->deleteData(0,$text->length); $text->appendData($substr); } else { $parent->removeChild($text); } } } // if removing this text emptied the parent of nodes,remove the node! if ($removeemptyelements && !$parent->hasChildNodes()) { $parent->parentNode->removeChild($parent); } $t_start = $t_end; } unset($textnodes); $newstr = $d->saveHTML($body); // mb_substr() is to remove <body></body> tags return mb_substr($newstr,6,-7,'utf-8'); } echo substr_html($html,480,30); 这将输出: <p> of "de Finibus</p> <p>Bonorum et Mal</p> 请注意,您的“子串”跨越多个p元素并不会让您感到困惑. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |