php截取字符串之截取utf8或gbk编码的中英文字符串示例

发布时间：2020-12-12 20:06:17 所属栏目：PHP教程来源：网络整理

导读：微博的发言有字数限制，其计数方式是，中文算2个，英文算1个，全角字符算2个，半角字符算1个。 php中自带strlen是返回的字节数，对于utf8编码的中文返回时3个，不满足需求。 mb_strlen 可以根据字符集计算长度，比如utf8的中文计数为1，但这不符合微博字数限

微博的发言有字数限制，其计数方式是，中文算2个，英文算1个，全角字符算2个，半角字符算1个。
php中自带strlen是返回的字节数，对于utf8编码的中文返回时3个，不满足需求。
mb_strlen 可以根据字符集计算长度，比如utf8的中文计数为1，但这不符合微博字数限制需求，中文必须计算为2才可以。
google了下，找到一个discuz中截取各种编码字符的类，改造了下，已经测试通过.其中参数$charset 只支持gbk与utf-8。

代码如下:

$a = "s＠@你好";
var_dump(strlen_weibo($a,'utf-8'));

结果输出为8，其中字母s计数为1，全角＠计数为2，半角@计数为1，两个中文计数为4。源码如下：

代码如下:

function strlen_weibo($string,$charset='utf-8')
{
 $n = $count = 0;
 $length = strlen($string);
 if (strtolower($charset) == 'utf-8')
 {
 while ($n < $length)
 {
 $currentByte = ord($string[$n]);
 if ($currentByte == 9 ||
 $currentByte == 10 ||
 (32 <= $currentByte && $currentByte <= 126))
 {
 $n++;
 $count++;
 } elseif (194 <= $currentByte && $currentByte <= 223)
 {
 $n += 2;
 $count += 2;
 } elseif (224 <= $currentByte && $currentByte <= 239)
 {
 $n += 3;
 $count += 2;
 } elseif (240 <= $currentByte && $currentByte <= 247)
 {
 $n += 4;
 $count += 2;
 } elseif (248 <= $currentByte && $currentByte <= 251)
 {
 $n += 5;
 $count += 2;
 } elseif ($currentByte == 252 || $currentByte == 253)
 {
 $n += 6;
 $count += 2;
 } else
 {
 $n++;
 $count++;
 }
 if ($count >= $length)
 {
 break;
 }
 }
 return $count;
 } else
 {
 for ($i = 0; $i < $length; $i++)
 {
 if (ord($string[$i]) > 127)
 {
 $i++;
 $count++;
 }
 $count++;
 }
 return $count;
 }
}

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!