加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 站长学院 > PHP教程 > 正文

php – 如何反转Unicode字符串

发布时间:2020-12-13 16:30:39 所属栏目:PHP教程 来源:网络整理
导读:在 comment to an answer to this question中暗示了PHP不能反转Unicode字符串. As for Unicode,it works in PHP because most apps process it as binary. Yes,PHP is 8-bit clean. Try the equivalent of this in PHP: perl -Mutf8 -e ‘print scalar rever
在 comment to an answer to this question中暗示了PHP不能反转Unicode字符串.

As for Unicode,it works in PHP
because most apps process it as
binary. Yes,PHP is 8-bit clean. Try
the equivalent of this in PHP: perl
-Mutf8 -e ‘print scalar reverse(“ほげほげ”)’ You will get garbage,
not “げほげほ”. – jrockway

不幸的是,PHPs unicode支持atm是最好的“缺乏”是正确的.这将是hopefully change drastically with PHP6.

PHP MultiByte functions确实提供了处理unicode所需的基本功能,但它不一致,缺少很多功能.其中之一是反转字符串的功能.

我当然想把这个文本翻译成没有其他原因,然后弄清楚是否有可能.我做了一个功能来完成这个巨大的复杂的任务来扭转这个Unicode文本,所以你可以放松一下,直到PHP6.

测试代码:

$enc = 'UTF-8';
$text = "ほげほげ";
$defaultEnc = mb_internal_encoding();

echo "Showing results with encoding $defaultEnc.nn";

$revNormal = strrev($text);
$revInt = mb_strrev($text);
$revEnc = mb_strrev($text,$enc);

echo "Original text is: $text .n";
echo "Normal strrev output: " . $revNormal . ".n";
echo "mb_strrev without encoding output: $revInt.n";
echo "mb_strrev with encoding $enc output: $revEnc.n";

if (mb_internal_encoding($enc)) {
    echo "nSetting internal encoding to $enc from $defaultEnc.nn";

    $revNormal = strrev($text);
    $revInt = mb_strrev($text);
    $revEnc = mb_strrev($text,$enc);

    echo "Original text is: $text .n";
    echo "Normal strrev output: " . $revNormal . ".n";
    echo "mb_strrev without encoding output: $revInt.n";
    echo "mb_strrev with encoding $enc output: $revEnc.n";

} else {
    echo "nCould not set internal encoding to $enc!n";
}
Grapheme功能处理UTF-8字符串比mbstring和PCRE功能更正确/ Mbstring和PCRE可能会中断字符.您可以通过执行以下代码来看到它们之间的差异.
function str_to_array($string)
{
    $length = grapheme_strlen($string);
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

        $ret[] = grapheme_substr($string,$i,1);
    }

    return $ret;
}

function str_to_array2($string)
{
    $length = mb_strlen($string,"UTF-8");
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

    $ret[] = mb_substr($string,1,"UTF-8");
}

    return $ret;
}

function str_to_array3($string)
{
    return preg_split('//u',$string,-1,PREG_SPLIT_NO_EMPTY);
}

function utf8_strrev($string)
{
    return implode(array_reverse(str_to_array($string)));
}

function utf8_strrev2($string)
{
    return implode(array_reverse(str_to_array2($string)));
}

function utf8_strrev3($string)
{
    return implode(array_reverse(str_to_array3($string)));
}

// http://www.php.net/manual/en/function.grapheme-strlen.php
$string = "axCCx8A"  // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
         ."oxCCx88"; // 'LATIN SMALL LETTER O WITH DIAERESIS'  (U+00F6)

var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },[
  'should be' => "oxCCx88"."axCCx8A",'grapheme' => utf8_strrev($string),'mbstring' => utf8_strrev2($string),'pcre' => utf8_strrev3($string)
]));

结果就在这里.

array(4) {
  ["should be"]=>
  string(12) "6FCC8861CC8A"
  ["grapheme"]=>
  string(12) "6FCC8861CC8A"
  ["mbstring"]=>
  string(12) "CC886FCC8A61"
  ["pcre"]=>
  string(12) "CC886FCC8A61"
}

IntlBreakIterator可以使用PHP 5.5(intl 3.0);

function utf8_strrev($str)
{
    $it = IntlBreakIterator::createCodePointInstance();
    $it->setText($str);

    $ret = '';
    $pos = 0;
    $prev = 0;

    foreach ($it as $pos) {
        $ret = substr($str,$prev,$pos - $prev) . $ret;
        $prev = $pos;
    }

    return $ret;  
}

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读