php – 如何反转Unicode字符串
在
comment to an answer to this question中暗示了PHP不能反转Unicode字符串.
不幸的是,PHPs unicode支持atm是最好的“缺乏”是正确的.这将是hopefully change drastically with PHP6. PHP MultiByte functions确实提供了处理unicode所需的基本功能,但它不一致,缺少很多功能.其中之一是反转字符串的功能. 我当然想把这个文本翻译成没有其他原因,然后弄清楚是否有可能.我做了一个功能来完成这个巨大的复杂的任务来扭转这个Unicode文本,所以你可以放松一下,直到PHP6. 测试代码: $enc = 'UTF-8'; $text = "ほげほげ"; $defaultEnc = mb_internal_encoding(); echo "Showing results with encoding $defaultEnc.nn"; $revNormal = strrev($text); $revInt = mb_strrev($text); $revEnc = mb_strrev($text,$enc); echo "Original text is: $text .n"; echo "Normal strrev output: " . $revNormal . ".n"; echo "mb_strrev without encoding output: $revInt.n"; echo "mb_strrev with encoding $enc output: $revEnc.n"; if (mb_internal_encoding($enc)) { echo "nSetting internal encoding to $enc from $defaultEnc.nn"; $revNormal = strrev($text); $revInt = mb_strrev($text); $revEnc = mb_strrev($text,$enc); echo "Original text is: $text .n"; echo "Normal strrev output: " . $revNormal . ".n"; echo "mb_strrev without encoding output: $revInt.n"; echo "mb_strrev with encoding $enc output: $revEnc.n"; } else { echo "nCould not set internal encoding to $enc!n"; }
Grapheme功能处理UTF-8字符串比mbstring和PCRE功能更正确/ Mbstring和PCRE可能会中断字符.您可以通过执行以下代码来看到它们之间的差异.
function str_to_array($string) { $length = grapheme_strlen($string); $ret = []; for ($i = 0; $i < $length; $i += 1) { $ret[] = grapheme_substr($string,$i,1); } return $ret; } function str_to_array2($string) { $length = mb_strlen($string,"UTF-8"); $ret = []; for ($i = 0; $i < $length; $i += 1) { $ret[] = mb_substr($string,1,"UTF-8"); } return $ret; } function str_to_array3($string) { return preg_split('//u',$string,-1,PREG_SPLIT_NO_EMPTY); } function utf8_strrev($string) { return implode(array_reverse(str_to_array($string))); } function utf8_strrev2($string) { return implode(array_reverse(str_to_array2($string))); } function utf8_strrev3($string) { return implode(array_reverse(str_to_array3($string))); } // http://www.php.net/manual/en/function.grapheme-strlen.php $string = "axCCx8A" // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) ."oxCCx88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6) var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },[ 'should be' => "oxCCx88"."axCCx8A",'grapheme' => utf8_strrev($string),'mbstring' => utf8_strrev2($string),'pcre' => utf8_strrev3($string) ])); 结果就在这里. array(4) { ["should be"]=> string(12) "6FCC8861CC8A" ["grapheme"]=> string(12) "6FCC8861CC8A" ["mbstring"]=> string(12) "CC886FCC8A61" ["pcre"]=> string(12) "CC886FCC8A61" } IntlBreakIterator可以使用PHP 5.5(intl 3.0); function utf8_strrev($str) { $it = IntlBreakIterator::createCodePointInstance(); $it->setText($str); $ret = ''; $pos = 0; $prev = 0; foreach ($it as $pos) { $ret = substr($str,$prev,$pos - $prev) . $ret; $prev = $pos; } return $ret; } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |