¼ÓÈëÊÕ²Ø | ÉèΪÊ×Ò³ | »áÔ±ÖÐÐÄ | ÎÒҪͶ¸å Àî´óͬ £¨https://www.lidatong.com.cn/£©- ¿Æ¼¼¡¢½¨Õ¾¡¢¾­Ñé¡¢ÔÆ¼ÆËã¡¢5G¡¢´óÊý¾Ý,Õ¾³¤Íø!
µ±Ç°Î»Ö㺠Ê×Ò³ > Õ¾³¤Ñ§Ôº > PHP½Ì³Ì > ÕýÎÄ

PHPÖÐmb_detect_order()µÄÆæ¹ÖÐÐΪ

·¢²¼Ê±¼ä£º2020-12-13 18:15:19 ËùÊôÀ¸Ä¿£ºPHP½Ì³Ì À´Ô´£ºÍøÂçÕûÀí
µ¼¶Á£ºÎÒÏë¼ì²âһЩÎı¾µÄ±àÂë(ʹÓà PHP). Ϊ´Ë,ÎÒʹÓÃmb_detect_encoding()º¯Êý. ÎÊÌâÊÇÈç¹ûÎÒÓÃmb_detect_order()º¯Êý¸Ä±ä¿ÉÄܵıàÂë˳Ðò,º¯Êý»á·µ»Ø²»Í¬µÄ½á¹û. Ç뿼ÂÇÒÔÏÂʾÀý $html = STR¤Á¤ç¤Ã¤È¤Î¥¢¥¯¥»¥¹¤ÇÂä¤Á¤Æ¤·¤Þ¤Ã¤¿¤ê¡¢¥µ©`¥Ð©`ÕϺ¦¤¬¶à¤¤¥ì¥ó¥¿¥ë¥µ
ÎÒÏë¼ì²âһЩÎı¾µÄ±àÂë(ʹÓà PHP).
Ϊ´Ë,ÎÒʹÓÃmb_detect_encoding()º¯Êý.

ÎÊÌâÊÇÈç¹ûÎÒÓÃmb_detect_order()º¯Êý¸Ä±ä¿ÉÄܵıàÂë˳Ðò,º¯Êý»á·µ»Ø²»Í¬µÄ½á¹û.

Ç뿼ÂÇÒÔÏÂʾÀý

$html = <<< STR
¤Á¤ç¤Ã¤È¤Î¥¢¥¯¥»¥¹¤ÇÂä¤Á¤Æ¤·¤Þ¤Ã¤¿¤ê¡¢¥µ©`¥Ð©`ÕϺ¦¤¬¶à¤¤¥ì¥ó¥¿¥ë¥µ©`¥Ð©`¤òßx¤Ö¤È¤¢¤Ê¤¿¤Î¥Ó¥¸¥Í¥¹µÈ¤Ë¤«¤Ê¤ê¤ÎӰ푤¬¤Ç¤Æ¤·¤Þ¤¦¿ÉÄÜÐÔ¤¬¤¢¤ê¤Þ¤¹.ÌØ¤ËÉ̉Ӥò¤µ¤ì¤Æ¤¤¤ë‚€Èˤη½¡¢·¨Èˤη½¤ÏšÝ¤ò¤Ä¤±¤ë¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤
STR;
mb_detect_order(array('UTF-8','EUC-JP','SJIS','eucJP-win','SJIS-win','JIS','ISO-2022-JP','ISO-8859-1','ISO-8859-2'));
$originalEncoding = mb_detect_encoding($str);
die($originalEncoding); // $originalEncoding = 'UTF-8'

µ«ÊÇ,Èç¹ûÄú¸ü¸Ämb_detect_order()ÖеıàÂë˳Ðò,½á¹û½«»áÓÐËù²»Í¬£º

mb_detect_order(array('EUC-JP','UTF-8','ISO-8859-2'));        
die($originalEncoding); // $originalEncoding = 'EUC-JP'

ËùÒÔÎÒµÄÎÊÌâÊÇ£º
Ϊʲô»áÕâÑù£¿
PHPÖÐÓÐûÓÐÒ»ÖÖ·½·¨¿ÉÒÔÕýÈ·ÎÞÎ󵨼ì²âÎı¾µÄ±àÂ룿

Õâ¾ÍÊÇÎÒÆÚÍû·¢ÉúµÄÊÂÇé.

¼ì²âËã·¨¿ÉÄÜÖ»Êǰ´Ë³Ðò¼ÌÐø³¢ÊÔÔÚmb_detect_orderÖÐÖ¸¶¨µÄ±àÂë,È»ºó·µ»Ø×Ö½ÚÁ÷ÓÐЧµÄµÚÒ»¸ö±àÂë.

¸üÖÇÄܵĶ«Î÷ÐèҪͳ¼Æ·½·¨(ÎÒÈÏΪͨ³£Ê¹ÓûúÆ÷ѧϰ).

±à¼­£º²Î¼ûÀýÈçthis article¸üÖÇÄܵķ½·¨.

Due to its importance,automatic charset detection is already implemented in major Internet applications such as Mozilla or Internet Explorer. They are very accurate and fast,but the implementation applies many domain specific knowledges in case-by-case basis. As opposed to their methods,we aimed at a simple algorithm which can be uniformly applied to every charset,and the algorithm is based on well-established,standard machine learning techniques. We also studied the relationship between language and charset detection,and compared byte-based algorithms and character-based algorithms. We used Naive Bayes (NB) and Support Vector Machine (SVM).

£¨±à¼­£ºÀî´óͬ£©

¡¾ÉùÃ÷¡¿±¾Õ¾ÄÚÈݾùÀ´×ÔÍøÂ磬ÆäÏà¹ØÑÔÂÛ½ö´ú±í×÷Õ߸öÈ˹۵㣬²»´ú±í±¾Õ¾Á¢³¡¡£ÈôÎÞÒâÇÖ·¸µ½ÄúµÄȨÀû£¬Ç뼰ʱÓëÁªÏµÕ¾³¤É¾³ýÏà¹ØÄÚÈÝ!

    ÍÆ¼öÎÄÕÂ
      ÈȵãÔĶÁ