在perl中解码UTF-8 JSON的问题
使用
JSON库处理时会破坏UTF-8字符(可能这类似于
Problem with decoding unicode JSON in perl,但设置binmode只会产生另一个问题).
我已将问题减少到以下示例: (hlovdal) localhost:/tmp/my_test>cat my_test.pl #!/usr/bin/perl -w use strict; use warnings; use JSON; use File::Slurp; use Getopt::Long; use Encode; my $set_binmode = 0; GetOptions("set-binmode" => $set_binmode); if ($set_binmode) { binmode(STDIN,":encoding(UTF-8)"); binmode(STDOUT,":encoding(UTF-8)"); binmode(STDERR,":encoding(UTF-8)"); } sub check { my $text = shift; return "is_utf8(): " . (Encode::is_utf8($text) ? "1" : "0") . ",is_utf8(1): " . (Encode::is_utf8($text,1) ? "1" : "0"). ". "; } my $my_test = "hei p? deg"; my $json_text = read_file('my_test.json'); my $hash_ref = JSON->new->utf8->decode($json_text); print check($my_test),"$my_test = $my_testn"; print check($json_text),"$json_text = $json_text"; print check($$hash_ref{'my_test'}),"$$hash_ref{'my_test'} = " . $$hash_ref{'my_test'} . "n"; (hlovdal) localhost:/tmp/my_test> 在运行测试时,文本由于某种原因被压缩到iso-8859-1中.设置binmode排序可以解决它,但随后会导致其他字符串的双重编码. (hlovdal) localhost:/tmp/my_test>cat my_test.json { "my_test" : "hei p? deg" } (hlovdal) localhost:/tmp/my_test>file my_test.json my_test.json: UTF-8 Unicode text (hlovdal) localhost:/tmp/my_test>hexdump -c my_test.json 0000000 { " m y _ t e s t " : " h 0000010 e i p 303 245 d e g " } n 000001e (hlovdal) localhost:/tmp/my_test> (hlovdal) localhost:/tmp/my_test>perl my_test.pl is_utf8(): 0,is_utf8(1): 0. $my_test = hei p? deg is_utf8(): 0,is_utf8(1): 0. $json_text = { "my_test" : "hei p? deg" } is_utf8(): 1,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg (hlovdal) localhost:/tmp/my_test>perl my_test.pl --set-binmode is_utf8(): 0,is_utf8(1): 0. $my_test = hei p?¥ deg is_utf8(): 0,is_utf8(1): 0. $json_text = { "my_test" : "hei p?¥ deg" } is_utf8(): 1,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg (hlovdal) localhost:/tmp/my_test> 是什么导致了这个以及如何解决? 这是一个新安装的和最新的Fedora 15系统. (hlovdal) localhost:/tmp/my_test>perl --version | grep version This is perl 5,version 12,subversion 4 (v5.12.4) built for x86_64-linux-thread-multi (hlovdal) localhost:/tmp/my_test>rpm -q perl-JSON perl-JSON-2.51-1.fc15.noarch (hlovdal) localhost:/tmp/my_test>locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= (hlovdal) localhost:/tmp/my_test> 更新:添加使用utf8无法解决它,字符仍然没有正确处理(虽然与以前略有不同): (hlovdal) localhost:/tmp/my_test>perl my_test.pl is_utf8(): 1,is_utf8(1): 1. $my_test = hei p? deg is_utf8(): 0,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg (hlovdal) localhost:/tmp/my_test>perl my_test.pl --set-binmode is_utf8(): 1,is_utf8(1): 1. $my_test = hei p? deg is_utf8(): 0,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg (hlovdal) localhost:/tmp/my_test> 如perlunifaq所述
解决方法
你用UTF-8保存了程序,但忘了告诉Perl.添加使用utf8;.
而且,你编程太复杂了. JSON函数DWYM.要检查内容,请使用Devel :: Peek. use utf8; # for the following line my $my_test = 'hei p? deg'; use Devel::Peek qw(Dump); use File::Slurp (read_file); use JSON qw(decode_json); my $hash_ref = decode_json(read_file('my_test.json')); Dump $hash_ref; # Perl character strings Dump $my_test; # Perl character string (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |