加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

在perl中解码UTF-8 JSON的问题

发布时间:2020-12-16 06:18:19 所属栏目:大数据 来源:网络整理
导读:使用 JSON库处理时会破坏UTF-8字符(可能这类似于 Problem with decoding unicode JSON in perl,但设置binmode只会产生另一个问题). 我已将问题减少到以下示例: (hlovdal) localhost:/tmp/my_testcat my_test.pl#!/usr/bin/perl -wuse strict;use warnings;u
使用 JSON库处理时会破坏UTF-8字符(可能这类似于 Problem with decoding unicode JSON in perl,但设置binmode只会产生另一个问题).

我已将问题减少到以下示例:

(hlovdal) localhost:/tmp/my_test>cat my_test.pl
#!/usr/bin/perl -w
use strict;
use warnings;
use JSON;
use File::Slurp;
use Getopt::Long;
use Encode;

my $set_binmode = 0;
GetOptions("set-binmode" => $set_binmode);

if ($set_binmode) {
        binmode(STDIN,":encoding(UTF-8)");
        binmode(STDOUT,":encoding(UTF-8)");
        binmode(STDERR,":encoding(UTF-8)");
}

sub check {
        my $text = shift;
        return "is_utf8(): " . (Encode::is_utf8($text) ? "1" : "0") . ",is_utf8(1): " . (Encode::is_utf8($text,1) ? "1" : "0"). ". ";
}

my $my_test = "hei p? deg";
my $json_text = read_file('my_test.json');
my $hash_ref = JSON->new->utf8->decode($json_text);

print check($my_test),"$my_test = $my_testn";
print check($json_text),"$json_text = $json_text";
print check($$hash_ref{'my_test'}),"$$hash_ref{'my_test'} = " . $$hash_ref{'my_test'} .  "n";

(hlovdal) localhost:/tmp/my_test>

在运行测试时,文本由于某种原因被压缩到iso-8859-1中.设置binmode排序可以解决它,但随后会导致其他字符串的双重编码.

(hlovdal) localhost:/tmp/my_test>cat my_test.json 
{ "my_test" : "hei p? deg" }
(hlovdal) localhost:/tmp/my_test>file my_test.json 
my_test.json: UTF-8 Unicode text
(hlovdal) localhost:/tmp/my_test>hexdump -c my_test.json 
0000000   {       "   m   y   _   t   e   s   t   "       :       "   h
0000010   e   i       p 303 245       d   e   g   "       }  n        
000001e
(hlovdal) localhost:/tmp/my_test>
(hlovdal) localhost:/tmp/my_test>perl my_test.pl
is_utf8(): 0,is_utf8(1): 0. $my_test = hei p? deg
is_utf8(): 0,is_utf8(1): 0. $json_text = { "my_test" : "hei p? deg" }
is_utf8(): 1,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg
(hlovdal) localhost:/tmp/my_test>perl my_test.pl --set-binmode
is_utf8(): 0,is_utf8(1): 0. $my_test = hei p?¥ deg
is_utf8(): 0,is_utf8(1): 0. $json_text = { "my_test" : "hei p?¥ deg" }
is_utf8(): 1,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg
(hlovdal) localhost:/tmp/my_test>

是什么导致了这个以及如何解决?

这是一个新安装的和最新的Fedora 15系统.

(hlovdal) localhost:/tmp/my_test>perl --version | grep version
This is perl 5,version 12,subversion 4 (v5.12.4) built for x86_64-linux-thread-multi
(hlovdal) localhost:/tmp/my_test>rpm -q perl-JSON
perl-JSON-2.51-1.fc15.noarch
(hlovdal) localhost:/tmp/my_test>locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
(hlovdal) localhost:/tmp/my_test>

更新:添加使用utf8无法解决它,字符仍然没有正确处理(虽然与以前略有不同):

(hlovdal) localhost:/tmp/my_test>perl  my_test.pl
is_utf8(): 1,is_utf8(1): 1. $my_test = hei p? deg
is_utf8(): 0,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg
(hlovdal) localhost:/tmp/my_test>perl  my_test.pl --set-binmode
is_utf8(): 1,is_utf8(1): 1. $my_test = hei p? deg
is_utf8(): 0,is_utf8(1): 1. $$hash_ref{'my_test'} = hei p? deg
(hlovdal) localhost:/tmp/my_test>

如perlunifaq所述

Can I use Unicode in my Perl sources?

Yes,you can! If your sources are
UTF-8 encoded,you can indicate that
with the use utf8 pragma.

06004

This doesn’t do anything to your
input,or to your output. It only
influences the way your sources are
read. You can use Unicode in string
literals,in identifiers (but they
still have to be “word characters”
according to w ),and even in custom
delimiters.

解决方法

你用UTF-8保存了程序,但忘了告诉Perl.添加使用utf8;.

而且,你编程太复杂了. JSON函数DWYM.要检查内容,请使用Devel :: Peek.

use utf8; # for the following line
my $my_test = 'hei p? deg';

use Devel::Peek qw(Dump);
use File::Slurp (read_file);
use JSON qw(decode_json);

my $hash_ref = decode_json(read_file('my_test.json'));

Dump $hash_ref; # Perl character strings
Dump $my_test;  # Perl character string

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读