加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

Perl Tip

发布时间:2020-12-16 00:17:13 所属栏目:大数据 来源:网络整理
导读:perl one line iconv perl -mEncode -npe 'Encode::from_to($_,"utf-8","gbk")' perl -mEncode -npe '$_=Encode::encode("gbk",Encode::decode("utf-8",$_))' ? ------------------------------------------------------------------------------ use Encode

perl one line iconv

perl -mEncode -npe 'Encode::from_to($_,"utf-8","gbk")'

perl -mEncode -npe '$_=Encode::encode("gbk",Encode::decode("utf-8",$_))'

?

------------------------------------------------------------------------------

use Encode;
$_="abc你好wert";
$a=decode('cp936',$_);
($x)=($a=~m/(p{Han}+)/);
print encode('cp936',$x),"n";

匹配所有非汉字:P{Han}
匹配所有汉字: p{Han}

The Perl FAQ entry?How do I strip blank space from the beginning/end of a string??states that using

s/^s+|s+$//g;

is slower than doing it in two steps:

s/^s+//;
s/s+$//;

Why is this combined statement noticeably slower than the separate ones (for any input string)?

The Perl regex runtime runs much quicker when working with 'fixed' or 'anchored' substrings rather than 'floated' substrings. A substring is fixed when you can lock it to a certain place in the source string. Both '^' and '$' provide that anchoring. However,when you use alternation '|',the compiler doesn't recognize the choices as fixed,so it uses less optimized code to scan the whole string. And at the end of the process,looking for fixed strings twice is much,much faster than looking for a floating string once. On a related note,reading perl's regcomp.c will make you go blind.

Update: Here's some additional details. You can run perl with the '-Dr' flag if you've compiled it with debugging support and it'll dump out regex compilation data. Here's what you get:

~#?debugperl?-Dr?-e?'s/^s+//g'?Compiling?REx?`^s+'
size?4?Got?36?bytes?for?offset?annotations.
first?at?2
synthetic?stclass?"ANYOF[11121415?{unicode_all}]".
???1:?BOL(2)
???2:?PLUS(4)
???3:???SPACE(0)
???4:?END(0)
stclass?"ANYOF[11121415?{unicode_all}]"?anchored(BOL)?minlen?1
#?debugperl?-Dr?-e?'s/^s+|s+$//g'?Compiling?REx?`^s+|s+$'
size?9?Got?76?bytes?for?offset?annotations.

???1:?BRANCH(5)
???2:???BOL(3)
???3:???PLUS(9)
???4:?????SPACE(0)
???5:?BRANCH(9)
???6:???PLUS(8)
???7:?????SPACE(0)
???8:???EOL(9)
???9:?END(0)
minlen?1

Note the word 'anchored' in the first dump.

How do I strip blank space from the beginning/end of a string?

(contributed by brian d foy)

A substitution can do this for you. For a single line,you want to replace all the leading or trailing whitespace with nothing. You can do that with a pair of substitutions:

?s/^s+//;
s/s+$//;

You can also write that as a single substitution,although it turns out the combined statement is slower than the separate ones. That might not matter to you,though:

?s/^s+|s+$//g;

In this regular expression,the alternation matches either at the beginning or the end of the string since the anchors have a lower precedence than the alternation. With the?/g?flag,the substitution makes all possible matches,so it gets both. Remember,the trailing newline matches the?s+,and the?$?anchor can match to the absolute end of the string,so the newline disappears too. Just add the newline to the output,which has the added benefit of preserving "blank" (consisting entirely of whitespace) lines which the?^s+?would remove all by itself:

?while(?<>?)?{
????s/^s+|s+$//g;
????print?"$_n";?
}

For a multi-line string,you can apply the regular expression to each logical line in the string by adding the?/m?flag (for "multi-line"). With the?/m?flag,the?$?matches?before?an embedded newline,so it doesn't remove it. This pattern still removes the newline at the end of the string:

?$string?=~?s/^s+|s+$//gm;

Remember that lines consisting entirely of whitespace will disappear,since the first part of the alternation can match the entire string and replace it with nothing. If you need to keep embedded blank lines,you have to do a little more work. Instead of matching any whitespace (since that includes a newline),just match the other whitespace:

?$string?=~?s/^[tf?]+|[tf?]+$//mg;

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读