Perl笔记3

发布时间：2020-12-16 00:05:57 所属栏目：大数据来源：网络整理

导读：22 哈希一种数据结构，它和数组的不同在于索引方式，数组是以数字来索引，哈希则以名字来索引。哈希由键值对组成，键必须是唯一的字符串，而值可以是数字，字符串，undef或者这些类型的组合 (1) 访问哈希元素语法： $hash{$some_key} 注意其与数组的区别

22 哈希
一种数据结构，它和数组的不同在于索引方式，数组是以数字来索引，哈希则以名字来索引。
哈希由键值对组成，键必须是唯一的字符串，而值可以是数字，字符串，undef或者这些类型的组合
(1) 访问哈希元素
语法：
$hash{$some_key}
注意其与数组的区别，数组使用[],而且数组的键是数字。
$family_name{'fred'} = 'flintstone';
$family_name{'barney'} = 'rubble';
foreach my $person (qw/fred barney/)
{
??? print "I've heard of $person $family_name{$person}.n";
}
Result:
I've heard of fred flintstone.
I've heard of barney rubble.

(2) 访问整个哈希
要指代整个哈希，可以用百分号(%)作为前缀,
如前面的哈希为%family_name

哈希可以被转换成列表
%some_hash = ('foo',35,'bar',12.4,2.5,'hello','wilma',1.45e30,'betty',"bye");
在列表上下文中，哈希的值是简单的键值对列表：
@any_array = %some_hash;
print "@any_arrayn";
结果如下(顺序不一定是之前插入的顺序)：
betty bye bar 12.4 wilma 1.45e+30 foo 35 2.5 hello

(3) 哈希赋值
这不是常见的用法
my %new_hash = %old_hash;
解释：过程不是简单的复制内存块，而是大概先将%old_hash展开为列表，然后将列表重新组建成新键-值对，形成新的哈希。

%ip_address = reverse %host_name;
#将哈希%host_name中的键值对对换，形成新的哈希%ip_address。但是这种用法最好确保原来哈希的值没有重复的，否则哈希遵循一个
约定"后发先至"，即用列表中最后的键覆盖之前的键。

(4) 胖箭头
只是逗号的另一种写法，方便区分键值
my %last_name = (
??? fred => 'dog',#胖箭头左边的裸字会自动加上引号。
??? dino => 'cat',
??? barney => 'apple',
??? betty => 'apple',
??? );
???
但是如果键为特殊符号时，比如加号+，就不能省略引号了。???
???
另外在花括号中检索特定键名的元素时，也可以省略键名的引号。???
$score{'fred'}可以写成$score{fred}
但是也要注意特殊情况：
$hash{bar.foo} = 1; #bar.foo会先进行字符串计算，得barfoo作为键

(5) 哈希函数
函数1：keys和values函数
keys函数返回哈希的键列表
values函数返回对应的值列表

实例：
my %hash = ('a'=>1,'b'=>2,'c'=>3);
my @k = keys %hash;
my @v = values %hash;
print "The %hash's keys is : @kn";
print "The %hash's values is : @vn";

测试结果：
The %hash's keys are : c a b
The %hash's values are : 3 1 2

在标量上下文中，这两个函数返回哈希中元素(键-值对)的个数：
my $count = keys %hash; #得到3

也可以将哈希当成布尔值表达式判断真假
if (%hash)
{
??? print "That was a true value!n";
}
只要hash中有一个键值对就为true.

函数2：each函数
它以包含两个元素的列表形式返回键值对。
实际使用时，唯一适合使用each的地方就是在while循环中：
my %hash = ('a'=>1,'c'=>3);
while (($key,$value) = each %hash) #列表($key,$value)在标量上下文中统计个数，只要个数不会0，while循环就执行
{
??? print "$key => $valuen";
}
Result:
c => 3
a => 1
b => 2

实例：
排序输出哈希中的键值对
my %hash = ('a'=>1,'c'=>3);
foreach my $key (sort keys %hash)
{
??? print "$key => $hash{$key}n";
}
Result:
a => 1
b => 2
c => 3

(6) 哈希函数应用
exists 函数
判断哈希中是否存在某个键，返回真假，与键对应的值无关
if (exists $books{"dino"})
{
??? print "Hey,there's a library card for dino!n";
}

delete 函数
my $person = "betty";
delete $books{$person};

哈希元素内插：
$books{'fred'} = 3;
$books{'wilma'} = 1;
$books{'hello'} = 0;
$books{'tom'} = undef;
foreach my $person (sort keys %books)
{
??? if ($books{$person})
??? {
??????? print "$person has $books{$person} itemsn";
??? }
}
Result:
fred has 3 items
wilma has 1 items

(7) %ENV哈希
从%ENV中访问操作系统环境的PATH的值
print "$ENV{'PATH'}n";
Result：
/opt/vertica/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/oracle/product/10.2.0/db_1/bin:...
在Linux上面参考PATH环境变量的值一致：
[root@etl10 hash]# echo $PATH
/opt/vertica/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/oracle/product/10.2.0/db_1/bin:...
/

测试：
my (@words,%count,$word);
chomp(@words = <STDIN>);
foreach $word (@words)
{
??? $count{$word} += 1;
}

foreach $word (keys %count)
{
??? print "$word was seen $count{$word} times.n";
}

测试结果：
[root@etl10 hash]# perl 3.pl
hello
dog
cat
dog
cat
dog
ket
cat was seen 2 times.
ket was seen 1 times.
dog was seen 3 times.
hello was seen 1 times.

23 正在表达式
(30)字符集的简写
从Perl 5.14引入了一种新的修饰符a,当你需要严格按照ASCII的范围来匹配数字字符时，可以用这种方式。
use 5.014;
$_ = 'The HAL-900 requires authorization to continue.';
if (/HAL-[d]+/a) #a写在最后面
{
??? say 'The string mentions some model or HAL computer.';
}

24 用正则表达式进行匹配
(1) 锚位
通过给定锚位，让模式仅在字符串指定位置匹配。
A行首锚位匹配字符串的绝对开头
m{Ahttps?://}i 判断字符串是否以https开头
z航末锚位匹配字符串的绝对末尾
m{.pngz}i 匹配以.png结尾的字符串
Z锚位行未锚位，允许后面出现换行符

while (<STDIN>)
{
??? print if /.pngZ/; #Z匹配行尾的换行符???
}

while (<STDIN>)
{
??? chomp; #z时手动去掉末尾换行符
??? print if /.pngz/;???
}

同时使用AZ:
/As*Z/ 匹配一个空行,但这个空行是包含若干个空白符的
A Z z都是Perl 5里面的正则表达式特性
在Perl 4里面，所用的表示字符串开头锚位的是脱字符(^),用于表示字符结尾锚位的是$

$_ = 'This is a wilma line
barney is on another_line
but this ends in fred
and a final dino line';

if (/fred$/m)
{
??? print "OK!n";
}

请严格选用A和z，除非真的需要用到匹配多行文本的情况。

(2) 单词锚位
b 单词边界锚位匹配任何单词的首尾
/bfredb/ 只可匹配fred，无法匹配frederick,alfred或manfredmann
单词锚位只匹配以w字符的开头和结尾

非单词边界锚位B,匹配所有b不能匹配的位置
/bsearchB/会匹配searches,searching和searched. 但是不匹配search,researching.

(3) 绑定操作符=~
默认情况下，模式匹配的操作对象是$_,绑定操作符告诉Perl，拿右边的模式来匹配左边的字符串，而不是匹配$_
my $some_other = "I dream of betty rubble.";
if ($some_other =~ /brub/)
{
??? print "Aye,there's the rub.n"
}

print "Do you like Perl? ";
my $likes_perl = (<STDIN> =~ /byesb/i); #或者my $likes_perl = <STDIN> =~ /byesb/i;
if ($likes_perl)
{
??? print "You said earlier that you like Perl,so...n"
}

(4) 模式中的内插
my $what = "larry"; #也可以从命令行获取：my $what = shift @ARGV;
while (<>)
{
??? if (/A($what)/)
??? {
??????? print "We saw $what in beginning of $_n";
??? }
???
}

(5) 捕获变量
4: 反向引用的是模式匹配期间得到的结果
$4: 模式匹配结束后对得到的捕获内容的索引
Exam1:
$_ = "Hello there,neighbor";
if (/s([a-zA-Z]+),/)
{
??? print "The word is $1!n";
}

Result:
The word is there!

Exam2:
$_ = "Hello there,neighbor";
if (/(S+) (S+),(S+)/) #[sS] 意思是匹配所有空白字符+非空白字符,就是指全部字符
{
??? print "The word are $1 $2 $3n";
}

Result:
The word are Hello there neighbor

Exam3:
my $var = 'I think that I will see you in 1000 years';
if ($var =~ /([0-9]+) years/)
{
??? print "The $1 value is $1n";
}

Result:
The $1 value is 1000

(6) 捕获变量的存续期
捕获变量会一直存活到下次成功匹配为止，也就是说如果下次成功匹配后，就会覆盖之前的捕获变量的值，但是如果下次匹配失败，则
捕获变量的值还保留上一次的值。

(7) 不捕获模式
if (/(bronto)?saurus (steak|burger)/) #bronto只做分组，不做捕获，后面的(steak|burger)才是需要捕获的
{
??? print "Fred wants a $2n"; #只能使用$2,$1已经被(bronto)?占用了
}

解决的办法：
使用不捕获圆括号： ?: 告诉这一对圆括号完全是为了分组存在

my $var = 'brontosaurus BBQ steak';
if ($var =~ /(?:bronto)saurus (?:BBQ)? (steak|burger)/)
{
??? print "Fred wants a $1n"; #还会获取自己喜欢的steak或者burger???
}

(8) 命名捕获
下面举例说明一下问题的来源：
use 5.010;
my $names = 'Fred or Barney';
if ( $names =~ m/(w) and (w+)/) #不会匹配成功，因为$names里面用的是or，而匹配模式里面是and
{
??? say "I saw $1 and $2";
}

修改1：

use 5.010;
my $names = 'Fred or Barney';
if ( $names =~ m/(w+) (or|and) (w+)/) #可以匹配成功,但是$2捕获的值不是我们想要的
{
??? say "I saw $1 and $2";
}
输出：
I saw Fred and or

修改2：
use 5.010;
my $names = 'Fred or Barney';
if ( $names =~ m/(w+) (?:or|and) (w+)/) #可以匹配成功,但是$2捕获的值是我们想要的
{
??? say "I saw $1 and $2";
}

输出：
I saw Fred and Barney

当时如果需要捕获的内容有很多的话，用不匹配模式就不方便了。
为了避免记忆$1之类的数字变量，Perl 5.10增加了对捕获内容直接命名的写法
最终捕获的内容会保存在特殊哈希%+里面：其中的键就是在捕获时用的特殊标签，对应的值是被捕获
的字符串
具体的写法：
(?<LABEL>PATTERN)? #LABEL自行命名

修改4：
use 5.010;
my $names = 'Fred or Barney';
if ( $names =~ m/(?<name2>w+) (or|and) (?<name1>w+)/)? # 定义捕获标签?<name1>和?<name2>
{
??? say "I saw $+{name1} and $+{name2}"; #根据标签来捕获值
}

输出：
I saw Barney and Fred

在使用捕获标签后，反向引用的用法也随之有所改变。
之前用1或者g{1}写法，现在可以用g{label}
use 5.010;
my $names = 'Fred or Barney Fred';
if ( $names =~ m/(?<name2>w+) (?:or|and) (?<name1>w+) g{name2}/) #将g{name2}修改为g{1}也是可以的，只不过扩展性不好。
{
??? say "I saw $+{name1} and $+{name2}";
}

输出：
I saw Barney and Fred

#注：我们也可以使用k<label>来表示反向引用
use 5.010;
my $names = 'Fred or Barney Fred';
if ( $names =~ m/(?<name2>+) (?:or|and) (?<name1>w+) k<name2>/)? #使用k<name2>来代替g{name2}
{
??? say "I saw $+{name1} and $+{name2}";
}

(9) 自动捕获变量
$& : 字符串里实际匹配模式的部分会被自动存进$&里面
$name = "hello here,byebey!";
if ($name =~ /s(w+),/)
{
??? print "$&n";
}

Result:
?here,#注意，前面有一个空格
?
注：第一个捕获内容存放在$1中，$&里面保存的是整个匹配区段
匹配区段之前的内容会存到$`里面，而匹配区段之后的内容会存到$'里面。
$name = "hello here,/)
{
??? print "$`:$`n$&:$&n$':$'n";
??? print "$`$&$'n"; #三个字段连接起来，得到原来的字符串
}

Result：
$`:hello??? #保存匹配区段之前的内容
$&: here,?? #保存匹配区段内容
$': byebey! #保存匹配区段之后的内容
hello here,byebey!

特别需要注意的是：
使用自动捕获变量会导致其他正则表达式的速度变慢。
其实可以有变通的方法。比如说将整个模式加上()，然后使用$1来代替$&

如果使用Perl 5.10或以上的版本，修饰符p只会针对特定的正则表达式开启类似的自动捕获变量。
使用${^PREMATCH} ${^MATCH} ${^POSTMATCH}来代替$` $& $'

$name = "hello here,/p)
{
??? print "PREMATCH is ${^PREMATCH}n";
??? print "PREMATCH is ${^MATCH}n";
??? print "PREMATCH is ${^POSTMATCH}n";
??? print "${^PREMATCH}${^MATCH}${^POSTMATCH}";??
}

Result:

PREMATCH is hello
PREMATCH is? here,
PREMATCH is? byebey!
hello here,byebey!

(10) 通用量词
*? :? 同{0,}
+? :? 同{1,}
?? :? 同{0,1}

(11) 正则表达式优先级(高到低)
圆括号???? (...) (?:...) (?<LABEL>...)
量词?????? a* a+ a? a{n,m}
锚位和序列 abc ^ $ A b z Z
择一竖线?? a|b|c
原子?????? a [abc] d 1 g{2}

示例：
while (<>)
{
??? chomp;
??? if (/sz/) #以空格结尾但是不包括换行符
??? {
??????? print "$_#";
??? }

}

第九章用正则表达式处理文本
(1) 用s///进行替换
m// 表示模式匹配实现查找的功能(pattern match)
s///替换操作符实现查找并替换功能(substitution)

$_ = "He's out bowling with Barney tonight.";
s/with (w+)/against $1's team/;
print "$_n";
Result:
He's out bowling against Barney's team tonight.

s///返回的是布尔值

(2)? 用/g进行全局的替换
一个相当常见的全局替换是去掉连续的空格

$_ = "He's?? out?? bowling? twith?? Barney?? tonight.";
s/s+/ /g;
print "$_n";

Result:
He's out bowling with Barney tonight.

s/^s+//; #将开头的空白替换成空字符串
s/s+$//; #将结尾的空白替换成空字符串
s/^s+|s+$//g; #去除开头和结尾的空白符? 这么做会运行的稍慢一点

(3) 不同的定界符
使用没有左右之分的字符：
s#^https://#http://;

使用有左右之分的成对字符，就必须使用两队：一对包住模式，一对包住替换字符串。它们可以不必相同.
s{hello}{OK}
s[hello](OK)
s<hello>#OK#

(4) 可用替换修饰符
除了/g,还可以使用/i /x /s等/
s#wilma#Wilma#gi; #将所有的WilmA或者WILMA等一律替换为Wilma
s{_END_.*}{}s; #将_END_标记和他后面的所有内容都删除掉

(5) 绑定操作符
$file_name =~ s#^.*/##s;

(6) 无损替换
my $hello = "Fred ate 1 rib";
my $copy = $hello;
$copy =~ s/d+ ribs?/10 ribs/; #上面一步和这一步可以替换为：(my $copy = $hello) =~ s/d+ ribs?/10 ribs/;
print "$hello value is '$hello'n";
print "$copy value is '$copy'n";

Result:
$hello value is 'Fred ate 1 rib'
$copy value is 'Fred ate 10 ribs'

Perl 5.014增加了一个/r修饰符(/r)
use 5.014;
my $hello = "Fred ate 1 rib";
my $copy = $hello =~ s/d+ ribs?/10 ribs/r; #先做替换再做复制
print "$hello value is '$hello'n";
print "$copy value is '$copy'n";

$_ = "I saw Barney with Fred.";
s/(fred|barney)/U$1/gi; #将U转义符其后的所有字符转换成大写的
print "$_n";

s/(fred|barney)/L$1/gi;#将L转义符其后的所有字符转换成小写的
print "$_n";
Result:
I saw BARNEY with FRED.
I saw barney with fred.

默认情况下，它们会影响之后全部的(替换)字符串，可以用W关闭大小写转换的功能：
s/(w+) with (w+)/U$2E with $1/i;
print "$_n";
Result:
I saw FRED with barney.

如果上面不加E，则:
$_ = "I saw Barney with Fred.";
s/(w+) with (w+)/U$1 with $2/i;
print "$_n";
输出结果为：
I saw BARNEY WITH FRED.

使用小写形式(l与u)时，只会影响紧跟其后的第一个字符。
$_ = "I saw Barney with fred.";
s/(fred|barney)/u$1/gi; #将U转义符其后的所有字符转换成大写的
print "$_n";

Result:
I saw Barney with Fred.

组合使用：后续全部字符小写，首字母大写
s/(fred|barney)/uL$1/ig;

上面的也可以用在双引号里面：
print "uLyou are a UgoodE boy!"
Result:
You are a GOOD boy!

(7) split操作符
格式：
my @fields = split /分隔符/,$string;
实例：
my @fields = split /:/,"abc:def:g:h"; #得到("abc","def","g","h")
my @fields = split /:/,"dbc:def::g:h"; #得到("abc","","h")
提到一个规则：
split会保留开头处的空字段，却人舍弃结尾处的空字段
my @fields = split /:/,":::a:b:c:::"; #得到("","a","b","c")
my @fields = split /:/,":::a:b:c:::",-1; #得到("","c","")

利用split的/s+/模式根据空白符分隔字段
$input = "This? is a t test.n";
my @args = split /s+/,$input;
print "@argsn";

Result:
This is a test.

默认split会以空白字符分隔$_中的字符串：
my @fields = split; #等效于 split /s+/,$_;

(8) join函数
join将片段组合成一个字符串
my $result = join $glue,@pieces; #将join的第一个参数比作胶水
my $x = join ":",4,6,8,10,12;

Result:
4:6:8:10:12

胶水只在片段之间出现
列表至少需要两个参数，否则胶水无法涂进去

my $x = join ":",12;
my @values = split ":",$x;
my $z = join "-",@values;
Result:
4-6-8-10-12

注意：join的第一个参数是字符串而不是模式。

(9) 列表上下文中的m//
$_ = "Hello there,neighbor!";
my($first,$second,$third) = /(S+) (S+),(S+)/;

my $text = "Fred dropped a 5 ton granite block on Mr. Slate";
my @wors =($text =~ /([a-z])+/ig);

my $data = "Barney Rubble Fred Flintstone Wilma Flintstone";
my %last_name = ($data =~ /(w+)s+(w+)/g);
my @keys = keys(%last_name);
my @values = values(%last_name);
print "@keysn";
print "@valuesn";
输出：
Wilma Barney Fred
Flintstone Rubble Flintstone

(10) 非贪婪词
$_ = "fred and barney went bowling last night";
/fred.+barney/ #匹配fred后，.+会一直吞下所有的，然后barney匹配不到时，再一个一个字符地吐出，直到barney匹配或不匹配
/fred.+?barney/ #匹配fred后，.+?会吞下最少的字符，即一个字符，如果barney不匹配的话，再多吞一个，直到barney匹配或不匹配

案例：
$_ = "I thought you said Fred and <BOLD>Velma</BOLD>,not <BOLD>Wilma</BOLD>";
要求去掉:<BOLD>和</BOLD>
s#<BOLD>(.*)</BOLD>#$1#g; #会匹配第一个和最后一个<BOLD> </BOLD> 不符合要求输出：I thought you said Fred and Velma</BOLD>,not <BOLD>Wilma
s#<BOLD>(.*?)</BOLD>#$1#g; #先匹配第一个和第二个，然后删除，再匹配第三个和第三个，再删除输出：I thought you said Fred and Velma,not Wilma

非贪婪的版本：
+?
*?
{5,9}?
?? #虽然还是会匹配一次或者零次，但是优先考虑零次。

(11) 匹配多行

注：^和$是整个字符串的开头和结尾的锚位。
当模式加上/m修饰符后，就可以用它们来匹配字符串内的每一行。
$_ = "I'm much betternthan Barney is nat bowling,nWilma.n";
print "$_n";
print "Found 'wilma' at start of linen" if /^wilmab/im; #OK

实例：
use strict;
use POSIX; #为了使用strftime子程序
my $start_time = strftime("%Y-%m-%d %H:%M:%S",localtime);
print "$start_timen";
my $filename = "D:many_lines.txt";
open FILE,$filename
??? or die "Can't open '$filename': $!";
my $lines = join '',<FILE>;
print "$linesn";
$lines =~ s/^/$filename: /mg; #如果没有/m的话，只在首行加入
print "$linesn";

my $end_time = strftime("%Y-%m-%d %H:%M:%S",localtime);
print "$end_timen";

Result:
2012-11-12 13:53:21
A Tom
B is
C a
D good
E boy.
D:many_lines.txt: A Tom
D:many_lines.txt: B is
D:many_lines.txt: C a
D:many_lines.txt: D good
D:many_lines.txt: E boy.
2012-11-12 13:53:21

编写Perl脚本，修改文件中的指定内容：
[root@etl10 scott]# cat name.txt #需要修改内容
Name: NIOS
Age: 3 years
phone: 10086
Code: 210000
Time: 2012-11-12 14:04:03

[root@etl10 scott]# cat test.pl #修改文件的脚本
#!/usr/bin/env perl
use 5.010;
chomp(my $time = `date`); #或use POSIX;? my $start_time = strftime("%Y-%m-%d %H:%M:%S",localtime);
$^I = ".bak";
while (<>)
{
??? s/Code.*/Code: 210002/g;
??? s/Name.*/Name: BICP/g;
??? s/age.*n//gi;
??? s/Time.*/Time: $time/g;
??? print;
}

[root@etl10 scott]# perl test.pl name.txt? #执行修改
[root@etl10 scott]# cat name.txt #修改过后的文件内容
Name: BICP
phone: 10086
Code: 210002
Time: Tue Nov 12 14:10:11 CST 2013
[root@etl10 scott]# cat name.txt.bak #修改之前保存的内容可以模仿Linux操作系统，将$^I = "~"，这样备份的文件名为name.txt~
Name: NIOS
Age: 3 years
phone: 10086
Code: 210000
Time: 2012-11-12 14:04:03

(12) 从命令行直接输入
perl -p -i.bak -w -e 's/Randall/Randal/g' fred*.dat
-p : 可以让perl自动生成一小段程序，看起来如下：
while (<>)
{
??? print;
}
如果用-n替代-p，则可以自动将print去掉
-i : 将$^I设为".bak",如果不想备份，可以直接写-i
-w : 开启告警功能
-e : 后面跟可执行程序代码

上面代码等价于：
use warnings;
$^I = ".bak";
while (<>)
{
??? s/Randall/Randal/g;
??? print;
}

实例：
$^I = "~";
while (<>)
{
??? if (/A#!/)
??? {
???????? print $_;
???????? $_ .= "## Copyright (C) 20XX by YOurs Trulyn";
??? }
??? print; #不能没有
}

在#!开头的行下面添加一行内容：## Copyright (C) 20XX by YOurs Truly

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!