加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

使用perl的多个正则表达式

发布时间:2020-12-16 06:20:55 所属栏目:大数据 来源:网络整理
导读:嗨,这个网站已经帮我几次修复我在perl的问题. 这是我第一次提出问题,因为我无法在谷歌和堆栈溢出中找到答案. 我想要做的是获取两个单词之间的内容.但他们必须匹配的模式正在发生变化.我正在尝试获取产品详细信息.品牌,描述,名称等.我试图在另一个之后进行正
嗨,这个网站已经帮我几次修复我在perl的问题.
这是我第一次提出问题,因为我无法在谷歌和堆栈溢出中找到答案.

我想要做的是获取两个单词之间的内容.但他们必须匹配的模式正在发生变化.我正在尝试获取产品详细信息.品牌,描述,名称等.我试图在另一个之后进行正则表达式匹配,但遗憾的是这不起作用,因为$1保持定义.试图取消$1变量取消给我错误消息“只读”,这是合乎逻辑的.我将在下面发布我的代码,也许有人有一个想法如何让它工作.

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use IO::File;
use utf8;
my $nfh = IO::File->new('test.html','w');
my $site = 'http://www.test.de/dp/';
my $sku = '1550043196';
my $url = join('',$site,$sku);
my $content = get $url;
my $name = $1 if ($content =~ m{<span id="productTitle" class="a-size-large">(.*?)</span>}gism);
print "$namen";

# My attempt of undefying 
#undef $1;

my $marke = $1 if ($content =~ m{data-brand="(.*?)"}gism);
print "$marken";

有什么建议?

解决方法

首先,永远不要使用如下构造:

my $var = $val if( $some );

根据documentation:

NOTE: The behaviour of a my,state,or our modified with a statement
modifier conditional or loop construct (for example,my $x if … ) is
undefined. The value of the my variable may be undef,any previously
assigned value,or possibly anything else. Don’t rely on it. Future
versions of perl might do something different from the version of Perl
you try it out on. Here be dragons.

m //运算符,当指定了/ g修饰符时in list context,it returns a list of the substrings matched by any capturing parentheses in the regular expression.因此,正如@Сухой27在上面的注释中所说,你应该使用:

my ($some) = $str =~ m/...(...).../g;

举个简单的例子:

use strict;
use warnings;

my $u="undefined";
my $str = q{some="string" another="one"};

#will match
my ($m1) = $str =~ m/some="(.*?)"/g;
print 'm1=',$m1 // $u,'= $1=',$1 // $u,"=n";

#will NOT match
my ($m2) = $str =~ m/nothere="(.*?)"/g;
print 'm2=',$m2 // $u,"=n";

#will match another
my ($m3) = $str =~ m/another="(.*?)"/g;
print 'm3=',$m3 // $u,"=n";

打印:

m1=string= $1=string=
m2=undefined= $1=string=   #the $1 hold previously matched value
m3=one= $1=one=

正如您所看到的,当匹配不成功时,$1仍然存在. documentation说:

These special variables,like the %+ hash and the numbered match
variables ($1,$2,$3,etc.) are dynamically scoped until the end
of the enclosing block or until the next successful match,whichever
comes first. (See Compound Statements in perlsyn.)

NOTE: Failed
matches in Perl do not reset the match variables,which makes it
easier to write code that tests for a series of more specific cases
and remembers the best match.

因此,如果您不希望定义$1,则可以将匹配的部分包含在一个块中,例如:

use strict
use warnings;

my $u="undefined";
my $str = q{some="string" another="one"};
my($m1,$m2,$m3);

{($m1) = $str =~ m/some="(.*?)"/g;}
print 'm1=',"=n";

{($m2) = $str =~ m/nothere="(.*?)"/g;}
print 'm2=',"=n";

{($m3) = $str =~ m/another="(.*?)"/g;}
print 'm3=',"=n";

打印什么

m1=string= $1=undefined=
m2=undefined= $1=undefined=
m3=one= $1=undefined=

PS:我不是Perl大师,也许其他人会扩展/纠正这个答案.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读