Perl:2GB后的index / rindex?
发布时间:2020-12-15 23:22:34 所属栏目:大数据 来源:网络整理
导读:index和rindex在短字符串上工作得很好: $perl -e '$a=("x"x((230)-2))."b";print index($a,"b")'2147483646$perl -e '$a=("x"x((230)-2))."b";print rindex($a,"b")'2147483646 但是,在长字符串上,它们的行为完全不同: $perl -e '$a=("x"x((230)-1))."b";p
index和rindex在短字符串上工作得很好:
$perl -e '$a=("x"x((2<<30)-2))."b";print index($a,"b")' 2147483646 $perl -e '$a=("x"x((2<<30)-2))."b";print rindex($a,"b")' 2147483646 但是,在长字符串上,它们的行为完全不同: $perl -e '$a=("x"x((2<<30)-1))."b";print index($a,"b")' Segmentation fault (core dumped) $perl -e '$a=("x"x((2<<30)-1))."b";print rindex($a,"b")' -1 这是预期的吗?有一个很好的解决方法吗? $perl -V Summary of my perl5 (revision 5 version 18 subversion 2) configuration: Platform: osname=linux,osvers=3.2.0-58-generic,archname=x86_64-linux-gnu-thread-multi uname='linux brownie 3.2.0-58-generic #88-ubuntu smp tue dec 3 17:37:58 utc 2013 x86_64 x86_64 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.18.2 -des' hint=recommended,useposix=true,d_sigaction=define useithreads=define,usemultiplicity=define useperlio=define,d_sfio=undef,uselargefiles=define,usesocks=undef use64bitint=define,use64bitall=define,uselongdouble=undef usemymalloc=n,bincompat5005=undef Compiler: cc='cc',ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',optimize='-O2 -g',cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='',gccversion='4.8.2',gccosandvers='' intsize=4,longsize=8,ptrsize=8,doublesize=8,byteorder=12345678 d_longlong=define,longlongsize=8,d_longdbl=define,longdblsize=16 ivtype='long',ivsize=8,nvtype='double',nvsize=8,Off_t='off_t',lseeksize=8 alignbytes=8,prototype=define Linker and Libraries: ld='cc',ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=,so=so,useshrplib=true,libperl=libperl.so.5.18.2 gnulibc_version='2.19' Dynamic Linking: dlsrc=dl_dlopen.xs,dlext=so,d_dlsymun=undef,ccdlflags='-Wl,-E' cccdlflags='-fPIC',lddlflags='-shared -L/usr/local/lib -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Locally applied patches: DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking DEBPKG:fixes/respect_umask - Respect umask during installation DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default. DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl. DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/702096 Point users to Debian packages of deprecated core modules DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.18.2-2ubuntu1 in patchlevel.h DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags} DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected DEBPKG:fixes/net_ftp_failed_command - [rt.cpan.org #37700] http://bugs.debian.org/491062 Net::FTP: cope gracefully with a failed command DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time DEBPKG:fixes/module_metadata_security_doc - [68cdd4b] CVE-2013-1437 documentation fix DEBPKG:fixes/module_metadata_taint_fix - [bff978f] http://bugs.debian.org/722210 [rt.cpan.org #88576] untaint version,if needed,in Module::Metadata DEBPKG:fixes/IPC-SysV-spelling - http://bugs.debian.org/730558 [rt.cpan.org #86736] Fix spelling of IPC_CREAT in IPC-SysV documentation DEBPKG:fixes/fix-undef-source - Built under linux Compiled at Mar 27 2014 18:30:28 %ENV: PERL_MB_OPT="--install_base "/home/tange/perl5"" PERL_MM_OPT="INSTALL_BASE=/home/tange/perl5" @INC: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl . 解决方法
This bug was fixed并将在5.22.0中,应该在5月份出来.它没有进入5.20.x系列.该补丁很简单,您应该能够将其应用于5.20.2或5.18.2代码库并重新编译.
正则表达式有同样的问题,它也在5.21中得到修复. 解决方法是在内存中没有2个gig字符串,这是一般的好习惯.如果从文件中读取它,可能以块的形式读取它并在每个块上使用索引. 如果您必须有一个2 gig字符串,请使用substr()以块为单位进行检查.不幸的是,这意味着你必须将字符串复制成碎片.下面的代码故意在循环外部初始化$substr,因此Perl不会多次重新分配内存. use v5.18; use strict; use warnings; sub big_index { my $ref = shift; my $match = shift; state $block_size = 2**29; my $strlen = length($$ref); my $matchlen = length($match); # No point in doing extra work if we don't need to. return index($$ref,$match) if $strlen < $block_size; my $substr = ''; my $offset = 0; for( my $offset = 0; $offset < $strlen; $offset += ($block_size - $matchlen - 1) ) { $substr = substr($$ref,$offset,$block_size); my $ret = index $substr,$match; return $ret + $offset if $ret != -1; } return -1; } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |