加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > Linux > 正文

尝试阅读pdf,解析数据,并使用Linux上的Perl将所需数据写入电子表

发布时间:2020-12-13 22:56:46 所属栏目:Linux 来源:网络整理
导读:我正在尝试从信用卡对帐单中提取数据并将其输入电子表格以用于税务目的.到目前为止我所做的涉及多个步骤,但我对Perl相对较新,并且正在从我所知道的工作.这是我到目前为止编写的两个单独的脚本…一个从pdf读取所有数据并写入文本文件,另一个解析文本(不完美)
我正在尝试从信用卡对帐单中提取数据并将其输入电子表格以用于税务目的.到目前为止我所做的涉及多个步骤,但我对Perl相对较新,并且正在从我所知道的工作.这是我到目前为止编写的两个单独的脚本…一个从pdf读取所有数据并写入文本文件,另一个解析文本(不完美)并将其写入另一个文本文件.然后我想创建一个csv文件导入电子表格或直接写入电子表格.我想用一个脚本做这个,但是两个或三个就足够了.

第一个脚本:

#!/usr/bin/perl
use CAM::PDF; 
my $file = "/home/cd/Documents/Jan14.pdf"; 
my $pdf = CAM::PDF->new($file); 
my $doc="";
my $filename = 'report.txt';
open(my $fh,'>',$filename) or die "Could not open file '$filename' $!";
for ($i=1; $i <= $pdf->numPages(); $i++) {
 $doc = $doc.$pdf->getPageText($i);
}
print $fh " $docn";
close $fh;
print "donen";

第二个脚本:

#!/usr/bin/perl
use strict;
use warnings;

undef $/;               # Enable 'slurp' mode
open (FILE,'<','report.txt') or die "Could not open report.txt: $!";

my $file = <FILE>;      # Whole file here now... 
my ($stuff_that_interests_me) = 
     ($file =~ m/.*?(Date of Transaction.*?CONTINUED).*/s);
print "$stuff_that_interests_men";

my $filename = 'data.txt';
open(my $fh,'>>',$filename) or die "Could not open file '$filename' $!";

print $fh " $stuff_that_interests_men";
close $fh;
print "donen";

close (FILE) or die "Could not close report.txt: $!";

open (FILE2,'report.txt') or die "Could not open report.txt: $!";

my $file2 = <FILE2>;      # Whole file here now... 
my ($other_stuff_that_interests_me) = 
     ($file2 =~ m/.*?(Page 2 .*?TRANSACTIONS THIS CYCLE).*/s);
print "$other_stuff_that_interests_men";
$filename = 'data.txt';
open($fh,$filename) or die "Could not open file '$filename' $!";

print $fh " $other_stuff_that_interests_men";
close $fh;
print "donen";

close (FILE2) or die "Could not close report.txt: $!";

更新:
我在CPAN上找到了一个模块(CAM:PDF),它非常适合我正在尝试做的事情……它甚至以我可以更容易地用于电子表格的格式呈现数据.但是,我还没弄明白如何将它打印到.txt文件…任何建议?

#!/usr/bin/perl -w

package main;

use warnings;
use strict;
use CAM::PDF;
use Getopt::Long;
use Pod::Usage;
use English qw(-no_match_vars);

our $VERSION = '1.60';

my %opts = (
            density    => undef,xdensity    => undef,ydensity    => undef,check      => 0,renderer   => 'CAM::PDF::Renderer::Dump',verbose    => 0,help       => 0,version    => 0,);

Getopt::Long::Configure('bundling');
GetOptions('r|renderer=s' => $opts{renderer},'d|density=f'  => $opts{density},'x|xdensity=f' => $opts{xdensity},'y|ydensity=f' => $opts{ydensity},'c|check'      => $opts{check},'v|verbose'    => $opts{verbose},'h|help'       => $opts{help},'V|version'    => $opts{version},) or pod2usage(1);
if ($opts{help})
{
   pod2usage(-exitstatus => 0,-verbose => 2);
}
if ($opts{version})
{
   print "CAM::PDF v$CAM::PDF::VERSIONn";
   exit 0;
}

if (defined $opts{density})
{
   $opts{xdensity} = $opts{ydensity} = $opts{density};
}
if (defined $opts{xdensity} || defined $opts{ydensity})
{
   if (!eval "require $opts{renderer}")  ## no critic (StringyEval)
   {
      die $EVAL_ERROR;
   }
   if (defined $opts{xdensity})
   {
      no strict 'refs'; ## no critic(ProhibitNoStrict)
      my $varname = $opts{renderer}.'::xdensity';
      ${$varname} = $opts{xdensity};
   }
   if (defined $opts{ydensity})
   {
      no strict 'refs'; ## no critic(ProhibitNoStrict)
      my $varname = $opts{renderer}.'::ydensity';
      ${$varname} = $opts{ydensity};
   }
}

if (@ARGV < 1)
{
   pod2usage(1);
}

my $file = shift;
my $pagelist = shift;

my $doc = CAM::PDF->new($file) || die "$CAM::PDF::errstrn";

foreach my $p ($doc->rangeToArray(1,$doc->numPages(),$pagelist))
{
   my $tree = $doc->getPageContentTree($p,$opts{verbose});
   if ($opts{check})
   {
      print "Checking page $pn";
      if (!$tree->validate())
      {
         print "  Failedn";
      }
   }
   $tree->render($opts{renderer});
}

解决方法

I’d like to either create a csv file to import into a spreadsheet or
write directly to a spreadsheet.

您可以直接写入电子表格,查看Excel::Writer::XLSX.

如果要创建CSV文件,可以尝试使用Text::CSV和Text::CSV_XS.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读