加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

ios – 从Objective-C中的PDF中提取可编辑字段

发布时间:2020-12-14 17:56:44 所属栏目:百科 来源:网络整理
导读:我一直在研究在我的iOS应用程序中使用PDF.我已经想出了一些难题,比如扫描运算符并在UIWebView中显示PDF.但是,我真正需要做的是识别PDF文档中的可编辑字段. 理想情况下,我希望能够直接与字段进行交互,但这听起来非常困难,而且不是明显的第一步.我已经与可以通
我一直在研究在我的iOS应用程序中使用PDF.我已经想出了一些难题,比如扫描运算符并在UIWebView中显示PDF.但是,我真正需要做的是识别PDF文档中的可编辑字段.

理想情况下,我希望能够直接与字段进行交互,但这听起来非常困难,而且不是明显的第一步.我已经与可以通过这种方式操作PDF的Windows服务连接,并且可以确定可识别的字段,在表单视图中从用户收集字段数据,以及将该数据发送回服务器.问题是我无法看到如何识别字段.我正在与政府发行的PDF格式(如I-9和W-4)进行交互,因此我无法控制PDF的创建或字段的命名.这就是我需要动态提取它们的原因.任何帮助和/或参考将不胜感激.

我正在使用Apple的Quatrz 2D编程指南中的[此参考](https://developer.apple.com/library/mac/#documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_pdf_scan/dq_pdf_scan.html“PDF文档解析”)扫描PDF时触发运算符回调,但这无法帮助我找到可编辑的字段.

我也只是加载带有PDF数据的UIWebView以显示给用户.

[_webView loadData:decodedData MIMEType:@"application/pdf" textEncodingName:@"utf-8" baseURL:nil];

更新:

我构建了一个PDF Helper类(如下所示)来遍历目录中所有可能的对象类型.最初我没有处理数组中的嵌套字典,所以我没有看到表单字段.一旦我修复了,我意识到有必须考虑的父引用,以避免循环递归调用,这将启动无限循环.下面的代码显示了文档目录中的大量信息.现在我只需要解析它以隔离我需要的表单字段.

PDFHelper.h

#import <Foundation/Foundation.h>

id selfClass;

@interface PDFHelper : NSObject

@property (nonatomic,strong) NSData *pdfData;
@property (nonatomic,strong) NSMutableDictionary *pdfDict;
@property (nonatomic) int catalogLevel;


-(NSArray *) copyPDFArray:(CGPDFArrayRef)arr referencingDictionary:(CGPDFDictionaryRef)dict referencingKey:(const char *)key;
-(NSArray *) getFormFields;
-(CGPDFDictionaryRef) getDocumentCatalog;

@end

PDFHelper.m

#import "PDFHelper.h"
#import "FileHelpers.h"
#import "Log.h"

@implementation PDFHelper

@synthesize pdfData = _pdfData;
@synthesize pdfDict = _pdfDict;
@synthesize catalogLevel = _catalogLevel;

-(id)init
{
    self = [super init];
    if(self)
    {
        selfClass = self;
        _pdfDict = [[NSMutableDictionary alloc] init];
        _catalogLevel = 1;
    }

    return self;
}

-(NSArray *) getFormFields
{
    CGPDFDictionaryRef acroForm = NULL;
    if (CGPDFDictionaryGetDictionary([self getPdfDocDictionary],"AcroForm",&acroForm))
        CGPDFDictionaryApplyFunction(acroForm,getDictionaryObjects,acroForm);
    return [_pdfDict objectForKey:@"XFA"];
}

-(CGPDFDictionaryRef) getDocumentCatalog
{
    CGPDFDictionaryRef docCatalog = [self getPdfDocDictionary];
    CGPDFDictionaryApplyFunction(docCatalog,docCatalog);
    return docCatalog;
}

-(CGPDFDictionaryRef) getPdfDocDictionary
{
    NSURL *pdf = [[NSURL alloc] initFileURLWithPath:[FileHelpers pathInLibraryDirectory:@"file.pdf"]];

    [_pdfData writeToFile:[pdf path] atomically:YES];

    CGPDFDocumentRef pdfDocument = CGPDFDocumentCreateWithURL((__bridge CFURLRef)pdf);
    CGPDFDictionaryRef returnDict = CGPDFDocumentGetCatalog(pdfDocument);
    return returnDict;
}

void getDictionaryObjects (const char *key,CGPDFObjectRef object,void *info) {

    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"key: %s",key]];
    for (int i = 0; i < [selfClass catalogLevel]; i++)
        logString = [NSString stringWithFormat:@"-%@",logString];
    [Log LogDebug:logString];

    CGPDFDictionaryRef contentDict = (CGPDFDictionaryRef)info;

    CGPDFObjectType type = CGPDFObjectGetType(object);
    switch (type) {
        case kCGPDFObjectTypeNull: {            
                [Log LogDebug:[NSString stringWithFormat:@"*****pdf null value"]];
            break;
        }
        case kCGPDFObjectTypeBoolean: {
            CGPDFBoolean objectBoolean;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeBoolean,&objectBoolean)) {
                NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf boolean value: %@",[NSNumber numberWithBool:objectBoolean]]];
                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                [Log LogDebug:logString];
                [[selfClass pdfDict] setObject:[NSNumber numberWithBool:objectBoolean]
                                        forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];
            }
            break;
        }
        case kCGPDFObjectTypeInteger: {
            CGPDFInteger objectInteger;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeInteger,&objectInteger)) {
                NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf integer value: %ld",(long int)objectInteger]];
                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                [Log LogDebug:logString];
                [[selfClass pdfDict] setObject:[NSNumber numberWithInt:objectInteger]
                                        forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];
            }
            break;
        }
        case kCGPDFObjectTypeReal: {
            CGPDFReal objectReal;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeReal,&objectReal)) {
                NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf real value: %ld",(long int)objectReal]];
                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                [Log LogDebug:logString];
                [[selfClass pdfDict] setObject:[NSNumber numberWithInt:objectReal]
                                        forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];
            }
            break;
        }
        case kCGPDFObjectTypeName: {
            const char *name;
            if (CGPDFDictionaryGetName(contentDict,key,&name))
            {
                NSString *dictName = [[NSString alloc] initWithCString:name encoding:NSUTF8StringEncoding];
                if (dictName)
                {
                    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf name value: %@",dictName]];
                    for (int i = 0; i < [selfClass catalogLevel]; i++)
                        logString = [NSString stringWithFormat:@"-%@",logString];
                    [Log LogDebug:logString];
                    [[selfClass pdfDict] setObject:dictName
                                            forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];
                }
            }
            break;
        }
        case kCGPDFObjectTypeString: {
            CGPDFStringRef objectString;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeString,&objectString)) {
                NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf string value: %@",(__bridge NSString *)CGPDFStringCopyTextString(objectString)]];
                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                [Log LogDebug:logString];
                [[selfClass pdfDict] setObject:(__bridge NSString *)CGPDFStringCopyTextString(objectString)
                                        forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];
            }
            break;
        }
        case kCGPDFObjectTypeArray: {
            CGPDFArrayRef objectArray;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeArray,&objectArray)) {
                NSArray *myArray=[selfClass copyPDFArray:objectArray referencingDictionary:contentDict referencingKey:key];
                [[selfClass pdfDict] setObject:myArray
                                        forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];

            }
            break;
        }
        case kCGPDFObjectTypeDictionary: {
            CGPDFDictionaryRef objectDictionary;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeDictionary,&objectDictionary)) {
                NSString *logString = @"Found dictionary";
                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                //[Log LogDebug:logString];
                NSString *keyCheck = [[NSString alloc] initWithUTF8String:key];
                if (![keyCheck isEqualToString:@"Parent"] && ![keyCheck isEqualToString:@"P"])
                {
                    [selfClass setCatalogLevel:[selfClass catalogLevel] + 1];
                    CGPDFDictionaryApplyFunction(objectDictionary,objectDictionary);
                    [selfClass setCatalogLevel:[selfClass catalogLevel] - 1];
                }
            }
            break;
        }
        case kCGPDFObjectTypeStream: {
            CGPDFStreamRef objectStream;
            if (CGPDFObjectGetValue(object,kCGPDFObjectTypeStream,&objectStream)) {

                CGPDFDictionaryRef dict = CGPDFStreamGetDictionary( objectStream );

                CGPDFDataFormat fmt = CGPDFDataFormatRaw;
                CFDataRef streamData = CGPDFStreamCopyData(objectStream,&fmt);
                NSData *data = [[NSData alloc] initWithData:(__bridge NSData *)(streamData)];
                [data writeToFile:[FileHelpers pathInDocumentDirectory:@"data.dat"] atomically:YES];
                NSString *dataString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
                //if (!dataString) {
                //    dataString = [[NSString alloc] initWithData:(__bridge NSData *)(streamData) encoding:NSUTF16StringEncoding];
               // }

                NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf stream length: %ld - %@",(long int)CFDataGetLength( streamData ),dataString]];

                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                [Log LogDebug:logString];

                NSString *keyCheck = [[NSString alloc] initWithUTF8String:key];
                if( dict && ![keyCheck isEqualToString:@"Parent"] && ![keyCheck isEqualToString:@"P"])
                {
                    [selfClass setCatalogLevel:[selfClass catalogLevel] + 1];
                    CGPDFDictionaryApplyFunction(dict,dict);
                    [selfClass setCatalogLevel:[selfClass catalogLevel] - 1];
                }
            }
        }
    }
}

- (NSArray *)copyPDFArray:(CGPDFArrayRef)arr referencingDictionary:(CGPDFDictionaryRef)dict referencingKey:(const char *)key
{
    int i = 0;
    NSMutableArray *temp = [[NSMutableArray alloc] init];

    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array count: %zu",CGPDFArrayGetCount(arr)]];
    for (int i = 0; i < [selfClass catalogLevel]; i++)
        logString = [NSString stringWithFormat:@"-%@",logString];
    [Log LogDebug:logString];

    for(i=0; i<CGPDFArrayGetCount(arr); i++){
        CGPDFObjectRef object;
        CGPDFArrayGetObject(arr,i,&object);
        CGPDFObjectType type = CGPDFObjectGetType(object);
        switch(type){
            case kCGPDFObjectTypeNull: {
                NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array null(%d)",i]];
                for (int i = 0; i < [selfClass catalogLevel]; i++)
                    logString = [NSString stringWithFormat:@"-%@",logString];
                [Log LogDebug:logString];
                break;
            }
            case kCGPDFObjectTypeBoolean: {
                CGPDFBoolean objectBool;
                if (CGPDFObjectGetValue(object,&objectBool)) {
                    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array boolean value(%d): %@",[NSNumber numberWithBool:objectBool]]];
                    for (int i = 0; i < [selfClass catalogLevel]; i++)
                        logString = [NSString stringWithFormat:@"-%@",logString];
                    [Log LogDebug:logString];
                    [temp addObject:[NSNumber numberWithBool:objectBool]];
                }
                break;
            }
            case kCGPDFObjectTypeInteger: {
                CGPDFInteger objectInteger;
                if (CGPDFObjectGetValue(object,&objectInteger)) {
                    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array integer value(%d): %ld",(long int)objectInteger]];
                    for (int i = 0; i < [selfClass catalogLevel]; i++)
                        logString = [NSString stringWithFormat:@"-%@",logString];
                    [Log LogDebug:logString];
                    [temp addObject:[NSNumber numberWithInt:objectInteger]];
                }
                break;
            }
            case kCGPDFObjectTypeReal:
            {
                CGPDFReal objectReal;
                if (CGPDFObjectGetValue(object,&objectReal))
                {
                    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array real(%d): %ld",(long int)objectReal]];
                    for (int i = 0; i < [selfClass catalogLevel]; i++)
                        logString = [NSString stringWithFormat:@"-%@",logString];
                    [Log LogDebug:logString];
                    [temp addObject:[NSNumber numberWithInt:objectReal]];
                }
                break;
            }
            case kCGPDFObjectTypeName:
            {
                const char *name;
                if (CGPDFDictionaryGetName(dict,&name))
                {
                    NSString *dictName = [[NSString alloc] initWithCString:name encoding:NSUTF8StringEncoding];

                    if (dictName)
                    {
                        NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array name value(%d): %@",dictName]];
                        for (int i = 0; i < [selfClass catalogLevel]; i++)
                            logString = [NSString stringWithFormat:@"-%@",logString];
                        [Log LogDebug:logString];
                        [[selfClass pdfDict] setObject:dictName
                                                forKey:[NSString stringWithCString:key encoding:NSUTF8StringEncoding]];
                    }
                }
                break;
            }
            case kCGPDFObjectTypeString:
            {
                CGPDFStringRef objectString;
                if (CGPDFObjectGetValue(object,&objectString))
                {
                    NSString *tempStr = (__bridge NSString *)CGPDFStringCopyTextString(objectString);
                    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array string(%d): %@",tempStr]];
                    for (int i = 0; i < [selfClass catalogLevel]; i++)
                        logString = [NSString stringWithFormat:@"-%@",logString];
                    [Log LogDebug:logString];
                    [temp addObject:tempStr];
                }
                break;
            }
            case kCGPDFObjectTypeArray :
            {
                CGPDFArrayRef objectArray;
                if (CGPDFObjectGetValue(object,&objectArray))
                {
                    NSArray *tempArr = [selfClass copyPDFArray:objectArray referencingDictionary:dict referencingKey:key];
                    [temp addObject:tempArr];
                }
                break;
            }
            case kCGPDFObjectTypeDictionary :
            {
                CGPDFDictionaryRef objectDict;
                NSString *keyCheck = [[NSString alloc] initWithUTF8String:key];
                if (CGPDFObjectGetValue(object,&objectDict) && ![keyCheck isEqualToString:@"Parent"] && ![keyCheck isEqualToString:@"P"])
                {
                    [selfClass setCatalogLevel:[selfClass catalogLevel] + 1];
                    CGPDFDictionaryApplyFunction( objectDict,objectDict);
                    [selfClass setCatalogLevel:[selfClass catalogLevel] - 1];
                }
                break;
            }
            case kCGPDFObjectTypeStream :
            {
                CGPDFStreamRef objectStream;
                if (CGPDFObjectGetValue(object,&objectStream))
                {
                    CGPDFDictionaryRef streamDict = CGPDFStreamGetDictionary( objectStream );
                    CGPDFDataFormat fmt = CGPDFDataFormatRaw;
                    CFDataRef streamData = CGPDFStreamCopyData(objectStream,&fmt);
                    NSString *dataString = [[NSString alloc] initWithData:(__bridge NSData *)(streamData) encoding:NSUTF8StringEncoding];

                    NSString *logString = [[NSString alloc] initWithString:[NSString stringWithFormat:@"pdf array stream length: (%d): %ld - %@",dataString]];

                    for (int i = 0; i < [selfClass catalogLevel]; i++)
                        logString = [NSString stringWithFormat:@"-%@",logString];
                    [Log LogDebug:logString];


                    NSString *keyCheck = [[NSString alloc] initWithUTF8String:key];
                    if( streamDict && ![keyCheck isEqualToString:@"Parent"] && ![keyCheck isEqualToString:@"P"])
                    {
                        [selfClass setCatalogLevel:[selfClass catalogLevel] + 1];
                        CGPDFDictionaryApplyFunction( streamDict,streamDict );
                        [selfClass setCatalogLevel:[selfClass catalogLevel] - 1];
                    }
                }
            }

        }
    }
    return temp;
}

@end

解决方法

“可编辑字段”是指可以使用Acrobat或Adobe Reader填写的表单元素的类型?

这些字段不是实际页面描述的一部分.如果查看PDF规范文档,您将在第12.7章中找到“交互式表单”的说明,该说明解释了文档的字段字典是从文档目录中名为“AcroForm”的元素开始存储的.

据我所知,iOS确实允许您访问文档目录,因此您必须在该目录字典中找到“AcroForm”字段,然后进入字段字典结构以收集所需的信息.完整文档中的所有字段都以分层方式存储在此处.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读