加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

python – PDF出血检测

发布时间:2020-12-20 10:33:25 所属栏目:Python 来源:网络整理
导读:我目前正在编写一个小工具( Python pyPdf)来测试PDF以确保打印机符合性. 唉,我已经对第一项任务感到困惑:检测PDF是否至少有3毫米’流血'(页面周围没有打印任何内容).我已经知道我无法检测完整文档的出血,因为似乎没有全局文档.然而,在页面上我总共可以检测
我目前正在编写一个小工具( Python pyPdf)来测试PDF以确保打印机符合性.

唉,我已经对第一项任务感到困惑:检测PDF是否至少有3毫米’流血'(页面周围没有打印任何内容).我已经知道我无法检测完整文档的出血,因为似乎没有全局文档.然而,在页面上我总共可以检测到五个不同的盒子:

> mediaBox
> bleedBox
> trimBox
> cropBox
> artBox

我阅读了关于那些盒子的pyPdf documentation,但我理解的唯一一个是mediaBox,它似乎代表整个页面大小(即纸张).

bleedBox显然应该定义出血,但似乎并非总是如此.

我注意到的另一件事是,例如PDF,所有这些盒子在每页上都有完全相同的尺寸(完全没有出血),但是当我打开它时会有大量的流血;这让我认为单个文本元素有自己的偏移量.

所以,显然,只计算mediaBox和bleedBox的出血不是一个可行的选择.

如果有人能够了解这些盒子实际上是什么以及我可以从中得出什么结论(例如,一个盒子总是小于另一个盒子),我将非常高兴.

奖金问题:有人能告诉我documentation中提到的“默认用户空间单位”究竟是什么?我很确定这是指机器上的mm,但是我想在各处执行mm.

解决方法

引用Adobe发布的PDF规范 ISO 32000-1:2008:

14.11.2 Page Boundaries

14.11.2.1 General

A PDF page may be prepared either for a finished medium,such as a
sheet of paper,or as part of a prepress process in which the content
of the page is placed on an intermediate medium,such as film or an
imposed reproduction plate. In the latter case,it is important to
distinguish between the intermediate page and the finished page. The
intermediate page may often include additional production-related
content,such as bleeds or printer marks,that falls outside the
boundaries of the finished page. To handle such cases,a PDF page
maydefine as many as five separate boundaries to control various
aspects of the imaging process:

  • The media box defines the boundaries of the physical medium on which
    the page is to be printed. It may include any extended area
    surrounding the finished page for bleed,printing marks,or other such
    purposes. It may also include areas close to the edges of the medium
    that cannot be marked because of physical limitations of the output
    device. Content falling outside this boundary may safely be discarded
    without affecting the meaning of the PDF file.

  • The crop box defines the region to which the contents of the page
    shall be clipped (cropped) when displayed or printed. Unlike the other
    boxes,the crop box has no defined meaning in terms of physical page
    geometry or intended use; it merely imposes clipping on the page
    contents. However,in the absence of additional information (such as
    imposition instructions specified in a JDF or PJTF job ticket),the
    crop box determines how the page’s contents shall be positioned on the
    output medium. The default value is the page’s media box.

  • The bleed box (PDF 1.3) defines the region to which the contents of
    the page shall be clipped when output in a production environment.
    This may include any extra bleed area needed to accommodate the
    physical limitations of cutting,folding,and trimming equipment. The
    actual printed page may include printing marks that fall outside the
    bleed box. The default value is the page’s crop box.

  • The trim box (PDF 1.3) defines the intended dimensions of the
    finished page after trimming. It may be smaller than the media box to
    allow for production-related content,such as printing instructions,
    cut marks,or colour bars. The default value is the page’s crop box.

  • The art box (PDF 1.3) defines the extent of the page’s meaningful
    content (including potential white space) as intended by the page’s
    creator. The default value is the page’s crop box.

The page object dictionary specifies these boundaries in the MediaBox,
CropBox,BleedBox,TrimBox,and ArtBox entries,respectively (see
Table 30). All of them are rectangles expressed in default user space
units. The crop,bleed,trim,and art boxes shall not ordinarily
extend beyond the boundaries of the media box. If they do,they are
effectively reduced to their intersection with the media box. Figure
86 illustrates the relationships among these boundaries. (The crop box
is not shown in the figure because it has no defined relationship with
any of the other boundaries.)

接下来有一个漂亮的图形显示了彼此相关的框:

在很多情况下只设置媒体盒的原因是

>如果PDF用于电子消费(即在电脑上阅读),其他盒子几乎不重要;和
>即使在印前环境中,它们也不再像过去那样必要了,参见article Pedro在他的评论中提及.

关于你的“红利问题”:用户空间单位默认为1/72英寸;但是,从PDF 1.6开始,可以使用页面字典中的UserUnit条目将其更改为该大小的任何(不必要的整数)倍数.在现有PDF中更改它实际上是将其缩放,因为用户空间单位是页面的设备无关坐标系中的基本单位.因此,除非您要更新页面描述中的每个命令,并参考坐标以保持页面尺寸,否则您不希望强制执行毫米级用户空间单元…;)

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读