mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-10 18:24:40 +00:00
3567 lines
141 KiB
XML
3567 lines
141 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE book [
|
|
<!ENTITY ldquo "“">
|
|
<!ENTITY rdquo "”">
|
|
<!ENTITY mdash "—">
|
|
<!ENTITY ndash "–">
|
|
<!ENTITY nbsp " ">
|
|
<!ENTITY swversion "4.0.1">
|
|
<!ENTITY lastreleased "January 17, 2013">
|
|
]>
|
|
<book>
|
|
<bookinfo>
|
|
<title>QPDF Manual</title>
|
|
<subtitle>For QPDF Version &swversion;, &lastreleased;</subtitle>
|
|
<author>
|
|
<firstname>Jay</firstname><surname>Berkenbilt</surname>
|
|
</author>
|
|
<copyright>
|
|
<year>2005–2013</year>
|
|
<holder>Jay Berkenbilt</holder>
|
|
</copyright>
|
|
</bookinfo>
|
|
<preface id="acknowledgments">
|
|
<title>General Information</title>
|
|
<para>
|
|
QPDF is a program that does structural, content-preserving
|
|
transformations on PDF files. QPDF's website is located at <ulink
|
|
url="http://qpdf.sourceforge.net/">http://qpdf.sourceforge.net/</ulink>.
|
|
QPDF's source code is hosted on github at <ulink
|
|
url="https://github.com/qpdf/qpdf">https://github.com/qpdf/qpdf</ulink>.
|
|
</para>
|
|
<para>
|
|
QPDF has been released under the terms of <ulink
|
|
url="http://www.opensource.org/licenses/artistic-license-2.0.php">Version
|
|
2.0 of the Artistic License</ulink>, a copy of which appears in the
|
|
file <filename>Artistic-2.0</filename> in the source distribution.
|
|
</para>
|
|
<para>
|
|
QPDF was originally created in 2001 and modified periodically
|
|
between 2001 and 2005 during my employment at <ulink
|
|
url="http://www.apexcovantage.com">Apex CoVantage</ulink>. Upon my
|
|
departure from Apex, the company graciously allowed me to take
|
|
ownership of the software and continue maintaining as an open
|
|
source project, a decision for which I am very grateful. I have
|
|
made considerable enhancements to it since that time. I feel
|
|
fortunate to have worked for people who would make such a decision.
|
|
This work would not have been possible without their support.
|
|
</para>
|
|
</preface>
|
|
<chapter id="ref.overview">
|
|
<title>What is QPDF?</title>
|
|
<para>
|
|
QPDF is a program that does structural, content-preserving
|
|
transformations on PDF files. It could have been called something
|
|
like <emphasis>pdf-to-pdf</emphasis>. It also provides many useful
|
|
capabilities to developers of PDF-producing software or for people
|
|
who just want to look at the innards of a PDF file to learn more
|
|
about how they work.
|
|
</para>
|
|
<para>
|
|
With QPDF, it is possible to copy objects from one PDF file into
|
|
another and to manipulate the list of pages in a PDF file. This
|
|
makes it possible to merge and split PDF files. The QPDF library
|
|
also makes it possible for you to create PDF files from scratch.
|
|
In this mode, you are responsible for supplying all the contents of
|
|
the file, while the QPDF library takes care off all the syntactical
|
|
representation of the objects, creation of cross references tables
|
|
and, if you use them, object streams, encryption, linearization,
|
|
and other syntactic details. You are still responsible for
|
|
generating PDF content on your own.
|
|
</para>
|
|
<para>
|
|
QPDF has been designed with very few external dependencies, and it
|
|
is intentionally very lightweight. QPDF is
|
|
<emphasis>not</emphasis> a PDF content creation library, a PDF
|
|
viewer, or a program capable of converting PDF into other formats.
|
|
In particular, QPDF knows nothing about the semantics of PDF
|
|
content streams. If you are looking for something that can do
|
|
that, you should look elsewhere. However, once you have a valid
|
|
PDF file, QPDF can be used to transform that file in ways perhaps
|
|
your original PDF creation can't handle. For example, many
|
|
programs generate simple PDF files but can't password-protect them,
|
|
web-optimize them, or perform other transformations of that type.
|
|
</para>
|
|
</chapter>
|
|
<chapter id="ref.installing">
|
|
<title>Building and Installing QPDF</title>
|
|
<para>
|
|
This chapter describes how to build and install qpdf. Please see
|
|
also the <filename>README</filename> and
|
|
<filename>INSTALL</filename> files in the source distribution.
|
|
</para>
|
|
<sect1 id="ref.prerequisites">
|
|
<title>System Requirements</title>
|
|
<para>
|
|
The qpdf package has relatively few external dependencies. In
|
|
order to build qpdf, the following packages are required:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
zlib: <ulink url="http://www.zlib.net/">http://www.zlib.net/</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
pcre: <ulink url="http://www.pcre.org/">http://www.pcre.org/</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
gnu make 3.81 or newer: <ulink url="http://www.gnu.org/software/make">http://www.gnu.org/software/make</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
perl version 5.8 or newer:
|
|
<ulink url="http://www.perl.org/">http://www.perl.org/</ulink>;
|
|
required for <command>fix-qdf</command> and the test suite.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
GNU diffutils (any version): <ulink
|
|
url="http://www.gnu.org/software/diffutils/">http://www.gnu.org/software/diffutils/</ulink>
|
|
is required to run the test suite. Note that this is the
|
|
version of diff present on virtually all GNU/Linux systems.
|
|
This is required because the test suite uses <command>diff
|
|
-u</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A C++ compiler that works well with STL and has the <type>long
|
|
long</type> type. Most modern C++ compilers should fit the
|
|
bill fine. QPDF is tested with gcc and Microsoft Visual C++.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Part of qpdf's test suite does comparisons of the contents PDF
|
|
files by converting them images and comparing the images. The
|
|
image comparison tests are disabled by default. Those tests are
|
|
not required for determining correctness of a qpdf build if you
|
|
have not modified the code since the test suite also contains
|
|
expected output files that are compared literally. The image
|
|
comparison tests provide an extra check to make sure that any
|
|
content transformations don't break the rendering of pages.
|
|
Transformations that affect the content streams themselves are off
|
|
by default and are only provided to help developers look into the
|
|
contents of PDF files. If you are making deep changes to the
|
|
library that cause changes in the contents of the files that qpdf
|
|
generates, then you should enable the image comparison tests.
|
|
Enable them by running <command>configure</command> with the
|
|
<option>--enable-test-compare-images</option> flag. If you enable
|
|
this, the following additional requirements are required by the
|
|
test suite. Note that in no case are these items required to use
|
|
qpdf.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
libtiff: <ulink url="http://www.remotesensing.org/libtiff/">http://www.remotesensing.org/libtiff/</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
GhostScript version 8.60 or newer: <ulink
|
|
url="http://www.ghostscript.com">http://www.ghostscript.com</ulink>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
If you do not enable this, then you do not need to have tiff and
|
|
ghostscript.
|
|
</para>
|
|
<para>
|
|
If Adobe Reader is installed as <command>acroread</command>, some
|
|
additional test cases will be enabled. These test cases simply
|
|
verify that Adobe Reader can open the files that qpdf creates.
|
|
They require version 8.0 or newer to pass. However, in order to
|
|
avoid having qpdf depend on non-free (as in liberty) software, the
|
|
test suite will still pass without Adobe reader, and the test
|
|
suite still exercises the full functionality of the software.
|
|
</para>
|
|
<para>
|
|
Pre-built documentation is distributed with qpdf, so you should
|
|
generally not need to rebuild the documentation. In order to
|
|
build the documentation from its docbook sources, you need the
|
|
docbook XML style sheets (<ulink
|
|
url="http://downloads.sourceforge.net/docbook/">http://downloads.sourceforge.net/docbook/</ulink>).
|
|
To build the PDF version of the documentation, you need Apache fop
|
|
(<ulink
|
|
url="http://xml.apache.org/fop/">http://xml.apache.org/fop/</ulink>)
|
|
version 0.94 or higher.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.building">
|
|
<title>Build Instructions</title>
|
|
<para>
|
|
Building qpdf on UNIX is generally just a matter of running
|
|
|
|
<programlisting>./configure
|
|
make
|
|
</programlisting>
|
|
You can also run <command>make check</command> to run the test
|
|
suite and <command>make install</command> to install. Please run
|
|
<command>./configure --help</command> for options on what can be
|
|
configured. You can also set the value of
|
|
<varname>DESTDIR</varname> during installation to install to a
|
|
temporary location, as is common with many open source packages.
|
|
Please see also the <filename>README</filename> and
|
|
<filename>INSTALL</filename> files in the source distribution.
|
|
</para>
|
|
<para>
|
|
Building on Windows is a little bit more complicated. For
|
|
details, please see <filename>README-windows.txt</filename> in the
|
|
source distribution. You can also download a binary distribution
|
|
for Windows. There is a port of qpdf to Visual C++ version 6 in
|
|
the <filename>contrib</filename> area generously contributed by
|
|
Jian Ma. This is also discussed in more detail in
|
|
<filename>README-windows.txt</filename>.
|
|
</para>
|
|
<para>
|
|
There are some other things you can do with the build. Although
|
|
qpdf uses <application>autoconf</application>, it does not use
|
|
<application>automake</application> but instead uses a
|
|
hand-crafted non-recursive Makefile that requires gnu make. If
|
|
you're really interested, please read the comments in the
|
|
top-level <filename>Makefile</filename>.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.using">
|
|
<title>Running QPDF</title>
|
|
<para>
|
|
This chapter describes how to run the qpdf program from the command
|
|
line.
|
|
</para>
|
|
<sect1 id="ref.invocation">
|
|
<title>Basic Invocation</title>
|
|
<para>
|
|
When running qpdf, the basic invocation is as follows:
|
|
|
|
<programlisting><command>qpdf</command><option> [ <replaceable>options</replaceable> ] <replaceable>infilename</replaceable> [ <replaceable>outfilename</replaceable> ]</option>
|
|
</programlisting>
|
|
This converts PDF file <option>infilename</option> to PDF file
|
|
<option>outfilename</option>. The output file is functionally
|
|
identical to the input file but may have been structurally
|
|
reorganized. Also, orphaned objects will be removed from the
|
|
file. Many transformations are available as controlled by the
|
|
options below. In place of <option>infilename</option>, the
|
|
parameter <option>--empty</option> may be specified. This causes
|
|
qpdf to use a dummy input file that contains zero pages. The only
|
|
normal use case for using <option>--empty</option> would be if you
|
|
were going to add pages from another source, as discussed in <xref
|
|
linkend="ref.page-selection"/>.
|
|
</para>
|
|
<para>
|
|
<option>outfilename</option> does not have to be seekable, even
|
|
when generating linearized files. Specifying
|
|
“<option>-</option>” as <option>outfilename</option>
|
|
means to write to standard output. However, you can't specify the
|
|
same file as both the input and the output because qpdf reads data
|
|
from the input file as it writes to the output file.
|
|
</para>
|
|
<para>
|
|
Most options require an output file, but some testing or
|
|
inspection commands do not. These are specifically noted.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.basic-options">
|
|
<title>Basic Options</title>
|
|
<para>
|
|
The following options are the most common ones and perform
|
|
commonly needed transformations.
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--password=password</option></term>
|
|
<listitem>
|
|
<para>
|
|
Specifies a password for accessing encrypted files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--linearize</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation of a linearized (web-optimized) output file.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--copy-encryption=file</option></term>
|
|
<listitem>
|
|
<para>
|
|
Encrypt the file using the same encryption parameters,
|
|
including user and owner password, as the specified file. Use
|
|
<option>--encrypt-file-password</option> to specify a password
|
|
if one is needed to open this file. Note that copying the
|
|
encryption parameters from a file also copies the first half
|
|
of <literal>/ID</literal> from the file since this is part of
|
|
the encryption parameters.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--encrypt-file-password=password</option></term>
|
|
<listitem>
|
|
<para>
|
|
If the file specified with <option>--copy-encryption</option>
|
|
requires a password, specify the password using this option.
|
|
Note that only one of the user or owner password is required.
|
|
Both passwords will be preserved since QPDF does not
|
|
distinguish between the two passwords. It is possible to
|
|
preserve encryption parameters, including the owner password,
|
|
from a file even if you don't know the file's owner password.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--encrypt options --</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation an encrypted output file. Please see <xref
|
|
linkend="ref.encryption-options"/> for details on how to
|
|
specify encryption parameters.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--decrypt</option></term>
|
|
<listitem>
|
|
<para>
|
|
Removes any encryption on the file. A password must be
|
|
supplied if the file is password protected.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--pages options --</option></term>
|
|
<listitem>
|
|
<para>
|
|
Select specific pages from one or more input files. See <xref
|
|
linkend="ref.page-selection"/> for details on how to do page
|
|
selection (splitting and merging).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
Password-protected files may be opened by specifying a password.
|
|
By default, qpdf will preserve any encryption data associated with
|
|
a file. If <option>--decrypt</option> is specified, qpdf will
|
|
attempt to remove any encryption information. If
|
|
<option>--encrypt</option> is specified, qpdf will replace the
|
|
document's encryption parameters with whatever is specified.
|
|
</para>
|
|
<para>
|
|
Note that qpdf does not obey encryption restrictions already
|
|
imposed on the file. Doing so would be meaningless since qpdf can
|
|
be used to remove encryption from the file entirely. This
|
|
functionality is not intended to be used for bypassing copyright
|
|
restrictions or other restrictions placed on files by their
|
|
producers.
|
|
</para>
|
|
<para>
|
|
In all cases where qpdf allows specification of a password, care
|
|
must be taken if the password contains characters that fall
|
|
outside of the 7-bit US-ASCII character range to ensure that the
|
|
exact correct byte sequence is provided. It is possible that a
|
|
future version of qpdf may handle this more gracefully. For
|
|
example, if a password was encrypted using a password that was
|
|
encoded in ISO-8859-1 and your terminal is configured to use
|
|
UTF-8, the password you supply may not work properly. There are
|
|
various approaches to handling this. For example, if you are
|
|
using Linux and have the iconv executable (part of the ICU
|
|
package) installed, you could pass <option>--password=`echo
|
|
<replaceable>password</replaceable> | iconv -t
|
|
iso-8859-1`</option> to qpdf where
|
|
<replaceable>password</replaceable> is a password specified in
|
|
your terminal's locale. A detailed discussion of this is out of
|
|
scope for this manual, but just be aware of this issue if you have
|
|
trouble with a password that contains 8-bit characters.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.encryption-options">
|
|
<title>Encryption Options</title>
|
|
<para>
|
|
To change the encryption parameters of a file, use the --encrypt
|
|
flag. The syntax is
|
|
|
|
<programlisting><option>--encrypt <replaceable>user-password</replaceable> <replaceable>owner-password</replaceable> <replaceable>key-length</replaceable> [ <replaceable>restrictions</replaceable> ] --</option>
|
|
</programlisting>
|
|
Note that “<option>--</option>” terminates parsing of
|
|
encryption flags and must be present even if no restrictions are
|
|
present.
|
|
</para>
|
|
<para>
|
|
Either or both of the user password and the owner password may be
|
|
empty strings.
|
|
</para>
|
|
<para>
|
|
The value for
|
|
<option><replaceable>key-length</replaceable></option> may be 40,
|
|
128, or 256. The restriction flags are dependent upon key length.
|
|
When no additional restrictions are given, the default is to be
|
|
fully permissive.
|
|
</para>
|
|
<para>
|
|
If <option><replaceable>key-length</replaceable></option> is 40,
|
|
the following restriction options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--print=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow printing.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--modify=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow document modification.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--extract=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow text/image extraction.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--annotate=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow comments and form fill-in
|
|
and signing.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
If <option><replaceable>key-length</replaceable></option> is 128,
|
|
the following restriction options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--accessibility=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow accessibility to visually
|
|
impaired.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--extract=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow text/graphic extraction.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--print=<replaceable>print-opt</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls printing access.
|
|
<option><replaceable>print-opt</replaceable></option> may be
|
|
one of the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>full</option>: allow full printing
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>low</option>: allow low-resolution printing only
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>none</option>: disallow printing
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--modify=<replaceable>modify-opt</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls modify access.
|
|
<option><replaceable>modify-opt</replaceable></option> may be
|
|
one of the following, each of which implies all the options
|
|
that follow it:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>all</option>: allow full document modification
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>annotate</option>: allow comment authoring and form operations
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>form</option>: allow form field fill-in and signing
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>assembly</option>: allow document assembly only
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>none</option>: allow no modifications
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--cleartext-metadata</option></term>
|
|
<listitem>
|
|
<para>
|
|
If specified, any metadata stream in the document will be left
|
|
unencrypted even if the rest of the document is encrypted.
|
|
This also forces the PDF version to be at least 1.5.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--use-aes=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
If <option>--use-aes=y</option> is specified, AES encryption
|
|
will be used instead of RC4 encryption. This forces the PDF
|
|
version to be at least 1.6.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-V4</option></term>
|
|
<listitem>
|
|
<para>
|
|
Use of this option forces the <literal>/V</literal> and
|
|
<literal>/R</literal> parameters in the document's encryption
|
|
dictionary to be set to the value <literal>4</literal>. As
|
|
qpdf will automatically do this when required, there is no
|
|
reason to ever use this option. It exists primarily for use
|
|
in testing qpdf itself. This option also forces the PDF
|
|
version to be at least 1.5.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
If <option><replaceable>key-length</replaceable></option> is 256,
|
|
the minimum PDF version is 1.7 with extension level 8, and the
|
|
AES-based encryption format used is the PDF 2.0 encryption method
|
|
supported by Acrobat X. the same options are available as with
|
|
128 bits with the following exceptions:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--use-aes</option></term>
|
|
<listitem>
|
|
<para>
|
|
This option is not available with 256-bit keys. AES is always
|
|
used with 256-bit encryption keys.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-V4</option></term>
|
|
<listitem>
|
|
<para>
|
|
This option is not available with 256 keys.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-R5</option></term>
|
|
<listitem>
|
|
<para>
|
|
If specified, qpdf sets the minimum version to 1.7 at
|
|
extension level 3 and writes the deprecated encryption format
|
|
used by Acrobat version IX. This option should not be used in
|
|
practice to generate PDF files that will be in general use,
|
|
but it can be useful to generate files if you are trying to
|
|
test proper support in another application for PDF files
|
|
encrypted in this way.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
The default for each permission option is to be fully permissive.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.page-selection">
|
|
<title>Page Selection Options</title>
|
|
<para>
|
|
Starting with qpdf 3.0, it is possible to split and merge PDF
|
|
files by selecting pages from one or more input files. Whatever
|
|
file is given as the primary input file is used as the starting
|
|
point, but its pages are replaced with pages as specified.
|
|
|
|
<programlisting><option>--pages <replaceable>input-file</replaceable> [ <replaceable>--password=password</replaceable> ] <replaceable>page-range</replaceable> [ ... ] --</option>
|
|
</programlisting>
|
|
Multiple input files may be specified. Each one is given as the
|
|
name of the input file, an optional password (if required to open
|
|
the file), and the range of pages. Note that
|
|
“<option>--</option>” terminates parsing of page
|
|
selection flags.
|
|
</para>
|
|
<para>
|
|
For each file that pages should be taken from, specify the file, a
|
|
password needed to open the file (if any), and a page range. The
|
|
password needs to be given only once per file. If any of the
|
|
input files are the same as the primary input file or the file
|
|
used to copy encryption parameters (if specified), you do not need
|
|
to repeat the password here. The same file can be repeated
|
|
multiple times. If a file that is repeated has a password, the
|
|
password only has to be given the first time. All non-page data
|
|
(info, outlines, page numbers, etc.) are taken from the primary
|
|
input file. To discard these, use <option>--empty</option> as the
|
|
primary input.
|
|
</para>
|
|
<para>
|
|
It is not presently possible to specify the same page from the
|
|
same file directly more than once, but you can make this work by
|
|
specifying two different paths to the same file (such as by
|
|
putting <filename>./</filename> somewhere in the path). This can
|
|
also be used if you want to repeat a page from one of the input
|
|
files in the output file. This may be made more convenient in a
|
|
future version of qpdf if there is enough demand for this feature.
|
|
</para>
|
|
<para>
|
|
The page range is a set of numbers separated by commas, ranges of
|
|
numbers separated dashes, or combinations of those. The character
|
|
“z” represents the last page. Pages can appear in any
|
|
order. Ranges can appear with a high number followed by a low
|
|
number, which causes the pages to appear in reverse. Repeating a
|
|
number will cause an error, but you can use the workaround
|
|
discussed above should you really want to include the same page
|
|
twice.
|
|
</para>
|
|
<para>
|
|
Example page ranges:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>1,3,5-9,15-12</literal>: pages 1, 2, 3, 5, 6, 7, 8,
|
|
9, 15, 14, 13, and 12.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>z-1</literal>: all pages in the document in reverse
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Note that qpdf doesn't presently do anything special about other
|
|
constructs in a PDF file that may know about pages, so semantics
|
|
of splitting and merging vary across features. For example, the
|
|
document's outlines (bookmarks) point to actual page objects, so
|
|
if you select some pages and not others, bookmarks that point to
|
|
pages that are in the output file will work, and remaining
|
|
bookmarks will not work. On the other hand, page labels (page
|
|
numbers specified in the file) are just sequential, so page labels
|
|
will be messed up in the output file. A future version of
|
|
<command>qpdf</command> may do a better job at handling these
|
|
issues. (Note that the qpdf library already contains all of the
|
|
APIs required in order to implement this in your own application
|
|
if you need it.) In the mean time, you can always use
|
|
<option>--empty</option> as the primary input file to avoid
|
|
copying all of that from the first file. For example, to take
|
|
pages 1 through 5 from a <filename>infile.pdf</filename> while
|
|
preserving all metadata associated with that file, you could use
|
|
|
|
<programlisting><command>qpdf</command> <option>infile.pdf --pages infile.pdf 1-5 -- outfile.pdf</option>
|
|
</programlisting>
|
|
If you wanted pages 1 through 5 from
|
|
<filename>infile.pdf</filename> but you wanted the rest of the
|
|
metadata to be dropped, you could instead run
|
|
|
|
<programlisting><command>qpdf</command> <option>--empty --pages infile.pdf 1-5 -- outfile.pdf</option>
|
|
</programlisting>
|
|
If you wanted to take pages 1–5 from
|
|
<filename>file1.pdf</filename> and pages 11–15 from
|
|
<filename>file2.pdf</filename> in reverse, you would run
|
|
|
|
<programlisting><command>qpdf</command> <option>file1.pdf --pages file1.pdf 1-5 file2.pdf 15-11 -- outfile.pdf</option>
|
|
</programlisting>
|
|
If, for some reason, you wanted to take the first page of an
|
|
encrypted file called <filename>encrypted.pdf</filename> with
|
|
password <literal>pass</literal> and repeat it twice in an output
|
|
file, and if you wanted to drop metadata (like page numbers and
|
|
outlines) but preserve encryption, you would use
|
|
|
|
<programlisting><command>qpdf</command> <option>--empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
|
|
--pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
|
|
outfile.pdf</option>
|
|
</programlisting>
|
|
Note that we had to specify the password all three times because
|
|
giving a password as <option>--encryption-file-password</option>
|
|
doesn't count for page selection, and as far as qpdf is concerned,
|
|
<filename>encrypted.pdf</filename> and
|
|
<filename>./encrypted.pdf</filename> are separated files. These
|
|
are all corner cases that most users should hopefully never have
|
|
to be bothered with.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.advanced-transformation">
|
|
<title>Advanced Transformation Options</title>
|
|
<para>
|
|
These transformation options control fine points of how qpdf
|
|
creates the output file. Mostly these are of use only to people
|
|
who are very familiar with the PDF file format or who are PDF
|
|
developers. The following options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--stream-data=<replaceable>option</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls transformation of stream data. The value of
|
|
<option><replaceable>option</replaceable></option> may be one
|
|
of the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>compress</option>: recompress stream data when
|
|
possible (default)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>preserve</option>: leave all stream data as is
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>uncompress</option>: uncompress stream data when
|
|
possible
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--normalize-content=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Enables or disables normalization of content streams.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--suppress-recovery</option></term>
|
|
<listitem>
|
|
<para>
|
|
Prevents qpdf from attempting to recover damaged files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--object-streams=<replaceable>mode</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls handing of object streams. The value of
|
|
<option><replaceable>mode</replaceable></option> may be one of
|
|
the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>preserve</option>: preserve original object streams
|
|
(default)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>disable</option>: don't write any object streams
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>generate</option>: use object streams wherever
|
|
possible
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--ignore-xref-streams</option></term>
|
|
<listitem>
|
|
<para>
|
|
Tells qpdf to ignore any cross-reference streams.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--qdf</option></term>
|
|
<listitem>
|
|
<para>
|
|
Turns on QDF mode. For additional information on QDF, please
|
|
see <xref linkend="ref.qdf"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--min-version=<replaceable>version</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Forces the PDF version of the output file to be at least
|
|
<replaceable>version</replaceable>. In other words, if the
|
|
input file has a lower version than the specified version, the
|
|
specified version will be used. If the input file has a
|
|
higher version, the input file's original version will be
|
|
used. It is seldom necessary to use this option since qpdf
|
|
will automatically increase the version as needed when adding
|
|
features that require newer PDF readers.
|
|
</para>
|
|
<para>
|
|
The version number may be expressed in the form
|
|
<replaceable>major.minor.extension-level</replaceable>, in
|
|
which case the version is interpreted as
|
|
<replaceable>major.minor</replaceable> at extension level
|
|
<replaceable>extension-level</replaceable>. For example,
|
|
version <literal>1.7.8</literal> represents version 1.7 at
|
|
extension level 8. Note that minimal syntax checking is done
|
|
on the command line.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-version=<replaceable>version</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
This option forces the PDF version to be the exact version
|
|
specified <emphasis>even when the file may have content that
|
|
is not supported in that version</emphasis>. The version
|
|
number is interpreted in the same way as with
|
|
<option>--min-version</option> so that extension levels can be
|
|
set. In some cases, forcing the output file's PDF version to
|
|
be lower than that of the input file will cause qpdf to
|
|
disable certain features of the document. Specifically,
|
|
256-bit keys are disabled if the version is less than 1.7 with
|
|
extension level 8 (except R5 is disabled if less than 1.7 with
|
|
extension level 3), AES encryption is disabled if the version
|
|
is less than 1.6, cleartext metadata and object streams are
|
|
disabled if less than 1.5, 128-bit encryption keys are
|
|
disabled if less than 1.4, and all encryption is disabled if
|
|
less than 1.3. Even with these precautions, qpdf won't be
|
|
able to do things like eliminate use of newer image
|
|
compression schemes, transparency groups, or other features
|
|
that may have been added in more recent versions of PDF.
|
|
</para>
|
|
<para>
|
|
As a general rule, with the exception of big structural things
|
|
like the use of object streams or AES encryption, PDF viewers
|
|
are supposed to ignore features in files that they don't
|
|
support from newer versions. This means that forcing the
|
|
version to a lower version may make it possible to open your
|
|
PDF file with an older version, though bear in mind that some
|
|
of the original document's functionality may be lost.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
By default, when a stream is encoded using non-lossy filters that
|
|
qpdf understands and is not already compressed using a good
|
|
compression scheme, qpdf will uncompress and recompress streams.
|
|
Assuming proper filter implements, this is safe and generally
|
|
results in smaller files. This behavior may also be explicitly
|
|
requested with <option>--stream-data=compress</option>.
|
|
</para>
|
|
<para>
|
|
When <option>--stream-data=preserve</option> is specified, qpdf
|
|
will never attempt to change the filtering of any stream data.
|
|
</para>
|
|
<para>
|
|
When <option>--stream-data=uncompress</option> is specified, qpdf
|
|
will attempt to remove any non-lossy filters that it supports.
|
|
This includes <literal>/FlateDecode</literal>,
|
|
<literal>/LZWDecode</literal>, <literal>/ASCII85Decode</literal>,
|
|
and <literal>/ASCIIHexDecode</literal>. This can be very useful
|
|
for inspecting the contents of various streams.
|
|
</para>
|
|
<para>
|
|
When <option>--normalize-content=y</option> is specified, qpdf
|
|
will attempt to normalize whitespace and newlines in page content
|
|
streams. This is generally safe but could, in some cases, cause
|
|
damage to the content streams. This option is intended for people
|
|
who wish to study PDF content streams or to debug PDF content.
|
|
You should not use this for “production” PDF files.
|
|
</para>
|
|
<para>
|
|
Ordinarily, qpdf will attempt to recover from certain types of
|
|
errors in PDF files. These include errors in the cross-reference
|
|
table, certain types of object numbering errors, and certain types
|
|
of stream length errors. Sometimes, qpdf may think it has
|
|
recovered but may not have actually recovered, so care should be
|
|
taken when using this option as some data loss is possible. The
|
|
<option>--suppress-recovery</option> option will prevent qpdf from
|
|
attempting recovery. In this case, it will fail on the first
|
|
error that it encounters.
|
|
</para>
|
|
<para>
|
|
Object streams, also known as compressed objects, were introduced
|
|
into the PDF specification at version 1.5, corresponding to
|
|
Acrobat 6. Some older PDF viewers may not support files with
|
|
object streams. qpdf can be used to transform files with object
|
|
streams to files without object streams or vice versa. As
|
|
mentioned above, there are three object stream modes:
|
|
<option>preserve</option>, <option>disable</option>, and
|
|
<option>generate</option>.
|
|
</para>
|
|
<para>
|
|
In <option>preserve</option> mode, the relationship to objects and
|
|
the streams that contain them is preserved from the original file.
|
|
In <option>disable</option> mode, all objects are written as
|
|
regular, uncompressed objects. The resulting file should be
|
|
readable by older PDF viewers. (Of course, the content of the
|
|
files may include features not supported by older viewers, but at
|
|
least the structure will be supported.) In
|
|
<option>generate</option> mode, qpdf will create its own object
|
|
streams. This will usually result in more compact PDF files,
|
|
though they may not be readable by older viewers. In this mode,
|
|
qpdf will also make sure the PDF version number in the header is
|
|
at least 1.5.
|
|
</para>
|
|
<para>
|
|
Ordinarily, qpdf reads cross-reference streams when they are
|
|
present in a PDF file. If <option>--ignore-xref-streams</option>
|
|
is specified, qpdf will ignore any cross-reference streams for
|
|
hybrid PDF files. The purpose of hybrid files is to make some
|
|
content available to viewers that are not aware of cross-reference
|
|
streams. It is almost never desirable to ignore them. The only
|
|
time when you might want to use this feature is if you are testing
|
|
creation of hybrid PDF files and wish to see how a PDF consumer
|
|
that doesn't understand object and cross-reference streams would
|
|
interpret such a file.
|
|
</para>
|
|
<para>
|
|
The <option>--qdf</option> flag turns on QDF mode, which changes
|
|
some of the defaults described above. Specifically, in QDF mode,
|
|
by default, stream data is uncompressed, content streams are
|
|
normalized, and encryption is removed. These defaults can still
|
|
be overridden by specifying the appropriate options as described
|
|
above. Additionally, in QDF mode, stream lengths are stored as
|
|
indirect objects, objects are laid out in a less efficient but
|
|
more readable fashion, and the documents are interspersed with
|
|
comments that make it easier for the user to find things and also
|
|
make it possible for <command>fix-qdf</command> to work properly.
|
|
QDF mode is intended for people, mostly developers, who wish to
|
|
inspect or modify PDF files in a text editor. For details, please
|
|
see <xref linkend="ref.qdf"/>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.testing-options">
|
|
<title>Testing, Inspection, and Debugging Options</title>
|
|
<para>
|
|
These options can be useful for digging into PDF files or for use
|
|
in automated test suites for software that uses the qpdf library.
|
|
When any of the options in this section are specified, no output
|
|
file should be given. The following options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--static-id</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation of a fixed value for /ID. This is intended
|
|
for testing only. Never use it for production files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--static-aes-iv</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes use of a static initialization vector for AES-CBC.
|
|
This is intended for testing only so that output files can be
|
|
reproducible. Never use it for production files. This option
|
|
in particular is not secure since it significantly weakens the
|
|
encryption.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--no-original-object-ids</option></term>
|
|
<listitem>
|
|
<para>
|
|
Suppresses inclusion of original object ID comments in QDF
|
|
files. This can be useful when generating QDF files for test
|
|
purposes, particularly when comparing them to determine
|
|
whether two PDF files have identical content.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-show-encryption</option></term>
|
|
<listitem>
|
|
<para>
|
|
Shows document encryption parameters. Also shows the
|
|
document's user password if the owner password is given.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-check-linearization</option></term>
|
|
<listitem>
|
|
<para>
|
|
Checks file integrity and linearization status.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-show-linearization</option></term>
|
|
<listitem>
|
|
<para>
|
|
Checks and displays all data in the linearization hint tables.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-show-xref</option></term>
|
|
<listitem>
|
|
<para>
|
|
Shows the contents of the cross-reference table in a
|
|
human-readable form. This is especially useful for files with
|
|
cross-reference streams which are stored in a binary format.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-show-object=obj[,gen]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Show the contents of the given object. This is especially
|
|
useful for inspecting objects that are inside of object
|
|
streams (also known as “compressed objects”).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-raw-stream-data</option></term>
|
|
<listitem>
|
|
<para>
|
|
When used along with the <option>--show-object</option>
|
|
option, if the object is a stream, shows the raw stream data
|
|
instead of object's contents.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-filtered-stream-data</option></term>
|
|
<listitem>
|
|
<para>
|
|
When used along with the <option>--show-object</option>
|
|
option, if the object is a stream, shows the filtered stream
|
|
data instead of object's contents. If the stream is filtered
|
|
using filters that qpdf does not support, an error will be
|
|
issued.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-show-pages</option></term>
|
|
<listitem>
|
|
<para>
|
|
Shows the object and generation number for each page
|
|
dictionary object and for each content stream associated with
|
|
the page. Having this information makes it more convenient to
|
|
inspect objects from a particular page.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-with-images</option></term>
|
|
<listitem>
|
|
<para>
|
|
When used along with <option>--show-pages</option>, also shows
|
|
the object and generation numbers for the image objects on
|
|
each page. (At present, information about images in shared
|
|
resource dictionaries are not output by this command. This is
|
|
discussed in a comment in the source code.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>-check</option></term>
|
|
<listitem>
|
|
<para>
|
|
Checks file structure and well as encryption, linearization,
|
|
and encoding of stream data. A file for which
|
|
<option>--check</option> reports no errors may still have
|
|
errors in stream data content but should otherwise be
|
|
structurally sound. If <option>--check</option> any errors,
|
|
qpdf will exit with a status of 2. There are some recoverable
|
|
conditions that <option>--check</option> detects. These are
|
|
issued as warnings instead of errors. If qpdf finds no errors
|
|
but finds warnings, it will exit with a status of 3 (as of
|
|
version 2.0.4).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
The <option>--raw-stream-data</option> and
|
|
<option>--filtered-stream-data</option> options are ignored unless
|
|
<option>--show-object</option> is given. Either of these options
|
|
will cause the stream data to be written to standard output. In
|
|
order to avoid commingling of stream data with other output, it is
|
|
recommend that these objects not be combined with other
|
|
test/inspection options.
|
|
</para>
|
|
<para>
|
|
If <option>--filtered-stream-data</option> is given and
|
|
<option>--normalize-content=y</option> is also given, qpdf will
|
|
attempt to normalize the stream data as if it is a page content
|
|
stream. This attempt will be made even if it is not a page
|
|
content stream, in which case it will produce unusable results.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.qdf">
|
|
<title>QDF Mode</title>
|
|
<para>
|
|
In QDF mode, qpdf creates PDF files in what we call <firstterm>QDF
|
|
form</firstterm>. A PDF file in QDF form, sometimes called a QDF
|
|
file, is a completely valid PDF file that has
|
|
<literal>%QDF-1.0</literal> as its third line (after the pdf header
|
|
and binary characters) and has certain other characteristics. The
|
|
purpose of QDF form is to make it possible to edit PDF files, with
|
|
some restrictions, in an ordinary text editor. This can be very
|
|
useful for experimenting with different PDF constructs or for
|
|
making one-off edits to PDF files (though there are other reasons
|
|
why this may not always work).
|
|
</para>
|
|
<para>
|
|
It is ordinarily very difficult to edit PDF files in a text editor
|
|
for two reasons: most meaningful data in PDF files is compressed,
|
|
and PDF files are full of offset and length information that makes
|
|
it hard to add or remove data. A QDF file is organized in a manner
|
|
such that, if edits are kept within certain constraints, the
|
|
<command>fix-qdf</command> program, distributed with qpdf, is able
|
|
to restore edited files to a correct state. The
|
|
<command>fix-qdf</command> program takes no command-line
|
|
arguments. It reads a possibly edited QDF file from standard input
|
|
and writes a repaired file to standard output.
|
|
</para>
|
|
<para>
|
|
The following attributes characterize a QDF file:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
All objects appear in numerical order in the PDF file, including
|
|
when objects appear in object streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Objects are printed in an easy-to-read format, and all line
|
|
endings are normalized to UNIX line endings.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Unless specifically overridden, streams appear uncompressed
|
|
(when qpdf supports the filters and they are compressed with a
|
|
non-lossy compression scheme), and most content streams are
|
|
normalized (line endings are converted to just a UNIX-style
|
|
linefeeds).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
All streams lengths are represented as indirect objects, and the
|
|
stream length object is always the next object after the stream.
|
|
If the stream data does not end with a newline, an extra newline
|
|
is inserted, and a special comment appears after the stream
|
|
indicating that this has been done.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If the PDF file contains object streams, if object stream
|
|
<emphasis>n</emphasis> contains <emphasis>k</emphasis> objects,
|
|
those objects are numbered from <emphasis>n+1</emphasis> through
|
|
<emphasis>n+k</emphasis>, and the object number/offset pairs
|
|
appear on a separate line for each object. Additionally, each
|
|
object in the object stream is preceded by a comment indicating
|
|
its object number and index. This makes it very easy to find
|
|
objects in object streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
All beginnings of objects, <literal>stream</literal> tokens,
|
|
<literal>endstream</literal> tokens, and
|
|
<literal>endobj</literal> tokens appear on lines by themselves.
|
|
A blank line follows every <literal>endobj</literal> token.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If there is a cross-reference stream, it is unfiltered.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Page dictionaries and page content streams are marked with
|
|
special comments that make them easy to find.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Comments precede each object indicating the object number of the
|
|
corresponding object in the original file.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
When editing a QDF file, any edits can be made as long as the above
|
|
constraints are maintained. This means that you can freely edit a
|
|
page's content without worrying about messing up the QDF file. It
|
|
is also possible to add new objects so long as those objects are
|
|
added after the last object in the file or subsequent objects are
|
|
renumbered. If a QDF file has object streams in it, you can always
|
|
add the new objects before the xref stream and then change the
|
|
number of the xref stream, since nothing generally ever references
|
|
it by number.
|
|
</para>
|
|
<para>
|
|
It is not generally practical to remove objects from QDF files
|
|
without messing up object numbering, but if you remove all
|
|
references to an object, you can run qpdf on the file (after
|
|
running <command>fix-qdf</command>), and qpdf will omit the
|
|
now-orphaned object.
|
|
</para>
|
|
<para>
|
|
When <command>fix-qdf</command> is run, it goes through the file
|
|
and recomputes the following parts of the file:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
the <literal>/N</literal>, <literal>/W</literal>, and
|
|
<literal>/First</literal> keys of all object stream dictionaries
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the pairs of numbers representing object numbers and offsets of
|
|
objects in object streams
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
all stream lengths
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the cross-reference table or cross-reference stream
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the offset to the cross-reference table or cross-reference
|
|
stream following the <literal>startxref</literal> token
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</chapter>
|
|
<chapter id="ref.using-library">
|
|
<title>Using the QPDF Library</title>
|
|
<para>
|
|
The source tree for the qpdf package has an
|
|
<filename>examples</filename> directory that contains a few
|
|
example programs. The <filename>qpdf/qpdf.cc</filename> source
|
|
file also serves as a useful example since it exercises almost all
|
|
of the qpdf library's public interface. The best source of
|
|
documentation on the library itself is reading comments in
|
|
<filename>include/qpdf/QPDF.hh</filename>,
|
|
<filename>include/qpdf/QDFWriter.hh</filename>, and
|
|
<filename>include/qpdf/QPDFObjectHandle.hh</filename>.
|
|
</para>
|
|
<para>
|
|
All header files are installed in the <filename>include/qpdf</filename> directory. It
|
|
is recommend that you use <literal>#include
|
|
<qpdf/QPDF.hh></literal> rather than adding
|
|
<filename>include/qpdf</filename> to your include path.
|
|
</para>
|
|
<para>
|
|
When linking against the qpdf static library, you may also need to
|
|
specify <literal>-lpcre -lz</literal> on your link command. If
|
|
your system understands how to read libtool
|
|
<filename>.la</filename> files, this may not be necessary.
|
|
</para>
|
|
<para>
|
|
The qpdf library is safe to use in a multithreaded program, but no
|
|
individual <type>QPDF</type> object instance (including
|
|
<type>QPDF</type>, <type>QPDFObjectHandle</type>, or
|
|
<type>QPDFWriter</type>) can be used in more than one thread at a
|
|
time. Multiple threads may simultaneously work with different
|
|
instances of these and all other QPDF objects.
|
|
</para>
|
|
</chapter>
|
|
<chapter id="ref.design">
|
|
<title>Design and Library Notes</title>
|
|
<sect1 id="ref.design.intro">
|
|
<title>Introduction</title>
|
|
<para>
|
|
This section was written prior to the implementation of the qpdf
|
|
package and was subsequently modified to reflect the
|
|
implementation. In some cases, for purposes of explanation, it
|
|
may differ slightly from the actual implementation. As always,
|
|
the source code and test suite are authoritative. Even if there
|
|
are some errors, this document should serve as a road map to
|
|
understanding how this code works.
|
|
</para>
|
|
<para>
|
|
In general, one should adhere strictly to a specification when
|
|
writing but be liberal in reading. This way, the product of our
|
|
software will be accepted by the widest range of other programs,
|
|
and we will accept the widest range of input files. This library
|
|
attempts to conform to that philosophy whenever possible but also
|
|
aims to provide strict checking for people who want to validate
|
|
PDF files. If you don't want to see warnings and are trying to
|
|
write something that is tolerant, you can call
|
|
<literal>setSuppressWarnings(true)</literal>. If you want to fail
|
|
on the first error, you can call
|
|
<literal>setAttemptRecovery(false)</literal>. The default
|
|
behavior is to generating warnings for recoverable problems. Note
|
|
that recovery will not always produce the desired results even if
|
|
it is able to get through the file. Unlike most other PDF files
|
|
that produce generic warnings such as “This file is
|
|
damaged,”, qpdf generally issues a detailed error message
|
|
that would be most useful to a PDF developer. This is by design
|
|
as there seems to be a shortage of PDF validation tools out
|
|
there. (This was, in fact, one of the major motivations behind
|
|
the initial creation of qpdf.)
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.design-goals">
|
|
<title>Design Goals</title>
|
|
<para>
|
|
The QPDF package includes support for reading and rewriting PDF
|
|
files. It aims to hide from the user details involving object
|
|
locations, modified (appended) PDF files, the
|
|
directness/indirectness of objects, and stream filters including
|
|
encryption. It does not aim to hide knowledge of the object
|
|
hierarchy or content stream contents. Put another way, a user of
|
|
the qpdf library is expected to have knowledge about how PDF files
|
|
work, but is not expected to have to keep track of bookkeeping
|
|
details such as file positions.
|
|
</para>
|
|
<para>
|
|
A user of the library never has to care whether an object is
|
|
direct or indirect. All access to objects deals with this
|
|
transparently. All memory management details are also handled by
|
|
the library.
|
|
</para>
|
|
<para>
|
|
The <classname>PointerHolder</classname> object is used internally
|
|
by the library to deal with memory management. This is basically
|
|
a smart pointer object very similar in spirit to the Boost
|
|
library's <classname>shared_ptr</classname> object, but predating
|
|
it by several years. This library also makes use of a technique
|
|
for giving fine-grained access to methods in one class to other
|
|
classes by using public subclasses with friends and only private
|
|
members that in turn call private methods of the containing class.
|
|
See <classname>QPDFObjectHandle::Factory</classname> as an
|
|
example.
|
|
</para>
|
|
<para>
|
|
The top-level qpdf class is <classname>QPDF</classname>. A
|
|
<classname>QPDF</classname> object represents a PDF file. The
|
|
library provides methods for both accessing and mutating PDF
|
|
files.
|
|
</para>
|
|
<para>
|
|
<classname>QPDFObject</classname> is the basic PDF Object class.
|
|
It is an abstract base class from which are derived classes for
|
|
each type of PDF object. Clients do not interact with Objects
|
|
directly but instead interact with
|
|
<classname>QPDFObjectHandle</classname>.
|
|
</para>
|
|
<para>
|
|
<classname>QPDFObjectHandle</classname> contains
|
|
<classname>PointerHolder<QPDFObject></classname> and
|
|
includes accessor methods that are type-safe proxies to the
|
|
methods of the derived object classes as well as methods for
|
|
querying object types. They can be passed around by value,
|
|
copied, stored in containers, etc. with very low overhead.
|
|
Instances of <classname>QPDFObjectHandle</classname> always
|
|
contain a reference back to the <classname>QPDF</classname> object
|
|
from which they were created. A
|
|
<classname>QPDFObjectHandle</classname> may be direct or indirect.
|
|
If indirect, the <classname>QPDFObject</classname> the
|
|
<classname>PointerHolder</classname> initially points to is a null
|
|
pointer. In this case, the first attempt to access the underlying
|
|
<classname>QPDFObject</classname> will result in the
|
|
<classname>QPDFObject</classname> being resolved via a call to the
|
|
referenced <classname>QPDF</classname> instance. This makes it
|
|
essentially impossible to make coding errors in which certain
|
|
things will work for some PDF files and not for others based on
|
|
which objects are direct and which objects are indirect.
|
|
</para>
|
|
<para>
|
|
Instances of <classname>QPDFObjectHandle</classname> can be
|
|
directly created and modified using static factory methods in the
|
|
<classname>QPDFObjectHandle</classname> class. There are factory
|
|
methods for each type of object as well as a convenience method
|
|
<function>QPDFObjectHandle::parse</function> that creates an
|
|
object from a string representation of the object. Existing
|
|
instances of <classname>QPDFObjectHandle</classname> can also be
|
|
modified in several ways. See comments in
|
|
<filename>QPDFObjectHandle.hh</filename> for details.
|
|
</para>
|
|
<para>
|
|
When the <classname>QPDF</classname> class creates a new object,
|
|
it dynamically allocates the appropriate type of
|
|
<classname>QPDFObject</classname> and immediately hands the
|
|
pointer to an instance of <classname>QPDFObjectHandle</classname>.
|
|
The parser reads a token from the current file position. If the
|
|
token is a not either a dictionary or array opener, an object is
|
|
immediately constructed from the single token and the parser
|
|
returns. Otherwise, the parser is invoked recursively in a
|
|
special mode in which it accumulates objects until it finds a
|
|
balancing closer. During this process, the
|
|
“<literal>R</literal>” keyword is recognized and an
|
|
indirect <classname>QPDFObjectHandle</classname> may be
|
|
constructed.
|
|
</para>
|
|
<para>
|
|
The <function>QPDF::resolve()</function> method, which is used to
|
|
resolve an indirect object, may be invoked from the
|
|
<classname>QPDFObjectHandle</classname> class. It first checks a
|
|
cache to see whether this object has already been read. If not,
|
|
it reads the object from the PDF file and caches it. It the
|
|
returns the resulting <classname>QPDFObjectHandle</classname>.
|
|
The calling object handle then replaces its
|
|
<classname>PointerHolder<QDFObject></classname> with the one
|
|
from the newly returned <classname>QPDFObjectHandle</classname>.
|
|
In this way, only a single copy of any direct object need exist
|
|
and clients can access objects transparently without knowing
|
|
caring whether they are direct or indirect objects. Additionally,
|
|
no object is ever read from the file more than once. That means
|
|
that only the portions of the PDF file that are actually needed
|
|
are ever read from the input file, thus allowing the qpdf package
|
|
to take advantage of this important design goal of PDF files.
|
|
</para>
|
|
<para>
|
|
If the requested object is inside of an object stream, the object
|
|
stream itself is first read into memory. Then the tokenizer reads
|
|
objects from the memory stream based on the offset information
|
|
stored in the stream. Those individual objects are cached, after
|
|
which the temporary buffer holding the object stream contents are
|
|
discarded. In this way, the first time an object in an object
|
|
stream is requested, all objects in the stream are cached.
|
|
</para>
|
|
<para>
|
|
An instance of <classname>QPDF</classname> is constructed by using
|
|
the class's default constructor. If desired, the
|
|
<classname>QPDF</classname> object may be configured with various
|
|
methods that change its default behavior. Then the
|
|
<function>QPDF::processFile()</function> method is passed the name
|
|
of a PDF file, which permanently associates the file with that
|
|
QPDF object. A password may also be given for access to
|
|
password-protected files. QPDF does not enforce encryption
|
|
parameters and will treat user and owner passwords equivalently.
|
|
Either password may be used to access an encrypted file.
|
|
<footnote>
|
|
<para>
|
|
As pointed out earlier, the intention is not for qpdf to be used
|
|
to bypass security on files. but as any open source PDF consumer
|
|
may be easily modified to bypass basic PDF document security,
|
|
and qpdf offers may transformations that can do this as well,
|
|
there seems to be little point in the added complexity of
|
|
conditionally enforcing document security.
|
|
</para>
|
|
</footnote>
|
|
<classname>QPDF</classname> will allow recovery of a user password
|
|
given an owner password. The input PDF file must be seekable.
|
|
(Output files written by <classname>QPDFWriter</classname> need
|
|
not be seekable, even when creating linearized files.) During
|
|
construction, <classname>QPDF</classname> validates the PDF file's
|
|
header, and then reads the cross reference tables and trailer
|
|
dictionaries. The <classname>QPDF</classname> class keeps only
|
|
the first trailer dictionary though it does read all of them so it
|
|
can check the <literal>/Prev</literal> key.
|
|
<classname>QPDF</classname> class users may request the root
|
|
object and the trailer dictionary specifically. The cross
|
|
reference table is kept private. Objects may then be requested by
|
|
number of by walking the object tree.
|
|
</para>
|
|
<para>
|
|
When a PDF file has a cross-reference stream instead of a
|
|
cross-reference table and trailer, requesting the document's
|
|
trailer dictionary returns the stream dictionary from the
|
|
cross-reference stream instead.
|
|
</para>
|
|
<para>
|
|
There are some convenience routines for very common operations
|
|
such as walking the page tree and returning a vector of all page
|
|
objects. For full details, please see the header file
|
|
<filename>QPDF.hh</filename>.
|
|
</para>
|
|
<para>
|
|
The following example should clarify how
|
|
<classname>QPDF</classname> processes a simple file.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Client constructs <classname>QPDF</classname>
|
|
<varname>pdf</varname> and calls
|
|
<function>pdf.processFile("a.pdf");</function>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <classname>QPDF</classname> class checks the beginning of
|
|
<filename>a.pdf</filename> for
|
|
<literal>%!PDF-1.[0-9]+</literal>. It then reads the cross
|
|
reference table mentioned at the end of the file, ensuring that
|
|
it is looking before the last <literal>%%EOF</literal>. After
|
|
getting to <literal>trailer</literal> keyword, it invokes the
|
|
parser.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The parser sees “<literal><<</literal>”, so
|
|
it calls itself recursively in dictionary creation mode.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In dictionary creation mode, the parser keeps accumulating
|
|
objects until it encounters
|
|
“<literal>>></literal>”. Each object that is
|
|
read is pushed onto a stack. If
|
|
“<literal>R</literal>” is read, the last two
|
|
objects on the stack are inspected. If they are integers, they
|
|
are popped off the stack and their values are used to construct
|
|
an indirect object handle which is then pushed onto the stack.
|
|
When “<literal>>></literal>” is finally read,
|
|
the stack is converted into a
|
|
<classname>QPDF_Dictionary</classname> which is placed in a
|
|
<classname>QPDFObjectHandle</classname> and returned.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The resulting dictionary is saved as the trailer dictionary.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <literal>/Prev</literal> key is searched. If present,
|
|
<classname>QPDF</classname> seeks to that point and repeats
|
|
except that the new trailer dictionary is not saved. If
|
|
<literal>/Prev</literal> is not present, the initial parsing
|
|
process is complete.
|
|
</para>
|
|
<para>
|
|
If there is an encryption dictionary, the document's encryption
|
|
parameters are initialized.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The client requests root object. The
|
|
<classname>QPDF</classname> class gets the value of root key
|
|
from trailer dictionary and returns it. It is an unresolved
|
|
indirect <classname>QPDFObjectHandle</classname>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The client requests the <literal>/Pages</literal> key from root
|
|
<classname>QPDFObjectHandle</classname>. The
|
|
<classname>QPDFObjectHandle</classname> notices that it is
|
|
indirect so it asks <classname>QPDF</classname> to resolve it.
|
|
<classname>QPDF</classname> looks in the object cache for an
|
|
object with the root dictionary's object ID and generation
|
|
number. Upon not seeing it, it checks the cross reference
|
|
table, gets the offset, and reads the object present at that
|
|
offset. It stores the result in the object cache and returns
|
|
the cached result. The calling
|
|
<classname>QPDFObjectHandle</classname> replaces its object
|
|
pointer with the one from the resolved
|
|
<classname>QPDFObjectHandle</classname>, verifies that it a
|
|
valid dictionary object, and returns the (unresolved indirect)
|
|
<classname>QPDFObject</classname> handle to the top of the
|
|
Pages hierarchy.
|
|
</para>
|
|
<para>
|
|
As the client continues to request objects, the same process is
|
|
followed for each new requested object.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.encryption">
|
|
<title>Encryption</title>
|
|
<para>
|
|
Encryption is supported transparently by qpdf. When opening a PDF
|
|
file, if an encryption dictionary exists, the
|
|
<classname>QPDF</classname> object processes this dictionary using
|
|
the password (if any) provided. The primary decryption key is
|
|
computed and cached. No further access is made to the encryption
|
|
dictionary after that time. When an object is read from a file,
|
|
the object ID and generation of the object in which it is
|
|
contained is always known. Using this information along with the
|
|
stored encryption key, all stream and string objects are
|
|
transparently decrypted. Raw encrypted objects are never stored
|
|
in memory. This way, nothing in the library ever has to know or
|
|
care whether it is reading an encrypted file.
|
|
</para>
|
|
<para>
|
|
An interface is also provided for writing encrypted streams and
|
|
strings given an encryption key. This is used by
|
|
<classname>QPDFWriter</classname> when it rewrites encrypted
|
|
files.
|
|
</para>
|
|
<para>
|
|
When copying encrypted files, unless otherwise directed, qpdf will
|
|
preserve any encryption in force in the original file. qpdf can
|
|
do this with either the user or the owner password. There is no
|
|
difference in capability based on which password is used. When 40
|
|
or 128 bit encryption keys are used, the user password can be
|
|
recovered with the owner password. With 256 keys, the user and
|
|
owner passwords are used independently to encrypt the actual
|
|
encryption key, so while either can be used, the owner password
|
|
can no longer be used to recover the user password.
|
|
</para>
|
|
<para>
|
|
Starting with version 4.0.0, qpdf can read files that are not
|
|
encrypted but that contain encrypted attachments, but it cannot
|
|
write such files. qpdf also requires the password to be specified
|
|
in order to open the file, not just to extract attachments, since
|
|
once the file is open, all decryption is handled transparently.
|
|
When copying files like this while preserving encryption, qpdf
|
|
will apply the file's encryption to everything in the file, not
|
|
just to the attachments. When decrypting the file, qpdf will
|
|
decrypt the attachments. In general, when copying PDF files with
|
|
multiple encryption formats, qpdf will choose the newest format.
|
|
The only exception to this is that clear-text metadata will be
|
|
preserved as clear-text if it is that way in the original file.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.adding-and-remove-pages">
|
|
<title>Adding and Removing Pages</title>
|
|
<para>
|
|
While qpdf's API has supported adding and modifying objects for
|
|
some time, version 3.0 introduces specific methods for adding and
|
|
removing pages. These are largely convenience routines that
|
|
handle two tricky issues: pushing inheritable resources from the
|
|
<literal>/Pages</literal> tree down to individual pages and
|
|
manipulation of the <literal>/Pages</literal> tree itself. For
|
|
details, see <function>addPage</function> and surrounding methods
|
|
in <filename>QPDF.hh</filename>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.reserved-objects">
|
|
<title>Reserving Object Numbers</title>
|
|
<para>
|
|
Version 3.0 of qpdf introduced the concept of reserved objects.
|
|
These are seldom needed for ordinary operations, but there are
|
|
cases in which you may want to add a series of indirect objects
|
|
with references to each other to a <classname>QPDF</classname>
|
|
object. This causes a problem because you can't determine the
|
|
object ID that a new indirect object will have until you add it to
|
|
the <classname>QPDF</classname> object with
|
|
<function>QPDF::makeIndirectObject</function>. The only way to
|
|
add two mutually referential objects to a
|
|
<classname>QPDF</classname> object prior to version 3.0 would be
|
|
to add the new objects first and then make them refer to each
|
|
other after adding them. Now it is possible to create a
|
|
<firstterm>reserved object</firstterm> using
|
|
<function>QPDFObjectHandle::newReserved</function>. This is an
|
|
indirect object that stays “unresolved” even if it is
|
|
queried for its type. So now, if you want to create a set of
|
|
mutually referential objects, you can create reservations for each
|
|
one of them and use those reservations to construct the
|
|
references. When finished, you can call
|
|
<function>QPDF::replaceReserved</function> to replace the reserved
|
|
objects with the real ones. This functionality will never be
|
|
needed by most applications, but it is used internally by QPDF
|
|
when copying objects from other PDF files, as discussed in <xref
|
|
linkend="ref.foreign-objects"/>. For an example of how to use
|
|
reserved objects, search for <function>newReserved</function> in
|
|
<filename>test_driver.cc</filename> in qpdf's sources.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.foreign-objects">
|
|
<title>Copying Objects From Other PDF Files</title>
|
|
<para>
|
|
Version 3.0 of qpdf introduced the ability to copy objects into a
|
|
<classname>QPDF</classname> object from a different
|
|
<classname>QPDF</classname> object, which we refer to as
|
|
<firstterm>foreign objects</firstterm>. This allows arbitrary
|
|
merging of PDF files. The <command>qpdf</command> command-line
|
|
tool provides limited support for basic page selection, including
|
|
merging in pages from other files, but the library's API makes it
|
|
possible to implement arbitrarily complex merging operations. The
|
|
main method for copying foreign objects is
|
|
<function>QPDF::copyForeignObject</function>. This takes an
|
|
indirect object from another <classname>QPDF</classname> and
|
|
copies it recursively into this object while preserving all object
|
|
structure, including circular references. This means you can add
|
|
a direct object that you create from scratch to a
|
|
<classname>QPDF</classname> object with
|
|
<function>QPDF::makeIndirectObject</function>, and you can add an
|
|
indirect object from another file with
|
|
<function>QPDF::copyForeignObject</function>. The fact that
|
|
<function>QPDF::makeIndirectObject</function> does not
|
|
automatically detect a foreign object and copy it is an explicit
|
|
design decision. Copying a foreign object seems like a
|
|
sufficiently significant thing to do that it should be done
|
|
explicitly.
|
|
</para>
|
|
<para>
|
|
The other way to copy foreign objects is by passing a page from
|
|
one <classname>QPDF</classname> to another by calling
|
|
<function>QPDF::addPage</function>. In contrast to
|
|
<function>QPDF::makeIndirectObject</function>, this method
|
|
automatically distinguishes between indirect objects in the
|
|
current file, foreign objects, and direct objects.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.rewriting">
|
|
<title>Writing PDF Files</title>
|
|
<para>
|
|
The qpdf library supports file writing of
|
|
<classname>QPDF</classname> objects to PDF files through the
|
|
<classname>QPDFWriter</classname> class. The
|
|
<classname>QPDFWriter</classname> class has two writing modes: one
|
|
for non-linearized files, and one for linearized files. See <xref
|
|
linkend="ref.linearization"/> for a description of linearization
|
|
is implemented. This section describes how we write
|
|
non-linearized files including the creation of QDF files (see
|
|
<xref linkend="ref.qdf"/>.
|
|
</para>
|
|
<para>
|
|
This outline was written prior to implementation and is not
|
|
exactly accurate, but it provides a correct “notional”
|
|
idea of how writing works. Look at the code in
|
|
<classname>QPDFWriter</classname> for exact details.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Initialize state:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
next object number = 1
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
object queue = empty
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
renumber table: old object id/generation to new id/0 = empty
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
xref table: new id -> offset = empty
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Create a QPDF object from a file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Write header for new PDF file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Request the trailer dictionary.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
For each value that is an indirect object, grab the next object
|
|
number (via an operation that returns and increments the
|
|
number). Map object to new number in renumber table. Push
|
|
object onto queue.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
While there are more objects on the queue:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Pop queue.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Look up object's new number <emphasis>n</emphasis> in the
|
|
renumbering table.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Store current offset into xref table.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Write <literal><replaceable>n</replaceable> 0 obj</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If object is null, whether direct or indirect, write out
|
|
null, thus eliminating unresolvable indirect object
|
|
references.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If the object is a stream stream, write stream contents,
|
|
piped through any filters as required, to a memory buffer.
|
|
Use this buffer to determine the stream length.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If object is not a stream, array, or dictionary, write out
|
|
its contents.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If object is an array or dictionary (including stream),
|
|
traverse its elements (for array) or values (for
|
|
dictionaries), handling recursive dictionaries and arrays,
|
|
looking for indirect objects. When an indirect object is
|
|
found, if it is not resolvable, ignore. (This case is
|
|
handled when writing it out.) Otherwise, look it up in the
|
|
renumbering table. If not found, grab the next available
|
|
object number, assign to the referenced object in the
|
|
renumbering table, and push the referenced object onto the
|
|
queue. As a special case, when writing out a stream
|
|
dictionary, replace length, filters, and decode parameters
|
|
as required.
|
|
</para>
|
|
<para>
|
|
Write out dictionary or array, replacing any unresolvable
|
|
indirect object references with null (pdf spec says
|
|
reference to non-existent object is legal and resolves to
|
|
null) and any resolvable ones with references to the
|
|
renumbered objects.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If the object is a stream, write
|
|
<literal>stream\n</literal>, the stream contents (from the
|
|
memory buffer), and <literal>\nendstream\n</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When done, write <literal>endobj</literal>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Once we have finished the queue, all referenced objects will have
|
|
been written out and all deleted objects or unreferenced objects
|
|
will have been skipped. The new cross-reference table will
|
|
contain an offset for every new object number from 1 up to the
|
|
number of objects written. This can be used to write out a new
|
|
xref table. Finally we can write out the trailer dictionary with
|
|
appropriately computed /ID (see spec, 8.3, File Identifiers), the
|
|
cross reference table offset, and <literal>%%EOF</literal>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.filtered-streams">
|
|
<title>Filtered Streams</title>
|
|
<para>
|
|
Support for streams is implemented through the
|
|
<classname>Pipeline</classname> interface which was designed for
|
|
this package.
|
|
</para>
|
|
<para>
|
|
When reading streams, create a series of
|
|
<classname>Pipeline</classname> objects. The
|
|
<classname>Pipeline</classname> abstract base requires
|
|
implementation <function>write()</function> and
|
|
<function>finish()</function> and provides an implementation of
|
|
<function>getNext()</function>. Each pipeline object, upon
|
|
receiving data, does whatever it is going to do and then writes
|
|
the data (possibly modified) to its successor. Alternatively, a
|
|
pipeline may be an end-of-the-line pipeline that does something
|
|
like store its output to a file or a memory buffer ignoring a
|
|
successor. For additional details, look at
|
|
<filename>Pipeline.hh</filename>.
|
|
</para>
|
|
<para>
|
|
<classname>QPDF</classname> can read raw or filtered streams.
|
|
When reading a filtered stream, the <classname>QPDF</classname>
|
|
class creates a <classname>Pipeline</classname> object for one of
|
|
each appropriate filter object and chains them together. The last
|
|
filter should write to whatever type of output is required. The
|
|
<classname>QPDF</classname> class has an interface to write raw or
|
|
filtered stream contents to a given pipeline.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.linearization">
|
|
<title>Linearization</title>
|
|
<para>
|
|
This chapter describes how <classname>QPDF</classname> and
|
|
<classname>QPDFWriter</classname> implement creation and processing
|
|
of linearized PDFS.
|
|
</para>
|
|
<sect1 id="ref.linearization-strategy">
|
|
<title>Basic Strategy for Linearization</title>
|
|
<para>
|
|
To avoid the incestuous problem of having the qpdf library
|
|
validate its own linearized files, we have a special linearized
|
|
file checking mode which can be invoked via <command>qpdf
|
|
--check-linearization</command> (or <command>qpdf
|
|
--check</command>). This mode reads the linearization parameter
|
|
dictionary and the hint streams and validates that object
|
|
ordering, parameters, and hint stream contents are correct. The
|
|
validation code was first tested against linearized files created
|
|
by external tools (Acrobat and pdlin) and then used to validate
|
|
files created by <classname>QPDFWriter</classname> itself.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearized.preparation">
|
|
<title>Preparing For Linearization</title>
|
|
<para>
|
|
Before creating a linearized PDF file from any other PDF file, the
|
|
PDF file must be altered such that all page attributes are
|
|
propagated down to the page level (and not inherited from parents
|
|
in the <literal>/Pages</literal> tree). We also have to know
|
|
which objects refer to which other objects, being concerned with
|
|
page boundaries and a few other cases. We refer to this part of
|
|
preparing the PDF file as <firstterm>optimization</firstterm>,
|
|
discussed in <xref linkend="ref.optimization"/>. Note the, in
|
|
this context, the term <firstterm>optimization</firstterm> is a
|
|
qpdf term, and the term <firstterm>linearization</firstterm> is a
|
|
term from the PDF specification. Do not be confused by the fact
|
|
that many applications refer to linearization as optimization or
|
|
web optimization.
|
|
</para>
|
|
<para>
|
|
When creating linearized PDF files from optimized PDF files, there
|
|
are really only a few issues that need to be dealt with:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Creation of hints tables
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Placing objects in the correct order
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Filling in offsets and byte sizes
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.optimization">
|
|
<title>Optimization</title>
|
|
<para>
|
|
In order to perform various operations such as linearization and
|
|
splitting files into pages, it is necessary to know which objects
|
|
are referenced by which pages, page thumbnails, and root and
|
|
trailer dictionary keys. It is also necessary to ensure that all
|
|
page-level attributes appear directly at the page level and are
|
|
not inherited from parents in the pages tree.
|
|
</para>
|
|
<para>
|
|
We refer to the process of enforcing these constraints as
|
|
<firstterm>optimization</firstterm>. As mentioned above, note
|
|
that some applications refer to linearization as optimization.
|
|
Although this optimization was initially motivated by the need to
|
|
create linearized files, we are using these terms separately.
|
|
</para>
|
|
<para>
|
|
PDF file optimization is implemented in the
|
|
<filename>QPDF_optimization.cc</filename> source file. That file
|
|
is richly commented and serves as the primary reference for the
|
|
optimization process.
|
|
</para>
|
|
<para>
|
|
After optimization has been completed, the private member
|
|
variables <varname>obj_user_to_objects</varname> and
|
|
<varname>object_to_obj_users</varname> in
|
|
<classname>QPDF</classname> have been populated. Any object that
|
|
has more than one value in the
|
|
<varname>object_to_obj_users</varname> table is shared. Any
|
|
object that has exactly one value in the
|
|
<varname>object_to_obj_users</varname> table is private. To find
|
|
all the private objects in a page or a trailer or root dictionary
|
|
key, one merely has make this determination for each element in
|
|
the <varname>obj_user_to_objects</varname> table for the given
|
|
page or key.
|
|
</para>
|
|
<para>
|
|
Note that pages and thumbnails have different object user types,
|
|
so the above test on a page will not include objects referenced by
|
|
the page's thumbnail dictionary and nothing else.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization.writing">
|
|
<title>Writing Linearized Files</title>
|
|
<para>
|
|
We will create files with only primary hint streams. We will
|
|
never write overflow hint streams. (As of PDF version 1.4,
|
|
Acrobat doesn't either, and they are never necessary.) The hint
|
|
streams contain offset information to objects that point to where
|
|
they would be if the hint stream were not present. This means
|
|
that we have to calculate all object positions before we can
|
|
generate and write the hint table. This means that we have to
|
|
generate the file in two passes. To make this reliable,
|
|
<classname>QPDFWriter</classname> in linearization mode invokes
|
|
exactly the same code twice to write the file to a pipeline.
|
|
</para>
|
|
<para>
|
|
In the first pass, the target pipeline is a count pipeline chained
|
|
to a discard pipeline. The count pipeline simply passes its data
|
|
through to the next pipeline in the chain but can return the
|
|
number of bytes passed through it at any intermediate point. The
|
|
discard pipeline is an end of line pipeline that just throws its
|
|
data away. The hint stream is not written and dummy values with
|
|
adequate padding are stored in the first cross reference table,
|
|
linearization parameter dictionary, and /Prev key of the first
|
|
trailer dictionary. All the offset, length, object renumbering
|
|
information, and anything else we need for the second pass is
|
|
stored.
|
|
</para>
|
|
<para>
|
|
At the end of the first pass, this information is passed to the
|
|
<classname>QPDF</classname> class which constructs a compressed
|
|
hint stream in a memory buffer and returns it.
|
|
<classname>QPDFWriter</classname> uses this information to write a
|
|
complete hint stream object into a memory buffer. At this point,
|
|
the length of the hint stream is known.
|
|
</para>
|
|
<para>
|
|
In the second pass, the end of the pipeline chain is a regular
|
|
file instead of a discard pipeline, and we have known values for
|
|
all the offsets and lengths that we didn't have in the first pass.
|
|
We have to adjust offsets that appear after the start of the hint
|
|
stream by the length of the hint stream, which is known. Anything
|
|
that is of variable length is padded, with the padding code
|
|
surrounding any writing code that differs in the two passes. This
|
|
ensures that changes to the way things are represented never
|
|
results in offsets that were gathered during the first pass
|
|
becoming incorrect for the second pass.
|
|
</para>
|
|
<para>
|
|
Using this strategy, we can write linearized files to a
|
|
non-seekable output stream with only a single pass to disk or
|
|
wherever the output is going.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization-data">
|
|
<title>Calculating Linearization Data</title>
|
|
<para>
|
|
Once a file is optimized, we have information about which objects
|
|
access which other objects. We can then process these tables to
|
|
decide which part (as described in “Linearized PDF Document
|
|
Structure” in the PDF specification) each object is
|
|
contained within. This tells us the exact order in which objects
|
|
are written. The <classname>QPDFWriter</classname> class asks for
|
|
this information and enqueues objects for writing in the proper
|
|
order. It also turns on a check that causes an exception to be
|
|
thrown if an object is encountered that has not already been
|
|
queued. (This could happen only if there were a bug in the
|
|
traversal code used to calculate the linearization data.)
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization-issues">
|
|
<title>Known Issues with Linearization</title>
|
|
<para>
|
|
There are a handful of known issues with this linearization code.
|
|
These issues do not appear to impact the behavior of linearized
|
|
files which still work as intended: it is possible for a web
|
|
browser to begin to display them before they are fully
|
|
downloaded. In fact, it seems that various other programs that
|
|
create linearized files have many of these same issues. These
|
|
items make reference to terminology used in the linearization
|
|
appendix of the PDF specification.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Thread Dictionary information keys appear in part 4 with the
|
|
rest of Threads instead of in part 9. Objects in part 9 are
|
|
not grouped together functionally.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
We are not calculating numerators for shared object positions
|
|
within content streams or interleaving them within content
|
|
streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
We generate only page offset, shared object, and outline hint
|
|
tables. It would be relatively easy to add some additional
|
|
tables. We gather most of the information needed to create
|
|
thumbnail hint tables. There are comments in the code about
|
|
this.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization-debugging">
|
|
<title>Debugging Note</title>
|
|
<para>
|
|
The <command>qpdf --show-linearization</command> command can show
|
|
the complete contents of linearization hint streams. To look at
|
|
the raw data, you can extract the filtered contents of the
|
|
linearization hint tables using <command>qpdf --show-object=n
|
|
--filtered-stream-data</command>. Then, to convert this into a
|
|
bit stream (since linearization tables are bit streams written
|
|
without regard to byte boundaries), you can pipe the resulting
|
|
data through the following perl code:
|
|
|
|
<programlisting>use bytes;
|
|
binmode STDIN;
|
|
undef $/;
|
|
my $a = <STDIN>;
|
|
my @ch = split(//, $a);
|
|
map { printf("%08b", ord($_)) } @ch;
|
|
print "\n";
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.object-and-xref-streams">
|
|
<title>Object and Cross-Reference Streams</title>
|
|
<para>
|
|
This chapter provides information about the implementation of
|
|
object stream and cross-reference stream support in qpdf.
|
|
</para>
|
|
<sect1 id="ref.object-streams">
|
|
<title>Object Streams</title>
|
|
<para>
|
|
Object streams can contain any regular object except the
|
|
following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
stream objects
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
objects with generation > 0
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the encryption dictionary
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
objects containing the /Length of another stream
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
In addition, Adobe reader (at least as of version 8.0.0) appears
|
|
to not be able to handle having the document catalog appear in an
|
|
object stream if the file is encrypted, though this is not
|
|
specifically disallowed by the specification.
|
|
</para>
|
|
<para>
|
|
There are additional restrictions for linearized files. See <xref
|
|
linkend="ref.object-streams-linearization"/>for details.
|
|
</para>
|
|
<para>
|
|
The PDF specification refers to objects in object streams as
|
|
“compressed objects” regardless of whether the object
|
|
stream is compressed.
|
|
</para>
|
|
<para>
|
|
The generation number of every object in an object stream must be
|
|
zero. It is possible to delete and replace an object in an object
|
|
stream with a regular object.
|
|
</para>
|
|
<para>
|
|
The object stream dictionary has the following keys:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>/N</literal>: number of objects
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/First</literal>: byte offset of first object
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Extends</literal>: indirect reference to stream that
|
|
this extends
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Stream collections are formed with <literal>/Extends</literal>.
|
|
They must form a directed acyclic graph. These can be used for
|
|
semantic information and are not meaningful to the PDF document's
|
|
syntactic structure. Although qpdf preserves stream collections,
|
|
it never generates them and doesn't make use of this information
|
|
in any way.
|
|
</para>
|
|
<para>
|
|
The specification recommends limiting the number of objects in
|
|
object stream for efficiency in reading and decoding. Acrobat 6
|
|
uses no more than 100 objects per object stream for linearized
|
|
files and no more 200 objects per stream for non-linearized files.
|
|
<classname>QPDFWriter</classname>, in object stream generation
|
|
mode, never puts more than 100 objects in an object stream.
|
|
</para>
|
|
<para>
|
|
Object stream contents consists of <emphasis>N</emphasis> pairs of
|
|
integers, each of which is the object number and the byte offset
|
|
of the object relative to the first object in the stream, followed
|
|
by the objects themselves, concatenated.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.xref-streams">
|
|
<title>Cross-Reference Streams</title>
|
|
<para>
|
|
For non-hybrid files, the value following
|
|
<literal>startxref</literal> is the byte offset to the xref stream
|
|
rather than the word <literal>xref</literal>.
|
|
</para>
|
|
<para>
|
|
For hybrid files (files containing both xref tables and
|
|
cross-reference streams), the xref table's trailer dictionary
|
|
contains the key <literal>/XRefStm</literal> whose value is the
|
|
byte offset to a cross-reference stream that supplements the xref
|
|
table. A PDF 1.5-compliant application should read the xref table
|
|
first. Then it should replace any object that it has already seen
|
|
with any defined in the xref stream. Then it should follow any
|
|
<literal>/Prev</literal> pointer in the original xref table's
|
|
trailer dictionary. The specification is not clear about what
|
|
should be done, if anything, with a <literal>/Prev</literal>
|
|
pointer in the xref stream referenced by an xref table. The
|
|
<classname>QPDF</classname> class ignores it, which is probably
|
|
reasonable since, if this case were to appear for any sensible PDF
|
|
file, the previous xref table would probably have a corresponding
|
|
<literal>/XRefStm</literal> pointer of its own. For example, if a
|
|
hybrid file were appended, the appended section would have its own
|
|
xref table and <literal>/XRefStm</literal>. The appended xref
|
|
table would point to the previous xref table which would point the
|
|
<literal>/XRefStm</literal>, meaning that the new
|
|
<literal>/XRefStm</literal> doesn't have to point to it.
|
|
</para>
|
|
<para>
|
|
Since xref streams must be read very early, they may not be
|
|
encrypted, and the may not contain indirect objects for keys
|
|
required to read them, which are these:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Type</literal>: value <literal>/XRef</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Size</literal>: value <emphasis>n+1</emphasis>: where
|
|
<emphasis>n</emphasis> is highest object number (same as
|
|
<literal>/Size</literal> in the trailer dictionary)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Index</literal> (optional): value
|
|
<literal>[<replaceable>n count</replaceable> ...]</literal>
|
|
used to determine which objects' information is stored in this
|
|
stream. The default is <literal>[0 /Size]</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Prev</literal>: value
|
|
<replaceable>offset</replaceable>: byte offset of previous xref
|
|
stream (same as <literal>/Prev</literal> in the trailer
|
|
dictionary)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/W [...]</literal>: sizes of each field in the xref
|
|
table
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
The other fields in the xref stream, which may be indirect if
|
|
desired, are the union of those from the xref table's trailer
|
|
dictionary.
|
|
</para>
|
|
<sect2 id="ref.xref-stream-data">
|
|
<title>Cross-Reference Stream Data</title>
|
|
<para>
|
|
The stream data is binary and encoded in big-endian byte order.
|
|
Entries are concatenated, and each entry has a length equal to
|
|
the total of the entries in <literal>/W</literal> above. Each
|
|
entry consists of one or more fields, the first of which is the
|
|
type of the field. The number of bytes for each field is given
|
|
by <literal>/W</literal> above. A 0 in <literal>/W</literal>
|
|
indicates that the field is omitted and has the default value.
|
|
The default value for the field type is
|
|
“<literal>1</literal>”. All other default values are
|
|
“<literal>0</literal>”.
|
|
</para>
|
|
<para>
|
|
PDF 1.5 has three field types:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
0: for free objects. Format: <literal>0 obj
|
|
next-generation</literal>, same as the free table in a
|
|
traditional cross-reference table
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
1: regular non-compressed object. Format: <literal>1 offset
|
|
generation</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
2: for objects in object streams. Format: <literal>2
|
|
object-stream-number index</literal>, the number of object
|
|
stream containing the object and the index within the object
|
|
stream of the object.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
It seems standard to have the first entry in the table be
|
|
<literal>0 0 0</literal> instead of <literal>0 0 ffff</literal>
|
|
if there are no deleted objects.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
<sect1 id="ref.object-streams-linearization">
|
|
<title>Implications for Linearized Files</title>
|
|
<para>
|
|
For linearized files, the linearization dictionary, document
|
|
catalog, and page objects may not be contained in object streams.
|
|
</para>
|
|
<para>
|
|
Objects stored within object streams are given the highest range
|
|
of object numbers within the main and first-page cross-reference
|
|
sections.
|
|
</para>
|
|
<para>
|
|
It is okay to use cross-reference streams in place of regular xref
|
|
tables. There are on special considerations.
|
|
</para>
|
|
<para>
|
|
Hint data refers to object streams themselves, not the objects in
|
|
the streams. Shared object references should also be made to the
|
|
object streams. There are no reference in any hint tables to the
|
|
object numbers of compressed objects (objects within object
|
|
streams).
|
|
</para>
|
|
<para>
|
|
When numbering objects, all shared objects within both the first
|
|
and second halves of the linearized files must be numbered
|
|
consecutively after all normal uncompressed objects in that half.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.object-stream-implementation">
|
|
<title>Implementation Notes</title>
|
|
<para>
|
|
There are three modes for writing object streams:
|
|
<option>disable</option>, <option>preserve</option>, and
|
|
<option>generate</option>. In disable mode, we do not generate
|
|
any object streams, and we also generate an xref table rather than
|
|
xref streams. This can be used to generate PDF files that are
|
|
viewable with older readers. In preserve mode, we write object
|
|
streams such that written object streams contain the same objects
|
|
and <literal>/Extends</literal> relationships as in the original
|
|
file. This is equal to disable if the file has no object streams.
|
|
In generate, we create object streams ourselves by grouping
|
|
objects that are allowed in object streams together in sets of no
|
|
more than 100 objects. We also ensure that the PDF version is at
|
|
least 1.5 in generate mode, but we preserve the version header in
|
|
the other modes. The default is <option>preserve</option>.
|
|
</para>
|
|
<para>
|
|
We do not support creation of hybrid files. When we write files,
|
|
even in preserve mode, we will lose any xref tables and merge any
|
|
appended sections.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<appendix id="ref.release-notes">
|
|
<title>Release Notes</title>
|
|
<para>
|
|
For a detailed list of changes, please see the file
|
|
<filename>ChangeLog</filename> in the source distribution.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>4.0.1: January 17, 2013</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix detection of binary attachments in test suite to avoid
|
|
false test failures on some platforms.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add clarifying comment in <filename>QPDF.hh</filename> to
|
|
methods that return the user password explaining that it is no
|
|
longer possible with newer encryption formats to recover the
|
|
user password knowing the owner password. In earlier
|
|
encryption formats, the user password was encrypted in the
|
|
file using the owner password. In newer encryption formats, a
|
|
separate encryption key is used on the file, and that key is
|
|
independently encrypted using both the user password and the
|
|
owner password.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>4.0.0: December 31, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Major enhancement: support has been added for newer encryption
|
|
schemes supported by version X of Adobe Acrobat. This
|
|
includes use of 127-character passwords, 256-bit encryption
|
|
keys, and the encryption scheme specified in ISO 32000-2, the
|
|
PDF 2.0 specification. This scheme can be chosen from the
|
|
command line by specifying use of 256-bit keys. qpdf also
|
|
supports the deprecated encryption method used by Acrobat IX.
|
|
This encryption style has known security weaknesses and should
|
|
not be used in practice. However, such files exist “in
|
|
the wild,” so support for this scheme is still useful.
|
|
New methods
|
|
<function>QPDFWriter::setR6EncryptionParameters</function>
|
|
(for the PDF 2.0 scheme) and
|
|
<function>QPDFWriter::setR5EncryptionParameters</function>
|
|
(for the deprecated scheme) have been added to enable these
|
|
new encryption schemes. Corresponding functions have been
|
|
added to the C API as well.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Full support for Adobe extension levels in PDF version
|
|
information. Starting with PDF version 1.7, corresponding to
|
|
ISO 32000, Adobe adds new functionality by increasing the
|
|
extension level rather than increasing the version. This
|
|
support includes addition of the
|
|
<function>QPDF::getExtensionLevel</function> method for
|
|
retrieving the document's extension level, addition of
|
|
versions of
|
|
<function>QPDFWriter::setMinimumPDFVersion</function> and
|
|
<function>QPDFWriter::forcePDFVersion</function> that accept
|
|
an extension level, and extended syntax for specifying forced
|
|
and minimum versions on the command line as described in <xref
|
|
linkend="ref.advanced-transformation"/>. Corresponding
|
|
functions have been added to the C API as well.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Minor fixes to prevent qpdf from referencing objects in the
|
|
file that are not referenced in the file's overall structure.
|
|
Most files don't have any such objects, but some files have
|
|
contain unreferenced objects with errors, so these fixes
|
|
prevent qpdf from needlessly rejecting or complaining about
|
|
such objects.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new generalized methods for reading and writing files
|
|
from/to programmer-defined sources. The method
|
|
<function>QPDF::processInputSource</function> allows the
|
|
programmer to use any input source for the input file, and
|
|
<function>QPDFWriter::setOutputPipeline</function> allows the
|
|
programmer to write the output file through any pipeline.
|
|
These methods would make it possible to perform any number of
|
|
specialized operations, such as accessing external storage
|
|
systems, creating bindings for qpdf in other programming
|
|
languages that have their own I/O systems, etc.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method <function>QPDF::getEncryptionKey</function> for
|
|
retrieving the underlying encryption key used in the file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
This release includes a small handful of non-compatible API
|
|
changes. While effort is made to avoid such changes, all the
|
|
non-compatible API changes in this version were to parts of
|
|
the API that would likely never be used outside the library
|
|
itself. In all cases, the altered methods or structures were
|
|
parts of the <classname>QPDF</classname> that were public to
|
|
enable them to be called from either
|
|
<classname>QPDFWriter</classname> or were part of validation
|
|
code that was over-zealous in reporting problems in parts of
|
|
the file that would not ordinarily be referenced. In no case
|
|
did any of the removed methods do anything worse that falsely
|
|
report error conditions in files that were broken in ways that
|
|
didn't matter. The following public parts of the
|
|
<classname>QPDF</classname> class were changed in a
|
|
non-compatible way:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Updated nested <classname>QPDF::EncryptionData</classname>
|
|
class to add fields needed by the newer encryption formats,
|
|
member variables changed to private so that future changes
|
|
will not require breaking backward compatibility.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Added additional parameters to
|
|
<function>compute_data_key</function>, which is used by
|
|
<classname>QPDFWriter</classname> to compute the encryption
|
|
key used to encrypt a specific object.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed the method
|
|
<function>flattenScalarReferences</function>. This method
|
|
was previously used prior to writing a new PDF file, but it
|
|
has the undesired side effect of causing qpdf to read
|
|
objects in the file that were not referenced. Some
|
|
otherwise files have unreferenced objects with errors in
|
|
them, so this could cause qpdf to reject files that would
|
|
be accepted by virtually all other PDF readers. In fact,
|
|
qpdf relied on only a very small part of what
|
|
flattenScalarReferences did, so only this part has been
|
|
preserved, and it is now done directly inside
|
|
<classname>QPDFWriter</classname>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed the method <function>decodeStreams</function>.
|
|
This method was used by the <option>--check</option> option
|
|
of the <command>qpdf</command> command-line tool to force
|
|
all streams in the file to be decoded, but it also suffered
|
|
from the problem of opening otherwise unreferenced streams
|
|
and thus could report false positive. The
|
|
<option>--check</option> option now causes qpdf to go
|
|
through all the motions of writing a new file based on the
|
|
original one, so it will always reference and check exactly
|
|
those parts of a file that any ordinary viewer would check.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed the method
|
|
<function>trimTrailerForWrite</function>. This method was
|
|
used by <classname>QPDFWriter</classname> to modify the
|
|
original QPDF object by removing fields from the trailer
|
|
dictionary that wouldn't apply to the newly written file.
|
|
This functionality, though generally harmless, was a poor
|
|
implementation and has been replaced by having QPDFWriter
|
|
filter these out when copying the trailer rather than
|
|
modifying the original QPDF object. (Note that qpdf never
|
|
modifies the original file itself.)
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Allow the PDF header to appear anywhere in the first 1024
|
|
bytes of the file. This is consistent with what other readers
|
|
do.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix the <command>pkg-config</command> files to list zlib and
|
|
pcre in <function>Requires.private</function> to better
|
|
support static linking using <command>pkg-config</command>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>3.0.2: September 6, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: <function>QPDFWriter::setOutputMemory</function> did
|
|
not work when not used with
|
|
<function>QPDFWriter::setStaticID</function>, which made it
|
|
pretty much useless. This has been fixed.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New API call
|
|
<function>QPDFWriter::setExtraHeaderText</function> inserts
|
|
additional text near the header of the PDF file. The intended
|
|
use case is to insert comments that may be consumed by a
|
|
downstream application, though other use cases may exist.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>3.0.1: August 11, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Version 3.0.0 included addition of files for
|
|
<command>pkg-config</command>, but this was not mentioned in
|
|
the release notes. The release notes for 3.0.0 were updated
|
|
to mention this.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: if an object stream ended with a scalar object not
|
|
followed by space, qpdf would incorrectly report that it
|
|
encountered a premature EOF. This bug has been in qpdf since
|
|
version 2.0.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>3.0.0: August 2, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Acknowledgment: I would like to express gratitude for the
|
|
contributions of Tobias Hoffmann toward the release of qpdf
|
|
version 3.0. He is responsible for most of the implementation
|
|
and design of the new API for manipulating pages, and
|
|
contributed code and ideas for many of the improvements made
|
|
in version 3.0. Without his work, this release would
|
|
certainly not have happened as soon as it did, if at all.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Non-compatible API change:</emphasis> The version of
|
|
<function>QPDFObjectHandle::replaceStreamData</function> that
|
|
uses a <classname>StreamDataProvider</classname> no longer
|
|
requires (or accepts) a <varname>length</varname> parameter.
|
|
See <xref linkend="ref.upgrading-to-3.0"/> for an explanation.
|
|
While care is taken to avoid non-compatible API changes in
|
|
general, an exception was made this time because the new
|
|
interface offers an opportunity to significantly simplify
|
|
calling code.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Support has been added for large files. The test suite
|
|
verifies support for files larger than 4 gigabytes, and manual
|
|
testing has verified support for files larger than 10
|
|
gigabytes. Large file support is available for both 32-bit
|
|
and 64-bit platforms as long as the compiler and underlying
|
|
platforms support it.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Support for page selection (splitting and merging PDF files)
|
|
has been added to the <command>qpdf</command> command-line
|
|
tool. See <xref linkend="ref.page-selection"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Options have been added to the <command>qpdf</command>
|
|
command-line tool for copying encryption parameters from
|
|
another file. See <xref linkend="ref.basic-options"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New methods have been added to the <classname>QPDF</classname>
|
|
object for adding and removing pages. See <xref
|
|
linkend="ref.adding-and-remove-pages"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New methods have been added to the <classname>QPDF</classname>
|
|
object for copying objects from other PDF files. See <xref
|
|
linkend="ref.foreign-objects"/>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A new method <function>QPDFObjectHandle::parse</function> has
|
|
been added for constructing
|
|
<classname>QPDFObjectHandle</classname> objects from a string
|
|
description.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Methods have been added to <classname>QPDFWriter</classname>
|
|
to allow writing to an already open stdio <type>FILE*</type>
|
|
addition to writing to standard output or a named file.
|
|
Methods have been added to <classname>QPDF</classname> to be
|
|
able to process a file from an already open stdio
|
|
<type>FILE*</type>. This makes it possible to read and write
|
|
PDF from secure temporary files that have been unlinked prior
|
|
to being fully read or written.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <function>QPDF::emptyPDF</function> can be used to allow
|
|
creation of PDF files from scratch. The example
|
|
<filename>examples/pdf-create.cc</filename> illustrates how it
|
|
can be used.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Several methods to take
|
|
<classname>PointerHolder<Buffer></classname> can now
|
|
also accept <type>std::string</type> arguments.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Many new convenience methods have been added to the library,
|
|
most in <classname>QPDFObjectHandle</classname>. See
|
|
<filename>ChangeLog</filename> for a full list.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When building on a platform that supports ELF shared libraries
|
|
(such as Linux), symbol versions are enabled by default. They
|
|
can be disabled by passing
|
|
<option>--disable-ld-version-script</option> to
|
|
<command>./configure</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The file <filename>libqpdf.pc</filename> is now installed to
|
|
support <command>pkg-config</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Image comparison tests are off by default now since they are
|
|
not needed to verify a correct build or port of qpdf. They
|
|
are needed only when changing the actual PDF output generated
|
|
by qpdf. You should enable them if you are making deep
|
|
changes to qpdf itself. See <filename>README</filename> for
|
|
details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Large file tests are off by default but can be turned on with
|
|
<command>./configure</command> or by setting an environment
|
|
variable before running the test suite. See
|
|
<filename>README</filename> for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When qpdf's test suite fails, failures are not printed to the
|
|
terminal anymore by default. Instead, find them in
|
|
<filename>build/qtest.log</filename>. For packagers who are
|
|
building with an autobuilder, you can add the
|
|
<option>--enable-show-failed-test-output</option> option to
|
|
<command>./configure</command> to restore the old behavior.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>2.3.1: December 28, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix thread-safety problem resulting from non-thread-safe use
|
|
of the PCRE library.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Made a few minor documentation fixes.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add workaround for a bug that appears in some versions of
|
|
ghostscript to the test suite
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix minor build issue for Visual C++ 2010.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.3.0: August 11, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: when preserving existing encryption on encrypted
|
|
files with cleartext metadata, older qpdf versions would
|
|
generate password-protected files with no valid password.
|
|
This operation now works. This bug only affected files
|
|
created by copying existing encryption parameters; explicit
|
|
encryption with specification of cleartext metadata worked
|
|
before and continues to work.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Enhance <classname>QPDFWriter</classname> with a new
|
|
constructor that allows you to delay the specification of the
|
|
output file. When using this constructor, you may now call
|
|
<function>QPDFWriter::setOutputFilename</function> to specify
|
|
the output file, or you may use
|
|
<function>QPDFWriter::setOutputMemory</function> to cause
|
|
<classname>QPDFWriter</classname> to write the resulting PDF
|
|
file to a memory buffer. You may then use
|
|
<function>QPDFWriter::getBuffer</function> to retrieve the
|
|
memory buffer.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new API call <function>QPDF::replaceObject</function> for
|
|
replacing objects by object ID
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new API call <function>QPDF::swapObjects</function> for
|
|
swapping two objects by object ID
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add <function>QPDFObjectHandle::getDictAsMap</function> and
|
|
<function>QPDFObjectHandle::getArrayAsVector</function> to
|
|
allow retrieval of dictionary objects as maps and array
|
|
objects as vectors.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add functions <function>qpdf_get_info_key</function> and
|
|
<function>qpdf_set_info_key</function> to the C API for
|
|
manipulating string fields of the document's
|
|
<literal>/Info</literal> dictionary.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add functions <function>qpdf_init_write_memory</function>,
|
|
<function>qpdf_get_buffer_length</function>, and
|
|
<function>qpdf_get_buffer</function> to the C API for writing
|
|
PDF files to a memory buffer instead of a file.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>2.2.4: June 25, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix installation and compilation issues; no functionality
|
|
changes.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.3: April 30, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Handle some damaged streams with incorrect characters
|
|
following the stream keyword.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Improve handling of inline images when normalizing content
|
|
streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Enhance error recovery to properly handle files that use
|
|
object 0 as a regular object, which is specifically disallowed
|
|
by the spec.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.2: October 4, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new function <function>qpdf_read_memory</function>
|
|
to the C API to call
|
|
<function>QPDF::processMemoryFile</function>. This was an
|
|
omission in qpdf 2.2.1.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.1: October 1, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new method <function>QPDF::setOutputStreams</function>
|
|
to replace <varname>std::cout</varname> and
|
|
<varname>std::cerr</varname> with other streams for generation
|
|
of diagnostic messages and error messages. This can be useful
|
|
for GUIs or other applications that want to capture any output
|
|
generated by the library to present to the user in some other
|
|
way. Note that QPDF does not write to
|
|
<varname>std::cout</varname> (or the specified output stream)
|
|
except where explicitly mentioned in
|
|
<filename>QPDF.hh</filename>, and that the only use of the
|
|
error stream is for warnings. Note also that output of
|
|
warnings is suppressed when
|
|
<literal>setSuppressWarnings(true)</literal> is called.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method <function>QPDF::processMemoryFile</function>
|
|
for operating on PDF files that are loaded into memory rather
|
|
than in a file on disk.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Give a warning but otherwise ignore empty PDF objects by
|
|
treating them as null. Empty object are not permitted by the
|
|
PDF specification but have been known to appear in some actual
|
|
PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Handle inline image filter abbreviations when the appear as
|
|
stream filter abbreviations. The PDF specification does not
|
|
allow use of stream filter abbreviations in this way, but
|
|
Adobe Reader and some other PDF readers accept them since they
|
|
sometimes appear incorrectly in actual PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Implement miscellaneous enhancements to
|
|
<classname>PointerHolder</classname> and
|
|
<classname>Buffer</classname> to support other changes.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.0: August 14, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new methods to <classname>QPDFObjectHandle</classname>
|
|
(<function>newStream</function> and
|
|
<function>replaceStreamData</function> for creating new
|
|
streams and replacing stream data. This makes it possible to
|
|
perform a wide range of operations that were not previously
|
|
possible.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new helper method in
|
|
<classname>QPDFObjectHandle</classname>
|
|
(<function>addPageContents</function>) for appending or
|
|
prepending new content streams to a page. This method makes
|
|
it possible to manipulate content streams without having to be
|
|
concerned whether a page's contents are a single stream or an
|
|
array of streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method in <classname>QPDFObjectHandle</classname>:
|
|
<function>replaceOrRemoveKey</function>, which replaces a
|
|
dictionary key
|
|
with a given value unless the value is null, in which case it
|
|
removes the key instead.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method in <classname>QPDFObjectHandle</classname>:
|
|
<function>getRawStreamData</function>, which returns the raw
|
|
(unfiltered) stream data into a buffer. This complements the
|
|
<function>getStreamData</function> method, which returns the
|
|
filtered (uncompressed) stream data and can only be used when
|
|
the stream's data is filterable.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Provide two new examples:
|
|
<command>pdf-double-page-size</command> and
|
|
<command>pdf-invert-images</command> that illustrate the newly
|
|
added interfaces.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix a memory leak that would cause loss of a few bytes for
|
|
every object involved in a cycle of object references. Thanks
|
|
to Jian Ma for calling my attention to the leak.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.5: April 25, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Remove restriction of file identifier strings to 16 bytes.
|
|
This unnecessary restriction was preventing qpdf from being
|
|
able to encrypt or decrypt files with identifier strings that
|
|
were not exactly 16 bytes long. The specification imposes no
|
|
such restriction.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.4: April 18, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Apply the same padding calculation fix from version 2.1.2 to
|
|
the main cross reference stream as well.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Since <command>qpdf --check</command> only performs limited
|
|
checks, clarify the output to make it clear that there still
|
|
may be errors that qpdf can't check. This should make it less
|
|
surprising to people when another PDF reader is unable to read
|
|
a file that qpdf thinks is okay.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.3: March 27, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix bug that could cause a failure when rewriting PDF files
|
|
that contain object streams with unreferenced objects that in
|
|
turn reference indirect scalars.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Don't complain about (invalid) AES streams that aren't a
|
|
multiple of 16 bytes. Instead, pad them before decrypting.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.2: January 24, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix bug in padding around first half cross reference stream in
|
|
linearized files. The bug could cause an assertion failure
|
|
when linearizing certain unlucky files.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.1: December 14, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
No changes in functionality; insert missing include in an
|
|
internal library header file to support gcc 4.4, and update
|
|
test suite to ignore broken Adobe Reader installations.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1: October 30, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
This is the first version of qpdf to include Windows support.
|
|
On Windows, it is possible to build a DLL. Additionally, a
|
|
partial C-language API has been introduced, which makes it
|
|
possible to call qpdf functions from non-C++ environments. I
|
|
am very grateful to <!-- Žarko Gajić --> Zarko Gagic (<ulink
|
|
url="http://delphi.about.com/">http://delphi.about.com/</ulink>)
|
|
for tirelessly testing numerous pre-release versions of this
|
|
DLL and providing many excellent suggestions on improving the
|
|
interface.
|
|
</para>
|
|
<para>
|
|
For programming to the C interface, please see the header file
|
|
<filename>qpdf/qpdf-c.h</filename> and the example
|
|
<filename>examples/pdf-linearize.c</filename>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Zarko Gajic has written a Delphi wrapper for qpdf, which can
|
|
be downloaded from qpdf's download side. Zarko's Delphi
|
|
wrapper is released with the same licensing terms as qpdf
|
|
itself and comes with this disclaimer: “Delphi wrapper
|
|
unit <filename>qpdf.pas</filename> created by Zarko Gajic
|
|
(<ulink
|
|
url="http://delphi.about.com/">http://delphi.about.com/</ulink>).
|
|
Use at your own risk and for whatever purpose you want. No
|
|
support is provided. Sample code is provided.”
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Support has been added for AES encryption and crypt filters.
|
|
Although qpdf does not presently support files that use
|
|
PKI-based encryption, with the addition of AES and crypt
|
|
filters, qpdf is now be able to open most encrypted files
|
|
created with newer versions of Acrobat or other PDF creation
|
|
software. Note that I have not been able to get very many
|
|
files encrypted in this way, so it's possible there could
|
|
still be some cases that qpdf can't handle. Please report
|
|
them if you find them.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Many error messages have been improved to include more
|
|
information in hopes of making qpdf a more useful tool for PDF
|
|
experts to use in manually recovering damaged PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Attempt to avoid compressing metadata streams if possible.
|
|
This is consistent with other PDF creation applications.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Provide new command-line options for AES encrypt, cleartext
|
|
metadata, and setting the minimum and forced PDF versions of
|
|
output files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add additional methods to the <classname>QPDF</classname>
|
|
object for querying the document's permissions. Although qpdf
|
|
does not enforce these permissions, it does make them
|
|
available so that applications that use qpdf can enforce
|
|
permissions.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <option>--check</option> option to <command>qpdf</command>
|
|
has been extended to include some additional information.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
There have been a handful of non-compatible API changes. For
|
|
details, see <xref linkend="ref.upgrading-to-2.1"/>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.6: May 3, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Do not attempt to uncompress streams that have decode
|
|
parameters we don't recognize. Earlier versions of qpdf would
|
|
have rejected files with such streams.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.5: March 10, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Improve error handling in the LZW decoder, and fix a small
|
|
error introduced in the previous version with regard to
|
|
handling full tables. The LZW decoder has been more strongly
|
|
verified in this release.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.4: February 21, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Include proper support for LZW streams encoded without the
|
|
“early code change” flag. Special thanks to Atom
|
|
Smasher who reported the problem and provided an input file
|
|
compressed in this way, which I did not previously have.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Implement some improvements to file recovery logic.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.3: February 15, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Compile cleanly with gcc 4.4.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Handle strings encoded as UTF-16BE properly.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.2: June 30, 2008</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Update test suite to work properly with a
|
|
non-<command>bash</command> <filename>/bin/sh</filename> and
|
|
with Perl 5.10. No changes were made to the actual qpdf
|
|
source code itself for this release.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.1: May 6, 2008</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
No changes in functionality or interface. This release
|
|
includes fixes to the source code so that qpdf compiles
|
|
properly and passes its test suite on a broader range of
|
|
platforms. See <filename>ChangeLog</filename> in the source
|
|
distribution for details.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0: April 29, 2008</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
First public release.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</appendix>
|
|
<appendix id="ref.upgrading-to-2.1">
|
|
<title>Upgrading from 2.0 to 2.1</title>
|
|
<para>
|
|
Although, as a general rule, we like to avoid introducing
|
|
source-level incompatibilities in qpdf's interface, there were a
|
|
few non-compatible changes made in this version. A considerable
|
|
amount of source code that uses qpdf will probably compile without
|
|
any changes, but in some cases, you may have to update your code.
|
|
The changes are enumerated here. There are also some new
|
|
interfaces; for those, please refer to the header files.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
QPDF's exception handling mechanism now uses
|
|
<classname>std::logic_error</classname> for internal errors and
|
|
<classname>std::runtime_error</classname> for runtime errors in
|
|
favor of the now removed <classname>QEXC</classname> classes used
|
|
in previous versions. The <classname>QEXC</classname> exception
|
|
classes predated the addition of the
|
|
<filename><stdexcept></filename> header file to the C++
|
|
standard library. Most of the exceptions thrown by the qpdf
|
|
library itself are still of type <classname>QPDFExc</classname>
|
|
which is now derived from
|
|
<classname>std::runtime_error</classname>. Programs that caught
|
|
an instance of <classname>std::exception</classname> and
|
|
displayed it by calling the <function>what()</function> method
|
|
will not need to be changed.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <classname>QPDFExc</classname> class now internally
|
|
represents various fields of the error condition and provides
|
|
interfaces for querying them. Among the fields is a numeric
|
|
error code that can help applications act differently on (a small
|
|
number of) different error conditions. See
|
|
<filename>QPDFExc.hh</filename> for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Warnings can be retrieved from qpdf as instances of
|
|
<classname>QPDFExc</classname> instead of strings.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The nested <classname>QPDF::EncryptionData</classname> class's
|
|
constructor takes an additional argument. This class is
|
|
primarily intended to be used by
|
|
<classname>QPDFWriter</classname>. There's not really anything
|
|
useful an end-user application could do with it. It probably
|
|
shouldn't really be part of the public interface to begin with.
|
|
Likewise, some of the methods for computing internal encryption
|
|
dictionary parameters have changed to support
|
|
<literal>/R=4</literal> encryption.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The method <function>QPDF::getUserPassword</function> has been
|
|
removed since it didn't do what people would think it did. There
|
|
are now two new methods:
|
|
<function>QPDF::getPaddedUserPassword</function> and
|
|
<function>QPDF::getTrimmedUserPassword</function>. The first one
|
|
does what the old <function>QPDF::getUserPassword</function>
|
|
method used to do, which is to return the password with possible
|
|
binary padding as specified by the PDF specification. The second
|
|
one returns a human-readable password string.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The enumerated types that used to be nested in
|
|
<classname>QPDFWriter</classname> have moved to top-level
|
|
enumerated types and are now defined in the file
|
|
<filename>qpdf/Constants.h</filename>. This enables them to be
|
|
shared by both the C and C++ interfaces.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</appendix>
|
|
<appendix id="ref.upgrading-to-3.0">
|
|
<title>Upgrading to 3.0</title>
|
|
<para>
|
|
For the most part, the API for qpdf version 3.0 is backward
|
|
compatible with versions 2.1 and later. There are two exceptions:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The method
|
|
<function>QPDFObjectHandle::replaceStreamData</function> that
|
|
uses a <classname>StreamDataProvider</classname> to provide the
|
|
stream data no longer takes a <varname>length</varname>
|
|
parameter. While it would have been easy enough to keep the
|
|
parameter for backward compatibility, in this case, the
|
|
parameter was removed since this provides the user an
|
|
opportunity to simplify the calling code. This method was
|
|
introduced in version 2.2. At the time, the
|
|
<varname>length</varname> parameter was required in order to
|
|
ensure that calls to the stream data provider returned the same
|
|
length for a specific stream every time they were invoked. In
|
|
particular, the linearization code depends on this. Instead,
|
|
qpdf 3.0 and newer check for that constraint explicitly. The
|
|
first time the stream data provider is called for a specific
|
|
stream, the actual length is saved, and subsequent calls are
|
|
required to return the same number of bytes. This means the
|
|
calling code no longer has to compute the length in advance,
|
|
which can be a significant simplification. If your code fails
|
|
to compile because of the extra argument and you don't want to
|
|
make other changes to your code, just omit the argument.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Many methods take <type>long long</type> instead of other
|
|
integer types. Most if not all existing code should compile
|
|
fine with this change since such parameters had always
|
|
previously been smaller types. This change was required to
|
|
support files larger than two gigabytes in size.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</appendix>
|
|
<appendix id="ref.upgrading-to-4.0">
|
|
<title>Upgrading to 4.0</title>
|
|
<para>
|
|
While version 4.0 includes a few non-compatible API changes, it is
|
|
very unlikely that anyone's code would have used any of those parts
|
|
of the API since they generally required information that would
|
|
only be available inside the library. In the unlikely event that
|
|
you should run into trouble, please see the ChangeLog. See also
|
|
<xref linkend="ref.release-notes"/> for a complete list of the
|
|
non-compatible API changes made in this version.
|
|
</para>
|
|
</appendix>
|
|
</book>
|