mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-03 15:17:29 +00:00
5921 lines
236 KiB
XML
5921 lines
236 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE book [
|
|
<!ENTITY ldquo "“">
|
|
<!ENTITY rdquo "”">
|
|
<!ENTITY mdash "—">
|
|
<!ENTITY ndash "–">
|
|
<!ENTITY nbsp " ">
|
|
<!ENTITY swversion "8.2.1">
|
|
<!ENTITY lastreleased "August 18, 2018">
|
|
]>
|
|
<book>
|
|
<bookinfo>
|
|
<title>QPDF Manual</title>
|
|
<subtitle>For QPDF Version &swversion;, &lastreleased;</subtitle>
|
|
<author>
|
|
<firstname>Jay</firstname><surname>Berkenbilt</surname>
|
|
</author>
|
|
<copyright>
|
|
<year>2005–2018</year>
|
|
<holder>Jay Berkenbilt</holder>
|
|
</copyright>
|
|
</bookinfo>
|
|
<preface id="acknowledgments">
|
|
<title>General Information</title>
|
|
<para>
|
|
QPDF is a program that does structural, content-preserving
|
|
transformations on PDF files. QPDF's website is located at <ulink
|
|
url="http://qpdf.sourceforge.net/">http://qpdf.sourceforge.net/</ulink>.
|
|
QPDF's source code is hosted on github at <ulink
|
|
url="https://github.com/qpdf/qpdf">https://github.com/qpdf/qpdf</ulink>.
|
|
</para>
|
|
<para>
|
|
QPDF is licensed under <ulink
|
|
url="http://www.apache.org/licenses/LICENSE-2.0">the Apache
|
|
License, Version 2.0</ulink> (the "License"). Unless required by
|
|
applicable law or agreed to in writing, software distributed under
|
|
the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
|
|
OR CONDITIONS OF ANY KIND, either express or implied. See the
|
|
License for the specific language governing permissions and
|
|
limitations under the License.
|
|
</para>
|
|
<para>
|
|
Versions of qpdf prior to version 7 were released under the terms
|
|
of <ulink url="https://opensource.org/licenses/Artistic-2.0">the
|
|
Artistic License, version 2.0</ulink>. At your option, you may
|
|
continue to consider qpdf to be licensed under those terms. The
|
|
Apache License 2.0 permits everything that the Artistic License 2.0
|
|
permits but is slightly less restrictive. Allowing the Artistic
|
|
License to continue being used is primary to help people who may
|
|
have to get specific approval to use qpdf in their products.
|
|
</para>
|
|
<para>
|
|
QPDF is intentionally released with a permissive license. However,
|
|
if there is some reason that the licensing terms don't work for
|
|
your requirements, please feel free to contact the copyright holder
|
|
to make other arrangements.
|
|
</para>
|
|
<para>
|
|
QPDF was originally created in 2001 and modified periodically
|
|
between 2001 and 2005 during my employment at <ulink
|
|
url="http://www.apexcovantage.com">Apex CoVantage</ulink>. Upon my
|
|
departure from Apex, the company graciously allowed me to take
|
|
ownership of the software and continue maintaining as an open
|
|
source project, a decision for which I am very grateful. I have
|
|
made considerable enhancements to it since that time. I feel
|
|
fortunate to have worked for people who would make such a decision.
|
|
This work would not have been possible without their support.
|
|
</para>
|
|
</preface>
|
|
<chapter id="ref.overview">
|
|
<title>What is QPDF?</title>
|
|
<para>
|
|
QPDF is a program that does structural, content-preserving
|
|
transformations on PDF files. It could have been called something
|
|
like <emphasis>pdf-to-pdf</emphasis>. It also provides many useful
|
|
capabilities to developers of PDF-producing software or for people
|
|
who just want to look at the innards of a PDF file to learn more
|
|
about how they work.
|
|
</para>
|
|
<para>
|
|
With QPDF, it is possible to copy objects from one PDF file into
|
|
another and to manipulate the list of pages in a PDF file. This
|
|
makes it possible to merge and split PDF files. The QPDF library
|
|
also makes it possible for you to create PDF files from scratch.
|
|
In this mode, you are responsible for supplying all the contents of
|
|
the file, while the QPDF library takes care off all the syntactical
|
|
representation of the objects, creation of cross references tables
|
|
and, if you use them, object streams, encryption, linearization,
|
|
and other syntactic details. You are still responsible for
|
|
generating PDF content on your own.
|
|
</para>
|
|
<para>
|
|
QPDF has been designed with very few external dependencies, and it
|
|
is intentionally very lightweight. QPDF is
|
|
<emphasis>not</emphasis> a PDF content creation library, a PDF
|
|
viewer, or a program capable of converting PDF into other formats.
|
|
In particular, QPDF knows nothing about the semantics of PDF
|
|
content streams. If you are looking for something that can do
|
|
that, you should look elsewhere. However, once you have a valid
|
|
PDF file, QPDF can be used to transform that file in ways perhaps
|
|
your original PDF creation can't handle. For example, many
|
|
programs generate simple PDF files but can't password-protect them,
|
|
web-optimize them, or perform other transformations of that type.
|
|
</para>
|
|
</chapter>
|
|
<chapter id="ref.installing">
|
|
<title>Building and Installing QPDF</title>
|
|
<para>
|
|
This chapter describes how to build and install qpdf. Please see
|
|
also the <filename>README.md</filename> and
|
|
<filename>INSTALL</filename> files in the source distribution.
|
|
</para>
|
|
<sect1 id="ref.prerequisites">
|
|
<title>System Requirements</title>
|
|
<para>
|
|
The qpdf package has few external dependencies. In order to build
|
|
qpdf, the following packages are required:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
zlib: <ulink url="http://www.zlib.net/">http://www.zlib.net/</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
jpeg: <ulink
|
|
url="http://www.ijg.org/files/">http://www.ijg.org/files/</ulink>
|
|
or <ulink
|
|
url="https://libjpeg-turbo.org/">https://libjpeg-turbo.org/</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
gnu make 3.81 or newer: <ulink url="http://www.gnu.org/software/make">http://www.gnu.org/software/make</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
perl version 5.8 or newer:
|
|
<ulink url="http://www.perl.org/">http://www.perl.org/</ulink>;
|
|
required for <command>fix-qdf</command> and the test suite.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
GNU diffutils (any version): <ulink
|
|
url="http://www.gnu.org/software/diffutils/">http://www.gnu.org/software/diffutils/</ulink>
|
|
is required to run the test suite. Note that this is the
|
|
version of diff present on virtually all GNU/Linux systems.
|
|
This is required because the test suite uses <command>diff
|
|
-u</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A C++ compiler that works well with STL and has the <type>long
|
|
long</type> type. Most modern C++ compilers should fit the bill
|
|
fine. QPDF is tested with gcc, clang, and Microsoft Visual C++.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Part of qpdf's test suite does comparisons of the contents PDF
|
|
files by converting them images and comparing the images. The
|
|
image comparison tests are disabled by default. Those tests are
|
|
not required for determining correctness of a qpdf build if you
|
|
have not modified the code since the test suite also contains
|
|
expected output files that are compared literally. The image
|
|
comparison tests provide an extra check to make sure that any
|
|
content transformations don't break the rendering of pages.
|
|
Transformations that affect the content streams themselves are off
|
|
by default and are only provided to help developers look into the
|
|
contents of PDF files. If you are making deep changes to the
|
|
library that cause changes in the contents of the files that qpdf
|
|
generates, then you should enable the image comparison tests.
|
|
Enable them by running <command>configure</command> with the
|
|
<option>--enable-test-compare-images</option> flag. If you enable
|
|
this, the following additional requirements are required by the
|
|
test suite. Note that in no case are these items required to use
|
|
qpdf.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
libtiff: <ulink url="http://www.remotesensing.org/libtiff/">http://www.remotesensing.org/libtiff/</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
GhostScript version 8.60 or newer: <ulink
|
|
url="http://www.ghostscript.com">http://www.ghostscript.com</ulink>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
If you do not enable this, then you do not need to have tiff and
|
|
ghostscript.
|
|
</para>
|
|
<para>
|
|
If Adobe Reader is installed as <command>acroread</command>, some
|
|
additional test cases will be enabled. These test cases simply
|
|
verify that Adobe Reader can open the files that qpdf creates.
|
|
They require version 8.0 or newer to pass. However, in order to
|
|
avoid having qpdf depend on non-free (as in liberty) software, the
|
|
test suite will still pass without Adobe reader, and the test
|
|
suite still exercises the full functionality of the software.
|
|
</para>
|
|
<para>
|
|
Pre-built documentation is distributed with qpdf, so you should
|
|
generally not need to rebuild the documentation. In order to
|
|
build the documentation from its docbook sources, you need the
|
|
docbook XML style sheets (<ulink
|
|
url="http://downloads.sourceforge.net/docbook/">http://downloads.sourceforge.net/docbook/</ulink>).
|
|
To build the PDF version of the documentation, you need Apache fop
|
|
(<ulink
|
|
url="http://xml.apache.org/fop/">http://xml.apache.org/fop/</ulink>)
|
|
version 0.94 or higher.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.building">
|
|
<title>Build Instructions</title>
|
|
<para>
|
|
Building qpdf on UNIX is generally just a matter of running
|
|
|
|
<programlisting>./configure
|
|
make
|
|
</programlisting>
|
|
You can also run <command>make check</command> to run the test
|
|
suite and <command>make install</command> to install. Please run
|
|
<command>./configure --help</command> for options on what can be
|
|
configured. You can also set the value of
|
|
<varname>DESTDIR</varname> during installation to install to a
|
|
temporary location, as is common with many open source packages.
|
|
Please see also the <filename>README.md</filename> and
|
|
<filename>INSTALL</filename> files in the source distribution.
|
|
</para>
|
|
<para>
|
|
Building on Windows is a little bit more complicated. For
|
|
details, please see <filename>README-windows.md</filename> in the
|
|
source distribution. You can also download a binary distribution
|
|
for Windows. There is a port of qpdf to Visual C++ version 6 in
|
|
the <filename>contrib</filename> area generously contributed by
|
|
Jian Ma. This is also discussed in more detail in
|
|
<filename>README-windows.md</filename>.
|
|
</para>
|
|
<para>
|
|
There are some other things you can do with the build. Although
|
|
qpdf uses <application>autoconf</application>, it does not use
|
|
<application>automake</application> but instead uses a
|
|
hand-crafted non-recursive Makefile that requires gnu make. If
|
|
you're really interested, please read the comments in the
|
|
top-level <filename>Makefile</filename>.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.using">
|
|
<title>Running QPDF</title>
|
|
<para>
|
|
This chapter describes how to run the qpdf program from the command
|
|
line.
|
|
</para>
|
|
<sect1 id="ref.invocation">
|
|
<title>Basic Invocation</title>
|
|
<para>
|
|
When running qpdf, the basic invocation is as follows:
|
|
|
|
<programlisting><command>qpdf</command><option> [ <replaceable>options</replaceable> ] <replaceable>infilename</replaceable> [ <replaceable>outfilename</replaceable> ]</option>
|
|
</programlisting>
|
|
This converts PDF file <option>infilename</option> to PDF file
|
|
<option>outfilename</option>. The output file is functionally
|
|
identical to the input file but may have been structurally
|
|
reorganized. Also, orphaned objects will be removed from the
|
|
file. Many transformations are available as controlled by the
|
|
options below. In place of <option>infilename</option>, the
|
|
parameter <option>--empty</option> may be specified. This causes
|
|
qpdf to use a dummy input file that contains zero pages. The only
|
|
normal use case for using <option>--empty</option> would be if you
|
|
were going to add pages from another source, as discussed in <xref
|
|
linkend="ref.page-selection"/>.
|
|
</para>
|
|
<para>
|
|
If <option>@filename</option> appears anywhere in the
|
|
command-line, it will be read line by line, and each line will be
|
|
treated as a command-line argument. The <option>@-</option> option
|
|
allows arguments to be read from standard input. This allows qpdf
|
|
to be invoked with an arbitrary number of arbitrarily long
|
|
arguments. It is also very useful for avoiding having to pass
|
|
passwords on the command line.
|
|
</para>
|
|
<para>
|
|
<option>outfilename</option> does not have to be seekable, even
|
|
when generating linearized files. Specifying
|
|
“<option>-</option>” as <option>outfilename</option>
|
|
means to write to standard output. However, you can't specify the
|
|
same file as both the input and the output because qpdf reads data
|
|
from the input file as it writes to the output file. QPDF attempts
|
|
to detect this case and fail without overwriting the output file.
|
|
</para>
|
|
<para>
|
|
Most options require an output file, but some testing or
|
|
inspection commands do not. These are specifically noted.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.shell-completion">
|
|
<title>Shell Completion</title>
|
|
<para>
|
|
Starting in qpdf version 8.3.0, qpdf provides its own completion
|
|
support for zsh and bash. You can enable bash completion with
|
|
<command>eval $(qpdf --completion-bash)</command> and zsh
|
|
completion with <command>eval $(qpdf --completion-zsh)</command>.
|
|
If <command>qpdf</command> is not in your path, you should invoke
|
|
it above with an absolute path. If you invoke it with a relative
|
|
path, it will warn you, and the completion won't work if you're in
|
|
a different directory.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.basic-options">
|
|
<title>Basic Options</title>
|
|
<para>
|
|
The following options are the most common ones and perform
|
|
commonly needed transformations.
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--help</option></term>
|
|
<listitem>
|
|
<para>
|
|
Display command-line invocation help.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--version</option></term>
|
|
<listitem>
|
|
<para>
|
|
Display the current version of qpdf.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--copyright</option></term>
|
|
<listitem>
|
|
<para>
|
|
Show detailed copyright information.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--completion-bash</option></term>
|
|
<listitem>
|
|
<para>
|
|
Output a completion command you can eval to enable shell
|
|
completion from bash.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--completion-zsh</option></term>
|
|
<listitem>
|
|
<para>
|
|
Output a completion command you can eval to enable shell
|
|
completion from zsh.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--password=password</option></term>
|
|
<listitem>
|
|
<para>
|
|
Specifies a password for accessing encrypted files. Note that
|
|
you can use <option>@filename</option> or <option>@-</option>
|
|
as described above to put the password in a file or pass it
|
|
via standard input so you can avoid specifying it on the
|
|
command line.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--verbose</option></term>
|
|
<listitem>
|
|
<para>
|
|
Increase verbosity of output. For now, this just prints some
|
|
indication of any file that it creates.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--progress</option></term>
|
|
<listitem>
|
|
<para>
|
|
Indicate progress while writing files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--no-warn</option></term>
|
|
<listitem>
|
|
<para>
|
|
Suppress writing of warnings to stderr. If warnings were
|
|
detected and suppressed, <command>qpdf</command> will still
|
|
exit with exit code 3.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--linearize</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation of a linearized (web-optimized) output file.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--copy-encryption=file</option></term>
|
|
<listitem>
|
|
<para>
|
|
Encrypt the file using the same encryption parameters,
|
|
including user and owner password, as the specified file. Use
|
|
<option>--encrypt-file-password</option> to specify a password
|
|
if one is needed to open this file. Note that copying the
|
|
encryption parameters from a file also copies the first half
|
|
of <literal>/ID</literal> from the file since this is part of
|
|
the encryption parameters.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--encrypt-file-password=password</option></term>
|
|
<listitem>
|
|
<para>
|
|
If the file specified with <option>--copy-encryption</option>
|
|
requires a password, specify the password using this option.
|
|
Note that only one of the user or owner password is required.
|
|
Both passwords will be preserved since QPDF does not
|
|
distinguish between the two passwords. It is possible to
|
|
preserve encryption parameters, including the owner password,
|
|
from a file even if you don't know the file's owner password.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--encrypt options --</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation an encrypted output file. Please see <xref
|
|
linkend="ref.encryption-options"/> for details on how to
|
|
specify encryption parameters.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--decrypt</option></term>
|
|
<listitem>
|
|
<para>
|
|
Removes any encryption on the file. A password must be
|
|
supplied if the file is password protected.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--password-is-hex-key</option></term>
|
|
<listitem>
|
|
<para>
|
|
Overrides the usual computation/retrieval of the PDF file's
|
|
encryption key from user/owner password with an explicit
|
|
specification of the encryption key. When this option is
|
|
specified, the argument to the <option>--password</option>
|
|
option is interpreted as a hexadecimal-encoded key value. This
|
|
only applies to the password used to open the main input file.
|
|
It does not apply to other files opened by
|
|
<option>--pages</option> or other options or to files being
|
|
written.
|
|
</para>
|
|
<para>
|
|
Most users will never have a need for this option, and no
|
|
standard viewers support this mode of operation, but it can be
|
|
useful for forensic or investigatory purposes. For example, if
|
|
a PDF file is encrypted with an unknown password, a
|
|
brute-force attack using the key directly is sometimes more
|
|
efficient than one using the password. Also, if a file is
|
|
heavily damaged, it may be possible to derive the encryption
|
|
key and recover parts of the file using it directly. To expose
|
|
the encryption key used by an encrypted file that you can open
|
|
normally, use the <option>--show-encryption-key</option>
|
|
option.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--rotate=[+|-]angle[:page-range]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Apply rotation to specified pages. The
|
|
<option>page-range</option> portion of the option value has
|
|
the same format as page ranges in <xref
|
|
linkend="ref.page-selection"/>. If the page range is omitted,
|
|
the rotation is applied to all pages. The
|
|
<option>angle</option> portion of the parameter may be either
|
|
90, 180, or 270. If preceded by <option>+</option> or
|
|
<option>-</option>, the angle is added to or subtracted from
|
|
the specified pages' original rotations. Otherwise the pages'
|
|
rotations are set to the exact value. For example, the command
|
|
<command>qpdf in.pdf out.pdf --rotate=+90:2,4,6
|
|
--rotate=180:7-8</command> would rotate pages 2, 4, and 6 90
|
|
degrees clockwise from their original rotation and force the
|
|
rotation of pages 7 through 9 to 180 degrees regardless of
|
|
their original rotation, and the command <command>qpdf in.pdf
|
|
out.pdf --rotate=180</command> would rotate all pages by 180
|
|
degrees.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--keep-files-open=<replaceable>[yn]</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
This option controls whether qpdf keeps individual files open
|
|
while merging. Prior to version 8.1.0, qpdf always kept all
|
|
files open, but this meant that the number of files that could
|
|
be merged was limited by the operating system's open file
|
|
limit. Version 8.1.0 opened files as they were referenced and
|
|
closed them after each read, but this caused a major
|
|
performance impact. Version 8.2.0 optimized the performance
|
|
but did so in a way that, for local file systems, there was a
|
|
small but unavoidable performance hit, but for networked file
|
|
systems, the performance impact could be very high. Starting
|
|
with version 8.2.1, the default behavior is that files are
|
|
kept open if no more than 200 files are specified, but that
|
|
the behavior can be explicitly overridden with the
|
|
<option>--keep-files-open</option> flag. If you are merging
|
|
more than 200 files but less than the operating system's max
|
|
open files limit, you may want to use
|
|
<option>--keep-files-open=y</option>, especially if working
|
|
over a networked file system. If you are using a local file
|
|
system where the overhead is low and you might sometimes merge
|
|
more than the OS limit's number of files from a script and are
|
|
not worried about a few seconds additional processing time,
|
|
you may want to specify <option>--keep-files-open=n</option>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--pages options --</option></term>
|
|
<listitem>
|
|
<para>
|
|
Select specific pages from one or more input files. See <xref
|
|
linkend="ref.page-selection"/> for details on how to do page
|
|
selection (splitting and merging).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--split-pages=[n]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Write each group of <option>n</option> pages to a separate
|
|
output file. If <option>n</option> is not specified, create
|
|
single pages. Output file names are generated as follows:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
If the string <literal>%d</literal> appears in the output
|
|
file name, it is replaced with a range of zero-padded page
|
|
numbers starting from 1.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Otherwise, if the output file name ends in
|
|
<filename>.pdf</filename> (case insensitive), a zero-padded
|
|
page range, preceded by a dash, is inserted before the file
|
|
extension.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Otherwise, the file name is appended with a zero-padded
|
|
page range preceded by a dash.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Page ranges are a single number in the case of single-page
|
|
groups or two numbers separated by a dash otherwise.
|
|
For example, if <filename>infile.pdf</filename> has 12 pages
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<command>qpdf --split-pages infile.pdf %d-out</command>
|
|
would generate files <filename>01-out</filename> through
|
|
<filename>12-out</filename>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>qpdf --split-pages=2 infile.pdf
|
|
outfile.pdf</command> would generate files
|
|
<filename>outfile-01-02.pdf</filename> through
|
|
<filename>outfile-11-12.pdf</filename>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>qpdf --split-pages infile.pdf
|
|
something.else</command> would generate files
|
|
<filename>something.else-01</filename> through
|
|
<filename>something.else-12</filename>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Note that outlines, threads, and other global features of the
|
|
original PDF file are not preserved. For each page of output,
|
|
this option creates an empty PDF and copies a single page from
|
|
the output into it. If you require the global data, you will
|
|
have to run <command>qpdf</command> with the
|
|
<option>--pages</option> option once for each file. Using
|
|
<option>--split-pages</option> is much faster if you don't
|
|
require the global data.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
Password-protected files may be opened by specifying a password.
|
|
By default, qpdf will preserve any encryption data associated with
|
|
a file. If <option>--decrypt</option> is specified, qpdf will
|
|
attempt to remove any encryption information. If
|
|
<option>--encrypt</option> is specified, qpdf will replace the
|
|
document's encryption parameters with whatever is specified.
|
|
</para>
|
|
<para>
|
|
Note that qpdf does not obey encryption restrictions already
|
|
imposed on the file. Doing so would be meaningless since qpdf can
|
|
be used to remove encryption from the file entirely. This
|
|
functionality is not intended to be used for bypassing copyright
|
|
restrictions or other restrictions placed on files by their
|
|
producers.
|
|
</para>
|
|
<para>
|
|
In all cases where qpdf allows specification of a password, care
|
|
must be taken if the password contains characters that fall
|
|
outside of the 7-bit US-ASCII character range to ensure that the
|
|
exact correct byte sequence is provided. It is possible that a
|
|
future version of qpdf may handle this more gracefully. For
|
|
example, if a password was encrypted using a password that was
|
|
encoded in ISO-8859-1 and your terminal is configured to use
|
|
UTF-8, the password you supply may not work properly. There are
|
|
various approaches to handling this. For example, if you are
|
|
using Linux and have the iconv executable installed, you could
|
|
pass <option>--password=`echo <replaceable>password</replaceable>
|
|
| iconv -t iso-8859-1`</option> to qpdf where
|
|
<replaceable>password</replaceable> is a password specified in
|
|
your terminal's locale. A detailed discussion of this is out of
|
|
scope for this manual, but just be aware of this issue if you have
|
|
trouble with a password that contains 8-bit characters.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.encryption-options">
|
|
<title>Encryption Options</title>
|
|
<para>
|
|
To change the encryption parameters of a file, use the --encrypt
|
|
flag. The syntax is
|
|
|
|
<programlisting><option>--encrypt <replaceable>user-password</replaceable> <replaceable>owner-password</replaceable> <replaceable>key-length</replaceable> [ <replaceable>restrictions</replaceable> ] --</option>
|
|
</programlisting>
|
|
Note that “<option>--</option>” terminates parsing of
|
|
encryption flags and must be present even if no restrictions are
|
|
present.
|
|
</para>
|
|
<para>
|
|
Either or both of the user password and the owner password may be
|
|
empty strings.
|
|
</para>
|
|
<para>
|
|
The value for
|
|
<option><replaceable>key-length</replaceable></option> may be 40,
|
|
128, or 256. The restriction flags are dependent upon key length.
|
|
When no additional restrictions are given, the default is to be
|
|
fully permissive.
|
|
</para>
|
|
<para>
|
|
If <option><replaceable>key-length</replaceable></option> is 40,
|
|
the following restriction options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--print=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow printing.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--modify=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow document modification.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--extract=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow text/image extraction.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--annotate=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow comments and form fill-in
|
|
and signing.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
If <option><replaceable>key-length</replaceable></option> is 128,
|
|
the following restriction options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--accessibility=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow accessibility to visually
|
|
impaired.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--extract=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Determines whether or not to allow text/graphic extraction.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--print=<replaceable>print-opt</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls printing access.
|
|
<option><replaceable>print-opt</replaceable></option> may be
|
|
one of the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>full</option>: allow full printing
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>low</option>: allow low-resolution printing only
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>none</option>: disallow printing
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--modify=<replaceable>modify-opt</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls modify access.
|
|
<option><replaceable>modify-opt</replaceable></option> may be
|
|
one of the following, each of which implies all the options
|
|
that follow it:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>all</option>: allow full document modification
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>annotate</option>: allow comment authoring and form operations
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>form</option>: allow form field fill-in and signing
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>assembly</option>: allow document assembly only
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>none</option>: allow no modifications
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--cleartext-metadata</option></term>
|
|
<listitem>
|
|
<para>
|
|
If specified, any metadata stream in the document will be left
|
|
unencrypted even if the rest of the document is encrypted.
|
|
This also forces the PDF version to be at least 1.5.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--use-aes=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
If <option>--use-aes=y</option> is specified, AES encryption
|
|
will be used instead of RC4 encryption. This forces the PDF
|
|
version to be at least 1.6.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-V4</option></term>
|
|
<listitem>
|
|
<para>
|
|
Use of this option forces the <literal>/V</literal> and
|
|
<literal>/R</literal> parameters in the document's encryption
|
|
dictionary to be set to the value <literal>4</literal>. As
|
|
qpdf will automatically do this when required, there is no
|
|
reason to ever use this option. It exists primarily for use
|
|
in testing qpdf itself. This option also forces the PDF
|
|
version to be at least 1.5.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
If <option><replaceable>key-length</replaceable></option> is 256,
|
|
the minimum PDF version is 1.7 with extension level 8, and the
|
|
AES-based encryption format used is the PDF 2.0 encryption method
|
|
supported by Acrobat X. the same options are available as with
|
|
128 bits with the following exceptions:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--use-aes</option></term>
|
|
<listitem>
|
|
<para>
|
|
This option is not available with 256-bit keys. AES is always
|
|
used with 256-bit encryption keys.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-V4</option></term>
|
|
<listitem>
|
|
<para>
|
|
This option is not available with 256 keys.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-R5</option></term>
|
|
<listitem>
|
|
<para>
|
|
If specified, qpdf sets the minimum version to 1.7 at
|
|
extension level 3 and writes the deprecated encryption format
|
|
used by Acrobat version IX. This option should not be used in
|
|
practice to generate PDF files that will be in general use,
|
|
but it can be useful to generate files if you are trying to
|
|
test proper support in another application for PDF files
|
|
encrypted in this way.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
The default for each permission option is to be fully permissive.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.page-selection">
|
|
<title>Page Selection Options</title>
|
|
<para>
|
|
Starting with qpdf 3.0, it is possible to split and merge PDF
|
|
files by selecting pages from one or more input files. Whatever
|
|
file is given as the primary input file is used as the starting
|
|
point, but its pages are replaced with pages as specified.
|
|
|
|
<programlisting><option>--pages <replaceable>input-file</replaceable> [ <replaceable>--password=password</replaceable> ] [ <replaceable>page-range</replaceable> ] [ ... ] --</option>
|
|
</programlisting>
|
|
Multiple input files may be specified. Each one is given as the
|
|
name of the input file, an optional password (if required to open
|
|
the file), and the range of pages. Note that
|
|
“<option>--</option>” terminates parsing of page
|
|
selection flags.
|
|
</para>
|
|
<para>
|
|
For each file that pages should be taken from, specify the file, a
|
|
password needed to open the file (if any), and a page range. The
|
|
password needs to be given only once per file. If any of the
|
|
input files are the same as the primary input file or the file
|
|
used to copy encryption parameters (if specified), you do not need
|
|
to repeat the password here. The same file can be repeated
|
|
multiple times. If a file that is repeated has a password, the
|
|
password only has to be given the first time. All non-page data
|
|
(info, outlines, page numbers, etc.) are taken from the primary
|
|
input file. To discard these, use <option>--empty</option> as the
|
|
primary input.
|
|
</para>
|
|
<para>
|
|
Starting with qpdf 5.0.0, it is possible to omit the page range.
|
|
If qpdf sees a value in the place where it expects a page range
|
|
and that value is not a valid range but is a valid file name, qpdf
|
|
will implicitly use the range <literal>1-z</literal>, meaning that
|
|
it will include all pages in the file. This makes it possible to
|
|
easily combine all pages in a set of files with a command like
|
|
<command>qpdf --empty out.pdf --pages *.pdf --</command>.
|
|
</para>
|
|
<para>
|
|
It is not presently possible to specify the same page from the
|
|
same file directly more than once, but you can make this work by
|
|
specifying two different paths to the same file (such as by
|
|
putting <filename>./</filename> somewhere in the path). This can
|
|
also be used if you want to repeat a page from one of the input
|
|
files in the output file. This may be made more convenient in a
|
|
future version of qpdf if there is enough demand for this feature.
|
|
</para>
|
|
<para>
|
|
The page range is a set of numbers separated by commas, ranges of
|
|
numbers separated dashes, or combinations of those. The character
|
|
“z” represents the last page. A number preceded by an
|
|
“r” indicates to count from the end, so
|
|
<literal>r3-r1</literal> would be the last three pages of the
|
|
document. Pages can appear in any order. Ranges can appear with a
|
|
high number followed by a low number, which causes the pages to
|
|
appear in reverse. Repeating a number will cause an error, but you
|
|
can use the workaround discussed above should you really want to
|
|
include the same page twice.
|
|
</para>
|
|
<para>
|
|
Example page ranges:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>1,3,5-9,15-12</literal>: pages 1, 3, 5, 6, 7, 8,
|
|
9, 15, 14, 13, and 12 in that order.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>z-1</literal>: all pages in the document in reverse
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>r3-r1</literal>: the last three pages of the document
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>r1-r3</literal>: the last three pages of the document
|
|
in reverse order
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Starting in qpdf version 8.3, when you split and merge files, any
|
|
page labels (page numbers) are preserved in the final file. It is
|
|
expected that more document features will be preserved by
|
|
splitting and merging. In the mean time, semantics of splitting
|
|
and merging vary across features. For example, the document's
|
|
outlines (bookmarks) point to actual page objects, so if you
|
|
select some pages and not others, bookmarks that point to pages
|
|
that are in the output file will work, and remaining bookmarks
|
|
will not work. A future version of <command>qpdf</command> may do
|
|
a better job at handling these issues. (Note that the qpdf library
|
|
already contains all of the APIs required in order to implement
|
|
this in your own application if you need it.) In the mean time,
|
|
you can always use <option>--empty</option> as the primary input
|
|
file to avoid copying all of that from the first file. For
|
|
example, to take pages 1 through 5 from a
|
|
<filename>infile.pdf</filename> while preserving all metadata
|
|
associated with that file, you could use
|
|
|
|
<programlisting><command>qpdf</command> <option>infile.pdf --pages infile.pdf 1-5 -- outfile.pdf</option>
|
|
</programlisting>
|
|
If you wanted pages 1 through 5 from
|
|
<filename>infile.pdf</filename> but you wanted the rest of the
|
|
metadata to be dropped, you could instead run
|
|
|
|
<programlisting><command>qpdf</command> <option>--empty --pages infile.pdf 1-5 -- outfile.pdf</option>
|
|
</programlisting>
|
|
If you wanted to take pages 1–5 from
|
|
<filename>file1.pdf</filename> and pages 11–15 from
|
|
<filename>file2.pdf</filename> in reverse, you would run
|
|
|
|
<programlisting><command>qpdf</command> <option>file1.pdf --pages file1.pdf 1-5 file2.pdf 15-11 -- outfile.pdf</option>
|
|
</programlisting>
|
|
If, for some reason, you wanted to take the first page of an
|
|
encrypted file called <filename>encrypted.pdf</filename> with
|
|
password <literal>pass</literal> and repeat it twice in an output
|
|
file, and if you wanted to drop document-level metadata but
|
|
preserve encryption, you would use
|
|
|
|
<programlisting><command>qpdf</command> <option>--empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
|
|
--pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
|
|
outfile.pdf</option>
|
|
</programlisting>
|
|
Note that we had to specify the password all three times because
|
|
giving a password as <option>--encryption-file-password</option>
|
|
doesn't count for page selection, and as far as qpdf is concerned,
|
|
<filename>encrypted.pdf</filename> and
|
|
<filename>./encrypted.pdf</filename> are separated files. These
|
|
are all corner cases that most users should hopefully never have
|
|
to be bothered with.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.advanced-parsing">
|
|
<title>Advanced Parsing Options</title>
|
|
<para>
|
|
These options control aspects of how qpdf reads PDF files. Mostly
|
|
these are of use to people who are working with damaged files.
|
|
There is little reason to use these options unless you are trying
|
|
to solve specific problems. The following options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--suppress-recovery</option></term>
|
|
<listitem>
|
|
<para>
|
|
Prevents qpdf from attempting to recover damaged files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--ignore-xref-streams</option></term>
|
|
<listitem>
|
|
<para>
|
|
Tells qpdf to ignore any cross-reference streams.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
Ordinarily, qpdf will attempt to recover from certain types of
|
|
errors in PDF files. These include errors in the cross-reference
|
|
table, certain types of object numbering errors, and certain types
|
|
of stream length errors. Sometimes, qpdf may think it has
|
|
recovered but may not have actually recovered, so care should be
|
|
taken when using this option as some data loss is possible. The
|
|
<option>--suppress-recovery</option> option will prevent qpdf from
|
|
attempting recovery. In this case, it will fail on the first
|
|
error that it encounters.
|
|
</para>
|
|
<para>
|
|
Ordinarily, qpdf reads cross-reference streams when they are
|
|
present in a PDF file. If <option>--ignore-xref-streams</option>
|
|
is specified, qpdf will ignore any cross-reference streams for
|
|
hybrid PDF files. The purpose of hybrid files is to make some
|
|
content available to viewers that are not aware of cross-reference
|
|
streams. It is almost never desirable to ignore them. The only
|
|
time when you might want to use this feature is if you are testing
|
|
creation of hybrid PDF files and wish to see how a PDF consumer
|
|
that doesn't understand object and cross-reference streams would
|
|
interpret such a file.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.advanced-transformation">
|
|
<title>Advanced Transformation Options</title>
|
|
<para>
|
|
These transformation options control fine points of how qpdf
|
|
creates the output file. Mostly these are of use only to people
|
|
who are very familiar with the PDF file format or who are PDF
|
|
developers. The following options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--compress-streams=<replaceable>[yn]</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
By default, or with <option>--compress-streams=y</option>,
|
|
qpdf will compress any stream with no other filters applied to
|
|
it with the <literal>/FlateDecode</literal> filter when it
|
|
writes it. To suppress this behavior and preserve uncompressed
|
|
streams as uncompressed, use
|
|
<option>--compress-streams=n</option>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--decode-level=<replaceable>option</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls which streams qpdf tries to decode. The default is
|
|
<option>generalized</option>. The following options are
|
|
available:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>none</option>: do not attempt to decode any streams
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>generalized</option>: decode streams filtered with
|
|
supported generalized filters: <option>/LZWDecode</option>,
|
|
<option>/FlateDecode</option>,
|
|
<option>/ASCII85Decode</option>, and
|
|
<option>/ASCIIHexDecode</option>. We define generalized
|
|
filters as those to be used for general-purpose compression
|
|
or encoding, as opposed to filters specifically designed
|
|
for image data.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>specialized</option>: in addition to generalized,
|
|
decode streams with supported non-lossy specialized
|
|
filters; currently this is just <option>/RunLengthDecode</option>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>all</option>: in addition to generalized and
|
|
specialized, decode streams with supported lossy filters;
|
|
currently this is just <option>/DCTDecode</option> (JPEG)
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--stream-data=<replaceable>option</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls transformation of stream data. This option predates
|
|
the <option>--compress-streams</option> and
|
|
<option>--decode-level</option> options. Those options can be
|
|
used to achieve the same affect with more control. The value
|
|
of <option><replaceable>option</replaceable></option> may be
|
|
one of the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>compress</option>: recompress stream data when
|
|
possible (default); equivalent to
|
|
<option>--compress-streams=y</option>
|
|
<option>--decode-level=generalized</option>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>preserve</option>: leave all stream data as is;
|
|
equivalent to <option>--compress-streams=n</option>
|
|
<option>--decode-level=none</option>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>uncompress</option>: uncompress stream data
|
|
compressed with generalized filters when possible;
|
|
equivalent to <option>--compress-streams=n</option>
|
|
<option>--decode-level=generalized</option>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--normalize-content=[yn]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Enables or disables normalization of content streams. Content
|
|
normalization is enabled by default in QDF mode. Please see
|
|
<xref linkend="ref.qdf"/> for additional discussion of QDF
|
|
mode.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--object-streams=<replaceable>mode</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Controls handling of object streams. The value of
|
|
<option><replaceable>mode</replaceable></option> may be one of
|
|
the following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<option>preserve</option>: preserve original object streams
|
|
(default)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>disable</option>: don't write any object streams
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>generate</option>: use object streams wherever
|
|
possible
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--preserve-unreferenced</option></term>
|
|
<listitem>
|
|
<para>
|
|
Tells qpdf to preserve objects that are not referenced when
|
|
writing the file. Ordinarily any object that is not referenced
|
|
in a traversal of the document from the trailer dictionary
|
|
will be discarded. This may be useful in working with some
|
|
damaged files or inspecting files with known unreferenced
|
|
objects.
|
|
</para>
|
|
<para>
|
|
This flag is ignored for linearized files and has the effect
|
|
of causing objects in the new file to be written in order by
|
|
object ID from the original file. This does not mean that
|
|
object numbers will be the same since qpdf may create stream
|
|
lengths as direct or indirect differently from the original
|
|
file, and the original file may have gaps in its numbering.
|
|
</para>
|
|
<para>
|
|
See also <option>--preserve-unreferenced-resources</option>,
|
|
which does something completely different.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--preserve-unreferenced-resources</option></term>
|
|
<listitem>
|
|
<para>
|
|
Starting with qpdf 8.1, when splitting pages, qpdf ordinarily
|
|
attempts to remove images and fonts that are not used by a
|
|
page even if they are referenced in the page's resources
|
|
dictionary. This option suppresses that behavior. The only
|
|
reason to use this is if you suspect that qpdf is removing
|
|
resources it shouldn't be removing. If you encounter that
|
|
case, please report it as a bug.
|
|
</para>
|
|
<para>
|
|
See also <option>--preserve-unreferenced-resources</option>,
|
|
which does something completely different.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--newline-before-endstream</option></term>
|
|
<listitem>
|
|
<para>
|
|
Tells qpdf to insert a newline before the
|
|
<literal>endstream</literal> keyword, not counted in the
|
|
length, after any stream content even if the last character of
|
|
the stream was a newline. This may result in two newlines in
|
|
some cases. This is a requirement of PDF/A. While qpdf doesn't
|
|
specifically know how to generate PDF/A-compliant PDFs, this
|
|
at least prevents it from removing compliance on already
|
|
compliant files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--linearize-pass1=<replaceable>file</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Write the first pass of linearization to the named file. The
|
|
resulting file is not a valid PDF file. This option is useful
|
|
only for debugging <classname>QPDFWriter</classname>'s
|
|
linearization code. When qpdf linearizes files, it writes the
|
|
file in two passes, using the first pass to calculate sizes
|
|
and offsets that are required for hint tables and the
|
|
linearization dictionary. Ordinarily, the first pass is
|
|
discarded. This option enables it to be captured.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--coalesce-contents</option></term>
|
|
<listitem>
|
|
<para>
|
|
When a page's contents are split across multiple streams, this
|
|
option causes qpdf to combine them into a single stream. Use
|
|
of this option is never necessary for ordinary usage, but it
|
|
can help when working with some files in some cases. For
|
|
example, some PDF writers split page contents into small
|
|
streams at arbitrary points that may fall in the middle of
|
|
lexical tokens within the content, and some PDF readers may
|
|
get confused on such files. If you use qpdf to coalesce the
|
|
content streams, such readers may be able to work with the
|
|
file more easily. This can also be combined with QDF mode or
|
|
content normalization to make it easier to look at all of a
|
|
page's contents at once.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--qdf</option></term>
|
|
<listitem>
|
|
<para>
|
|
Turns on QDF mode. For additional information on QDF, please
|
|
see <xref linkend="ref.qdf"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--min-version=<replaceable>version</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
Forces the PDF version of the output file to be at least
|
|
<replaceable>version</replaceable>. In other words, if the
|
|
input file has a lower version than the specified version, the
|
|
specified version will be used. If the input file has a
|
|
higher version, the input file's original version will be
|
|
used. It is seldom necessary to use this option since qpdf
|
|
will automatically increase the version as needed when adding
|
|
features that require newer PDF readers.
|
|
</para>
|
|
<para>
|
|
The version number may be expressed in the form
|
|
<replaceable>major.minor.extension-level</replaceable>, in
|
|
which case the version is interpreted as
|
|
<replaceable>major.minor</replaceable> at extension level
|
|
<replaceable>extension-level</replaceable>. For example,
|
|
version <literal>1.7.8</literal> represents version 1.7 at
|
|
extension level 8. Note that minimal syntax checking is done
|
|
on the command line.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--force-version=<replaceable>version</replaceable></option></term>
|
|
<listitem>
|
|
<para>
|
|
This option forces the PDF version to be the exact version
|
|
specified <emphasis>even when the file may have content that
|
|
is not supported in that version</emphasis>. The version
|
|
number is interpreted in the same way as with
|
|
<option>--min-version</option> so that extension levels can be
|
|
set. In some cases, forcing the output file's PDF version to
|
|
be lower than that of the input file will cause qpdf to
|
|
disable certain features of the document. Specifically,
|
|
256-bit keys are disabled if the version is less than 1.7 with
|
|
extension level 8 (except R5 is disabled if less than 1.7 with
|
|
extension level 3), AES encryption is disabled if the version
|
|
is less than 1.6, cleartext metadata and object streams are
|
|
disabled if less than 1.5, 128-bit encryption keys are
|
|
disabled if less than 1.4, and all encryption is disabled if
|
|
less than 1.3. Even with these precautions, qpdf won't be
|
|
able to do things like eliminate use of newer image
|
|
compression schemes, transparency groups, or other features
|
|
that may have been added in more recent versions of PDF.
|
|
</para>
|
|
<para>
|
|
As a general rule, with the exception of big structural things
|
|
like the use of object streams or AES encryption, PDF viewers
|
|
are supposed to ignore features in files that they don't
|
|
support from newer versions. This means that forcing the
|
|
version to a lower version may make it possible to open your
|
|
PDF file with an older version, though bear in mind that some
|
|
of the original document's functionality may be lost.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
By default, when a stream is encoded using non-lossy filters that
|
|
qpdf understands and is not already compressed using a good
|
|
compression scheme, qpdf will uncompress and recompress streams.
|
|
Assuming proper filter implements, this is safe and generally
|
|
results in smaller files. This behavior may also be explicitly
|
|
requested with <option>--stream-data=compress</option>.
|
|
</para>
|
|
<para>
|
|
When <option>--normalize-content=y</option> is specified, qpdf
|
|
will attempt to normalize whitespace and newlines in page content
|
|
streams. This is generally safe but could, in some cases, cause
|
|
damage to the content streams. This option is intended for people
|
|
who wish to study PDF content streams or to debug PDF content.
|
|
You should not use this for “production” PDF files.
|
|
</para>
|
|
<para>
|
|
This paragraph discusses edge cases of content normalization that
|
|
are not of concern to most users and are not relevant when content
|
|
normalization is not enabled. When normalizing content, if qpdf
|
|
runs into any lexical errors, it will print a warning indicating
|
|
that content may be damaged. The only situation in which qpdf is
|
|
known to cause damage during content normalization is when a
|
|
page's contents are split across multiple streams and streams are
|
|
split in the middle of a lexical token such as a string, name, or
|
|
inline image. There may be some pathological cases in which qpdf
|
|
could damage content without noticing this, such as if the partial
|
|
tokens at the end of one stream and the beginning of the next
|
|
stream are both valid, but usually qpdf will be able to detect
|
|
this case. For slightly increased safety, you can specify
|
|
<option>--coalesce-contents</option> in addition to
|
|
<option>--normalize-content</option> or <option>--qdf</option>.
|
|
This will cause qpdf to combine all the content streams into one,
|
|
thus recombining any split tokens. However doing this will prevent
|
|
you from being able to see the original layout of the content
|
|
streams. If you must inspect the original content streams in an
|
|
uncompressed format, you can always run with <option>--qdf
|
|
--normalize-content=n</option> for a QDF file without content
|
|
normalization, or alternatively
|
|
<option>--stream-data=uncompress</option> for a regular non-QDF
|
|
mode file with uncompressed streams. These will both uncompress
|
|
all the streams but will not attempt to normalize content. Please
|
|
note that if you are using content normalization or QDF mode for
|
|
the purpose of manually inspecting files, you don't have to care
|
|
about this.
|
|
</para>
|
|
<para>
|
|
Object streams, also known as compressed objects, were introduced
|
|
into the PDF specification at version 1.5, corresponding to
|
|
Acrobat 6. Some older PDF viewers may not support files with
|
|
object streams. qpdf can be used to transform files with object
|
|
streams to files without object streams or vice versa. As
|
|
mentioned above, there are three object stream modes:
|
|
<option>preserve</option>, <option>disable</option>, and
|
|
<option>generate</option>.
|
|
</para>
|
|
<para>
|
|
In <option>preserve</option> mode, the relationship to objects and
|
|
the streams that contain them is preserved from the original file.
|
|
In <option>disable</option> mode, all objects are written as
|
|
regular, uncompressed objects. The resulting file should be
|
|
readable by older PDF viewers. (Of course, the content of the
|
|
files may include features not supported by older viewers, but at
|
|
least the structure will be supported.) In
|
|
<option>generate</option> mode, qpdf will create its own object
|
|
streams. This will usually result in more compact PDF files,
|
|
though they may not be readable by older viewers. In this mode,
|
|
qpdf will also make sure the PDF version number in the header is
|
|
at least 1.5.
|
|
</para>
|
|
<para>
|
|
The <option>--qdf</option> flag turns on QDF mode, which changes
|
|
some of the defaults described above. Specifically, in QDF mode,
|
|
by default, stream data is uncompressed, content streams are
|
|
normalized, and encryption is removed. These defaults can still
|
|
be overridden by specifying the appropriate options as described
|
|
above. Additionally, in QDF mode, stream lengths are stored as
|
|
indirect objects, objects are laid out in a less efficient but
|
|
more readable fashion, and the documents are interspersed with
|
|
comments that make it easier for the user to find things and also
|
|
make it possible for <command>fix-qdf</command> to work properly.
|
|
QDF mode is intended for people, mostly developers, who wish to
|
|
inspect or modify PDF files in a text editor. For details, please
|
|
see <xref linkend="ref.qdf"/>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.testing-options">
|
|
<title>Testing, Inspection, and Debugging Options</title>
|
|
<para>
|
|
These options can be useful for digging into PDF files or for use
|
|
in automated test suites for software that uses the qpdf library.
|
|
When any of the options in this section are specified, no output
|
|
file should be given. The following options are available:
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>--deterministic-id</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation of a deterministic value for /ID. This
|
|
prevents use of timestamp and output file name information in
|
|
the /ID generation. Instead, at some slight additional runtime
|
|
cost, the /ID field is generated to include a digest of the
|
|
significant parts of the content of the output PDF file. This
|
|
means that a given qpdf operation should generate the same /ID
|
|
each time it is run, which can be useful when caching results
|
|
or for generation of some test data. Use of this flag is not
|
|
compatible with creation of encrypted files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--static-id</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes generation of a fixed value for /ID. This is intended
|
|
for testing only. Never use it for production files. If you
|
|
are trying to get the same /ID each time for a given file and
|
|
you are not generating encrypted files, consider using the
|
|
<option>--deterministic-id</option> option.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--static-aes-iv</option></term>
|
|
<listitem>
|
|
<para>
|
|
Causes use of a static initialization vector for AES-CBC.
|
|
This is intended for testing only so that output files can be
|
|
reproducible. Never use it for production files. This option
|
|
in particular is not secure since it significantly weakens the
|
|
encryption.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--no-original-object-ids</option></term>
|
|
<listitem>
|
|
<para>
|
|
Suppresses inclusion of original object ID comments in QDF
|
|
files. This can be useful when generating QDF files for test
|
|
purposes, particularly when comparing them to determine
|
|
whether two PDF files have identical content.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-encryption</option></term>
|
|
<listitem>
|
|
<para>
|
|
Shows document encryption parameters. Also shows the
|
|
document's user password if the owner password is given.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-encryption-key</option></term>
|
|
<listitem>
|
|
<para>
|
|
When encryption information is being displayed, as when
|
|
<option>--check</option> or <option>--show-encryption</option>
|
|
is given, display the computed or retrieved encryption key as
|
|
a hexadecimal string. This value is not ordinarily useful to
|
|
users, but it can be used as the argument to
|
|
<option>--password</option> if the
|
|
<option>--password-is-hex-key</option> is specified. Note
|
|
that, when PDF files are encrypted, passwords and other
|
|
metadata are used only to compute an encryption key, and the
|
|
encryption key is what is actually used for encryption. This
|
|
enables retrieval of that key.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--check-linearization</option></term>
|
|
<listitem>
|
|
<para>
|
|
Checks file integrity and linearization status.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-linearization</option></term>
|
|
<listitem>
|
|
<para>
|
|
Checks and displays all data in the linearization hint tables.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-xref</option></term>
|
|
<listitem>
|
|
<para>
|
|
Shows the contents of the cross-reference table in a
|
|
human-readable form. This is especially useful for files with
|
|
cross-reference streams which are stored in a binary format.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-object=trailer|obj[,gen]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Show the contents of the given object. This is especially
|
|
useful for inspecting objects that are inside of object
|
|
streams (also known as “compressed objects”).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--raw-stream-data</option></term>
|
|
<listitem>
|
|
<para>
|
|
When used along with the <option>--show-object</option>
|
|
option, if the object is a stream, shows the raw stream data
|
|
instead of object's contents.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--filtered-stream-data</option></term>
|
|
<listitem>
|
|
<para>
|
|
When used along with the <option>--show-object</option>
|
|
option, if the object is a stream, shows the filtered stream
|
|
data instead of object's contents. If the stream is filtered
|
|
using filters that qpdf does not support, an error will be
|
|
issued.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-npages</option></term>
|
|
<listitem>
|
|
<para>
|
|
Prints the number of pages in the input file on a line by
|
|
itself. Since the number of pages appears by itself on a
|
|
line, this option can be useful for scripting if you need to
|
|
know the number of pages in a file.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--show-pages</option></term>
|
|
<listitem>
|
|
<para>
|
|
Shows the object and generation number for each page
|
|
dictionary object and for each content stream associated with
|
|
the page. Having this information makes it more convenient to
|
|
inspect objects from a particular page.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--with-images</option></term>
|
|
<listitem>
|
|
<para>
|
|
When used along with <option>--show-pages</option>, also shows
|
|
the object and generation numbers for the image objects on
|
|
each page. (At present, information about images in shared
|
|
resource dictionaries are not output by this command. This is
|
|
discussed in a comment in the source code.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--json</option></term>
|
|
<listitem>
|
|
<para>
|
|
Generate a json representation of the file. This is described
|
|
in depth in <xref linkend="ref.json"/>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--json-help</option></term>
|
|
<listitem>
|
|
<para>
|
|
Describe the format of the json output.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--json-key=key</option></term>
|
|
<listitem>
|
|
<para>
|
|
This option is repeatable. If specified, only top-level keys
|
|
specified will be included in the json output. If not
|
|
specified, all keys wil be shown.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--json-object=trailer|obj[,gen]</option></term>
|
|
<listitem>
|
|
<para>
|
|
This option is repeatable. If specified, only specified
|
|
objects will be shown in the
|
|
“<literal>objects</literal>” key of the json
|
|
output. If absent, all objects will be shown.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>--check</option></term>
|
|
<listitem>
|
|
<para>
|
|
Checks file structure and well as encryption, linearization,
|
|
and encoding of stream data. A file for which
|
|
<option>--check</option> reports no errors may still have
|
|
errors in stream data content but should otherwise be
|
|
structurally sound. If <option>--check</option> any errors,
|
|
qpdf will exit with a status of 2. There are some recoverable
|
|
conditions that <option>--check</option> detects. These are
|
|
issued as warnings instead of errors. If qpdf finds no errors
|
|
but finds warnings, it will exit with a status of 3 (as of
|
|
version 2.0.4). When <option>--check</option> is combined
|
|
with other options, checks are always performed before any
|
|
other options are processed. For erroneous files,
|
|
<option>--check</option> will cause qpdf to attempt to
|
|
recover, after which other options are effectively operating
|
|
on the recovered file. Combining <option>--check</option> with
|
|
other options in this way can be useful for manually
|
|
recovering severely damaged files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
<para>
|
|
The <option>--raw-stream-data</option> and
|
|
<option>--filtered-stream-data</option> options are ignored unless
|
|
<option>--show-object</option> is given. Either of these options
|
|
will cause the stream data to be written to standard output. In
|
|
order to avoid commingling of stream data with other output, it is
|
|
recommend that these objects not be combined with other
|
|
test/inspection options.
|
|
</para>
|
|
<para>
|
|
If <option>--filtered-stream-data</option> is given and
|
|
<option>--normalize-content=y</option> is also given, qpdf will
|
|
attempt to normalize the stream data as if it is a page content
|
|
stream. This attempt will be made even if it is not a page
|
|
content stream, in which case it will produce unusable results.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.qdf">
|
|
<title>QDF Mode</title>
|
|
<para>
|
|
In QDF mode, qpdf creates PDF files in what we call <firstterm>QDF
|
|
form</firstterm>. A PDF file in QDF form, sometimes called a QDF
|
|
file, is a completely valid PDF file that has
|
|
<literal>%QDF-1.0</literal> as its third line (after the pdf header
|
|
and binary characters) and has certain other characteristics. The
|
|
purpose of QDF form is to make it possible to edit PDF files, with
|
|
some restrictions, in an ordinary text editor. This can be very
|
|
useful for experimenting with different PDF constructs or for
|
|
making one-off edits to PDF files (though there are other reasons
|
|
why this may not always work).
|
|
</para>
|
|
<para>
|
|
It is ordinarily very difficult to edit PDF files in a text editor
|
|
for two reasons: most meaningful data in PDF files is compressed,
|
|
and PDF files are full of offset and length information that makes
|
|
it hard to add or remove data. A QDF file is organized in a manner
|
|
such that, if edits are kept within certain constraints, the
|
|
<command>fix-qdf</command> program, distributed with qpdf, is able
|
|
to restore edited files to a correct state. The
|
|
<command>fix-qdf</command> program takes no command-line
|
|
arguments. It reads a possibly edited QDF file from standard input
|
|
and writes a repaired file to standard output.
|
|
</para>
|
|
<para>
|
|
The following attributes characterize a QDF file:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
All objects appear in numerical order in the PDF file, including
|
|
when objects appear in object streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Objects are printed in an easy-to-read format, and all line
|
|
endings are normalized to UNIX line endings.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Unless specifically overridden, streams appear uncompressed
|
|
(when qpdf supports the filters and they are compressed with a
|
|
non-lossy compression scheme), and most content streams are
|
|
normalized (line endings are converted to just a UNIX-style
|
|
linefeeds).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
All streams lengths are represented as indirect objects, and the
|
|
stream length object is always the next object after the stream.
|
|
If the stream data does not end with a newline, an extra newline
|
|
is inserted, and a special comment appears after the stream
|
|
indicating that this has been done.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If the PDF file contains object streams, if object stream
|
|
<emphasis>n</emphasis> contains <emphasis>k</emphasis> objects,
|
|
those objects are numbered from <emphasis>n+1</emphasis> through
|
|
<emphasis>n+k</emphasis>, and the object number/offset pairs
|
|
appear on a separate line for each object. Additionally, each
|
|
object in the object stream is preceded by a comment indicating
|
|
its object number and index. This makes it very easy to find
|
|
objects in object streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
All beginnings of objects, <literal>stream</literal> tokens,
|
|
<literal>endstream</literal> tokens, and
|
|
<literal>endobj</literal> tokens appear on lines by themselves.
|
|
A blank line follows every <literal>endobj</literal> token.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If there is a cross-reference stream, it is unfiltered.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Page dictionaries and page content streams are marked with
|
|
special comments that make them easy to find.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Comments precede each object indicating the object number of the
|
|
corresponding object in the original file.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
When editing a QDF file, any edits can be made as long as the above
|
|
constraints are maintained. This means that you can freely edit a
|
|
page's content without worrying about messing up the QDF file. It
|
|
is also possible to add new objects so long as those objects are
|
|
added after the last object in the file or subsequent objects are
|
|
renumbered. If a QDF file has object streams in it, you can always
|
|
add the new objects before the xref stream and then change the
|
|
number of the xref stream, since nothing generally ever references
|
|
it by number.
|
|
</para>
|
|
<para>
|
|
It is not generally practical to remove objects from QDF files
|
|
without messing up object numbering, but if you remove all
|
|
references to an object, you can run qpdf on the file (after
|
|
running <command>fix-qdf</command>), and qpdf will omit the
|
|
now-orphaned object.
|
|
</para>
|
|
<para>
|
|
When <command>fix-qdf</command> is run, it goes through the file
|
|
and recomputes the following parts of the file:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
the <literal>/N</literal>, <literal>/W</literal>, and
|
|
<literal>/First</literal> keys of all object stream dictionaries
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the pairs of numbers representing object numbers and offsets of
|
|
objects in object streams
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
all stream lengths
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the cross-reference table or cross-reference stream
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the offset to the cross-reference table or cross-reference
|
|
stream following the <literal>startxref</literal> token
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</chapter>
|
|
<chapter id="ref.using-library">
|
|
<title>Using the QPDF Library</title>
|
|
<sect1 id="ref.using.from-cxx">
|
|
<title>Using QPDF from C++</title>
|
|
<para>
|
|
The source tree for the qpdf package has an
|
|
<filename>examples</filename> directory that contains a few
|
|
example programs. The <filename>qpdf/qpdf.cc</filename> source
|
|
file also serves as a useful example since it exercises almost all
|
|
of the qpdf library's public interface. The best source of
|
|
documentation on the library itself is reading comments in
|
|
<filename>include/qpdf/QPDF.hh</filename>,
|
|
<filename>include/qpdf/QPDFWriter.hh</filename>, and
|
|
<filename>include/qpdf/QPDFObjectHandle.hh</filename>.
|
|
</para>
|
|
<para>
|
|
All header files are installed in the <filename>include/qpdf</filename> directory. It
|
|
is recommend that you use <literal>#include
|
|
<qpdf/QPDF.hh></literal> rather than adding
|
|
<filename>include/qpdf</filename> to your include path.
|
|
</para>
|
|
<para>
|
|
When linking against the qpdf static library, you may also need to
|
|
specify <literal>-lz -ljpeg</literal> on your link command. If
|
|
your system understands how to read libtool
|
|
<filename>.la</filename> files, this may not be necessary.
|
|
</para>
|
|
<para>
|
|
The qpdf library is safe to use in a multithreaded program, but no
|
|
individual <type>QPDF</type> object instance (including
|
|
<type>QPDF</type>, <type>QPDFObjectHandle</type>, or
|
|
<type>QPDFWriter</type>) can be used in more than one thread at a
|
|
time. Multiple threads may simultaneously work with different
|
|
instances of these and all other QPDF objects.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.using.other-languages">
|
|
<title>Using QPDF from other languages</title>
|
|
<para>
|
|
The qpdf library is implemented in C++, which makes it hard to use
|
|
directly in other languages. There are a few things that can help.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>“C”</term>
|
|
<listitem>
|
|
<para>
|
|
The qpdf library includes a “C” language interface
|
|
that provides a subset of the overall capabilities. The header
|
|
file <filename>qpdf/qpdf-c.h</filename> includes information
|
|
about its use. As long as you use a C++ linker, you can link C
|
|
programs with qpdf and use the C API. For languages that can
|
|
directly load methods from a shared library, the C API can also
|
|
be useful. People have reported success using the C API from
|
|
other languages on Windows by directly calling functions in the
|
|
DLL.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Python</term>
|
|
<listitem>
|
|
<para>
|
|
A Python module called <ulink
|
|
url="https://pypi.org/project/pikepdf/">pikepdf</ulink>
|
|
provides a clean and highly functional set of Python bindings
|
|
to the qpdf library. Using pikepdf, you can work with PDF files
|
|
in a natural way and combine qpdf's capabilities with other
|
|
functionality provided by Python's rich standard library and
|
|
available modules.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Other Languages</term>
|
|
<listitem>
|
|
<para>
|
|
Starting with version 8.3.0, the <command>qpdf</command>
|
|
command-line tool can produce a json representation of the PDF
|
|
file's non-content data. This can facilitate interacting
|
|
programmatically with PDF files through qpdf's command line
|
|
interface. For more information, please see <xref
|
|
linkend="ref.json"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.json">
|
|
<title>QPDF JSON</title>
|
|
<sect1 id="ref.json-overview">
|
|
<title>Overview</title>
|
|
<para>
|
|
Beginning with qpdf version 8.3.0, the <command>qpdf</command>
|
|
command-line program can produce a json representation of the
|
|
non-content data in a PDF file. It includes a dump in json format
|
|
of all objects in the PDF file excluding the content of streams.
|
|
This json representation makes it very easy to look in detail at
|
|
the structure of a given PDF file, and it also provides a great way
|
|
to work with PDF files programmatically from the command-line in
|
|
languages that can't call or link with the qpdf library directly.
|
|
Note that stream data can be extracted from PDF files using other
|
|
qpdf command-line options.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.json-guarantees">
|
|
<title>JSON Guarantees</title>
|
|
<para>
|
|
The qpdf json representation includes a json serialization of the
|
|
raw objects in the PDF file as well as some computed information in
|
|
a more easily extracted format. QPDF provides some guarantees about
|
|
its json format. These guarantees are designed to simplify the
|
|
experience of a developer working with the JSON format.
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>Compatibility</term>
|
|
<listitem>
|
|
<para>
|
|
The top-level json object output is a dictionary. The json
|
|
output contains various nested dictionaries and arrays. With
|
|
the exception of dictionaries that are populated by the fields
|
|
of objects from the file, all instances of a dictionary are
|
|
guaranteed to have exactly the same keys. Future versions of
|
|
qpdf are free to add additional keys but not to remove keys or
|
|
change the type of object that a key points to. The qpdf
|
|
program validates this guarantee, and in the unlikely event
|
|
that a bug in qpdf should cause it to generate data that
|
|
doesn't conform to this rule, it will ask you to file a bug
|
|
report.
|
|
</para>
|
|
<para>
|
|
The top-level json structure contains a
|
|
“<literal>version</literal>” key whose value is
|
|
simple integer. The value of the <literal>version</literal> key
|
|
will be incremented if a non-compatible change is made. A
|
|
non-compatible change would be any change that involves removal
|
|
of a key, a change to the format of data pointed to by a key,
|
|
or a semantic change that requires a different interpretation
|
|
of a previously existing key. A strong effort will be made to
|
|
avoid breaking compatibility.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Documentation</term>
|
|
<listitem>
|
|
<para>
|
|
The <command>qpdf</command> command can be invoked with the
|
|
<option>--json-help</option> option. This will output a json
|
|
structure that has the same structure as the json output that
|
|
qpdf generates, except that each field in the help output is a
|
|
description of the corresponding field in the json output. The
|
|
specific guarantees are as follows:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
A dictionary in the help output means that the corresponding
|
|
location in the actual json output is also a dictionary with
|
|
exactly the same keys; that is, no keys present in help are
|
|
absent in the real output, and no keys will be present in
|
|
the real output that are not in help.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A string in the help output is a description of the item
|
|
that appears in the corresponding location of the actual
|
|
output. The corresponding output can have any format.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
An array in the help output always contains a single
|
|
element. It indicates that the corresponding location in the
|
|
actual output is also an array, and that each element of the
|
|
array has whatever format is implied by the single element
|
|
of the help output's array.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
For example, the help output indicates includes a
|
|
“<literal>pagelabels</literal>” key whose value is
|
|
an array of one element. That element is a dictionary with keys
|
|
“<literal>index</literal>” and
|
|
“<literal>label</literal>”. In addition to
|
|
describing the meaning of those keys, this tells you that the
|
|
actual json output will contain a <literal>pagelabels</literal>
|
|
array, each of whose elements is a dictionary that contains an
|
|
<literal>index</literal> key, a <literal>label</literal> key,
|
|
and no other keys.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Directness and Simplicity</term>
|
|
<listitem>
|
|
<para>
|
|
The json output contains the value of every object in the file,
|
|
but it also contains some processed data. This is analogous to
|
|
how qpdf's library interface works. The processed data is
|
|
similar to the helper functions in that it allows you to look
|
|
at certain aspects of the PDF file without having to understand
|
|
all the nuances of the PDF specification, while the raw objects
|
|
allow you to mine the PDF for anything that the higher-level
|
|
interfaces are lacking.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="json.limitations">
|
|
<title>Limitations of JSON Representation</title>
|
|
<para>
|
|
There are a few limitations to be aware of with the json structure:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Strings, names, and indirect object references in the original
|
|
PDF file are all converted to strings in the json
|
|
representation. In the case of a “normal” PDF file,
|
|
you can tell the difference because a name starts with a slash
|
|
(<literal>/</literal>), and an indirect object reference looks
|
|
like <literal>n n R</literal>, but if there were to be a string
|
|
that looked like a name or indirect object reference, there
|
|
would be no way to tell this from the json output. Note that
|
|
there are certain cases where you know for sure what something
|
|
is, such as knowing that dictionary keys in objects are always
|
|
names and that certain things in the higher-level computed data
|
|
are known to contain indirect object references.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The json format doesn't support binary data very well. Mostly
|
|
the details are not important, but they are presented here for
|
|
information. When qpdf outputs a string in the json
|
|
representation, it converts the string to UTF-8, assuming usual
|
|
PDF string semantics. Specifically, if the original string is
|
|
UTF-16, it is converted to UTF-8. Otherwise, it is assumed to
|
|
have PDF doc encoding, and is converted to UTF-8 with that
|
|
assumption. This causes strange things to happen to binary
|
|
strings. For example, if you had the binary string
|
|
<literal><038051></literal>, this would be output to the
|
|
json as <literal>\u0003•Q</literal> because
|
|
<literal>03</literal> is not a printable character and
|
|
<literal>80</literal> is the bullet character in PDF doc
|
|
encoding and is mapped to the Unicode value
|
|
<literal>2022</literal>. Since <literal>51</literal> is
|
|
<literal>Q</literal>, it is output as is. If you wanted to
|
|
convert back from here to a binary string, would have to
|
|
recognize Unicode values whose code points are higher than
|
|
<literal>0xFF</literal> and map those back to their
|
|
corresponding PDF doc encoding characters. There is no way to
|
|
tell the difference between a Unicode string that was originally
|
|
encoded as UTF-16 or one that was converted from PDF doc
|
|
encoding. In other words, it's best if you don't try to use the
|
|
json format to extract binary strings from the PDF file, but if
|
|
you really had to, it could be done. Note that qpdf's
|
|
<option>--show-object</option> option does not have this
|
|
limitation and will reveal the string as encoded in the original
|
|
file.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="json.considerations">
|
|
<title>JSON: Special Considerations</title>
|
|
<para>
|
|
For the most part, the built-in JSON help tells you everything you
|
|
need to know about the JSON format, but there are a few
|
|
non-obvious things to be aware of:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
While qpdf guarantees that keys present in the help will be
|
|
present in the output, those fields may be null or empty if the
|
|
information is not known or absent in the file. Also, if you
|
|
specify <option>--json-keys</option>, the keys that are not
|
|
listed will be excluded entirely except for those that
|
|
<option>--json-help</option> says are always present.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In a few places, there are keys with names containing
|
|
<literal>pageposfrom1</literal>. The values of these keys are
|
|
null or an integer. If an integer, they point to a page index
|
|
within the file numbering from 1. Note that json indexes from
|
|
0, and you would also use 0-based indexing using the API.
|
|
However, 1-based indexing is easier in this case because the
|
|
command-line syntax for specifying page ranges is 1-based. If
|
|
you were going to write a program that looked through the json
|
|
for information about specific pages and then use the
|
|
command-line to extract those pages, 1-based indexing is
|
|
easier. Besides, it's more convenient to subtract 1 from a
|
|
program in a real programming language than it is to add 1 from
|
|
shell code.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The image information included in the <literal>page</literal>
|
|
section of the json output includes the key
|
|
“<literal>filterable</literal>”. Note that the
|
|
value of this field may depend on the
|
|
<option>--decode-level</option> that you invoke qpdf with. The
|
|
json output includes a top-level key
|
|
“<literal>parameters</literal>” that indicates the
|
|
decode level used for computing whether a stream was
|
|
filterable. For example, jpeg images will be shown as not
|
|
filterable by default, but they will be shown as filterable if
|
|
you run <command>qpdf --json --decode-level=all</command>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.design">
|
|
<title>Design and Library Notes</title>
|
|
<sect1 id="ref.design.intro">
|
|
<title>Introduction</title>
|
|
<para>
|
|
This section was written prior to the implementation of the qpdf
|
|
package and was subsequently modified to reflect the
|
|
implementation. In some cases, for purposes of explanation, it
|
|
may differ slightly from the actual implementation. As always,
|
|
the source code and test suite are authoritative. Even if there
|
|
are some errors, this document should serve as a road map to
|
|
understanding how this code works.
|
|
</para>
|
|
<para>
|
|
In general, one should adhere strictly to a specification when
|
|
writing but be liberal in reading. This way, the product of our
|
|
software will be accepted by the widest range of other programs,
|
|
and we will accept the widest range of input files. This library
|
|
attempts to conform to that philosophy whenever possible but also
|
|
aims to provide strict checking for people who want to validate
|
|
PDF files. If you don't want to see warnings and are trying to
|
|
write something that is tolerant, you can call
|
|
<literal>setSuppressWarnings(true)</literal>. If you want to fail
|
|
on the first error, you can call
|
|
<literal>setAttemptRecovery(false)</literal>. The default behavior
|
|
is to generating warnings for recoverable problems. Note that
|
|
recovery will not always produce the desired results even if it is
|
|
able to get through the file. Unlike most other PDF files that
|
|
produce generic warnings such as “This file is
|
|
damaged,”, qpdf generally issues a detailed error message
|
|
that would be most useful to a PDF developer. This is by design as
|
|
there seems to be a shortage of PDF validation tools out there.
|
|
This was, in fact, one of the major motivations behind the initial
|
|
creation of qpdf.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.design-goals">
|
|
<title>Design Goals</title>
|
|
<para>
|
|
The QPDF package includes support for reading and rewriting PDF
|
|
files. It aims to hide from the user details involving object
|
|
locations, modified (appended) PDF files, the
|
|
directness/indirectness of objects, and stream filters including
|
|
encryption. It does not aim to hide knowledge of the object
|
|
hierarchy or content stream contents. Put another way, a user of
|
|
the qpdf library is expected to have knowledge about how PDF files
|
|
work, but is not expected to have to keep track of bookkeeping
|
|
details such as file positions.
|
|
</para>
|
|
<para>
|
|
A user of the library never has to care whether an object is
|
|
direct or indirect, though it is possible to determine whether an
|
|
object is direct or not if this information is needed. All access
|
|
to objects deals with this transparently. All memory management
|
|
details are also handled by the library.
|
|
</para>
|
|
<para>
|
|
The <classname>PointerHolder</classname> object is used internally
|
|
by the library to deal with memory management. This is basically a
|
|
smart pointer object very similar in spirit to C++-11's
|
|
<classname>std::shared_ptr</classname> object, but predating it by
|
|
several years. This library also makes use of a technique for
|
|
giving fine-grained access to methods in one class to other
|
|
classes by using public subclasses with friends and only private
|
|
members that in turn call private methods of the containing class.
|
|
See <classname>QPDFObjectHandle::Factory</classname> as an
|
|
example.
|
|
</para>
|
|
<para>
|
|
The top-level qpdf class is <classname>QPDF</classname>. A
|
|
<classname>QPDF</classname> object represents a PDF file. The
|
|
library provides methods for both accessing and mutating PDF
|
|
files.
|
|
</para>
|
|
<para>
|
|
The primary class for interacting with PDF objects is
|
|
<classname>QPDFObjectHandle</classname>. Instances of this class
|
|
can be passed around by value, copied, stored in containers, etc.
|
|
with very low overhead. Instances of
|
|
<classname>QPDFObjectHandle</classname> created by reading from a
|
|
file will always contain a reference back to the
|
|
<classname>QPDF</classname> object from which they were created. A
|
|
<classname>QPDFObjectHandle</classname> may be direct or indirect.
|
|
If indirect, the <classname>QPDFObject</classname> the
|
|
<classname>PointerHolder</classname> initially points to is a null
|
|
pointer. In this case, the first attempt to access the underlying
|
|
<classname>QPDFObject</classname> will result in the
|
|
<classname>QPDFObject</classname> being resolved via a call to the
|
|
referenced <classname>QPDF</classname> instance. This makes it
|
|
essentially impossible to make coding errors in which certain
|
|
things will work for some PDF files and not for others based on
|
|
which objects are direct and which objects are indirect.
|
|
</para>
|
|
<para>
|
|
Instances of <classname>QPDFObjectHandle</classname> can be
|
|
directly created and modified using static factory methods in the
|
|
<classname>QPDFObjectHandle</classname> class. There are factory
|
|
methods for each type of object as well as a convenience method
|
|
<function>QPDFObjectHandle::parse</function> that creates an
|
|
object from a string representation of the object. Existing
|
|
instances of <classname>QPDFObjectHandle</classname> can also be
|
|
modified in several ways. See comments in
|
|
<filename>QPDFObjectHandle.hh</filename> for details.
|
|
</para>
|
|
<para>
|
|
An instance of <classname>QPDF</classname> is constructed by using
|
|
the class's default constructor. If desired, the
|
|
<classname>QPDF</classname> object may be configured with various
|
|
methods that change its default behavior. Then the
|
|
<function>QPDF::processFile()</function> method is passed the name
|
|
of a PDF file, which permanently associates the file with that
|
|
QPDF object. A password may also be given for access to
|
|
password-protected files. QPDF does not enforce encryption
|
|
parameters and will treat user and owner passwords equivalently.
|
|
Either password may be used to access an encrypted file.
|
|
<footnote>
|
|
<para>
|
|
As pointed out earlier, the intention is not for qpdf to be used
|
|
to bypass security on files. but as any open source PDF consumer
|
|
may be easily modified to bypass basic PDF document security,
|
|
and qpdf offers may transformations that can do this as well,
|
|
there seems to be little point in the added complexity of
|
|
conditionally enforcing document security.
|
|
</para>
|
|
</footnote>
|
|
<classname>QPDF</classname> will allow recovery of a user password
|
|
given an owner password. The input PDF file must be seekable.
|
|
(Output files written by <classname>QPDFWriter</classname> need
|
|
not be seekable, even when creating linearized files.) During
|
|
construction, <classname>QPDF</classname> validates the PDF file's
|
|
header, and then reads the cross reference tables and trailer
|
|
dictionaries. The <classname>QPDF</classname> class keeps only
|
|
the first trailer dictionary though it does read all of them so it
|
|
can check the <literal>/Prev</literal> key.
|
|
<classname>QPDF</classname> class users may request the root
|
|
object and the trailer dictionary specifically. The cross
|
|
reference table is kept private. Objects may then be requested by
|
|
number of by walking the object tree.
|
|
</para>
|
|
<para>
|
|
When a PDF file has a cross-reference stream instead of a
|
|
cross-reference table and trailer, requesting the document's
|
|
trailer dictionary returns the stream dictionary from the
|
|
cross-reference stream instead.
|
|
</para>
|
|
<para>
|
|
There are some convenience routines for very common operations
|
|
such as walking the page tree and returning a vector of all page
|
|
objects. For full details, please see the header files
|
|
<filename>QPDF.hh</filename> and
|
|
<filename>QPDFObjectHandle.hh</filename>. There are also some
|
|
additional helper classes that provide higher level API functions
|
|
for certain document constructions. These are discussed in <xref
|
|
linkend="ref.helper-classes"/>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.helper-classes">
|
|
<title>Helper Classes</title>
|
|
<para>
|
|
QPDF version 8.1 introduced the concept of helper classes. Helper
|
|
classes are intended to contain higher level APIs that allow
|
|
developers to work with certain document constructs at an
|
|
abstraction level above that of
|
|
<classname>QPDFObjectHandle</classname> while staying true to
|
|
qpdf's philosophy of not hiding document structure from the
|
|
developer. As with qpdf in general, the goal is take away some of
|
|
the more tedious bookkeeping aspects of working with PDF files,
|
|
not to remove the need for the developer to understand how the PDF
|
|
construction in question works. The driving factor behind the
|
|
creation of helper classes was to allow the evolution of higher
|
|
level interfaces in qpdf without polluting the interfaces of the
|
|
main top-level classes <classname>QPDF</classname> and
|
|
<classname>QPDFObjectHandle</classname>.
|
|
</para>
|
|
<para>
|
|
There are two kinds of helper classes:
|
|
<emphasis>document</emphasis> helpers and
|
|
<emphasis>object</emphasis> helpers. Document helpers are
|
|
constructed with a reference to a <classname>QPDF</classname>
|
|
object and provide methods for working with structures that are at
|
|
the document level. Object helpers are constructed with an
|
|
instance of a <classname>QPDFObjectHandle</classname> and provide
|
|
methods for working with specific types of objects.
|
|
</para>
|
|
<para>
|
|
Examples of document helpers include
|
|
<classname>QPDFPageDocumentHelper</classname>, which contains
|
|
methods for operating on the document's page trees, such as
|
|
enumerating all pages of a document and adding and removing pages;
|
|
and <classname>QPDFAcroFormDocumentHelper</classname>, which
|
|
contains document-level methods related to interactive forms, such
|
|
as enumerating form fields and creating mappings between form
|
|
fields and annotations.
|
|
</para>
|
|
<para>
|
|
Examples of object helpers include
|
|
<classname>QPDFPageObjectHelper</classname> for performing
|
|
operations on pages such as page rotation and some operations on
|
|
content streams, <classname>QPDFFormFieldObjectHelper</classname>
|
|
for performing operations related to interactive form fields, and
|
|
<classname>QPDFAnnotationObjectHelper</classname> for working with
|
|
annotations.
|
|
</para>
|
|
<para>
|
|
It is always possible to retrieve the underlying
|
|
<classname>QPDF</classname> reference from a document helper and
|
|
the underlying <classname>QPDFObjectHandle</classname> reference
|
|
from an object helper. Helpers are designed to be helpers, not
|
|
wrappers. The intention is that, in general, it is safe to freely
|
|
intermix operations that use helpers with operations that use the
|
|
underlying objects. Document and object helpers do not attempt to
|
|
provide a complete interface for working with the things they are
|
|
helping with, nor do they attempt to encapsulate underlying
|
|
structures. They just provide a few methods to help with
|
|
error-prone, repetitive, or complex tasks. In some cases, a helper
|
|
object may cache some information that is expensive to gather. In
|
|
such cases, the helper classes are implemented so that their own
|
|
methods keep the cache consistent, and the header file will
|
|
provide a method to invalidate the cache and a description of what
|
|
kinds of operations would make the cache invalid. If in doubt, you
|
|
can always discard a helper class and create a new one with the
|
|
same underlying objects, which will ensure that you have discarded
|
|
any stale information.
|
|
</para>
|
|
<para>
|
|
By Convention, document helpers are called
|
|
<classname>QPDFSomethingDocumentHelper</classname> and are derived
|
|
from <classname>QPDFDocumentHelper</classname>, and object helpers
|
|
are called <classname>QPDFSomethingObjectHelper</classname> and
|
|
are derived from <classname>QPDFObjectHelper</classname>. For
|
|
details on specific helpers, please see their header files. You
|
|
can find them by looking at
|
|
<filename>include/qpdf/QPDF*DocumentHelper.hh</filename> and
|
|
<filename>include/qpdf/QPDF*ObjectHelper.hh</filename>.
|
|
</para>
|
|
<para>
|
|
In order to avoid creation of circular dependencies, the following
|
|
general guidelines are followed with helper classes:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Core class interfaces do not know about helper classes. For
|
|
example, no methods of <classname>QPDF</classname> or
|
|
<classname>QPDFObjectHandle</classname> will include helper
|
|
classes in their interfaces.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Interfaces of object helpers will usually not use document
|
|
helpers in their interfaces. This is because it is much more
|
|
useful for document helpers to have methods that return object
|
|
helpers. Most operations in PDF files start at the document
|
|
level and go from there to the object level rather than the
|
|
other way around. It can sometimes be useful to map back from
|
|
object-level structures to document-level structures. If there
|
|
is a desire to do this, it will generally be provided by a
|
|
method in the document helper class.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Most of the time, object helpers don't know about other object
|
|
helpers. However, in some cases, one type of object may be a
|
|
container for another type of object, in which case it may make
|
|
sense for the outer object to know about the inner object. For
|
|
example, there are methods in the
|
|
<classname>QPDFPageObjectHelper</classname> that know
|
|
<classname>QPDFAnnotationObjectHelper</classname> because
|
|
references to annotations are contained in page dictionaries.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Any helper or core library class may use helpers in their
|
|
implementations.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Prior to qpdf version 8.1, higher level interfaces were added as
|
|
“convenience functions” in either
|
|
<classname>QPDF</classname> or
|
|
<classname>QPDFObjectHandle</classname>. For compatibility, older
|
|
convenience functions for operating with pages will remain in
|
|
those classes even as alternatives are provided in helper classes.
|
|
Going forward, new higher level interfaces will be provided using
|
|
helper classes.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.implementation-notes">
|
|
<title>Implementation Notes</title>
|
|
<para>
|
|
This section contains a few notes about QPDF's internal
|
|
implementation, particularly around what it does when it first
|
|
processes a file. This section is a bit of a simplification of
|
|
what it actually does, but it could serve as a starting point to
|
|
someone trying to understand the implementation. There is nothing
|
|
in this section that you need to know to use the qpdf library.
|
|
</para>
|
|
<para>
|
|
<classname>QPDFObject</classname> is the basic PDF Object class.
|
|
It is an abstract base class from which are derived classes for
|
|
each type of PDF object. Clients do not interact with Objects
|
|
directly but instead interact with
|
|
<classname>QPDFObjectHandle</classname>.
|
|
</para>
|
|
<para>
|
|
When the <classname>QPDF</classname> class creates a new object,
|
|
it dynamically allocates the appropriate type of
|
|
<classname>QPDFObject</classname> and immediately hands the
|
|
pointer to an instance of <classname>QPDFObjectHandle</classname>.
|
|
The parser reads a token from the current file position. If the
|
|
token is a not either a dictionary or array opener, an object is
|
|
immediately constructed from the single token and the parser
|
|
returns. Otherwise, the parser iterates in a special mode in which
|
|
it accumulates objects until it finds a balancing closer. During
|
|
this process, the “<literal>R</literal>” keyword is
|
|
recognized and an indirect <classname>QPDFObjectHandle</classname>
|
|
may be constructed.
|
|
</para>
|
|
<para>
|
|
The <function>QPDF::resolve()</function> method, which is used to
|
|
resolve an indirect object, may be invoked from the
|
|
<classname>QPDFObjectHandle</classname> class. It first checks a
|
|
cache to see whether this object has already been read. If not,
|
|
it reads the object from the PDF file and caches it. It the
|
|
returns the resulting <classname>QPDFObjectHandle</classname>.
|
|
The calling object handle then replaces its
|
|
<classname>PointerHolder<QDFObject></classname> with the one
|
|
from the newly returned <classname>QPDFObjectHandle</classname>.
|
|
In this way, only a single copy of any direct object need exist
|
|
and clients can access objects transparently without knowing
|
|
caring whether they are direct or indirect objects. Additionally,
|
|
no object is ever read from the file more than once. That means
|
|
that only the portions of the PDF file that are actually needed
|
|
are ever read from the input file, thus allowing the qpdf package
|
|
to take advantage of this important design goal of PDF files.
|
|
</para>
|
|
<para>
|
|
If the requested object is inside of an object stream, the object
|
|
stream itself is first read into memory. Then the tokenizer reads
|
|
objects from the memory stream based on the offset information
|
|
stored in the stream. Those individual objects are cached, after
|
|
which the temporary buffer holding the object stream contents are
|
|
discarded. In this way, the first time an object in an object
|
|
stream is requested, all objects in the stream are cached.
|
|
</para>
|
|
<para>
|
|
The following example should clarify how
|
|
<classname>QPDF</classname> processes a simple file.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Client constructs <classname>QPDF</classname>
|
|
<varname>pdf</varname> and calls
|
|
<function>pdf.processFile("a.pdf");</function>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <classname>QPDF</classname> class checks the beginning of
|
|
<filename>a.pdf</filename> for a PDF header. It then reads the
|
|
cross reference table mentioned at the end of the file,
|
|
ensuring that it is looking before the last
|
|
<literal>%%EOF</literal>. After getting to
|
|
<literal>trailer</literal> keyword, it invokes the parser.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The parser sees “<literal><<</literal>”, so
|
|
it calls itself recursively in dictionary creation mode.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In dictionary creation mode, the parser keeps accumulating
|
|
objects until it encounters
|
|
“<literal>>></literal>”. Each object that is
|
|
read is pushed onto a stack. If
|
|
“<literal>R</literal>” is read, the last two
|
|
objects on the stack are inspected. If they are integers, they
|
|
are popped off the stack and their values are used to construct
|
|
an indirect object handle which is then pushed onto the stack.
|
|
When “<literal>>></literal>” is finally read,
|
|
the stack is converted into a
|
|
<classname>QPDF_Dictionary</classname> which is placed in a
|
|
<classname>QPDFObjectHandle</classname> and returned.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The resulting dictionary is saved as the trailer dictionary.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <literal>/Prev</literal> key is searched. If present,
|
|
<classname>QPDF</classname> seeks to that point and repeats
|
|
except that the new trailer dictionary is not saved. If
|
|
<literal>/Prev</literal> is not present, the initial parsing
|
|
process is complete.
|
|
</para>
|
|
<para>
|
|
If there is an encryption dictionary, the document's encryption
|
|
parameters are initialized.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The client requests root object. The
|
|
<classname>QPDF</classname> class gets the value of root key
|
|
from trailer dictionary and returns it. It is an unresolved
|
|
indirect <classname>QPDFObjectHandle</classname>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The client requests the <literal>/Pages</literal> key from root
|
|
<classname>QPDFObjectHandle</classname>. The
|
|
<classname>QPDFObjectHandle</classname> notices that it is
|
|
indirect so it asks <classname>QPDF</classname> to resolve it.
|
|
<classname>QPDF</classname> looks in the object cache for an
|
|
object with the root dictionary's object ID and generation
|
|
number. Upon not seeing it, it checks the cross reference
|
|
table, gets the offset, and reads the object present at that
|
|
offset. It stores the result in the object cache and returns
|
|
the cached result. The calling
|
|
<classname>QPDFObjectHandle</classname> replaces its object
|
|
pointer with the one from the resolved
|
|
<classname>QPDFObjectHandle</classname>, verifies that it a
|
|
valid dictionary object, and returns the (unresolved indirect)
|
|
<classname>QPDFObject</classname> handle to the top of the
|
|
Pages hierarchy.
|
|
</para>
|
|
<para>
|
|
As the client continues to request objects, the same process is
|
|
followed for each new requested object.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.casting">
|
|
<title>Casting Policy</title>
|
|
<para>
|
|
This section describes the casting policy followed by qpdf's
|
|
implementation. This is no concern to qpdf's end users and
|
|
largely of no concern to people writing code that uses qpdf, but
|
|
it could be of interest to people who are porting qpdf to a new
|
|
platform or who are making modifications to the code.
|
|
</para>
|
|
<para>
|
|
The C++ code in qpdf is free of old-style casts except where
|
|
unavoidable (e.g. where the old-style cast is in a macro provided
|
|
by a third-party header file). When there is a need for a cast,
|
|
it is handled, in order of preference, by rewriting the code to
|
|
avoid the need for a cast, calling
|
|
<function>const_cast</function>, calling
|
|
<function>static_cast</function>, calling
|
|
<function>reinterpret_cast</function>, or calling some combination
|
|
of the above. As a last resort, a compiler-specific
|
|
<literal>#pragma</literal> may be used to suppress a warning that
|
|
we don't want to fix. Examples may include suppressing warnings
|
|
about the use of old-style casts in code that is shared between C
|
|
and C++ code.
|
|
</para>
|
|
<para>
|
|
The casting policy explicitly prohibits casting between integer
|
|
sizes for no purpose other than to quiet a compiler warning when
|
|
there is no reasonable chance of a problem resulting. The reason
|
|
for this exclusion is that the practice of adding these additional
|
|
casts precludes future use of additional compiler warnings as a
|
|
tool for making future improvements to this aspect of the code,
|
|
and it also damages the readability of the code.
|
|
</para>
|
|
<para>
|
|
There are a few significant areas where casting is common in the
|
|
qpdf sources or where casting would be required to quiet higher
|
|
levels of compiler warnings but is omitted at present:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<type>char</type> vs. <type>unsigned char</type>. For
|
|
historical reasons, there are a lot of places in qpdf's
|
|
internals that deal with <type>unsigned char</type>, which
|
|
means that a lot of casting is required to interoperate with
|
|
standard library calls and <type>std::string</type>. In
|
|
retrospect, qpdf should have probably used regular (signed)
|
|
<type>char</type> and <type>char*</type> everywhere and just
|
|
cast to <type>unsigned char</type> when needed, but it's too
|
|
late to make that change now. There are
|
|
<function>reinterpret_cast</function> calls to go between
|
|
<type>char*</type> and <type>unsigned char*</type>, and there
|
|
are <function>static_cast</function> calls to go between
|
|
<type>char</type> and <type>unsigned char</type>. These should
|
|
always be safe.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Non-const <type>unsigned char*</type> used in the
|
|
<type>Pipeline</type> interface. The pipeline interface has a
|
|
<function>write</function> call that uses <type>unsigned
|
|
char*</type> without a <type>const</type> qualifier. The main
|
|
reason for this is to support pipelines that make calls to
|
|
third-party libraries, such as zlib, that don't include
|
|
<type>const</type> in their interfaces. Unfortunately, there
|
|
are many places in the code where it is desirable to have
|
|
<type>const char*</type> with pipelines. None of the pipeline
|
|
implementations in qpdf currently modify the data passed to
|
|
write, and doing so would be counter to the intent of
|
|
<type>Pipeline</type>, but there is nothing in the code to
|
|
prevent this from being done. There are places in the code
|
|
where <function>const_cast</function> is used to remove the
|
|
const-ness of pointers going into <type>Pipeline</type>s. This
|
|
could theoretically be unsafe, but there is adequate testing to
|
|
assert that it is safe and will remain safe in qpdf's code.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
|
|
pretty much unavoidable since sizes are unsigned types and
|
|
offsets are signed types. Whenever it is necessary to seek by
|
|
an amount given by a <type>size_t</type>, it becomes necessary
|
|
to mix and match between <type>size_t</type> and
|
|
<type>qpdf_offset_t</type>. Additionally, qpdf sometimes
|
|
treats memory buffers like files (as with
|
|
<type>BufferInputSource</type>, and those seek interfaces have
|
|
to be consistent with file-based input sources. Neither gcc
|
|
nor MSVC give warnings for this case by default, but both have
|
|
warning flags that can enable this. (MSVC:
|
|
<option>/W14267</option> or <option>/W3</option>, which also
|
|
enables some additional warnings that we ignore; gcc:
|
|
<option>-Wconversion -Wsign-conversion</option>). This could
|
|
matter for files whose sizes are larger than
|
|
2<superscript>63</superscript> bytes, but it is reasonable to
|
|
expect that a world where such files are common would also have
|
|
larger <type>size_t</type> and <type>qpdf_offset_t</type> types
|
|
in it. On most 64-bit systems at the time of this writing (the
|
|
release of version 4.1.0 of qpdf), both <type>size_t</type> and
|
|
<type>qpdf_offset_t</type> are 64-bit integer types, while on
|
|
many current 32-bit systems, <type>size_t</type> is a 32-bit
|
|
type while <type>qpdf_offset_t</type> is a 64-bit type. I am
|
|
not aware of any cases where 32-bit systems that have
|
|
<type>size_t</type> smaller than <type>qpdf_offset_t</type>
|
|
could run into problems. Although I can't conclusively rule
|
|
out the possibility of such problems existing, I suspect any
|
|
cases would be pretty contrived. In the event that someone
|
|
should produce a file that qpdf can't handle because of what is
|
|
suspected to be issues involving the handling of
|
|
<type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
|
|
may behave properly on 64-bit systems but not on 32-bit systems
|
|
because they have very large embedded files or streams, for
|
|
example), the above mentioned warning flags could be enabled
|
|
and all those implicit conversions could be carefully
|
|
scrutinized. (I have already gone through that exercise once
|
|
in adding support for files larger than 4 GB in size.) I
|
|
continue to be committed to supporting large files on 32-bit
|
|
systems, but I would not go to any lengths to support corner
|
|
cases involving large embedded files or large streams that work
|
|
on 64-bit systems but not on 32-bit systems because of
|
|
<type>size_t</type> being too small. It is reasonable to
|
|
assume that anyone working with such files would be using a
|
|
64-bit system anyway since many 32-bit applications would have
|
|
similar difficulties.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<type>size_t</type> vs. <type>int</type> or <type>long</type>.
|
|
There are some cases where <type>size_t</type> and
|
|
<type>int</type> or <type>long</type> or <type>size_t</type>
|
|
and <type>unsigned int</type> or <type>unsigned long</type> are
|
|
used interchangeably. These cases occur when working with very
|
|
small amounts of memory, such as with the bit readers (where
|
|
we're working with just a few bytes at a time), some cases of
|
|
<function>strlen</function>, and a few other cases. I have
|
|
scrutinized all of these cases and determined them to be safe,
|
|
but there is no mechanism in the code to ensure that new unsafe
|
|
conversions between <type>int</type> and <type>size_t</type>
|
|
aren't introduced short of good testing and strong awareness of
|
|
the issues. Again, if any such bugs are suspected in the
|
|
future, enabling the additional warning flags and scrutinizing
|
|
the warnings would be in order.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
To be clear, I believe qpdf to be well-behaved with respect to
|
|
sizes and offsets, and qpdf's test suite includes actual
|
|
generation and full processing of files larger than 4 GB in
|
|
size. The issues raised here are largely academic and should not
|
|
in any way be interpreted to mean that qpdf has practical problems
|
|
involving sloppiness with integer types. I also believe that
|
|
appropriate measures have been taken in the code to avoid problems
|
|
with signed vs. unsigned integers from resulting in memory
|
|
overwrites or other issues with potential security implications,
|
|
though there are never any absolute guarantees.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.encryption">
|
|
<title>Encryption</title>
|
|
<para>
|
|
Encryption is supported transparently by qpdf. When opening a PDF
|
|
file, if an encryption dictionary exists, the
|
|
<classname>QPDF</classname> object processes this dictionary using
|
|
the password (if any) provided. The primary decryption key is
|
|
computed and cached. No further access is made to the encryption
|
|
dictionary after that time. When an object is read from a file,
|
|
the object ID and generation of the object in which it is
|
|
contained is always known. Using this information along with the
|
|
stored encryption key, all stream and string objects are
|
|
transparently decrypted. Raw encrypted objects are never stored
|
|
in memory. This way, nothing in the library ever has to know or
|
|
care whether it is reading an encrypted file.
|
|
</para>
|
|
<para>
|
|
An interface is also provided for writing encrypted streams and
|
|
strings given an encryption key. This is used by
|
|
<classname>QPDFWriter</classname> when it rewrites encrypted
|
|
files.
|
|
</para>
|
|
<para>
|
|
When copying encrypted files, unless otherwise directed, qpdf will
|
|
preserve any encryption in force in the original file. qpdf can
|
|
do this with either the user or the owner password. There is no
|
|
difference in capability based on which password is used. When 40
|
|
or 128 bit encryption keys are used, the user password can be
|
|
recovered with the owner password. With 256 keys, the user and
|
|
owner passwords are used independently to encrypt the actual
|
|
encryption key, so while either can be used, the owner password
|
|
can no longer be used to recover the user password.
|
|
</para>
|
|
<para>
|
|
Starting with version 4.0.0, qpdf can read files that are not
|
|
encrypted but that contain encrypted attachments, but it cannot
|
|
write such files. qpdf also requires the password to be specified
|
|
in order to open the file, not just to extract attachments, since
|
|
once the file is open, all decryption is handled transparently.
|
|
When copying files like this while preserving encryption, qpdf
|
|
will apply the file's encryption to everything in the file, not
|
|
just to the attachments. When decrypting the file, qpdf will
|
|
decrypt the attachments. In general, when copying PDF files with
|
|
multiple encryption formats, qpdf will choose the newest format.
|
|
The only exception to this is that clear-text metadata will be
|
|
preserved as clear-text if it is that way in the original file.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.random-numbers">
|
|
<title>Random Number Generation</title>
|
|
<para>
|
|
QPDF generates random numbers to support generation of encrypted
|
|
data. Versions prior to 5.0.1 used <function>random</function> or
|
|
<function>rand</function> from <filename>stdlib</filename> to
|
|
generate random numbers. Version 5.0.1, if available, used
|
|
operating system-provided secure random number generation instead,
|
|
enabling use of <filename>stdlib</filename> random number
|
|
generation only if enabled by a compile-time option. Starting in
|
|
version 5.1.0, use of insecure random numbers was disabled unless
|
|
enabled at compile time. Starting in version 5.1.0, it is also
|
|
possible for you to disable use of OS-provided secure random
|
|
numbers. This is especially useful on Windows if you want to
|
|
avoid a dependency on Microsoft's cryptography API. In this case,
|
|
you must provide your own random data provider. Regardless of how
|
|
you compile qpdf, starting in version 5.1.0, it is possible for
|
|
you to provide your own random data provider at runtime. This
|
|
would enable you to use some software-based secure pseudorandom
|
|
number generator and to avoid use of whatever the operating system
|
|
provides. For details on how to do this, please refer to the
|
|
top-level README.md file in the source distribution and to comments
|
|
in <filename>QUtil.hh</filename>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.adding-and-remove-pages">
|
|
<title>Adding and Removing Pages</title>
|
|
<para>
|
|
While qpdf's API has supported adding and modifying objects for
|
|
some time, version 3.0 introduces specific methods for adding and
|
|
removing pages. These are largely convenience routines that
|
|
handle two tricky issues: pushing inheritable resources from the
|
|
<literal>/Pages</literal> tree down to individual pages and
|
|
manipulation of the <literal>/Pages</literal> tree itself. For
|
|
details, see <function>addPage</function> and surrounding methods
|
|
in <filename>QPDF.hh</filename>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.reserved-objects">
|
|
<title>Reserving Object Numbers</title>
|
|
<para>
|
|
Version 3.0 of qpdf introduced the concept of reserved objects.
|
|
These are seldom needed for ordinary operations, but there are
|
|
cases in which you may want to add a series of indirect objects
|
|
with references to each other to a <classname>QPDF</classname>
|
|
object. This causes a problem because you can't determine the
|
|
object ID that a new indirect object will have until you add it to
|
|
the <classname>QPDF</classname> object with
|
|
<function>QPDF::makeIndirectObject</function>. The only way to
|
|
add two mutually referential objects to a
|
|
<classname>QPDF</classname> object prior to version 3.0 would be
|
|
to add the new objects first and then make them refer to each
|
|
other after adding them. Now it is possible to create a
|
|
<firstterm>reserved object</firstterm> using
|
|
<function>QPDFObjectHandle::newReserved</function>. This is an
|
|
indirect object that stays “unresolved” even if it is
|
|
queried for its type. So now, if you want to create a set of
|
|
mutually referential objects, you can create reservations for each
|
|
one of them and use those reservations to construct the
|
|
references. When finished, you can call
|
|
<function>QPDF::replaceReserved</function> to replace the reserved
|
|
objects with the real ones. This functionality will never be
|
|
needed by most applications, but it is used internally by QPDF
|
|
when copying objects from other PDF files, as discussed in <xref
|
|
linkend="ref.foreign-objects"/>. For an example of how to use
|
|
reserved objects, search for <function>newReserved</function> in
|
|
<filename>test_driver.cc</filename> in qpdf's sources.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.foreign-objects">
|
|
<title>Copying Objects From Other PDF Files</title>
|
|
<para>
|
|
Version 3.0 of qpdf introduced the ability to copy objects into a
|
|
<classname>QPDF</classname> object from a different
|
|
<classname>QPDF</classname> object, which we refer to as
|
|
<firstterm>foreign objects</firstterm>. This allows arbitrary
|
|
merging of PDF files. The “from”
|
|
<classname>QPDF</classname> object must remain valid after the
|
|
copy as discussed in the note below. The <command>qpdf</command>
|
|
command-line tool provides limited support for basic page
|
|
selection, including merging in pages from other files, but the
|
|
library's API makes it possible to implement arbitrarily complex
|
|
merging operations. The main method for copying foreign objects is
|
|
<function>QPDF::copyForeignObject</function>. This takes an
|
|
indirect object from another <classname>QPDF</classname> and
|
|
copies it recursively into this object while preserving all object
|
|
structure, including circular references. This means you can add a
|
|
direct object that you create from scratch to a
|
|
<classname>QPDF</classname> object with
|
|
<function>QPDF::makeIndirectObject</function>, and you can add an
|
|
indirect object from another file with
|
|
<function>QPDF::copyForeignObject</function>. The fact that
|
|
<function>QPDF::makeIndirectObject</function> does not
|
|
automatically detect a foreign object and copy it is an explicit
|
|
design decision. Copying a foreign object seems like a
|
|
sufficiently significant thing to do that it should be done
|
|
explicitly.
|
|
</para>
|
|
<para>
|
|
The other way to copy foreign objects is by passing a page from
|
|
one <classname>QPDF</classname> to another by calling
|
|
<function>QPDF::addPage</function>. In contrast to
|
|
<function>QPDF::makeIndirectObject</function>, this method
|
|
automatically distinguishes between indirect objects in the
|
|
current file, foreign objects, and direct objects.
|
|
</para>
|
|
<para>
|
|
Please note: when you copy objects from one
|
|
<classname>QPDF</classname> to another, the source
|
|
<classname>QPDF</classname> object must remain valid until you
|
|
have finished with the destination object. This is because the
|
|
original object is still used to retrieve any referenced stream
|
|
data from the copied object.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.rewriting">
|
|
<title>Writing PDF Files</title>
|
|
<para>
|
|
The qpdf library supports file writing of
|
|
<classname>QPDF</classname> objects to PDF files through the
|
|
<classname>QPDFWriter</classname> class. The
|
|
<classname>QPDFWriter</classname> class has two writing modes: one
|
|
for non-linearized files, and one for linearized files. See <xref
|
|
linkend="ref.linearization"/> for a description of linearization
|
|
is implemented. This section describes how we write
|
|
non-linearized files including the creation of QDF files (see
|
|
<xref linkend="ref.qdf"/>.
|
|
</para>
|
|
<para>
|
|
This outline was written prior to implementation and is not
|
|
exactly accurate, but it provides a correct “notional”
|
|
idea of how writing works. Look at the code in
|
|
<classname>QPDFWriter</classname> for exact details.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Initialize state:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
next object number = 1
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
object queue = empty
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
renumber table: old object id/generation to new id/0 = empty
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
xref table: new id -> offset = empty
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Create a QPDF object from a file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Write header for new PDF file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Request the trailer dictionary.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
For each value that is an indirect object, grab the next object
|
|
number (via an operation that returns and increments the
|
|
number). Map object to new number in renumber table. Push
|
|
object onto queue.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
While there are more objects on the queue:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Pop queue.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Look up object's new number <emphasis>n</emphasis> in the
|
|
renumbering table.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Store current offset into xref table.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Write <literal><replaceable>n</replaceable> 0 obj</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If object is null, whether direct or indirect, write out
|
|
null, thus eliminating unresolvable indirect object
|
|
references.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If the object is a stream stream, write stream contents,
|
|
piped through any filters as required, to a memory buffer.
|
|
Use this buffer to determine the stream length.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If object is not a stream, array, or dictionary, write out
|
|
its contents.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If object is an array or dictionary (including stream),
|
|
traverse its elements (for array) or values (for
|
|
dictionaries), handling recursive dictionaries and arrays,
|
|
looking for indirect objects. When an indirect object is
|
|
found, if it is not resolvable, ignore. (This case is
|
|
handled when writing it out.) Otherwise, look it up in the
|
|
renumbering table. If not found, grab the next available
|
|
object number, assign to the referenced object in the
|
|
renumbering table, and push the referenced object onto the
|
|
queue. As a special case, when writing out a stream
|
|
dictionary, replace length, filters, and decode parameters
|
|
as required.
|
|
</para>
|
|
<para>
|
|
Write out dictionary or array, replacing any unresolvable
|
|
indirect object references with null (pdf spec says
|
|
reference to non-existent object is legal and resolves to
|
|
null) and any resolvable ones with references to the
|
|
renumbered objects.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If the object is a stream, write
|
|
<literal>stream\n</literal>, the stream contents (from the
|
|
memory buffer), and <literal>\nendstream\n</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When done, write <literal>endobj</literal>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Once we have finished the queue, all referenced objects will have
|
|
been written out and all deleted objects or unreferenced objects
|
|
will have been skipped. The new cross-reference table will
|
|
contain an offset for every new object number from 1 up to the
|
|
number of objects written. This can be used to write out a new
|
|
xref table. Finally we can write out the trailer dictionary with
|
|
appropriately computed /ID (see spec, 8.3, File Identifiers), the
|
|
cross reference table offset, and <literal>%%EOF</literal>.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.filtered-streams">
|
|
<title>Filtered Streams</title>
|
|
<para>
|
|
Support for streams is implemented through the
|
|
<classname>Pipeline</classname> interface which was designed for
|
|
this package.
|
|
</para>
|
|
<para>
|
|
When reading streams, create a series of
|
|
<classname>Pipeline</classname> objects. The
|
|
<classname>Pipeline</classname> abstract base requires
|
|
implementation <function>write()</function> and
|
|
<function>finish()</function> and provides an implementation of
|
|
<function>getNext()</function>. Each pipeline object, upon
|
|
receiving data, does whatever it is going to do and then writes
|
|
the data (possibly modified) to its successor. Alternatively, a
|
|
pipeline may be an end-of-the-line pipeline that does something
|
|
like store its output to a file or a memory buffer ignoring a
|
|
successor. For additional details, look at
|
|
<filename>Pipeline.hh</filename>.
|
|
</para>
|
|
<para>
|
|
<classname>QPDF</classname> can read raw or filtered streams.
|
|
When reading a filtered stream, the <classname>QPDF</classname>
|
|
class creates a <classname>Pipeline</classname> object for one of
|
|
each appropriate filter object and chains them together. The last
|
|
filter should write to whatever type of output is required. The
|
|
<classname>QPDF</classname> class has an interface to write raw or
|
|
filtered stream contents to a given pipeline.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.linearization">
|
|
<title>Linearization</title>
|
|
<para>
|
|
This chapter describes how <classname>QPDF</classname> and
|
|
<classname>QPDFWriter</classname> implement creation and processing
|
|
of linearized PDFS.
|
|
</para>
|
|
<sect1 id="ref.linearization-strategy">
|
|
<title>Basic Strategy for Linearization</title>
|
|
<para>
|
|
To avoid the incestuous problem of having the qpdf library
|
|
validate its own linearized files, we have a special linearized
|
|
file checking mode which can be invoked via <command>qpdf
|
|
--check-linearization</command> (or <command>qpdf
|
|
--check</command>). This mode reads the linearization parameter
|
|
dictionary and the hint streams and validates that object
|
|
ordering, parameters, and hint stream contents are correct. The
|
|
validation code was first tested against linearized files created
|
|
by external tools (Acrobat and pdlin) and then used to validate
|
|
files created by <classname>QPDFWriter</classname> itself.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearized.preparation">
|
|
<title>Preparing For Linearization</title>
|
|
<para>
|
|
Before creating a linearized PDF file from any other PDF file, the
|
|
PDF file must be altered such that all page attributes are
|
|
propagated down to the page level (and not inherited from parents
|
|
in the <literal>/Pages</literal> tree). We also have to know
|
|
which objects refer to which other objects, being concerned with
|
|
page boundaries and a few other cases. We refer to this part of
|
|
preparing the PDF file as <firstterm>optimization</firstterm>,
|
|
discussed in <xref linkend="ref.optimization"/>. Note the, in
|
|
this context, the term <firstterm>optimization</firstterm> is a
|
|
qpdf term, and the term <firstterm>linearization</firstterm> is a
|
|
term from the PDF specification. Do not be confused by the fact
|
|
that many applications refer to linearization as optimization or
|
|
web optimization.
|
|
</para>
|
|
<para>
|
|
When creating linearized PDF files from optimized PDF files, there
|
|
are really only a few issues that need to be dealt with:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Creation of hints tables
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Placing objects in the correct order
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Filling in offsets and byte sizes
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.optimization">
|
|
<title>Optimization</title>
|
|
<para>
|
|
In order to perform various operations such as linearization and
|
|
splitting files into pages, it is necessary to know which objects
|
|
are referenced by which pages, page thumbnails, and root and
|
|
trailer dictionary keys. It is also necessary to ensure that all
|
|
page-level attributes appear directly at the page level and are
|
|
not inherited from parents in the pages tree.
|
|
</para>
|
|
<para>
|
|
We refer to the process of enforcing these constraints as
|
|
<firstterm>optimization</firstterm>. As mentioned above, note
|
|
that some applications refer to linearization as optimization.
|
|
Although this optimization was initially motivated by the need to
|
|
create linearized files, we are using these terms separately.
|
|
</para>
|
|
<para>
|
|
PDF file optimization is implemented in the
|
|
<filename>QPDF_optimization.cc</filename> source file. That file
|
|
is richly commented and serves as the primary reference for the
|
|
optimization process.
|
|
</para>
|
|
<para>
|
|
After optimization has been completed, the private member
|
|
variables <varname>obj_user_to_objects</varname> and
|
|
<varname>object_to_obj_users</varname> in
|
|
<classname>QPDF</classname> have been populated. Any object that
|
|
has more than one value in the
|
|
<varname>object_to_obj_users</varname> table is shared. Any
|
|
object that has exactly one value in the
|
|
<varname>object_to_obj_users</varname> table is private. To find
|
|
all the private objects in a page or a trailer or root dictionary
|
|
key, one merely has make this determination for each element in
|
|
the <varname>obj_user_to_objects</varname> table for the given
|
|
page or key.
|
|
</para>
|
|
<para>
|
|
Note that pages and thumbnails have different object user types,
|
|
so the above test on a page will not include objects referenced by
|
|
the page's thumbnail dictionary and nothing else.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization.writing">
|
|
<title>Writing Linearized Files</title>
|
|
<para>
|
|
We will create files with only primary hint streams. We will
|
|
never write overflow hint streams. (As of PDF version 1.4,
|
|
Acrobat doesn't either, and they are never necessary.) The hint
|
|
streams contain offset information to objects that point to where
|
|
they would be if the hint stream were not present. This means
|
|
that we have to calculate all object positions before we can
|
|
generate and write the hint table. This means that we have to
|
|
generate the file in two passes. To make this reliable,
|
|
<classname>QPDFWriter</classname> in linearization mode invokes
|
|
exactly the same code twice to write the file to a pipeline.
|
|
</para>
|
|
<para>
|
|
In the first pass, the target pipeline is a count pipeline chained
|
|
to a discard pipeline. The count pipeline simply passes its data
|
|
through to the next pipeline in the chain but can return the
|
|
number of bytes passed through it at any intermediate point. The
|
|
discard pipeline is an end of line pipeline that just throws its
|
|
data away. The hint stream is not written and dummy values with
|
|
adequate padding are stored in the first cross reference table,
|
|
linearization parameter dictionary, and /Prev key of the first
|
|
trailer dictionary. All the offset, length, object renumbering
|
|
information, and anything else we need for the second pass is
|
|
stored.
|
|
</para>
|
|
<para>
|
|
At the end of the first pass, this information is passed to the
|
|
<classname>QPDF</classname> class which constructs a compressed
|
|
hint stream in a memory buffer and returns it.
|
|
<classname>QPDFWriter</classname> uses this information to write a
|
|
complete hint stream object into a memory buffer. At this point,
|
|
the length of the hint stream is known.
|
|
</para>
|
|
<para>
|
|
In the second pass, the end of the pipeline chain is a regular
|
|
file instead of a discard pipeline, and we have known values for
|
|
all the offsets and lengths that we didn't have in the first pass.
|
|
We have to adjust offsets that appear after the start of the hint
|
|
stream by the length of the hint stream, which is known. Anything
|
|
that is of variable length is padded, with the padding code
|
|
surrounding any writing code that differs in the two passes. This
|
|
ensures that changes to the way things are represented never
|
|
results in offsets that were gathered during the first pass
|
|
becoming incorrect for the second pass.
|
|
</para>
|
|
<para>
|
|
Using this strategy, we can write linearized files to a
|
|
non-seekable output stream with only a single pass to disk or
|
|
wherever the output is going.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization-data">
|
|
<title>Calculating Linearization Data</title>
|
|
<para>
|
|
Once a file is optimized, we have information about which objects
|
|
access which other objects. We can then process these tables to
|
|
decide which part (as described in “Linearized PDF Document
|
|
Structure” in the PDF specification) each object is
|
|
contained within. This tells us the exact order in which objects
|
|
are written. The <classname>QPDFWriter</classname> class asks for
|
|
this information and enqueues objects for writing in the proper
|
|
order. It also turns on a check that causes an exception to be
|
|
thrown if an object is encountered that has not already been
|
|
queued. (This could happen only if there were a bug in the
|
|
traversal code used to calculate the linearization data.)
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization-issues">
|
|
<title>Known Issues with Linearization</title>
|
|
<para>
|
|
There are a handful of known issues with this linearization code.
|
|
These issues do not appear to impact the behavior of linearized
|
|
files which still work as intended: it is possible for a web
|
|
browser to begin to display them before they are fully
|
|
downloaded. In fact, it seems that various other programs that
|
|
create linearized files have many of these same issues. These
|
|
items make reference to terminology used in the linearization
|
|
appendix of the PDF specification.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Thread Dictionary information keys appear in part 4 with the
|
|
rest of Threads instead of in part 9. Objects in part 9 are
|
|
not grouped together functionally.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
We are not calculating numerators for shared object positions
|
|
within content streams or interleaving them within content
|
|
streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
We generate only page offset, shared object, and outline hint
|
|
tables. It would be relatively easy to add some additional
|
|
tables. We gather most of the information needed to create
|
|
thumbnail hint tables. There are comments in the code about
|
|
this.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.linearization-debugging">
|
|
<title>Debugging Note</title>
|
|
<para>
|
|
The <command>qpdf --show-linearization</command> command can show
|
|
the complete contents of linearization hint streams. To look at
|
|
the raw data, you can extract the filtered contents of the
|
|
linearization hint tables using <command>qpdf --show-object=n
|
|
--filtered-stream-data</command>. Then, to convert this into a
|
|
bit stream (since linearization tables are bit streams written
|
|
without regard to byte boundaries), you can pipe the resulting
|
|
data through the following perl code:
|
|
|
|
<programlisting>use bytes;
|
|
binmode STDIN;
|
|
undef $/;
|
|
my $a = <STDIN>;
|
|
my @ch = split(//, $a);
|
|
map { printf("%08b", ord($_)) } @ch;
|
|
print "\n";
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="ref.object-and-xref-streams">
|
|
<title>Object and Cross-Reference Streams</title>
|
|
<para>
|
|
This chapter provides information about the implementation of
|
|
object stream and cross-reference stream support in qpdf.
|
|
</para>
|
|
<sect1 id="ref.object-streams">
|
|
<title>Object Streams</title>
|
|
<para>
|
|
Object streams can contain any regular object except the
|
|
following:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
stream objects
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
objects with generation > 0
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
the encryption dictionary
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
objects containing the /Length of another stream
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
In addition, Adobe reader (at least as of version 8.0.0) appears
|
|
to not be able to handle having the document catalog appear in an
|
|
object stream if the file is encrypted, though this is not
|
|
specifically disallowed by the specification.
|
|
</para>
|
|
<para>
|
|
There are additional restrictions for linearized files. See <xref
|
|
linkend="ref.object-streams-linearization"/>for details.
|
|
</para>
|
|
<para>
|
|
The PDF specification refers to objects in object streams as
|
|
“compressed objects” regardless of whether the object
|
|
stream is compressed.
|
|
</para>
|
|
<para>
|
|
The generation number of every object in an object stream must be
|
|
zero. It is possible to delete and replace an object in an object
|
|
stream with a regular object.
|
|
</para>
|
|
<para>
|
|
The object stream dictionary has the following keys:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>/N</literal>: number of objects
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/First</literal>: byte offset of first object
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Extends</literal>: indirect reference to stream that
|
|
this extends
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Stream collections are formed with <literal>/Extends</literal>.
|
|
They must form a directed acyclic graph. These can be used for
|
|
semantic information and are not meaningful to the PDF document's
|
|
syntactic structure. Although qpdf preserves stream collections,
|
|
it never generates them and doesn't make use of this information
|
|
in any way.
|
|
</para>
|
|
<para>
|
|
The specification recommends limiting the number of objects in
|
|
object stream for efficiency in reading and decoding. Acrobat 6
|
|
uses no more than 100 objects per object stream for linearized
|
|
files and no more 200 objects per stream for non-linearized files.
|
|
<classname>QPDFWriter</classname>, in object stream generation
|
|
mode, never puts more than 100 objects in an object stream.
|
|
</para>
|
|
<para>
|
|
Object stream contents consists of <emphasis>N</emphasis> pairs of
|
|
integers, each of which is the object number and the byte offset
|
|
of the object relative to the first object in the stream, followed
|
|
by the objects themselves, concatenated.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.xref-streams">
|
|
<title>Cross-Reference Streams</title>
|
|
<para>
|
|
For non-hybrid files, the value following
|
|
<literal>startxref</literal> is the byte offset to the xref stream
|
|
rather than the word <literal>xref</literal>.
|
|
</para>
|
|
<para>
|
|
For hybrid files (files containing both xref tables and
|
|
cross-reference streams), the xref table's trailer dictionary
|
|
contains the key <literal>/XRefStm</literal> whose value is the
|
|
byte offset to a cross-reference stream that supplements the xref
|
|
table. A PDF 1.5-compliant application should read the xref table
|
|
first. Then it should replace any object that it has already seen
|
|
with any defined in the xref stream. Then it should follow any
|
|
<literal>/Prev</literal> pointer in the original xref table's
|
|
trailer dictionary. The specification is not clear about what
|
|
should be done, if anything, with a <literal>/Prev</literal>
|
|
pointer in the xref stream referenced by an xref table. The
|
|
<classname>QPDF</classname> class ignores it, which is probably
|
|
reasonable since, if this case were to appear for any sensible PDF
|
|
file, the previous xref table would probably have a corresponding
|
|
<literal>/XRefStm</literal> pointer of its own. For example, if a
|
|
hybrid file were appended, the appended section would have its own
|
|
xref table and <literal>/XRefStm</literal>. The appended xref
|
|
table would point to the previous xref table which would point the
|
|
<literal>/XRefStm</literal>, meaning that the new
|
|
<literal>/XRefStm</literal> doesn't have to point to it.
|
|
</para>
|
|
<para>
|
|
Since xref streams must be read very early, they may not be
|
|
encrypted, and the may not contain indirect objects for keys
|
|
required to read them, which are these:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Type</literal>: value <literal>/XRef</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Size</literal>: value <emphasis>n+1</emphasis>: where
|
|
<emphasis>n</emphasis> is highest object number (same as
|
|
<literal>/Size</literal> in the trailer dictionary)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Index</literal> (optional): value
|
|
<literal>[<replaceable>n count</replaceable> ...]</literal>
|
|
used to determine which objects' information is stored in this
|
|
stream. The default is <literal>[0 /Size]</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/Prev</literal>: value
|
|
<replaceable>offset</replaceable>: byte offset of previous xref
|
|
stream (same as <literal>/Prev</literal> in the trailer
|
|
dictionary)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>/W [...]</literal>: sizes of each field in the xref
|
|
table
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
The other fields in the xref stream, which may be indirect if
|
|
desired, are the union of those from the xref table's trailer
|
|
dictionary.
|
|
</para>
|
|
<sect2 id="ref.xref-stream-data">
|
|
<title>Cross-Reference Stream Data</title>
|
|
<para>
|
|
The stream data is binary and encoded in big-endian byte order.
|
|
Entries are concatenated, and each entry has a length equal to
|
|
the total of the entries in <literal>/W</literal> above. Each
|
|
entry consists of one or more fields, the first of which is the
|
|
type of the field. The number of bytes for each field is given
|
|
by <literal>/W</literal> above. A 0 in <literal>/W</literal>
|
|
indicates that the field is omitted and has the default value.
|
|
The default value for the field type is
|
|
“<literal>1</literal>”. All other default values are
|
|
“<literal>0</literal>”.
|
|
</para>
|
|
<para>
|
|
PDF 1.5 has three field types:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
0: for free objects. Format: <literal>0 obj
|
|
next-generation</literal>, same as the free table in a
|
|
traditional cross-reference table
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
1: regular non-compressed object. Format: <literal>1 offset
|
|
generation</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
2: for objects in object streams. Format: <literal>2
|
|
object-stream-number index</literal>, the number of object
|
|
stream containing the object and the index within the object
|
|
stream of the object.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
It seems standard to have the first entry in the table be
|
|
<literal>0 0 0</literal> instead of <literal>0 0 ffff</literal>
|
|
if there are no deleted objects.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
<sect1 id="ref.object-streams-linearization">
|
|
<title>Implications for Linearized Files</title>
|
|
<para>
|
|
For linearized files, the linearization dictionary, document
|
|
catalog, and page objects may not be contained in object streams.
|
|
</para>
|
|
<para>
|
|
Objects stored within object streams are given the highest range
|
|
of object numbers within the main and first-page cross-reference
|
|
sections.
|
|
</para>
|
|
<para>
|
|
It is okay to use cross-reference streams in place of regular xref
|
|
tables. There are on special considerations.
|
|
</para>
|
|
<para>
|
|
Hint data refers to object streams themselves, not the objects in
|
|
the streams. Shared object references should also be made to the
|
|
object streams. There are no reference in any hint tables to the
|
|
object numbers of compressed objects (objects within object
|
|
streams).
|
|
</para>
|
|
<para>
|
|
When numbering objects, all shared objects within both the first
|
|
and second halves of the linearized files must be numbered
|
|
consecutively after all normal uncompressed objects in that half.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="ref.object-stream-implementation">
|
|
<title>Implementation Notes</title>
|
|
<para>
|
|
There are three modes for writing object streams:
|
|
<option>disable</option>, <option>preserve</option>, and
|
|
<option>generate</option>. In disable mode, we do not generate
|
|
any object streams, and we also generate an xref table rather than
|
|
xref streams. This can be used to generate PDF files that are
|
|
viewable with older readers. In preserve mode, we write object
|
|
streams such that written object streams contain the same objects
|
|
and <literal>/Extends</literal> relationships as in the original
|
|
file. This is equal to disable if the file has no object streams.
|
|
In generate, we create object streams ourselves by grouping
|
|
objects that are allowed in object streams together in sets of no
|
|
more than 100 objects. We also ensure that the PDF version is at
|
|
least 1.5 in generate mode, but we preserve the version header in
|
|
the other modes. The default is <option>preserve</option>.
|
|
</para>
|
|
<para>
|
|
We do not support creation of hybrid files. When we write files,
|
|
even in preserve mode, we will lose any xref tables and merge any
|
|
appended sections.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<appendix id="ref.release-notes">
|
|
<title>Release Notes</title>
|
|
<para>
|
|
For a detailed list of changes, please see the file
|
|
<filename>ChangeLog</filename> in the source distribution.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>8.2.1: August 18, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Command-line Enhancements
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add
|
|
<option>--keep-files-open=<replaceable>[yn]</replaceable></option>
|
|
to override default determination of whether to keep files
|
|
open when merging. Please see the discussion of
|
|
<option>--keep-files-open</option> in <xref
|
|
linkend="ref.basic-options"/> for additional details.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>8.2.0: August 16, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Command-line Enhancements
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add <option>--no-warn</option> option to suppress issuing
|
|
warning messages. If there are any conditions that would
|
|
have caused warnings to be issued, the exit status is still
|
|
3.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug Fixes and Optimizations
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Performance fix: optimize page merging operation to avoid
|
|
unnecessary open/close calls on files being merged. This
|
|
solves a dramatic slow-down that was observed when merging
|
|
certain types of files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Optimize how memory was used for the TIFF predictor,
|
|
drastically improving performance and memory usage for files
|
|
containing high-resolution images compressed with Flate
|
|
using the TIFF predictor.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: end of line characters were not properly handled
|
|
inside strings in some cases.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: using <option>--progress</option> on very small
|
|
files could cause an infinite loop.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
API enhancements
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new class <classname>QPDFSystemError</classname>, derived
|
|
from <classname>std::runtime_error</classname>, which is now
|
|
thrown by <function>QUtil::throw_system_error</function>.
|
|
This enables the triggering <classname>errno</classname>
|
|
value to be retrieved.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add <function>ClosedFileInputSource::stayOpen</function>
|
|
method, enabling a
|
|
<classname>ClosedFileInputSource</classname> to stay open
|
|
during manually indicated periods of high activity, thus
|
|
reducing the overhead of frequent open/close operations.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Build Changes
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
For the mingw builds, change the name of the DLL import
|
|
library from <filename>libqpdf.a</filename> to
|
|
<filename>libqpdf.dll.a</filename> to more accurately
|
|
reflect that it is an import library rather than a static
|
|
library. This potentially clears the way for supporting a
|
|
static library in the future, though presently, the qpdf
|
|
Windows build only builds the DLL and executables.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>8.1.0: June 23, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Usability Improvements
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
When splitting files, qpdf detects fonts and images that the
|
|
document metadata claims are referenced from a page but are
|
|
not actually referenced and omits them from the output file.
|
|
This change can cause a significant reduction in the size of
|
|
split PDF files for files created by some software packages.
|
|
Prior versions of qpdf would believe the document metadata
|
|
and sometimes include all the images from all the other
|
|
pages even though the pages were no longer present. In the
|
|
unlikely event that the old behavior should be desired, it
|
|
can be enabled by specifying
|
|
<option>--preserve-unreferenced-resources</option>. For
|
|
additional details, please see <xref
|
|
linkend="ref.advanced-transformation"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When merging multiple PDF files, qpdf no longer leaves all
|
|
the files open. This makes it possible to merge numbers of
|
|
files that may exceed the operating system's limit for the
|
|
maximum number of open files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <option>--rotate</option> option's syntax has been
|
|
extended to make the page range optional. If you specify
|
|
<option>--rotate=<replaceable>angle</replaceable></option>
|
|
without specifying a page range, the rotation will be
|
|
applied to all pages. This can be especially useful for
|
|
adjusting a PDF created from a multi-page document that
|
|
was scanned upside down.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When merging multiple files, the <option>--verbose</option>
|
|
option now prints information about each file as it operates
|
|
on that file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When the <option>--progress</option> option is specified,
|
|
qpdf will print a running indicator of its best guess at how
|
|
far through the writing process it is. Note that, as with
|
|
all progress meters, it's an approximation. This option is
|
|
implemented in a way that makes it useful for software that
|
|
uses the qpdf library; see API Enhancements below.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug Fixes
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Properly decrypt files that use revision 3 of the standard
|
|
security handler but use 40 bit keys (even though revision 3
|
|
supports 128-bit keys).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Limit depth of nested data structures to prevent crashes
|
|
from certain types of malformed (malicious) PDFs.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In “newline before endstream” mode, insert the
|
|
required extra newline before the
|
|
<literal>endstream</literal> at the end of object streams.
|
|
This one case was previously omitted.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
API Enhancements
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The first round of higher level “helper”
|
|
interfaces has been introduced. These are designed to
|
|
provide a more convenient way of interacting with certain
|
|
document features than using
|
|
<classname>QPDFObjectHandle</classname> directly. For
|
|
details on helpers, see <xref
|
|
linkend="ref.helper-classes"/>. Specific additional
|
|
interfaces are described below.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add two new document helper classes:
|
|
<classname>QPDFPageDocumentHelper</classname> for working
|
|
with pages, and
|
|
<classname>QPDFAcroFormDocumentHelper</classname> for
|
|
working with interactive forms. No old methods have been
|
|
removed, but <classname>QPDFPageDocumentHelper</classname>
|
|
is now the preferred way to perform operations on pages
|
|
rather than calling the old methods in
|
|
<classname>QPDFObjectHandle</classname> and
|
|
<classname>QPDF</classname> directly. Comments in the header
|
|
files direct you to the new interfaces. Please see the
|
|
header files and <filename>ChangeLog</filename> for
|
|
additional details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add three new object helper class:
|
|
<classname>QPDFPageObjectHelper</classname> for pages,
|
|
<classname>QPDFFormFieldObjectHelper</classname> for
|
|
interactive form fields, and
|
|
<classname>QPDFAnnotationObjectHelper</classname> for
|
|
annotations. All three classes are fairly sparse at the
|
|
moment, but they have some useful, basic functionality.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A new example program
|
|
<filename>examples/pdf-set-form-values.cc</filename> has
|
|
been added that illustrates use of the new document and
|
|
object helpers.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The method
|
|
<function>QPDFWriter::registerProgressReporter</function>
|
|
has been added. This method allows you to register a
|
|
function that is called by <classname>QPDFWriter</classname>
|
|
to update your idea of the percentage it thinks it is
|
|
through writing its output. Client programs can use this to
|
|
implement reasonably accurate progress meters. The
|
|
<command>qpdf</command> command line tool uses this to
|
|
implement its <option>--progress</option> option.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New methods
|
|
<function>QPDFObjectHandle::newUnicodeString</function> and
|
|
<function>QPDFObject::unparseBinary</function> have been
|
|
added to allow for more convenient creation of strings that
|
|
are explicitly encoded using big-endian UTF-16. This is
|
|
useful for creating strings that appear outside of content
|
|
streams, such as labels, form fields, outlines, document
|
|
metadata, etc.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A new class
|
|
<classname>QPDFObjectHandle::Rectangle</classname> has been
|
|
added to ease working with PDF rectangles, which are just
|
|
arrays of four numeric values.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>8.0.2: March 6, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
When a loop is detected while following cross reference
|
|
streams or tables, treat this as damage instead of silently
|
|
ignoring the previous table. This prevents loss of otherwise
|
|
recoverable data in some damaged files.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Properly handle pages with no contents.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>8.0.1: March 4, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Disregard data check errors when uncompressing
|
|
<option>/FlateDecode</option> streams. This is consistent with
|
|
most other PDF readers and allows qpdf to recover data from
|
|
another class of malformed PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
On the command line when specifying page ranges, support
|
|
preceding a page number by “r” to indicate that it
|
|
should be counted from the end. For example, the range
|
|
<literal>r3-r1</literal> would indicate the last three pages
|
|
of a document.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>8.0.0: February 25, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Packaging and Distribution Changes
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
QPDF is now distributed as an <ulink
|
|
url="https://appimage.org/">AppImage</ulink> in addition to
|
|
all the other ways it is distributed. The AppImage can be
|
|
found in the download area with the other packages. Thanks
|
|
to Kurt Pfeifle and Simon Peter for their contributions.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug Fixes
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<function>QPDFObjectHandle::getUTF8Val</function> now
|
|
properly treats non-Unicode strings as encoded with PDF Doc
|
|
Encoding.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Improvements to handling of objects in PDF files that are
|
|
not of the expected type. In most cases, qpdf will be able
|
|
to warn for such cases rather than fail with an exception.
|
|
Previous versions of qpdf would sometimes fail with errors
|
|
such as “operation for dictionary object attempted on
|
|
object of wrong type”. This situation should be mostly
|
|
or entirely eliminated now.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Enhancements to the <command>qpdf</command> Command-line Tool.
|
|
All new options listed here are documented in more detail in
|
|
<xref linkend="ref.using"/>.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The option
|
|
<option>--linearize-pass1=<replaceable>file</replaceable></option>
|
|
has been added for debugging qpdf's linearization code.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The option <option>--coalesce-contents</option> can be used
|
|
to combine content streams of a page whose contents are an
|
|
array of streams into a single stream.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
API Enhancements. All new API calls are documented in their
|
|
respective classes' header files. There are no non-compatible
|
|
changes to the API.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add function <function>qpdf_check_pdf</function> to the C API.
|
|
This function does basic checking that is a subset of what
|
|
<command>qpdf --check</command> performs.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Major enhancements to the lexical layer of qpdf. For a
|
|
complete list of enhancements, please refer to the
|
|
<filename>ChangeLog</filename> file. Most of the changes
|
|
result in improvements to qpdf's ability handle erroneous
|
|
files. It is also possible for programs to handle
|
|
whitespace, comments, and inline images as tokens.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New API for working with PDF content streams at a lexical
|
|
level. The new class
|
|
<classname>QPDFObjectHandle::TokenFilter</classname> allows
|
|
the developer to provide token handlers. Token filters can be
|
|
used with several different methods in
|
|
<classname>QPDFObjectHandle</classname> as well as with a
|
|
lower-level interface. See comments in
|
|
<filename>QPDFObjectHandle.hh</filename> as well as the new
|
|
examples <filename>examples/pdf-filter-tokens.cc</filename>
|
|
and <filename>examples/pdf-count-strings.cc</filename> for
|
|
details.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>7.1.1: February 4, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: files whose /ID fields were other than 16 bytes long
|
|
can now be properly linearized
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A few compile and link issues have been corrected for some
|
|
platforms.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>7.1.0: January 14, 2018</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
PDF files contain streams that may be compressed with various
|
|
compression algorithms which, in some cases, may be enhanced
|
|
by various predictor functions. Previously only the PNG up
|
|
predictor was supported. In this version, all the PNG
|
|
predictors as well as the TIFF predictor are supported. This
|
|
increases the range of files that qpdf is able to handle.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
QPDF now allows a raw encryption key to be specified in place
|
|
of a password when opening encrypted files, and will
|
|
optionally display the encryption key used by a file. This is
|
|
a non-standard operation, but it can be useful in certain
|
|
situations. Please see the discussion of
|
|
<option>--password-is-hex-key</option> in <xref
|
|
linkend="ref.basic-options"/> or the comments around
|
|
<function>QPDF::setPasswordIsHexKey</function> in
|
|
<filename>QPDF.hh</filename> for additional details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: numbers ending with a trailing decimal point are now
|
|
properly recognized as numbers.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: when building qpdf from source on some platforms
|
|
(especially MacOS), the build could get confused by older
|
|
versions of qpdf installed on the system. This has been
|
|
corrected.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>7.0.0: September 15, 2017</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Packaging and Distribution Changes
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
QPDF's primary license is now <ulink
|
|
url="http://www.apache.org/licenses/LICENSE-2.0">version 2.0
|
|
of the Apache License</ulink> rather than version 2.0 of the
|
|
Artistic License. You may still, at your option, consider
|
|
qpdf to be licensed with version 2.0 of the Artistic
|
|
license.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
QPDF no longer has a dependency on the PCRE (Perl-Compatible
|
|
Regular Expression) library. QPDF now has an added
|
|
dependency on the JPEG library.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug Fixes
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
This release contains many bug fixes for various infinite
|
|
loops, memory leaks, and other memory errors that could be
|
|
encountered with specially crafted or otherwise erroneous
|
|
PDF files.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
New Features
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
QPDF now supports reading and writing streams encoded with
|
|
JPEG or RunLength encoding. Library API enhancements and
|
|
command-line options have been added to control this
|
|
behavior. See command-line options
|
|
<option>--compress-streams</option> and
|
|
<option>--decode-level</option> and methods
|
|
<function>QPDFWriter::setCompressStreams</function> and
|
|
<function>QPDFWriter::setDecodeLevel</function>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
QPDF is much better at recovering from broken files. In most
|
|
cases, qpdf will skip invalid objects and will preserve
|
|
broken stream data by not attempting to filter broken
|
|
streams. QPDF is now able to recover or at least not crash
|
|
on dozens of broken test files I have received over the past
|
|
few years.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Page rotation is now supported and accessible from both the
|
|
library and the command line.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<classname>QPDFWriter</classname> supports writing files in
|
|
a way that preserves PCLm compliance in support of
|
|
driverless printing. This is very specialized and is only
|
|
useful to applications that already know how to create PCLm
|
|
files.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Enhancements to the <command>qpdf</command> Command-line Tool.
|
|
All new options listed here are documented in more detail in
|
|
<xref linkend="ref.using"/>.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Command-line arguments can now be read from files or
|
|
standard input using <literal>@file</literal> or
|
|
<literal>@-</literal> syntax. Please see <xref
|
|
linkend="ref.invocation"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>--rotate</option>: request page rotation
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>--newline-before-endstream</option>: ensure that a
|
|
newline appears before every <literal>endstream</literal>
|
|
keyword in the file; used to prevent qpdf from breaking
|
|
PDF/A compliance on already compliant files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>--preserve-unreferenced</option>: preserve
|
|
unreferenced objects in the input PDF
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>--split-pages</option>: break output into chunks
|
|
with fixed numbers of pages
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>--verbose</option>: print the name of each output
|
|
file that is created
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<option>--compress-streams</option> and
|
|
<option>--decode-level</option> replace
|
|
<option>--stream-data</option> for improving granularity of
|
|
controlling compression and decompression of stream data.
|
|
The <option>--stream-data</option> option will remain
|
|
available.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When running <command>qpdf --check</command> with other
|
|
options, checks are always run first. This enables qpdf to
|
|
perform its full recovery logic before outputting other
|
|
information. This can be especially useful when manually
|
|
recovering broken files, looking at qpdf's regenerated cross
|
|
reference table, or other similar operations.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Process <command>--pages</command> earlier so that other
|
|
options like <option>--show-pages</option> or
|
|
<option>--split-pages</option> can operate on the file after
|
|
page splitting/merging has occurred.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
API Changes. All new API calls are documented in their
|
|
respective classes' header files.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<function>QPDFObjectHandle::rotatePage</function>: apply
|
|
rotation to a page object
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<function>QPDFWriter::setNewlineBeforeEndstream</function>:
|
|
force newline to appear before <literal>endstream</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<function>QPDFWriter::setPreserveUnreferencedObjects</function>:
|
|
preserve unreferenced objects that appear in the input PDF.
|
|
The default behavior is to discard them.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New <classname>Pipeline</classname> types
|
|
<classname>Pl_RunLength</classname> and
|
|
<classname>Pl_DCT</classname> are available for developers
|
|
who wish to produce or consume RunLength or DCT stream data
|
|
directly. The <filename>examples/pdf-create.cc</filename>
|
|
example illustrates their use.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<function>QPDFWriter::setCompressStreams</function> and
|
|
<function>QPDFWriter::setDecodeLevel</function> methods
|
|
control handling of different types of stream compression.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new C API functions
|
|
<function>qpdf_set_compress_streams</function>,
|
|
<function>qpdf_set_decode_level</function>,
|
|
<function>qpdf_set_preserve_unreferenced_objects</function>,
|
|
and <function>qpdf_set_newline_before_endstream</function>
|
|
corresponding to the new <classname>QPDFWriter</classname>
|
|
methods.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>6.0.0: November 10, 2015</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Implement <option>--deterministic-id</option> command-line
|
|
option and <function>QPDFWriter::setDeterministicID</function>
|
|
as well as C API function
|
|
<function>qpdf_set_deterministic_ID</function> for generating
|
|
a deterministic ID for non-encrypted files. When this option
|
|
is selected, the ID of the file depends on the contents of the
|
|
output file, and not on transient items such as the timestamp
|
|
or output file name.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Make qpdf more tolerant of files whose xref table entries are
|
|
not the correct length.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>5.1.3: May 24, 2015</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: fix-qdf was not properly handling files that
|
|
contained object streams with more than 255 objects in them.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: qpdf was not properly initializing Microsoft's secure
|
|
crypto provider on fresh Windows installations that had not
|
|
had any keys created yet.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix a few errors found by Gynvael Coldwind and
|
|
Mateusz Jurczyk of the Google Security Team. Please see the
|
|
ChangeLog for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Properly handle pages that have no contents at all. There were
|
|
many cases in which qpdf handled this fine, but a few methods
|
|
blindly obtained page contents with handling the possibility
|
|
that there were no contents.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Make qpdf more robust for a few more kinds of problems that
|
|
may occur in invalid PDF files.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>5.1.2: June 7, 2014</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: linearizing files could create a corrupted output
|
|
file under extremely unlikely file size circumstances. See
|
|
ChangeLog for details. The odds of getting hit by this are
|
|
very low, though one person did.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: qpdf would fail to write files that had streams with
|
|
decode parameters referencing other streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New example program: <command>pdf-split-pages</command>:
|
|
efficiently split PDF files into individual pages. The example
|
|
program does this more efficiently than using <command>qpdf
|
|
--pages</command> to do it.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Packaging fix: Visual C++ binaries did not support Windows XP.
|
|
This has been rectified by updating the compilers used to
|
|
generate the release binaries.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>5.1.1: January 14, 2014</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Performance fix: copying foreign objects could be very slow
|
|
with certain types of files. This was most likely to be
|
|
visible during page splitting and was due to traversing the
|
|
same objects multiple times in some cases.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>5.1.0: December 17, 2013</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Added runtime option
|
|
(<function>QUtil::setRandomDataProvider</function>) to supply
|
|
your own random data provider. You can use this if you want
|
|
to avoid using the OS-provided secure random number generation
|
|
facility or stdlib's less secure version. See comments in
|
|
include/qpdf/QUtil.hh for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fixed image comparison tests to not create 12-bit-per-pixel
|
|
images since some versions of tiffcmp have bugs in comparing
|
|
them in some cases. This increases the disk space required by
|
|
the image comparison tests, which are off by default anyway.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Introduce a number of small fixes for compilation on the
|
|
latest clang in MacOS and the latest Visual C++ in Windows.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Be able to handle broken files that end the xref table header
|
|
with a space instead of a newline.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>5.0.1: October 18, 2013</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Thanks to a detailed review by Florian Weimer and the Red Hat
|
|
Product Security Team, this release includes a number of
|
|
non-user-visible security hardening changes. Please see the
|
|
ChangeLog file in the source distribution for the complete
|
|
list.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When available, operating system-specific secure random number
|
|
generation is used for generating initialization vectors and
|
|
other random values used during encryption or file creation.
|
|
For the Windows build, this results in an added dependency on
|
|
Microsoft's cryptography API. To disable the OS-specific
|
|
cryptography and use the old version, pass the
|
|
<option>--enable-insecure-random</option> option to
|
|
<command>./configure</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <command>qpdf</command> command-line tool now issues a
|
|
warning when <option>-accessibility=n</option> is specified
|
|
for newer encryption versions stating that the option is
|
|
ignored. qpdf, per the spec, has always ignored this flag,
|
|
but it previously did so silently. This warning is issued
|
|
only by the command-line tool, not by the library. The
|
|
library's handling of this flag is unchanged.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>5.0.0: July 10, 2013</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: previous versions of qpdf would lose objects with
|
|
generation != 0 when generating object streams. Fixing this
|
|
required changes to the public API.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed methods from public API that were only supposed to be
|
|
called by QPDFWriter and couldn't realistically be called
|
|
anywhere else. See ChangeLog for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New <type>QPDFObjGen</type> class added to represent an object
|
|
ID/generation pair.
|
|
<function>QPDFObjectHandle::getObjGen()</function> is now
|
|
preferred over
|
|
<function>QPDFObjectHandle::getObjectID()</function> and
|
|
<function>QPDFObjectHandle::getGeneration()</function> as it
|
|
makes it less likely for people to accidentally write code
|
|
that ignores the generation number. See
|
|
<filename>QPDF.hh</filename> and
|
|
<filename>QPDFObjectHandle.hh</filename> for additional notes.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add <option>--show-npages</option> command-line option to the
|
|
<command>qpdf</command> command to show the number of pages in
|
|
a file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Allow omission of the page range within
|
|
<option>--pages</option> for the <command>qpdf</command>
|
|
command. When omitted, the page range is implicitly taken to
|
|
be all the pages in the file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Various enhancements were made to support different types of
|
|
broken files or broken readers. Details can be found in
|
|
<filename>ChangeLog</filename>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>4.1.0: April 14, 2013</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Note to people including qpdf in distributions: the
|
|
<filename>.la</filename> files generated by libtool are now
|
|
installed by qpdf's <command>make install</command> target.
|
|
Before, they were not installed. This means that if your
|
|
distribution does not want to include <filename>.la</filename>
|
|
files, you must remove them as part of your packaging process.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Major enhancement: API enhancements have been made to support
|
|
parsing of content streams. This enhancement includes the
|
|
following changes:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<function>QPDFObjectHandle::parseContentStream</function>
|
|
method parses objects in a content stream and calls
|
|
handlers in a callback class. The example
|
|
<filename>examples/pdf-parse-content.cc</filename>
|
|
illustrates how this may be used.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<type>QPDFObjectHandle</type> can now represent operators
|
|
and inline images, object types that may only appear in
|
|
content streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Method <function>QPDFObjectHandle::getTypeCode()</function>
|
|
returns an enumerated type value representing the
|
|
underlying object type. Method
|
|
<function>QPDFObjectHandle::getTypeName()</function>
|
|
returns a text string describing the name of the type of a
|
|
<type>QPDFObjectHandle</type> object. These methods can be
|
|
used for more efficient parsing and debugging/diagnostic
|
|
messages.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>qpdf --check</command> now parses all pages' content
|
|
streams in addition to doing other checks. While there are
|
|
still many types of errors that cannot be detected, syntactic
|
|
errors in content streams will now be reported.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Minor compilation enhancements have been made to facilitate
|
|
easier for support for a broader range of compilers and
|
|
compiler versions.
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Warning flags have been moved into a separate variable in
|
|
<filename>autoconf.mk</filename>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The configure flag <option>--enable-werror</option> work
|
|
for Microsoft compilers
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
All MSVC CRT security warnings have been resolved.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
All C-style casts in C++ Code have been replaced by C++
|
|
casts, and many casts that had been included to suppress
|
|
higher warning levels for some compilers have been removed,
|
|
primarily for clarity. Places where integer type coercion
|
|
occurs have been scrutinized. A new casting policy has
|
|
been documented in the manual. This is of concern mainly
|
|
to people porting qpdf to new platforms or compilers. It
|
|
is not visible to programmers writing code that uses the
|
|
library
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Some internal limits have been removed in code that
|
|
converts numbers to strings. This is largely invisible to
|
|
users, but it does trigger a bug in some older versions of
|
|
mingw-w64's C++ library. See
|
|
<filename>README-windows.md</filename> in the source
|
|
distribution if you think this may affect you. The copy of
|
|
the DLL distributed with qpdf's binary distribution is not
|
|
affected by this problem.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The RPM spec file previously included with qpdf has been
|
|
removed. This is because virtually all Linux distributions
|
|
include qpdf now that it is a dependency of CUPS filters.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A few bug fixes are included:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Overridden compressed objects are properly handled.
|
|
Before, there were certain constructs that could cause qpdf
|
|
to see old versions of some objects. The most usual
|
|
manifestation of this was loss of filled in form values for
|
|
certain files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Installation no longer uses GNU/Linux-specific versions of
|
|
some commands, so <command>make install</command> works on
|
|
Solaris with native tools.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The 64-bit mingw Windows binary package no longer includes
|
|
a 32-bit DLL.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>4.0.1: January 17, 2013</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix detection of binary attachments in test suite to avoid
|
|
false test failures on some platforms.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add clarifying comment in <filename>QPDF.hh</filename> to
|
|
methods that return the user password explaining that it is no
|
|
longer possible with newer encryption formats to recover the
|
|
user password knowing the owner password. In earlier
|
|
encryption formats, the user password was encrypted in the
|
|
file using the owner password. In newer encryption formats, a
|
|
separate encryption key is used on the file, and that key is
|
|
independently encrypted using both the user password and the
|
|
owner password.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>4.0.0: December 31, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Major enhancement: support has been added for newer encryption
|
|
schemes supported by version X of Adobe Acrobat. This
|
|
includes use of 127-character passwords, 256-bit encryption
|
|
keys, and the encryption scheme specified in ISO 32000-2, the
|
|
PDF 2.0 specification. This scheme can be chosen from the
|
|
command line by specifying use of 256-bit keys. qpdf also
|
|
supports the deprecated encryption method used by Acrobat IX.
|
|
This encryption style has known security weaknesses and should
|
|
not be used in practice. However, such files exist “in
|
|
the wild,” so support for this scheme is still useful.
|
|
New methods
|
|
<function>QPDFWriter::setR6EncryptionParameters</function>
|
|
(for the PDF 2.0 scheme) and
|
|
<function>QPDFWriter::setR5EncryptionParameters</function>
|
|
(for the deprecated scheme) have been added to enable these
|
|
new encryption schemes. Corresponding functions have been
|
|
added to the C API as well.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Full support for Adobe extension levels in PDF version
|
|
information. Starting with PDF version 1.7, corresponding to
|
|
ISO 32000, Adobe adds new functionality by increasing the
|
|
extension level rather than increasing the version. This
|
|
support includes addition of the
|
|
<function>QPDF::getExtensionLevel</function> method for
|
|
retrieving the document's extension level, addition of
|
|
versions of
|
|
<function>QPDFWriter::setMinimumPDFVersion</function> and
|
|
<function>QPDFWriter::forcePDFVersion</function> that accept
|
|
an extension level, and extended syntax for specifying forced
|
|
and minimum versions on the command line as described in <xref
|
|
linkend="ref.advanced-transformation"/>. Corresponding
|
|
functions have been added to the C API as well.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Minor fixes to prevent qpdf from referencing objects in the
|
|
file that are not referenced in the file's overall structure.
|
|
Most files don't have any such objects, but some files have
|
|
contain unreferenced objects with errors, so these fixes
|
|
prevent qpdf from needlessly rejecting or complaining about
|
|
such objects.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new generalized methods for reading and writing files
|
|
from/to programmer-defined sources. The method
|
|
<function>QPDF::processInputSource</function> allows the
|
|
programmer to use any input source for the input file, and
|
|
<function>QPDFWriter::setOutputPipeline</function> allows the
|
|
programmer to write the output file through any pipeline.
|
|
These methods would make it possible to perform any number of
|
|
specialized operations, such as accessing external storage
|
|
systems, creating bindings for qpdf in other programming
|
|
languages that have their own I/O systems, etc.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method <function>QPDF::getEncryptionKey</function> for
|
|
retrieving the underlying encryption key used in the file.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
This release includes a small handful of non-compatible API
|
|
changes. While effort is made to avoid such changes, all the
|
|
non-compatible API changes in this version were to parts of
|
|
the API that would likely never be used outside the library
|
|
itself. In all cases, the altered methods or structures were
|
|
parts of the <classname>QPDF</classname> that were public to
|
|
enable them to be called from either
|
|
<classname>QPDFWriter</classname> or were part of validation
|
|
code that was over-zealous in reporting problems in parts of
|
|
the file that would not ordinarily be referenced. In no case
|
|
did any of the removed methods do anything worse that falsely
|
|
report error conditions in files that were broken in ways that
|
|
didn't matter. The following public parts of the
|
|
<classname>QPDF</classname> class were changed in a
|
|
non-compatible way:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Updated nested <classname>QPDF::EncryptionData</classname>
|
|
class to add fields needed by the newer encryption formats,
|
|
member variables changed to private so that future changes
|
|
will not require breaking backward compatibility.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Added additional parameters to
|
|
<function>compute_data_key</function>, which is used by
|
|
<classname>QPDFWriter</classname> to compute the encryption
|
|
key used to encrypt a specific object.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed the method
|
|
<function>flattenScalarReferences</function>. This method
|
|
was previously used prior to writing a new PDF file, but it
|
|
has the undesired side effect of causing qpdf to read
|
|
objects in the file that were not referenced. Some
|
|
otherwise files have unreferenced objects with errors in
|
|
them, so this could cause qpdf to reject files that would
|
|
be accepted by virtually all other PDF readers. In fact,
|
|
qpdf relied on only a very small part of what
|
|
flattenScalarReferences did, so only this part has been
|
|
preserved, and it is now done directly inside
|
|
<classname>QPDFWriter</classname>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed the method <function>decodeStreams</function>.
|
|
This method was used by the <option>--check</option> option
|
|
of the <command>qpdf</command> command-line tool to force
|
|
all streams in the file to be decoded, but it also suffered
|
|
from the problem of opening otherwise unreferenced streams
|
|
and thus could report false positive. The
|
|
<option>--check</option> option now causes qpdf to go
|
|
through all the motions of writing a new file based on the
|
|
original one, so it will always reference and check exactly
|
|
those parts of a file that any ordinary viewer would check.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Removed the method
|
|
<function>trimTrailerForWrite</function>. This method was
|
|
used by <classname>QPDFWriter</classname> to modify the
|
|
original QPDF object by removing fields from the trailer
|
|
dictionary that wouldn't apply to the newly written file.
|
|
This functionality, though generally harmless, was a poor
|
|
implementation and has been replaced by having QPDFWriter
|
|
filter these out when copying the trailer rather than
|
|
modifying the original QPDF object. (Note that qpdf never
|
|
modifies the original file itself.)
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Allow the PDF header to appear anywhere in the first 1024
|
|
bytes of the file. This is consistent with what other readers
|
|
do.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix the <command>pkg-config</command> files to list zlib and
|
|
pcre in <function>Requires.private</function> to better
|
|
support static linking using <command>pkg-config</command>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>3.0.2: September 6, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: <function>QPDFWriter::setOutputMemory</function> did
|
|
not work when not used with
|
|
<function>QPDFWriter::setStaticID</function>, which made it
|
|
pretty much useless. This has been fixed.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New API call
|
|
<function>QPDFWriter::setExtraHeaderText</function> inserts
|
|
additional text near the header of the PDF file. The intended
|
|
use case is to insert comments that may be consumed by a
|
|
downstream application, though other use cases may exist.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>3.0.1: August 11, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Version 3.0.0 included addition of files for
|
|
<command>pkg-config</command>, but this was not mentioned in
|
|
the release notes. The release notes for 3.0.0 were updated
|
|
to mention this.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: if an object stream ended with a scalar object not
|
|
followed by space, qpdf would incorrectly report that it
|
|
encountered a premature EOF. This bug has been in qpdf since
|
|
version 2.0.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>3.0.0: August 2, 2012</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Acknowledgment: I would like to express gratitude for the
|
|
contributions of Tobias Hoffmann toward the release of qpdf
|
|
version 3.0. He is responsible for most of the implementation
|
|
and design of the new API for manipulating pages, and
|
|
contributed code and ideas for many of the improvements made
|
|
in version 3.0. Without his work, this release would
|
|
certainly not have happened as soon as it did, if at all.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Non-compatible API change:</emphasis> The version of
|
|
<function>QPDFObjectHandle::replaceStreamData</function> that
|
|
uses a <classname>StreamDataProvider</classname> no longer
|
|
requires (or accepts) a <varname>length</varname> parameter.
|
|
See <xref linkend="ref.upgrading-to-3.0"/> for an explanation.
|
|
While care is taken to avoid non-compatible API changes in
|
|
general, an exception was made this time because the new
|
|
interface offers an opportunity to significantly simplify
|
|
calling code.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Support has been added for large files. The test suite
|
|
verifies support for files larger than 4 gigabytes, and manual
|
|
testing has verified support for files larger than 10
|
|
gigabytes. Large file support is available for both 32-bit
|
|
and 64-bit platforms as long as the compiler and underlying
|
|
platforms support it.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Support for page selection (splitting and merging PDF files)
|
|
has been added to the <command>qpdf</command> command-line
|
|
tool. See <xref linkend="ref.page-selection"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Options have been added to the <command>qpdf</command>
|
|
command-line tool for copying encryption parameters from
|
|
another file. See <xref linkend="ref.basic-options"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New methods have been added to the <classname>QPDF</classname>
|
|
object for adding and removing pages. See <xref
|
|
linkend="ref.adding-and-remove-pages"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
New methods have been added to the <classname>QPDF</classname>
|
|
object for copying objects from other PDF files. See <xref
|
|
linkend="ref.foreign-objects"/>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
A new method <function>QPDFObjectHandle::parse</function> has
|
|
been added for constructing
|
|
<classname>QPDFObjectHandle</classname> objects from a string
|
|
description.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Methods have been added to <classname>QPDFWriter</classname>
|
|
to allow writing to an already open stdio <type>FILE*</type>
|
|
addition to writing to standard output or a named file.
|
|
Methods have been added to <classname>QPDF</classname> to be
|
|
able to process a file from an already open stdio
|
|
<type>FILE*</type>. This makes it possible to read and write
|
|
PDF from secure temporary files that have been unlinked prior
|
|
to being fully read or written.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <function>QPDF::emptyPDF</function> can be used to allow
|
|
creation of PDF files from scratch. The example
|
|
<filename>examples/pdf-create.cc</filename> illustrates how it
|
|
can be used.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Several methods to take
|
|
<classname>PointerHolder<Buffer></classname> can now
|
|
also accept <type>std::string</type> arguments.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Many new convenience methods have been added to the library,
|
|
most in <classname>QPDFObjectHandle</classname>. See
|
|
<filename>ChangeLog</filename> for a full list.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When building on a platform that supports ELF shared libraries
|
|
(such as Linux), symbol versions are enabled by default. They
|
|
can be disabled by passing
|
|
<option>--disable-ld-version-script</option> to
|
|
<command>./configure</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The file <filename>libqpdf.pc</filename> is now installed to
|
|
support <command>pkg-config</command>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Image comparison tests are off by default now since they are
|
|
not needed to verify a correct build or port of qpdf. They
|
|
are needed only when changing the actual PDF output generated
|
|
by qpdf. You should enable them if you are making deep
|
|
changes to qpdf itself. See <filename>README.md</filename> for
|
|
details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Large file tests are off by default but can be turned on with
|
|
<command>./configure</command> or by setting an environment
|
|
variable before running the test suite. See
|
|
<filename>README.md</filename> for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
When qpdf's test suite fails, failures are not printed to the
|
|
terminal anymore by default. Instead, find them in
|
|
<filename>build/qtest.log</filename>. For packagers who are
|
|
building with an autobuilder, you can add the
|
|
<option>--enable-show-failed-test-output</option> option to
|
|
<command>./configure</command> to restore the old behavior.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.3.1: December 28, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix thread-safety problem resulting from non-thread-safe use
|
|
of the PCRE library.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Made a few minor documentation fixes.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add workaround for a bug that appears in some versions of
|
|
ghostscript to the test suite
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix minor build issue for Visual C++ 2010.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.3.0: August 11, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Bug fix: when preserving existing encryption on encrypted
|
|
files with cleartext metadata, older qpdf versions would
|
|
generate password-protected files with no valid password.
|
|
This operation now works. This bug only affected files
|
|
created by copying existing encryption parameters; explicit
|
|
encryption with specification of cleartext metadata worked
|
|
before and continues to work.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Enhance <classname>QPDFWriter</classname> with a new
|
|
constructor that allows you to delay the specification of the
|
|
output file. When using this constructor, you may now call
|
|
<function>QPDFWriter::setOutputFilename</function> to specify
|
|
the output file, or you may use
|
|
<function>QPDFWriter::setOutputMemory</function> to cause
|
|
<classname>QPDFWriter</classname> to write the resulting PDF
|
|
file to a memory buffer. You may then use
|
|
<function>QPDFWriter::getBuffer</function> to retrieve the
|
|
memory buffer.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new API call <function>QPDF::replaceObject</function> for
|
|
replacing objects by object ID
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new API call <function>QPDF::swapObjects</function> for
|
|
swapping two objects by object ID
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add <function>QPDFObjectHandle::getDictAsMap</function> and
|
|
<function>QPDFObjectHandle::getArrayAsVector</function> to
|
|
allow retrieval of dictionary objects as maps and array
|
|
objects as vectors.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add functions <function>qpdf_get_info_key</function> and
|
|
<function>qpdf_set_info_key</function> to the C API for
|
|
manipulating string fields of the document's
|
|
<literal>/Info</literal> dictionary.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add functions <function>qpdf_init_write_memory</function>,
|
|
<function>qpdf_get_buffer_length</function>, and
|
|
<function>qpdf_get_buffer</function> to the C API for writing
|
|
PDF files to a memory buffer instead of a file.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.4: June 25, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix installation and compilation issues; no functionality
|
|
changes.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.3: April 30, 2011</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Handle some damaged streams with incorrect characters
|
|
following the stream keyword.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Improve handling of inline images when normalizing content
|
|
streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Enhance error recovery to properly handle files that use
|
|
object 0 as a regular object, which is specifically disallowed
|
|
by the spec.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.2: October 4, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new function <function>qpdf_read_memory</function>
|
|
to the C API to call
|
|
<function>QPDF::processMemoryFile</function>. This was an
|
|
omission in qpdf 2.2.1.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.1: October 1, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new method <function>QPDF::setOutputStreams</function>
|
|
to replace <varname>std::cout</varname> and
|
|
<varname>std::cerr</varname> with other streams for generation
|
|
of diagnostic messages and error messages. This can be useful
|
|
for GUIs or other applications that want to capture any output
|
|
generated by the library to present to the user in some other
|
|
way. Note that QPDF does not write to
|
|
<varname>std::cout</varname> (or the specified output stream)
|
|
except where explicitly mentioned in
|
|
<filename>QPDF.hh</filename>, and that the only use of the
|
|
error stream is for warnings. Note also that output of
|
|
warnings is suppressed when
|
|
<literal>setSuppressWarnings(true)</literal> is called.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method <function>QPDF::processMemoryFile</function>
|
|
for operating on PDF files that are loaded into memory rather
|
|
than in a file on disk.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Give a warning but otherwise ignore empty PDF objects by
|
|
treating them as null. Empty object are not permitted by the
|
|
PDF specification but have been known to appear in some actual
|
|
PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Handle inline image filter abbreviations when the appear as
|
|
stream filter abbreviations. The PDF specification does not
|
|
allow use of stream filter abbreviations in this way, but
|
|
Adobe Reader and some other PDF readers accept them since they
|
|
sometimes appear incorrectly in actual PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Implement miscellaneous enhancements to
|
|
<classname>PointerHolder</classname> and
|
|
<classname>Buffer</classname> to support other changes.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.2.0: August 14, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Add new methods to <classname>QPDFObjectHandle</classname>
|
|
(<function>newStream</function> and
|
|
<function>replaceStreamData</function> for creating new
|
|
streams and replacing stream data. This makes it possible to
|
|
perform a wide range of operations that were not previously
|
|
possible.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new helper method in
|
|
<classname>QPDFObjectHandle</classname>
|
|
(<function>addPageContents</function>) for appending or
|
|
prepending new content streams to a page. This method makes
|
|
it possible to manipulate content streams without having to be
|
|
concerned whether a page's contents are a single stream or an
|
|
array of streams.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method in <classname>QPDFObjectHandle</classname>:
|
|
<function>replaceOrRemoveKey</function>, which replaces a
|
|
dictionary key
|
|
with a given value unless the value is null, in which case it
|
|
removes the key instead.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add new method in <classname>QPDFObjectHandle</classname>:
|
|
<function>getRawStreamData</function>, which returns the raw
|
|
(unfiltered) stream data into a buffer. This complements the
|
|
<function>getStreamData</function> method, which returns the
|
|
filtered (uncompressed) stream data and can only be used when
|
|
the stream's data is filterable.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Provide two new examples:
|
|
<command>pdf-double-page-size</command> and
|
|
<command>pdf-invert-images</command> that illustrate the newly
|
|
added interfaces.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Fix a memory leak that would cause loss of a few bytes for
|
|
every object involved in a cycle of object references. Thanks
|
|
to Jian Ma for calling my attention to the leak.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.5: April 25, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Remove restriction of file identifier strings to 16 bytes.
|
|
This unnecessary restriction was preventing qpdf from being
|
|
able to encrypt or decrypt files with identifier strings that
|
|
were not exactly 16 bytes long. The specification imposes no
|
|
such restriction.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.4: April 18, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Apply the same padding calculation fix from version 2.1.2 to
|
|
the main cross reference stream as well.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Since <command>qpdf --check</command> only performs limited
|
|
checks, clarify the output to make it clear that there still
|
|
may be errors that qpdf can't check. This should make it less
|
|
surprising to people when another PDF reader is unable to read
|
|
a file that qpdf thinks is okay.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.3: March 27, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix bug that could cause a failure when rewriting PDF files
|
|
that contain object streams with unreferenced objects that in
|
|
turn reference indirect scalars.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Don't complain about (invalid) AES streams that aren't a
|
|
multiple of 16 bytes. Instead, pad them before decrypting.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.2: January 24, 2010</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Fix bug in padding around first half cross reference stream in
|
|
linearized files. The bug could cause an assertion failure
|
|
when linearizing certain unlucky files.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1.1: December 14, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
No changes in functionality; insert missing include in an
|
|
internal library header file to support gcc 4.4, and update
|
|
test suite to ignore broken Adobe Reader installations.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.1: October 30, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
This is the first version of qpdf to include Windows support.
|
|
On Windows, it is possible to build a DLL. Additionally, a
|
|
partial C-language API has been introduced, which makes it
|
|
possible to call qpdf functions from non-C++ environments. I
|
|
am very grateful to Žarko <!-- Gajić --> Gajic (<ulink
|
|
url="http://zarko-gajic.iz.hr/">http://zarko-gajic.iz.hr/</ulink>)
|
|
for tirelessly testing numerous pre-release versions of this
|
|
DLL and providing many excellent suggestions on improving the
|
|
interface.
|
|
</para>
|
|
<para>
|
|
For programming to the C interface, please see the header file
|
|
<filename>qpdf/qpdf-c.h</filename> and the example
|
|
<filename>examples/pdf-linearize.c</filename>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Žarko Gajic has written a Delphi wrapper for qpdf, which can
|
|
be downloaded from qpdf's download side. Žarko's Delphi
|
|
wrapper is released with the same licensing terms as qpdf
|
|
itself and comes with this disclaimer: “Delphi wrapper
|
|
unit <filename>qpdf.pas</filename> created by Žarko Gajic
|
|
(<ulink
|
|
url="http://zarko-gajic.iz.hr/">http://zarko-gajic.iz.hr/</ulink>).
|
|
Use at your own risk and for whatever purpose you want. No
|
|
support is provided. Sample code is provided.”
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Support has been added for AES encryption and crypt filters.
|
|
Although qpdf does not presently support files that use
|
|
PKI-based encryption, with the addition of AES and crypt
|
|
filters, qpdf is now be able to open most encrypted files
|
|
created with newer versions of Acrobat or other PDF creation
|
|
software. Note that I have not been able to get very many
|
|
files encrypted in this way, so it's possible there could
|
|
still be some cases that qpdf can't handle. Please report
|
|
them if you find them.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Many error messages have been improved to include more
|
|
information in hopes of making qpdf a more useful tool for PDF
|
|
experts to use in manually recovering damaged PDF files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Attempt to avoid compressing metadata streams if possible.
|
|
This is consistent with other PDF creation applications.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Provide new command-line options for AES encrypt, cleartext
|
|
metadata, and setting the minimum and forced PDF versions of
|
|
output files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Add additional methods to the <classname>QPDF</classname>
|
|
object for querying the document's permissions. Although qpdf
|
|
does not enforce these permissions, it does make them
|
|
available so that applications that use qpdf can enforce
|
|
permissions.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <option>--check</option> option to <command>qpdf</command>
|
|
has been extended to include some additional information.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
There have been a handful of non-compatible API changes. For
|
|
details, see <xref linkend="ref.upgrading-to-2.1"/>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.6: May 3, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Do not attempt to uncompress streams that have decode
|
|
parameters we don't recognize. Earlier versions of qpdf would
|
|
have rejected files with such streams.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.5: March 10, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Improve error handling in the LZW decoder, and fix a small
|
|
error introduced in the previous version with regard to
|
|
handling full tables. The LZW decoder has been more strongly
|
|
verified in this release.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.4: February 21, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Include proper support for LZW streams encoded without the
|
|
“early code change” flag. Special thanks to Atom
|
|
Smasher who reported the problem and provided an input file
|
|
compressed in this way, which I did not previously have.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Implement some improvements to file recovery logic.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.3: February 15, 2009</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Compile cleanly with gcc 4.4.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Handle strings encoded as UTF-16BE properly.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.2: June 30, 2008</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Update test suite to work properly with a
|
|
non-<command>bash</command> <filename>/bin/sh</filename> and
|
|
with Perl 5.10. No changes were made to the actual qpdf
|
|
source code itself for this release.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0.1: May 6, 2008</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
No changes in functionality or interface. This release
|
|
includes fixes to the source code so that qpdf compiles
|
|
properly and passes its test suite on a broader range of
|
|
platforms. See <filename>ChangeLog</filename> in the source
|
|
distribution for details.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>2.0: April 29, 2008</term>
|
|
<listitem>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
First public release.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</appendix>
|
|
<appendix id="ref.upgrading-to-2.1">
|
|
<title>Upgrading from 2.0 to 2.1</title>
|
|
<para>
|
|
Although, as a general rule, we like to avoid introducing
|
|
source-level incompatibilities in qpdf's interface, there were a
|
|
few non-compatible changes made in this version. A considerable
|
|
amount of source code that uses qpdf will probably compile without
|
|
any changes, but in some cases, you may have to update your code.
|
|
The changes are enumerated here. There are also some new
|
|
interfaces; for those, please refer to the header files.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
QPDF's exception handling mechanism now uses
|
|
<classname>std::logic_error</classname> for internal errors and
|
|
<classname>std::runtime_error</classname> for runtime errors in
|
|
favor of the now removed <classname>QEXC</classname> classes used
|
|
in previous versions. The <classname>QEXC</classname> exception
|
|
classes predated the addition of the
|
|
<filename><stdexcept></filename> header file to the C++
|
|
standard library. Most of the exceptions thrown by the qpdf
|
|
library itself are still of type <classname>QPDFExc</classname>
|
|
which is now derived from
|
|
<classname>std::runtime_error</classname>. Programs that caught
|
|
an instance of <classname>std::exception</classname> and
|
|
displayed it by calling the <function>what()</function> method
|
|
will not need to be changed.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The <classname>QPDFExc</classname> class now internally
|
|
represents various fields of the error condition and provides
|
|
interfaces for querying them. Among the fields is a numeric
|
|
error code that can help applications act differently on (a small
|
|
number of) different error conditions. See
|
|
<filename>QPDFExc.hh</filename> for details.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Warnings can be retrieved from qpdf as instances of
|
|
<classname>QPDFExc</classname> instead of strings.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The nested <classname>QPDF::EncryptionData</classname> class's
|
|
constructor takes an additional argument. This class is
|
|
primarily intended to be used by
|
|
<classname>QPDFWriter</classname>. There's not really anything
|
|
useful an end-user application could do with it. It probably
|
|
shouldn't really be part of the public interface to begin with.
|
|
Likewise, some of the methods for computing internal encryption
|
|
dictionary parameters have changed to support
|
|
<literal>/R=4</literal> encryption.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The method <function>QPDF::getUserPassword</function> has been
|
|
removed since it didn't do what people would think it did. There
|
|
are now two new methods:
|
|
<function>QPDF::getPaddedUserPassword</function> and
|
|
<function>QPDF::getTrimmedUserPassword</function>. The first one
|
|
does what the old <function>QPDF::getUserPassword</function>
|
|
method used to do, which is to return the password with possible
|
|
binary padding as specified by the PDF specification. The second
|
|
one returns a human-readable password string.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The enumerated types that used to be nested in
|
|
<classname>QPDFWriter</classname> have moved to top-level
|
|
enumerated types and are now defined in the file
|
|
<filename>qpdf/Constants.h</filename>. This enables them to be
|
|
shared by both the C and C++ interfaces.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</appendix>
|
|
<appendix id="ref.upgrading-to-3.0">
|
|
<title>Upgrading to 3.0</title>
|
|
<para>
|
|
For the most part, the API for qpdf version 3.0 is backward
|
|
compatible with versions 2.1 and later. There are two exceptions:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The method
|
|
<function>QPDFObjectHandle::replaceStreamData</function> that
|
|
uses a <classname>StreamDataProvider</classname> to provide the
|
|
stream data no longer takes a <varname>length</varname>
|
|
parameter. While it would have been easy enough to keep the
|
|
parameter for backward compatibility, in this case, the
|
|
parameter was removed since this provides the user an
|
|
opportunity to simplify the calling code. This method was
|
|
introduced in version 2.2. At the time, the
|
|
<varname>length</varname> parameter was required in order to
|
|
ensure that calls to the stream data provider returned the same
|
|
length for a specific stream every time they were invoked. In
|
|
particular, the linearization code depends on this. Instead,
|
|
qpdf 3.0 and newer check for that constraint explicitly. The
|
|
first time the stream data provider is called for a specific
|
|
stream, the actual length is saved, and subsequent calls are
|
|
required to return the same number of bytes. This means the
|
|
calling code no longer has to compute the length in advance,
|
|
which can be a significant simplification. If your code fails
|
|
to compile because of the extra argument and you don't want to
|
|
make other changes to your code, just omit the argument.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Many methods take <type>long long</type> instead of other
|
|
integer types. Most if not all existing code should compile
|
|
fine with this change since such parameters had always
|
|
previously been smaller types. This change was required to
|
|
support files larger than two gigabytes in size.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</appendix>
|
|
<appendix id="ref.upgrading-to-4.0">
|
|
<title>Upgrading to 4.0</title>
|
|
<para>
|
|
While version 4.0 includes a few non-compatible API changes, it is
|
|
very unlikely that anyone's code would have used any of those parts
|
|
of the API since they generally required information that would
|
|
only be available inside the library. In the unlikely event that
|
|
you should run into trouble, please see the ChangeLog. See also
|
|
<xref linkend="ref.release-notes"/> for a complete list of the
|
|
non-compatible API changes made in this version.
|
|
</para>
|
|
</appendix>
|
|
</book>
|