mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-05 08:02:11 +00:00
6124 lines
244 KiB
XML
6124 lines
244 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
||
<!DOCTYPE book [
|
||
<!ENTITY ldquo "“">
|
||
<!ENTITY rdquo "”">
|
||
<!ENTITY mdash "—">
|
||
<!ENTITY ndash "–">
|
||
<!ENTITY nbsp " ">
|
||
<!ENTITY swversion "8.2.1">
|
||
<!ENTITY lastreleased "August 18, 2018">
|
||
]>
|
||
<book>
|
||
<bookinfo>
|
||
<title>QPDF Manual</title>
|
||
<subtitle>For QPDF Version &swversion;, &lastreleased;</subtitle>
|
||
<author>
|
||
<firstname>Jay</firstname><surname>Berkenbilt</surname>
|
||
</author>
|
||
<copyright>
|
||
<year>2005–2018</year>
|
||
<holder>Jay Berkenbilt</holder>
|
||
</copyright>
|
||
</bookinfo>
|
||
<preface id="acknowledgments">
|
||
<title>General Information</title>
|
||
<para>
|
||
QPDF is a program that does structural, content-preserving
|
||
transformations on PDF files. QPDF's website is located at <ulink
|
||
url="http://qpdf.sourceforge.net/">http://qpdf.sourceforge.net/</ulink>.
|
||
QPDF's source code is hosted on github at <ulink
|
||
url="https://github.com/qpdf/qpdf">https://github.com/qpdf/qpdf</ulink>.
|
||
</para>
|
||
<para>
|
||
QPDF is licensed under <ulink
|
||
url="http://www.apache.org/licenses/LICENSE-2.0">the Apache
|
||
License, Version 2.0</ulink> (the "License"). Unless required by
|
||
applicable law or agreed to in writing, software distributed under
|
||
the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
|
||
OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||
License for the specific language governing permissions and
|
||
limitations under the License.
|
||
</para>
|
||
<para>
|
||
Versions of qpdf prior to version 7 were released under the terms
|
||
of <ulink url="https://opensource.org/licenses/Artistic-2.0">the
|
||
Artistic License, version 2.0</ulink>. At your option, you may
|
||
continue to consider qpdf to be licensed under those terms. The
|
||
Apache License 2.0 permits everything that the Artistic License 2.0
|
||
permits but is slightly less restrictive. Allowing the Artistic
|
||
License to continue being used is primary to help people who may
|
||
have to get specific approval to use qpdf in their products.
|
||
</para>
|
||
<para>
|
||
QPDF is intentionally released with a permissive license. However,
|
||
if there is some reason that the licensing terms don't work for
|
||
your requirements, please feel free to contact the copyright holder
|
||
to make other arrangements.
|
||
</para>
|
||
<para>
|
||
QPDF was originally created in 2001 and modified periodically
|
||
between 2001 and 2005 during my employment at <ulink
|
||
url="http://www.apexcovantage.com">Apex CoVantage</ulink>. Upon my
|
||
departure from Apex, the company graciously allowed me to take
|
||
ownership of the software and continue maintaining as an open
|
||
source project, a decision for which I am very grateful. I have
|
||
made considerable enhancements to it since that time. I feel
|
||
fortunate to have worked for people who would make such a decision.
|
||
This work would not have been possible without their support.
|
||
</para>
|
||
</preface>
|
||
<chapter id="ref.overview">
|
||
<title>What is QPDF?</title>
|
||
<para>
|
||
QPDF is a program that does structural, content-preserving
|
||
transformations on PDF files. It could have been called something
|
||
like <emphasis>pdf-to-pdf</emphasis>. It also provides many useful
|
||
capabilities to developers of PDF-producing software or for people
|
||
who just want to look at the innards of a PDF file to learn more
|
||
about how they work.
|
||
</para>
|
||
<para>
|
||
With QPDF, it is possible to copy objects from one PDF file into
|
||
another and to manipulate the list of pages in a PDF file. This
|
||
makes it possible to merge and split PDF files. The QPDF library
|
||
also makes it possible for you to create PDF files from scratch.
|
||
In this mode, you are responsible for supplying all the contents of
|
||
the file, while the QPDF library takes care off all the syntactical
|
||
representation of the objects, creation of cross references tables
|
||
and, if you use them, object streams, encryption, linearization,
|
||
and other syntactic details. You are still responsible for
|
||
generating PDF content on your own.
|
||
</para>
|
||
<para>
|
||
QPDF has been designed with very few external dependencies, and it
|
||
is intentionally very lightweight. QPDF is
|
||
<emphasis>not</emphasis> a PDF content creation library, a PDF
|
||
viewer, or a program capable of converting PDF into other formats.
|
||
In particular, QPDF knows nothing about the semantics of PDF
|
||
content streams. If you are looking for something that can do
|
||
that, you should look elsewhere. However, once you have a valid
|
||
PDF file, QPDF can be used to transform that file in ways perhaps
|
||
your original PDF creation can't handle. For example, many
|
||
programs generate simple PDF files but can't password-protect them,
|
||
web-optimize them, or perform other transformations of that type.
|
||
</para>
|
||
</chapter>
|
||
<chapter id="ref.installing">
|
||
<title>Building and Installing QPDF</title>
|
||
<para>
|
||
This chapter describes how to build and install qpdf. Please see
|
||
also the <filename>README.md</filename> and
|
||
<filename>INSTALL</filename> files in the source distribution.
|
||
</para>
|
||
<sect1 id="ref.prerequisites">
|
||
<title>System Requirements</title>
|
||
<para>
|
||
The qpdf package has few external dependencies. In order to build
|
||
qpdf, the following packages are required:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
zlib: <ulink url="http://www.zlib.net/">http://www.zlib.net/</ulink>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
jpeg: <ulink
|
||
url="http://www.ijg.org/files/">http://www.ijg.org/files/</ulink>
|
||
or <ulink
|
||
url="https://libjpeg-turbo.org/">https://libjpeg-turbo.org/</ulink>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
gnu make 3.81 or newer: <ulink url="http://www.gnu.org/software/make">http://www.gnu.org/software/make</ulink>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
perl version 5.8 or newer:
|
||
<ulink url="http://www.perl.org/">http://www.perl.org/</ulink>;
|
||
required for <command>fix-qdf</command> and the test suite.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
GNU diffutils (any version): <ulink
|
||
url="http://www.gnu.org/software/diffutils/">http://www.gnu.org/software/diffutils/</ulink>
|
||
is required to run the test suite. Note that this is the
|
||
version of diff present on virtually all GNU/Linux systems.
|
||
This is required because the test suite uses <command>diff
|
||
-u</command>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A C++ compiler that works well with STL and has the <type>long
|
||
long</type> type. Most modern C++ compilers should fit the bill
|
||
fine. QPDF is tested with gcc, clang, and Microsoft Visual C++.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Part of qpdf's test suite does comparisons of the contents PDF
|
||
files by converting them images and comparing the images. The
|
||
image comparison tests are disabled by default. Those tests are
|
||
not required for determining correctness of a qpdf build if you
|
||
have not modified the code since the test suite also contains
|
||
expected output files that are compared literally. The image
|
||
comparison tests provide an extra check to make sure that any
|
||
content transformations don't break the rendering of pages.
|
||
Transformations that affect the content streams themselves are off
|
||
by default and are only provided to help developers look into the
|
||
contents of PDF files. If you are making deep changes to the
|
||
library that cause changes in the contents of the files that qpdf
|
||
generates, then you should enable the image comparison tests.
|
||
Enable them by running <command>configure</command> with the
|
||
<option>--enable-test-compare-images</option> flag. If you enable
|
||
this, the following additional requirements are required by the
|
||
test suite. Note that in no case are these items required to use
|
||
qpdf.
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
libtiff: <ulink url="http://www.remotesensing.org/libtiff/">http://www.remotesensing.org/libtiff/</ulink>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
GhostScript version 8.60 or newer: <ulink
|
||
url="http://www.ghostscript.com">http://www.ghostscript.com</ulink>
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
If you do not enable this, then you do not need to have tiff and
|
||
ghostscript.
|
||
</para>
|
||
<para>
|
||
If Adobe Reader is installed as <command>acroread</command>, some
|
||
additional test cases will be enabled. These test cases simply
|
||
verify that Adobe Reader can open the files that qpdf creates.
|
||
They require version 8.0 or newer to pass. However, in order to
|
||
avoid having qpdf depend on non-free (as in liberty) software, the
|
||
test suite will still pass without Adobe reader, and the test
|
||
suite still exercises the full functionality of the software.
|
||
</para>
|
||
<para>
|
||
Pre-built documentation is distributed with qpdf, so you should
|
||
generally not need to rebuild the documentation. In order to
|
||
build the documentation from its docbook sources, you need the
|
||
docbook XML style sheets (<ulink
|
||
url="http://downloads.sourceforge.net/docbook/">http://downloads.sourceforge.net/docbook/</ulink>).
|
||
To build the PDF version of the documentation, you need Apache fop
|
||
(<ulink
|
||
url="http://xml.apache.org/fop/">http://xml.apache.org/fop/</ulink>)
|
||
version 0.94 or higher.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.building">
|
||
<title>Build Instructions</title>
|
||
<para>
|
||
Building qpdf on UNIX is generally just a matter of running
|
||
|
||
<programlisting>./configure
|
||
make
|
||
</programlisting>
|
||
You can also run <command>make check</command> to run the test
|
||
suite and <command>make install</command> to install. Please run
|
||
<command>./configure --help</command> for options on what can be
|
||
configured. You can also set the value of
|
||
<varname>DESTDIR</varname> during installation to install to a
|
||
temporary location, as is common with many open source packages.
|
||
Please see also the <filename>README.md</filename> and
|
||
<filename>INSTALL</filename> files in the source distribution.
|
||
</para>
|
||
<para>
|
||
Building on Windows is a little bit more complicated. For
|
||
details, please see <filename>README-windows.md</filename> in the
|
||
source distribution. You can also download a binary distribution
|
||
for Windows. There is a port of qpdf to Visual C++ version 6 in
|
||
the <filename>contrib</filename> area generously contributed by
|
||
Jian Ma. This is also discussed in more detail in
|
||
<filename>README-windows.md</filename>.
|
||
</para>
|
||
<para>
|
||
There are some other things you can do with the build. Although
|
||
qpdf uses <application>autoconf</application>, it does not use
|
||
<application>automake</application> but instead uses a
|
||
hand-crafted non-recursive Makefile that requires gnu make. If
|
||
you're really interested, please read the comments in the
|
||
top-level <filename>Makefile</filename>.
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
<chapter id="ref.using">
|
||
<title>Running QPDF</title>
|
||
<para>
|
||
This chapter describes how to run the qpdf program from the command
|
||
line.
|
||
</para>
|
||
<sect1 id="ref.invocation">
|
||
<title>Basic Invocation</title>
|
||
<para>
|
||
When running qpdf, the basic invocation is as follows:
|
||
|
||
<programlisting><command>qpdf</command><option> [ <replaceable>options</replaceable> ] <replaceable>infilename</replaceable> [ <replaceable>outfilename</replaceable> ]</option>
|
||
</programlisting>
|
||
This converts PDF file <option>infilename</option> to PDF file
|
||
<option>outfilename</option>. The output file is functionally
|
||
identical to the input file but may have been structurally
|
||
reorganized. Also, orphaned objects will be removed from the
|
||
file. Many transformations are available as controlled by the
|
||
options below. In place of <option>infilename</option>, the
|
||
parameter <option>--empty</option> may be specified. This causes
|
||
qpdf to use a dummy input file that contains zero pages. The only
|
||
normal use case for using <option>--empty</option> would be if you
|
||
were going to add pages from another source, as discussed in <xref
|
||
linkend="ref.page-selection"/>.
|
||
</para>
|
||
<para>
|
||
If <option>@filename</option> appears anywhere in the
|
||
command-line, it will be read line by line, and each line will be
|
||
treated as a command-line argument. The <option>@-</option> option
|
||
allows arguments to be read from standard input. This allows qpdf
|
||
to be invoked with an arbitrary number of arbitrarily long
|
||
arguments. It is also very useful for avoiding having to pass
|
||
passwords on the command line.
|
||
</para>
|
||
<para>
|
||
<option>outfilename</option> does not have to be seekable, even
|
||
when generating linearized files. Specifying
|
||
“<option>-</option>” as <option>outfilename</option>
|
||
means to write to standard output. However, you can't specify the
|
||
same file as both the input and the output because qpdf reads data
|
||
from the input file as it writes to the output file. QPDF attempts
|
||
to detect this case and fail without overwriting the output file.
|
||
</para>
|
||
<para>
|
||
Most options require an output file, but some testing or
|
||
inspection commands do not. These are specifically noted.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.shell-completion">
|
||
<title>Shell Completion</title>
|
||
<para>
|
||
Starting in qpdf version 8.3.0, qpdf provides its own completion
|
||
support for zsh and bash. You can enable bash completion with
|
||
<command>eval $(qpdf --completion-bash)</command> and zsh
|
||
completion with <command>eval $(qpdf --completion-zsh)</command>.
|
||
If <command>qpdf</command> is not in your path, you should invoke
|
||
it above with an absolute path. If you invoke it with a relative
|
||
path, it will warn you, and the completion won't work if you're in
|
||
a different directory.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.basic-options">
|
||
<title>Basic Options</title>
|
||
<para>
|
||
The following options are the most common ones and perform
|
||
commonly needed transformations.
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--help</option></term>
|
||
<listitem>
|
||
<para>
|
||
Display command-line invocation help.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--version</option></term>
|
||
<listitem>
|
||
<para>
|
||
Display the current version of qpdf.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--copyright</option></term>
|
||
<listitem>
|
||
<para>
|
||
Show detailed copyright information.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--completion-bash</option></term>
|
||
<listitem>
|
||
<para>
|
||
Output a completion command you can eval to enable shell
|
||
completion from bash.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--completion-zsh</option></term>
|
||
<listitem>
|
||
<para>
|
||
Output a completion command you can eval to enable shell
|
||
completion from zsh.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--password=password</option></term>
|
||
<listitem>
|
||
<para>
|
||
Specifies a password for accessing encrypted files. Note that
|
||
you can use <option>@filename</option> or <option>@-</option>
|
||
as described above to put the password in a file or pass it
|
||
via standard input so you can avoid specifying it on the
|
||
command line.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--verbose</option></term>
|
||
<listitem>
|
||
<para>
|
||
Increase verbosity of output. For now, this just prints some
|
||
indication of any file that it creates.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--progress</option></term>
|
||
<listitem>
|
||
<para>
|
||
Indicate progress while writing files.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--no-warn</option></term>
|
||
<listitem>
|
||
<para>
|
||
Suppress writing of warnings to stderr. If warnings were
|
||
detected and suppressed, <command>qpdf</command> will still
|
||
exit with exit code 3.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--linearize</option></term>
|
||
<listitem>
|
||
<para>
|
||
Causes generation of a linearized (web-optimized) output file.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--copy-encryption=file</option></term>
|
||
<listitem>
|
||
<para>
|
||
Encrypt the file using the same encryption parameters,
|
||
including user and owner password, as the specified file. Use
|
||
<option>--encrypt-file-password</option> to specify a password
|
||
if one is needed to open this file. Note that copying the
|
||
encryption parameters from a file also copies the first half
|
||
of <literal>/ID</literal> from the file since this is part of
|
||
the encryption parameters.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--encrypt-file-password=password</option></term>
|
||
<listitem>
|
||
<para>
|
||
If the file specified with <option>--copy-encryption</option>
|
||
requires a password, specify the password using this option.
|
||
Note that only one of the user or owner password is required.
|
||
Both passwords will be preserved since QPDF does not
|
||
distinguish between the two passwords. It is possible to
|
||
preserve encryption parameters, including the owner password,
|
||
from a file even if you don't know the file's owner password.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--encrypt options --</option></term>
|
||
<listitem>
|
||
<para>
|
||
Causes generation an encrypted output file. Please see <xref
|
||
linkend="ref.encryption-options"/> for details on how to
|
||
specify encryption parameters.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--decrypt</option></term>
|
||
<listitem>
|
||
<para>
|
||
Removes any encryption on the file. A password must be
|
||
supplied if the file is password protected.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--password-is-hex-key</option></term>
|
||
<listitem>
|
||
<para>
|
||
Overrides the usual computation/retrieval of the PDF file's
|
||
encryption key from user/owner password with an explicit
|
||
specification of the encryption key. When this option is
|
||
specified, the argument to the <option>--password</option>
|
||
option is interpreted as a hexadecimal-encoded key value. This
|
||
only applies to the password used to open the main input file.
|
||
It does not apply to other files opened by
|
||
<option>--pages</option> or other options or to files being
|
||
written.
|
||
</para>
|
||
<para>
|
||
Most users will never have a need for this option, and no
|
||
standard viewers support this mode of operation, but it can be
|
||
useful for forensic or investigatory purposes. For example, if
|
||
a PDF file is encrypted with an unknown password, a
|
||
brute-force attack using the key directly is sometimes more
|
||
efficient than one using the password. Also, if a file is
|
||
heavily damaged, it may be possible to derive the encryption
|
||
key and recover parts of the file using it directly. To expose
|
||
the encryption key used by an encrypted file that you can open
|
||
normally, use the <option>--show-encryption-key</option>
|
||
option.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--rotate=[+|-]angle[:page-range]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Apply rotation to specified pages. The
|
||
<option>page-range</option> portion of the option value has
|
||
the same format as page ranges in <xref
|
||
linkend="ref.page-selection"/>. If the page range is omitted,
|
||
the rotation is applied to all pages. The
|
||
<option>angle</option> portion of the parameter may be either
|
||
90, 180, or 270. If preceded by <option>+</option> or
|
||
<option>-</option>, the angle is added to or subtracted from
|
||
the specified pages' original rotations. Otherwise the pages'
|
||
rotations are set to the exact value. For example, the command
|
||
<command>qpdf in.pdf out.pdf --rotate=+90:2,4,6
|
||
--rotate=180:7-8</command> would rotate pages 2, 4, and 6 90
|
||
degrees clockwise from their original rotation and force the
|
||
rotation of pages 7 through 9 to 180 degrees regardless of
|
||
their original rotation, and the command <command>qpdf in.pdf
|
||
out.pdf --rotate=180</command> would rotate all pages by 180
|
||
degrees.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--keep-files-open=<replaceable>[yn]</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
This option controls whether qpdf keeps individual files open
|
||
while merging. Prior to version 8.1.0, qpdf always kept all
|
||
files open, but this meant that the number of files that could
|
||
be merged was limited by the operating system's open file
|
||
limit. Version 8.1.0 opened files as they were referenced and
|
||
closed them after each read, but this caused a major
|
||
performance impact. Version 8.2.0 optimized the performance
|
||
but did so in a way that, for local file systems, there was a
|
||
small but unavoidable performance hit, but for networked file
|
||
systems, the performance impact could be very high. Starting
|
||
with version 8.2.1, the default behavior is that files are
|
||
kept open if no more than 200 files are specified, but that
|
||
the behavior can be explicitly overridden with the
|
||
<option>--keep-files-open</option> flag. If you are merging
|
||
more than 200 files but less than the operating system's max
|
||
open files limit, you may want to use
|
||
<option>--keep-files-open=y</option>, especially if working
|
||
over a networked file system. If you are using a local file
|
||
system where the overhead is low and you might sometimes merge
|
||
more than the OS limit's number of files from a script and are
|
||
not worried about a few seconds additional processing time,
|
||
you may want to specify <option>--keep-files-open=n</option>.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--pages options --</option></term>
|
||
<listitem>
|
||
<para>
|
||
Select specific pages from one or more input files. See <xref
|
||
linkend="ref.page-selection"/> for details on how to do page
|
||
selection (splitting and merging).
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--collate</option></term>
|
||
<listitem>
|
||
<para>
|
||
When specified, collate rather than concatenate pages from
|
||
files specified with <option>--pages</option>. See <xref
|
||
linkend="ref.page-selection"/> for additional details.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--split-pages=[n]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Write each group of <option>n</option> pages to a separate
|
||
output file. If <option>n</option> is not specified, create
|
||
single pages. Output file names are generated as follows:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
If the string <literal>%d</literal> appears in the output
|
||
file name, it is replaced with a range of zero-padded page
|
||
numbers starting from 1.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Otherwise, if the output file name ends in
|
||
<filename>.pdf</filename> (case insensitive), a zero-padded
|
||
page range, preceded by a dash, is inserted before the file
|
||
extension.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Otherwise, the file name is appended with a zero-padded
|
||
page range preceded by a dash.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Page ranges are a single number in the case of single-page
|
||
groups or two numbers separated by a dash otherwise.
|
||
For example, if <filename>infile.pdf</filename> has 12 pages
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<command>qpdf --split-pages infile.pdf %d-out</command>
|
||
would generate files <filename>01-out</filename> through
|
||
<filename>12-out</filename>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<command>qpdf --split-pages=2 infile.pdf
|
||
outfile.pdf</command> would generate files
|
||
<filename>outfile-01-02.pdf</filename> through
|
||
<filename>outfile-11-12.pdf</filename>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<command>qpdf --split-pages infile.pdf
|
||
something.else</command> would generate files
|
||
<filename>something.else-01</filename> through
|
||
<filename>something.else-12</filename>
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Note that outlines, threads, and other global features of the
|
||
original PDF file are not preserved. For each page of output,
|
||
this option creates an empty PDF and copies a single page from
|
||
the output into it. If you require the global data, you will
|
||
have to run <command>qpdf</command> with the
|
||
<option>--pages</option> option once for each file. Using
|
||
<option>--split-pages</option> is much faster if you don't
|
||
require the global data.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</para>
|
||
<para>
|
||
Password-protected files may be opened by specifying a password.
|
||
By default, qpdf will preserve any encryption data associated with
|
||
a file. If <option>--decrypt</option> is specified, qpdf will
|
||
attempt to remove any encryption information. If
|
||
<option>--encrypt</option> is specified, qpdf will replace the
|
||
document's encryption parameters with whatever is specified.
|
||
</para>
|
||
<para>
|
||
Note that qpdf does not obey encryption restrictions already
|
||
imposed on the file. Doing so would be meaningless since qpdf can
|
||
be used to remove encryption from the file entirely. This
|
||
functionality is not intended to be used for bypassing copyright
|
||
restrictions or other restrictions placed on files by their
|
||
producers.
|
||
</para>
|
||
<para>
|
||
In all cases where qpdf allows specification of a password, care
|
||
must be taken if the password contains characters that fall
|
||
outside of the 7-bit US-ASCII character range to ensure that the
|
||
exact correct byte sequence is provided. It is possible that a
|
||
future version of qpdf may handle this more gracefully. For
|
||
example, if a password was encrypted using a password that was
|
||
encoded in ISO-8859-1 and your terminal is configured to use
|
||
UTF-8, the password you supply may not work properly. There are
|
||
various approaches to handling this. For example, if you are
|
||
using Linux and have the iconv executable installed, you could
|
||
pass <option>--password=`echo <replaceable>password</replaceable>
|
||
| iconv -t iso-8859-1`</option> to qpdf where
|
||
<replaceable>password</replaceable> is a password specified in
|
||
your terminal's locale. A detailed discussion of this is out of
|
||
scope for this manual, but just be aware of this issue if you have
|
||
trouble with a password that contains 8-bit characters.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.encryption-options">
|
||
<title>Encryption Options</title>
|
||
<para>
|
||
To change the encryption parameters of a file, use the --encrypt
|
||
flag. The syntax is
|
||
|
||
<programlisting><option>--encrypt <replaceable>user-password</replaceable> <replaceable>owner-password</replaceable> <replaceable>key-length</replaceable> [ <replaceable>restrictions</replaceable> ] --</option>
|
||
</programlisting>
|
||
Note that “<option>--</option>” terminates parsing of
|
||
encryption flags and must be present even if no restrictions are
|
||
present.
|
||
</para>
|
||
<para>
|
||
Either or both of the user password and the owner password may be
|
||
empty strings.
|
||
</para>
|
||
<para>
|
||
The value for
|
||
<option><replaceable>key-length</replaceable></option> may be 40,
|
||
128, or 256. The restriction flags are dependent upon key length.
|
||
When no additional restrictions are given, the default is to be
|
||
fully permissive.
|
||
</para>
|
||
<para>
|
||
If <option><replaceable>key-length</replaceable></option> is 40,
|
||
the following restriction options are available:
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--print=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Determines whether or not to allow printing.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--modify=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Determines whether or not to allow document modification.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--extract=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Determines whether or not to allow text/image extraction.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--annotate=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Determines whether or not to allow comments and form fill-in
|
||
and signing.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
If <option><replaceable>key-length</replaceable></option> is 128,
|
||
the following restriction options are available:
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--accessibility=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Determines whether or not to allow accessibility to visually
|
||
impaired.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--extract=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Determines whether or not to allow text/graphic extraction.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--print=<replaceable>print-opt</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Controls printing access.
|
||
<option><replaceable>print-opt</replaceable></option> may be
|
||
one of the following:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<option>full</option>: allow full printing
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>low</option>: allow low-resolution printing only
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>none</option>: disallow printing
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--modify=<replaceable>modify-opt</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Controls modify access.
|
||
<option><replaceable>modify-opt</replaceable></option> may be
|
||
one of the following, each of which implies all the options
|
||
that follow it:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<option>all</option>: allow full document modification
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>annotate</option>: allow comment authoring and form operations
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>form</option>: allow form field fill-in and signing
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>assembly</option>: allow document assembly only
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>none</option>: allow no modifications
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--cleartext-metadata</option></term>
|
||
<listitem>
|
||
<para>
|
||
If specified, any metadata stream in the document will be left
|
||
unencrypted even if the rest of the document is encrypted.
|
||
This also forces the PDF version to be at least 1.5.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--use-aes=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
If <option>--use-aes=y</option> is specified, AES encryption
|
||
will be used instead of RC4 encryption. This forces the PDF
|
||
version to be at least 1.6.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--force-V4</option></term>
|
||
<listitem>
|
||
<para>
|
||
Use of this option forces the <literal>/V</literal> and
|
||
<literal>/R</literal> parameters in the document's encryption
|
||
dictionary to be set to the value <literal>4</literal>. As
|
||
qpdf will automatically do this when required, there is no
|
||
reason to ever use this option. It exists primarily for use
|
||
in testing qpdf itself. This option also forces the PDF
|
||
version to be at least 1.5.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
If <option><replaceable>key-length</replaceable></option> is 256,
|
||
the minimum PDF version is 1.7 with extension level 8, and the
|
||
AES-based encryption format used is the PDF 2.0 encryption method
|
||
supported by Acrobat X. the same options are available as with
|
||
128 bits with the following exceptions:
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--use-aes</option></term>
|
||
<listitem>
|
||
<para>
|
||
This option is not available with 256-bit keys. AES is always
|
||
used with 256-bit encryption keys.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--force-V4</option></term>
|
||
<listitem>
|
||
<para>
|
||
This option is not available with 256 keys.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--force-R5</option></term>
|
||
<listitem>
|
||
<para>
|
||
If specified, qpdf sets the minimum version to 1.7 at
|
||
extension level 3 and writes the deprecated encryption format
|
||
used by Acrobat version IX. This option should not be used in
|
||
practice to generate PDF files that will be in general use,
|
||
but it can be useful to generate files if you are trying to
|
||
test proper support in another application for PDF files
|
||
encrypted in this way.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
The default for each permission option is to be fully permissive.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.page-selection">
|
||
<title>Page Selection Options</title>
|
||
<para>
|
||
Starting with qpdf 3.0, it is possible to split and merge PDF
|
||
files by selecting pages from one or more input files. Whatever
|
||
file is given as the primary input file is used as the starting
|
||
point, but its pages are replaced with pages as specified.
|
||
|
||
<programlisting><option>--pages <replaceable>input-file</replaceable> [ <replaceable>--password=password</replaceable> ] [ <replaceable>page-range</replaceable> ] [ ... ] --</option>
|
||
</programlisting>
|
||
Multiple input files may be specified. Each one is given as the
|
||
name of the input file, an optional password (if required to open
|
||
the file), and the range of pages. Note that
|
||
“<option>--</option>” terminates parsing of page
|
||
selection flags.
|
||
</para>
|
||
<para>
|
||
For each file that pages should be taken from, specify the file, a
|
||
password needed to open the file (if any), and a page range. The
|
||
password needs to be given only once per file. If any of the
|
||
input files are the same as the primary input file or the file
|
||
used to copy encryption parameters (if specified), you do not need
|
||
to repeat the password here. The same file can be repeated
|
||
multiple times. If a file that is repeated has a password, the
|
||
password only has to be given the first time. All non-page data
|
||
(info, outlines, page numbers, etc.) are taken from the primary
|
||
input file. To discard these, use <option>--empty</option> as the
|
||
primary input.
|
||
</para>
|
||
<para>
|
||
Starting with qpdf 5.0.0, it is possible to omit the page range.
|
||
If qpdf sees a value in the place where it expects a page range
|
||
and that value is not a valid range but is a valid file name, qpdf
|
||
will implicitly use the range <literal>1-z</literal>, meaning that
|
||
it will include all pages in the file. This makes it possible to
|
||
easily combine all pages in a set of files with a command like
|
||
<command>qpdf --empty out.pdf --pages *.pdf --</command>.
|
||
</para>
|
||
<para>
|
||
It is not presently possible to specify the same page from the
|
||
same file directly more than once, but you can make this work by
|
||
specifying two different paths to the same file (such as by
|
||
putting <filename>./</filename> somewhere in the path). This can
|
||
also be used if you want to repeat a page from one of the input
|
||
files in the output file. This may be made more convenient in a
|
||
future version of qpdf if there is enough demand for this feature.
|
||
</para>
|
||
<para>
|
||
The page range is a set of numbers separated by commas, ranges of
|
||
numbers separated dashes, or combinations of those. The character
|
||
“z” represents the last page. A number preceded by an
|
||
“r” indicates to count from the end, so
|
||
<literal>r3-r1</literal> would be the last three pages of the
|
||
document. Pages can appear in any order. Ranges can appear with a
|
||
high number followed by a low number, which causes the pages to
|
||
appear in reverse. Repeating a number will cause an error, but you
|
||
can use the workaround discussed above should you really want to
|
||
include the same page twice.
|
||
</para>
|
||
<para>
|
||
Example page ranges:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<literal>1,3,5-9,15-12</literal>: pages 1, 3, 5, 6, 7, 8,
|
||
9, 15, 14, 13, and 12 in that order.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>z-1</literal>: all pages in the document in reverse
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>r3-r1</literal>: the last three pages of the document
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>r1-r3</literal>: the last three pages of the document
|
||
in reverse order
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Starting in qpdf version 8.3, you can specify the
|
||
<option>--collate</option> option. Note that this option is
|
||
specified outside of <option>--pages ... --</option>.
|
||
When <option>--collate</option> is specified, it changes the
|
||
meaning of <option>--pages</option> so that the specified files,
|
||
as modified by page ranges, are collated rather than concatenated.
|
||
For example, if you add the files <filename>odd.pdf</filename> and
|
||
<filename>even.pdf</filename> containing odd and even pages of a
|
||
document respectively, you could run <command>qpdf --collate
|
||
odd.pdf --pages odd.pdf even.pdf -- all.pdf</command> to collate
|
||
the pages. This would pick page 1 from odd, page 1 from even, page
|
||
2 from odd, page 2 from even, etc. until all pages have been
|
||
included. Any number of files and page ranges can be specified. If
|
||
any file has fewer pages, that file is just skipped when its pages
|
||
have all been included. For example, if you ran <command>qpdf
|
||
--collate --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf r1 --
|
||
out.pdf</command>, you would get the following pages in this
|
||
order:
|
||
<itemizedlist>
|
||
<listitem><para>a.pdf page 1</para></listitem>
|
||
<listitem><para>b.pdf page 6</para></listitem>
|
||
<listitem><para>c.pdf last page</para></listitem>
|
||
<listitem><para>a.pdf page 2</para></listitem>
|
||
<listitem><para>b.pdf page 5</para></listitem>
|
||
<listitem><para>a.pdf page 3</para></listitem>
|
||
<listitem><para>b.pdf page 4</para></listitem>
|
||
<listitem><para>a.pdf page 4</para></listitem>
|
||
<listitem><para>a.pdf page 5</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Starting in qpdf version 8.3, when you split and merge files, any
|
||
page labels (page numbers) are preserved in the final file. It is
|
||
expected that more document features will be preserved by
|
||
splitting and merging. In the mean time, semantics of splitting
|
||
and merging vary across features. For example, the document's
|
||
outlines (bookmarks) point to actual page objects, so if you
|
||
select some pages and not others, bookmarks that point to pages
|
||
that are in the output file will work, and remaining bookmarks
|
||
will not work. A future version of <command>qpdf</command> may do
|
||
a better job at handling these issues. (Note that the qpdf library
|
||
already contains all of the APIs required in order to implement
|
||
this in your own application if you need it.) In the mean time,
|
||
you can always use <option>--empty</option> as the primary input
|
||
file to avoid copying all of that from the first file. For
|
||
example, to take pages 1 through 5 from a
|
||
<filename>infile.pdf</filename> while preserving all metadata
|
||
associated with that file, you could use
|
||
|
||
<programlisting><command>qpdf</command> <option>infile.pdf --pages infile.pdf 1-5 -- outfile.pdf</option>
|
||
</programlisting>
|
||
If you wanted pages 1 through 5 from
|
||
<filename>infile.pdf</filename> but you wanted the rest of the
|
||
metadata to be dropped, you could instead run
|
||
|
||
<programlisting><command>qpdf</command> <option>--empty --pages infile.pdf 1-5 -- outfile.pdf</option>
|
||
</programlisting>
|
||
If you wanted to take pages 1–5 from
|
||
<filename>file1.pdf</filename> and pages 11–15 from
|
||
<filename>file2.pdf</filename> in reverse, you would run
|
||
|
||
<programlisting><command>qpdf</command> <option>file1.pdf --pages file1.pdf 1-5 file2.pdf 15-11 -- outfile.pdf</option>
|
||
</programlisting>
|
||
If, for some reason, you wanted to take the first page of an
|
||
encrypted file called <filename>encrypted.pdf</filename> with
|
||
password <literal>pass</literal> and repeat it twice in an output
|
||
file, and if you wanted to drop document-level metadata but
|
||
preserve encryption, you would use
|
||
|
||
<programlisting><command>qpdf</command> <option>--empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
|
||
--pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
|
||
outfile.pdf</option>
|
||
</programlisting>
|
||
Note that we had to specify the password all three times because
|
||
giving a password as <option>--encryption-file-password</option>
|
||
doesn't count for page selection, and as far as qpdf is concerned,
|
||
<filename>encrypted.pdf</filename> and
|
||
<filename>./encrypted.pdf</filename> are separated files. These
|
||
are all corner cases that most users should hopefully never have
|
||
to be bothered with.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.advanced-parsing">
|
||
<title>Advanced Parsing Options</title>
|
||
<para>
|
||
These options control aspects of how qpdf reads PDF files. Mostly
|
||
these are of use to people who are working with damaged files.
|
||
There is little reason to use these options unless you are trying
|
||
to solve specific problems. The following options are available:
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--suppress-recovery</option></term>
|
||
<listitem>
|
||
<para>
|
||
Prevents qpdf from attempting to recover damaged files.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--ignore-xref-streams</option></term>
|
||
<listitem>
|
||
<para>
|
||
Tells qpdf to ignore any cross-reference streams.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</para>
|
||
<para>
|
||
Ordinarily, qpdf will attempt to recover from certain types of
|
||
errors in PDF files. These include errors in the cross-reference
|
||
table, certain types of object numbering errors, and certain types
|
||
of stream length errors. Sometimes, qpdf may think it has
|
||
recovered but may not have actually recovered, so care should be
|
||
taken when using this option as some data loss is possible. The
|
||
<option>--suppress-recovery</option> option will prevent qpdf from
|
||
attempting recovery. In this case, it will fail on the first
|
||
error that it encounters.
|
||
</para>
|
||
<para>
|
||
Ordinarily, qpdf reads cross-reference streams when they are
|
||
present in a PDF file. If <option>--ignore-xref-streams</option>
|
||
is specified, qpdf will ignore any cross-reference streams for
|
||
hybrid PDF files. The purpose of hybrid files is to make some
|
||
content available to viewers that are not aware of cross-reference
|
||
streams. It is almost never desirable to ignore them. The only
|
||
time when you might want to use this feature is if you are testing
|
||
creation of hybrid PDF files and wish to see how a PDF consumer
|
||
that doesn't understand object and cross-reference streams would
|
||
interpret such a file.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.advanced-transformation">
|
||
<title>Advanced Transformation Options</title>
|
||
<para>
|
||
These transformation options control fine points of how qpdf
|
||
creates the output file. Mostly these are of use only to people
|
||
who are very familiar with the PDF file format or who are PDF
|
||
developers. The following options are available:
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--compress-streams=<replaceable>[yn]</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
By default, or with <option>--compress-streams=y</option>,
|
||
qpdf will compress any stream with no other filters applied to
|
||
it with the <literal>/FlateDecode</literal> filter when it
|
||
writes it. To suppress this behavior and preserve uncompressed
|
||
streams as uncompressed, use
|
||
<option>--compress-streams=n</option>.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--decode-level=<replaceable>option</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Controls which streams qpdf tries to decode. The default is
|
||
<option>generalized</option>. The following options are
|
||
available:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<option>none</option>: do not attempt to decode any streams
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>generalized</option>: decode streams filtered with
|
||
supported generalized filters: <option>/LZWDecode</option>,
|
||
<option>/FlateDecode</option>,
|
||
<option>/ASCII85Decode</option>, and
|
||
<option>/ASCIIHexDecode</option>. We define generalized
|
||
filters as those to be used for general-purpose compression
|
||
or encoding, as opposed to filters specifically designed
|
||
for image data.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>specialized</option>: in addition to generalized,
|
||
decode streams with supported non-lossy specialized
|
||
filters; currently this is just <option>/RunLengthDecode</option>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>all</option>: in addition to generalized and
|
||
specialized, decode streams with supported lossy filters;
|
||
currently this is just <option>/DCTDecode</option> (JPEG)
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--stream-data=<replaceable>option</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Controls transformation of stream data. This option predates
|
||
the <option>--compress-streams</option> and
|
||
<option>--decode-level</option> options. Those options can be
|
||
used to achieve the same affect with more control. The value
|
||
of <option><replaceable>option</replaceable></option> may be
|
||
one of the following:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<option>compress</option>: recompress stream data when
|
||
possible (default); equivalent to
|
||
<option>--compress-streams=y</option>
|
||
<option>--decode-level=generalized</option>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>preserve</option>: leave all stream data as is;
|
||
equivalent to <option>--compress-streams=n</option>
|
||
<option>--decode-level=none</option>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>uncompress</option>: uncompress stream data
|
||
compressed with generalized filters when possible;
|
||
equivalent to <option>--compress-streams=n</option>
|
||
<option>--decode-level=generalized</option>
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--normalize-content=[yn]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Enables or disables normalization of content streams. Content
|
||
normalization is enabled by default in QDF mode. Please see
|
||
<xref linkend="ref.qdf"/> for additional discussion of QDF
|
||
mode.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--object-streams=<replaceable>mode</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Controls handling of object streams. The value of
|
||
<option><replaceable>mode</replaceable></option> may be one of
|
||
the following:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<option>preserve</option>: preserve original object streams
|
||
(default)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>disable</option>: don't write any object streams
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>generate</option>: use object streams wherever
|
||
possible
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--preserve-unreferenced</option></term>
|
||
<listitem>
|
||
<para>
|
||
Tells qpdf to preserve objects that are not referenced when
|
||
writing the file. Ordinarily any object that is not referenced
|
||
in a traversal of the document from the trailer dictionary
|
||
will be discarded. This may be useful in working with some
|
||
damaged files or inspecting files with known unreferenced
|
||
objects.
|
||
</para>
|
||
<para>
|
||
This flag is ignored for linearized files and has the effect
|
||
of causing objects in the new file to be written in order by
|
||
object ID from the original file. This does not mean that
|
||
object numbers will be the same since qpdf may create stream
|
||
lengths as direct or indirect differently from the original
|
||
file, and the original file may have gaps in its numbering.
|
||
</para>
|
||
<para>
|
||
See also <option>--preserve-unreferenced-resources</option>,
|
||
which does something completely different.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--preserve-unreferenced-resources</option></term>
|
||
<listitem>
|
||
<para>
|
||
Starting with qpdf 8.1, when splitting pages, qpdf ordinarily
|
||
attempts to remove images and fonts that are not used by a
|
||
page even if they are referenced in the page's resources
|
||
dictionary. This option suppresses that behavior. The only
|
||
reason to use this is if you suspect that qpdf is removing
|
||
resources it shouldn't be removing. If you encounter that
|
||
case, please report it as a bug.
|
||
</para>
|
||
<para>
|
||
See also <option>--preserve-unreferenced-resources</option>,
|
||
which does something completely different.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--newline-before-endstream</option></term>
|
||
<listitem>
|
||
<para>
|
||
Tells qpdf to insert a newline before the
|
||
<literal>endstream</literal> keyword, not counted in the
|
||
length, after any stream content even if the last character of
|
||
the stream was a newline. This may result in two newlines in
|
||
some cases. This is a requirement of PDF/A. While qpdf doesn't
|
||
specifically know how to generate PDF/A-compliant PDFs, this
|
||
at least prevents it from removing compliance on already
|
||
compliant files.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--linearize-pass1=<replaceable>file</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Write the first pass of linearization to the named file. The
|
||
resulting file is not a valid PDF file. This option is useful
|
||
only for debugging <classname>QPDFWriter</classname>'s
|
||
linearization code. When qpdf linearizes files, it writes the
|
||
file in two passes, using the first pass to calculate sizes
|
||
and offsets that are required for hint tables and the
|
||
linearization dictionary. Ordinarily, the first pass is
|
||
discarded. This option enables it to be captured.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--coalesce-contents</option></term>
|
||
<listitem>
|
||
<para>
|
||
When a page's contents are split across multiple streams, this
|
||
option causes qpdf to combine them into a single stream. Use
|
||
of this option is never necessary for ordinary usage, but it
|
||
can help when working with some files in some cases. For
|
||
example, some PDF writers split page contents into small
|
||
streams at arbitrary points that may fall in the middle of
|
||
lexical tokens within the content, and some PDF readers may
|
||
get confused on such files. If you use qpdf to coalesce the
|
||
content streams, such readers may be able to work with the
|
||
file more easily. This can also be combined with QDF mode or
|
||
content normalization to make it easier to look at all of a
|
||
page's contents at once.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--flatten-annotations=<replaceable>option</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
This option collapses annotations into the pages' contents
|
||
with special handling for form fields. Ordinarily, an
|
||
annotation is rendered separately and on top of the page.
|
||
Combining annotations into the page's contents effectively
|
||
freezes the placement of the annotations, making them look
|
||
right after various page transformations. The library
|
||
functionality backing this option was added for the benefit of
|
||
programs that want to create <emphasis>n-up</emphasis> page
|
||
layouts and other similar things that don't work well with
|
||
annotations. The <replaceable>option</replaceable> parameter
|
||
may be any of the following:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<option>all</option>: include all annotations that are not
|
||
marked invisible or hidden
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>print</option>: only include annotations that
|
||
indicate that they should appear when the page is printed
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>screen</option>: omit annotations that indicate
|
||
they should not appear on the screen
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Note that form fields are special because the annotations that
|
||
are used to render filled-in form fields may become out of
|
||
date from the fields' values if the form is filled in by a
|
||
program that doesn't know how to update the appearances. If
|
||
qpdf detects this case, its default behavior is not to flatten
|
||
those annotations because doing so would cause the value of
|
||
the form field to be lost. This gives you a chance to go back
|
||
and resave the form with a program that knows how to generate
|
||
appearances. QPDF itself can generate appearances with some
|
||
limitations. See the <option>--generate-appearances</option>
|
||
option below.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--generate-appearances</option></term>
|
||
<listitem>
|
||
<para>
|
||
If a file contains interactive form fields and indicates that
|
||
the appearances are out of date with the values of the form,
|
||
this flag will regenerate appearances, subject to a few
|
||
limitations. Note that there is not usually a reason to do
|
||
this, but it can be necessary before using the
|
||
<option>--flatten-annotations</option> option. Most of these
|
||
are not a problem with well-behaved PDF files. The limitations
|
||
are as follows:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Radio button and checkbox appearances use the pre-set
|
||
values in the PDF file. QPDF just makes sure that the
|
||
correct appearance is displayed based on the value of the
|
||
field. This is fine for PDF files that create their forms
|
||
properly. Some PDF writers save appearances for fields when
|
||
they change, which could cause some controls to have
|
||
inconsistent appearances.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
For text fields and list boxes, any characters that fall
|
||
outside of US-ASCII will be replaced by the
|
||
<literal>?</literal> character.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Quadding is ignored. Quadding is used to specify whether
|
||
the contents of a field should be left, center, or right
|
||
aligned with the field.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Rich text, multi-line, and other more elaborate formatting
|
||
directives are ignored.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
There is no support for multi-select fields or signature
|
||
fields.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
If qpdf doesn't do a good enough job with your form, use an
|
||
external application to save your filled-in form before
|
||
processing it with qpdf.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--optimize-images</option></term>
|
||
<listitem>
|
||
<para>
|
||
This flag causes qpdf to recompress all images that are not
|
||
compressed with DCT (JPEG) using DCT compression as long as
|
||
doing so decreases the size in bytes of the image data and the
|
||
image does not fall below minimum specified dimensions. Useful
|
||
information is provided when used in combination with
|
||
<option>--verbose</option>. See also the
|
||
<option>--oi-min-width</option>,
|
||
<option>--oi-min-height</option>, and
|
||
<option>--oi-min-area</option> options.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--oi-min-width=<replaceable>width</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Avoid optimizing images whose width is below the specified
|
||
amount. If omitted, the default is 128 pixels. Use 0 for no
|
||
minimum.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--oi-min-height=<replaceable>height</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Avoid optimizing images whose height is below the specified
|
||
amount. If omitted, the default is 128 pixels. Use 0 for no
|
||
minimum.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--oi-min-area=<replaceable>area-in-pixels</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Avoid optimizing images whose pixel count
|
||
(width × height) is below the specified amount. If
|
||
omitted, the default is 16,384 pixels. Use 0 for no minimum.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--qdf</option></term>
|
||
<listitem>
|
||
<para>
|
||
Turns on QDF mode. For additional information on QDF, please
|
||
see <xref linkend="ref.qdf"/>.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--min-version=<replaceable>version</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
Forces the PDF version of the output file to be at least
|
||
<replaceable>version</replaceable>. In other words, if the
|
||
input file has a lower version than the specified version, the
|
||
specified version will be used. If the input file has a
|
||
higher version, the input file's original version will be
|
||
used. It is seldom necessary to use this option since qpdf
|
||
will automatically increase the version as needed when adding
|
||
features that require newer PDF readers.
|
||
</para>
|
||
<para>
|
||
The version number may be expressed in the form
|
||
<replaceable>major.minor.extension-level</replaceable>, in
|
||
which case the version is interpreted as
|
||
<replaceable>major.minor</replaceable> at extension level
|
||
<replaceable>extension-level</replaceable>. For example,
|
||
version <literal>1.7.8</literal> represents version 1.7 at
|
||
extension level 8. Note that minimal syntax checking is done
|
||
on the command line.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--force-version=<replaceable>version</replaceable></option></term>
|
||
<listitem>
|
||
<para>
|
||
This option forces the PDF version to be the exact version
|
||
specified <emphasis>even when the file may have content that
|
||
is not supported in that version</emphasis>. The version
|
||
number is interpreted in the same way as with
|
||
<option>--min-version</option> so that extension levels can be
|
||
set. In some cases, forcing the output file's PDF version to
|
||
be lower than that of the input file will cause qpdf to
|
||
disable certain features of the document. Specifically,
|
||
256-bit keys are disabled if the version is less than 1.7 with
|
||
extension level 8 (except R5 is disabled if less than 1.7 with
|
||
extension level 3), AES encryption is disabled if the version
|
||
is less than 1.6, cleartext metadata and object streams are
|
||
disabled if less than 1.5, 128-bit encryption keys are
|
||
disabled if less than 1.4, and all encryption is disabled if
|
||
less than 1.3. Even with these precautions, qpdf won't be
|
||
able to do things like eliminate use of newer image
|
||
compression schemes, transparency groups, or other features
|
||
that may have been added in more recent versions of PDF.
|
||
</para>
|
||
<para>
|
||
As a general rule, with the exception of big structural things
|
||
like the use of object streams or AES encryption, PDF viewers
|
||
are supposed to ignore features in files that they don't
|
||
support from newer versions. This means that forcing the
|
||
version to a lower version may make it possible to open your
|
||
PDF file with an older version, though bear in mind that some
|
||
of the original document's functionality may be lost.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</para>
|
||
<para>
|
||
By default, when a stream is encoded using non-lossy filters that
|
||
qpdf understands and is not already compressed using a good
|
||
compression scheme, qpdf will uncompress and recompress streams.
|
||
Assuming proper filter implements, this is safe and generally
|
||
results in smaller files. This behavior may also be explicitly
|
||
requested with <option>--stream-data=compress</option>.
|
||
</para>
|
||
<para>
|
||
When <option>--normalize-content=y</option> is specified, qpdf
|
||
will attempt to normalize whitespace and newlines in page content
|
||
streams. This is generally safe but could, in some cases, cause
|
||
damage to the content streams. This option is intended for people
|
||
who wish to study PDF content streams or to debug PDF content.
|
||
You should not use this for “production” PDF files.
|
||
</para>
|
||
<para>
|
||
This paragraph discusses edge cases of content normalization that
|
||
are not of concern to most users and are not relevant when content
|
||
normalization is not enabled. When normalizing content, if qpdf
|
||
runs into any lexical errors, it will print a warning indicating
|
||
that content may be damaged. The only situation in which qpdf is
|
||
known to cause damage during content normalization is when a
|
||
page's contents are split across multiple streams and streams are
|
||
split in the middle of a lexical token such as a string, name, or
|
||
inline image. There may be some pathological cases in which qpdf
|
||
could damage content without noticing this, such as if the partial
|
||
tokens at the end of one stream and the beginning of the next
|
||
stream are both valid, but usually qpdf will be able to detect
|
||
this case. For slightly increased safety, you can specify
|
||
<option>--coalesce-contents</option> in addition to
|
||
<option>--normalize-content</option> or <option>--qdf</option>.
|
||
This will cause qpdf to combine all the content streams into one,
|
||
thus recombining any split tokens. However doing this will prevent
|
||
you from being able to see the original layout of the content
|
||
streams. If you must inspect the original content streams in an
|
||
uncompressed format, you can always run with <option>--qdf
|
||
--normalize-content=n</option> for a QDF file without content
|
||
normalization, or alternatively
|
||
<option>--stream-data=uncompress</option> for a regular non-QDF
|
||
mode file with uncompressed streams. These will both uncompress
|
||
all the streams but will not attempt to normalize content. Please
|
||
note that if you are using content normalization or QDF mode for
|
||
the purpose of manually inspecting files, you don't have to care
|
||
about this.
|
||
</para>
|
||
<para>
|
||
Object streams, also known as compressed objects, were introduced
|
||
into the PDF specification at version 1.5, corresponding to
|
||
Acrobat 6. Some older PDF viewers may not support files with
|
||
object streams. qpdf can be used to transform files with object
|
||
streams to files without object streams or vice versa. As
|
||
mentioned above, there are three object stream modes:
|
||
<option>preserve</option>, <option>disable</option>, and
|
||
<option>generate</option>.
|
||
</para>
|
||
<para>
|
||
In <option>preserve</option> mode, the relationship to objects and
|
||
the streams that contain them is preserved from the original file.
|
||
In <option>disable</option> mode, all objects are written as
|
||
regular, uncompressed objects. The resulting file should be
|
||
readable by older PDF viewers. (Of course, the content of the
|
||
files may include features not supported by older viewers, but at
|
||
least the structure will be supported.) In
|
||
<option>generate</option> mode, qpdf will create its own object
|
||
streams. This will usually result in more compact PDF files,
|
||
though they may not be readable by older viewers. In this mode,
|
||
qpdf will also make sure the PDF version number in the header is
|
||
at least 1.5.
|
||
</para>
|
||
<para>
|
||
The <option>--qdf</option> flag turns on QDF mode, which changes
|
||
some of the defaults described above. Specifically, in QDF mode,
|
||
by default, stream data is uncompressed, content streams are
|
||
normalized, and encryption is removed. These defaults can still
|
||
be overridden by specifying the appropriate options as described
|
||
above. Additionally, in QDF mode, stream lengths are stored as
|
||
indirect objects, objects are laid out in a less efficient but
|
||
more readable fashion, and the documents are interspersed with
|
||
comments that make it easier for the user to find things and also
|
||
make it possible for <command>fix-qdf</command> to work properly.
|
||
QDF mode is intended for people, mostly developers, who wish to
|
||
inspect or modify PDF files in a text editor. For details, please
|
||
see <xref linkend="ref.qdf"/>.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.testing-options">
|
||
<title>Testing, Inspection, and Debugging Options</title>
|
||
<para>
|
||
These options can be useful for digging into PDF files or for use
|
||
in automated test suites for software that uses the qpdf library.
|
||
When any of the options in this section are specified, no output
|
||
file should be given. The following options are available:
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><option>--deterministic-id</option></term>
|
||
<listitem>
|
||
<para>
|
||
Causes generation of a deterministic value for /ID. This
|
||
prevents use of timestamp and output file name information in
|
||
the /ID generation. Instead, at some slight additional runtime
|
||
cost, the /ID field is generated to include a digest of the
|
||
significant parts of the content of the output PDF file. This
|
||
means that a given qpdf operation should generate the same /ID
|
||
each time it is run, which can be useful when caching results
|
||
or for generation of some test data. Use of this flag is not
|
||
compatible with creation of encrypted files.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--static-id</option></term>
|
||
<listitem>
|
||
<para>
|
||
Causes generation of a fixed value for /ID. This is intended
|
||
for testing only. Never use it for production files. If you
|
||
are trying to get the same /ID each time for a given file and
|
||
you are not generating encrypted files, consider using the
|
||
<option>--deterministic-id</option> option.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--static-aes-iv</option></term>
|
||
<listitem>
|
||
<para>
|
||
Causes use of a static initialization vector for AES-CBC.
|
||
This is intended for testing only so that output files can be
|
||
reproducible. Never use it for production files. This option
|
||
in particular is not secure since it significantly weakens the
|
||
encryption.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--no-original-object-ids</option></term>
|
||
<listitem>
|
||
<para>
|
||
Suppresses inclusion of original object ID comments in QDF
|
||
files. This can be useful when generating QDF files for test
|
||
purposes, particularly when comparing them to determine
|
||
whether two PDF files have identical content.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-encryption</option></term>
|
||
<listitem>
|
||
<para>
|
||
Shows document encryption parameters. Also shows the
|
||
document's user password if the owner password is given.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-encryption-key</option></term>
|
||
<listitem>
|
||
<para>
|
||
When encryption information is being displayed, as when
|
||
<option>--check</option> or <option>--show-encryption</option>
|
||
is given, display the computed or retrieved encryption key as
|
||
a hexadecimal string. This value is not ordinarily useful to
|
||
users, but it can be used as the argument to
|
||
<option>--password</option> if the
|
||
<option>--password-is-hex-key</option> is specified. Note
|
||
that, when PDF files are encrypted, passwords and other
|
||
metadata are used only to compute an encryption key, and the
|
||
encryption key is what is actually used for encryption. This
|
||
enables retrieval of that key.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--check-linearization</option></term>
|
||
<listitem>
|
||
<para>
|
||
Checks file integrity and linearization status.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-linearization</option></term>
|
||
<listitem>
|
||
<para>
|
||
Checks and displays all data in the linearization hint tables.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-xref</option></term>
|
||
<listitem>
|
||
<para>
|
||
Shows the contents of the cross-reference table in a
|
||
human-readable form. This is especially useful for files with
|
||
cross-reference streams which are stored in a binary format.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-object=trailer|obj[,gen]</option></term>
|
||
<listitem>
|
||
<para>
|
||
Show the contents of the given object. This is especially
|
||
useful for inspecting objects that are inside of object
|
||
streams (also known as “compressed objects”).
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--raw-stream-data</option></term>
|
||
<listitem>
|
||
<para>
|
||
When used along with the <option>--show-object</option>
|
||
option, if the object is a stream, shows the raw stream data
|
||
instead of object's contents.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--filtered-stream-data</option></term>
|
||
<listitem>
|
||
<para>
|
||
When used along with the <option>--show-object</option>
|
||
option, if the object is a stream, shows the filtered stream
|
||
data instead of object's contents. If the stream is filtered
|
||
using filters that qpdf does not support, an error will be
|
||
issued.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-npages</option></term>
|
||
<listitem>
|
||
<para>
|
||
Prints the number of pages in the input file on a line by
|
||
itself. Since the number of pages appears by itself on a
|
||
line, this option can be useful for scripting if you need to
|
||
know the number of pages in a file.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--show-pages</option></term>
|
||
<listitem>
|
||
<para>
|
||
Shows the object and generation number for each page
|
||
dictionary object and for each content stream associated with
|
||
the page. Having this information makes it more convenient to
|
||
inspect objects from a particular page.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--with-images</option></term>
|
||
<listitem>
|
||
<para>
|
||
When used along with <option>--show-pages</option>, also shows
|
||
the object and generation numbers for the image objects on
|
||
each page. (At present, information about images in shared
|
||
resource dictionaries are not output by this command. This is
|
||
discussed in a comment in the source code.)
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--json</option></term>
|
||
<listitem>
|
||
<para>
|
||
Generate a json representation of the file. This is described
|
||
in depth in <xref linkend="ref.json"/>
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--json-help</option></term>
|
||
<listitem>
|
||
<para>
|
||
Describe the format of the json output.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--json-key=key</option></term>
|
||
<listitem>
|
||
<para>
|
||
This option is repeatable. If specified, only top-level keys
|
||
specified will be included in the json output. If not
|
||
specified, all keys will be shown.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--json-object=trailer|obj[,gen]</option></term>
|
||
<listitem>
|
||
<para>
|
||
This option is repeatable. If specified, only specified
|
||
objects will be shown in the
|
||
“<literal>objects</literal>” key of the json
|
||
output. If absent, all objects will be shown.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><option>--check</option></term>
|
||
<listitem>
|
||
<para>
|
||
Checks file structure and well as encryption, linearization,
|
||
and encoding of stream data. A file for which
|
||
<option>--check</option> reports no errors may still have
|
||
errors in stream data content but should otherwise be
|
||
structurally sound. If <option>--check</option> any errors,
|
||
qpdf will exit with a status of 2. There are some recoverable
|
||
conditions that <option>--check</option> detects. These are
|
||
issued as warnings instead of errors. If qpdf finds no errors
|
||
but finds warnings, it will exit with a status of 3 (as of
|
||
version 2.0.4). When <option>--check</option> is combined
|
||
with other options, checks are always performed before any
|
||
other options are processed. For erroneous files,
|
||
<option>--check</option> will cause qpdf to attempt to
|
||
recover, after which other options are effectively operating
|
||
on the recovered file. Combining <option>--check</option> with
|
||
other options in this way can be useful for manually
|
||
recovering severely damaged files.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</para>
|
||
<para>
|
||
The <option>--raw-stream-data</option> and
|
||
<option>--filtered-stream-data</option> options are ignored unless
|
||
<option>--show-object</option> is given. Either of these options
|
||
will cause the stream data to be written to standard output. In
|
||
order to avoid commingling of stream data with other output, it is
|
||
recommend that these objects not be combined with other
|
||
test/inspection options.
|
||
</para>
|
||
<para>
|
||
If <option>--filtered-stream-data</option> is given and
|
||
<option>--normalize-content=y</option> is also given, qpdf will
|
||
attempt to normalize the stream data as if it is a page content
|
||
stream. This attempt will be made even if it is not a page
|
||
content stream, in which case it will produce unusable results.
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
<chapter id="ref.qdf">
|
||
<title>QDF Mode</title>
|
||
<para>
|
||
In QDF mode, qpdf creates PDF files in what we call <firstterm>QDF
|
||
form</firstterm>. A PDF file in QDF form, sometimes called a QDF
|
||
file, is a completely valid PDF file that has
|
||
<literal>%QDF-1.0</literal> as its third line (after the pdf header
|
||
and binary characters) and has certain other characteristics. The
|
||
purpose of QDF form is to make it possible to edit PDF files, with
|
||
some restrictions, in an ordinary text editor. This can be very
|
||
useful for experimenting with different PDF constructs or for
|
||
making one-off edits to PDF files (though there are other reasons
|
||
why this may not always work).
|
||
</para>
|
||
<para>
|
||
It is ordinarily very difficult to edit PDF files in a text editor
|
||
for two reasons: most meaningful data in PDF files is compressed,
|
||
and PDF files are full of offset and length information that makes
|
||
it hard to add or remove data. A QDF file is organized in a manner
|
||
such that, if edits are kept within certain constraints, the
|
||
<command>fix-qdf</command> program, distributed with qpdf, is able
|
||
to restore edited files to a correct state. The
|
||
<command>fix-qdf</command> program takes no command-line
|
||
arguments. It reads a possibly edited QDF file from standard input
|
||
and writes a repaired file to standard output.
|
||
</para>
|
||
<para>
|
||
The following attributes characterize a QDF file:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
All objects appear in numerical order in the PDF file, including
|
||
when objects appear in object streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Objects are printed in an easy-to-read format, and all line
|
||
endings are normalized to UNIX line endings.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Unless specifically overridden, streams appear uncompressed
|
||
(when qpdf supports the filters and they are compressed with a
|
||
non-lossy compression scheme), and most content streams are
|
||
normalized (line endings are converted to just a UNIX-style
|
||
linefeeds).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
All streams lengths are represented as indirect objects, and the
|
||
stream length object is always the next object after the stream.
|
||
If the stream data does not end with a newline, an extra newline
|
||
is inserted, and a special comment appears after the stream
|
||
indicating that this has been done.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If the PDF file contains object streams, if object stream
|
||
<emphasis>n</emphasis> contains <emphasis>k</emphasis> objects,
|
||
those objects are numbered from <emphasis>n+1</emphasis> through
|
||
<emphasis>n+k</emphasis>, and the object number/offset pairs
|
||
appear on a separate line for each object. Additionally, each
|
||
object in the object stream is preceded by a comment indicating
|
||
its object number and index. This makes it very easy to find
|
||
objects in object streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
All beginnings of objects, <literal>stream</literal> tokens,
|
||
<literal>endstream</literal> tokens, and
|
||
<literal>endobj</literal> tokens appear on lines by themselves.
|
||
A blank line follows every <literal>endobj</literal> token.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If there is a cross-reference stream, it is unfiltered.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Page dictionaries and page content streams are marked with
|
||
special comments that make them easy to find.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Comments precede each object indicating the object number of the
|
||
corresponding object in the original file.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
When editing a QDF file, any edits can be made as long as the above
|
||
constraints are maintained. This means that you can freely edit a
|
||
page's content without worrying about messing up the QDF file. It
|
||
is also possible to add new objects so long as those objects are
|
||
added after the last object in the file or subsequent objects are
|
||
renumbered. If a QDF file has object streams in it, you can always
|
||
add the new objects before the xref stream and then change the
|
||
number of the xref stream, since nothing generally ever references
|
||
it by number.
|
||
</para>
|
||
<para>
|
||
It is not generally practical to remove objects from QDF files
|
||
without messing up object numbering, but if you remove all
|
||
references to an object, you can run qpdf on the file (after
|
||
running <command>fix-qdf</command>), and qpdf will omit the
|
||
now-orphaned object.
|
||
</para>
|
||
<para>
|
||
When <command>fix-qdf</command> is run, it goes through the file
|
||
and recomputes the following parts of the file:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
the <literal>/N</literal>, <literal>/W</literal>, and
|
||
<literal>/First</literal> keys of all object stream dictionaries
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
the pairs of numbers representing object numbers and offsets of
|
||
objects in object streams
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
all stream lengths
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
the cross-reference table or cross-reference stream
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
the offset to the cross-reference table or cross-reference
|
||
stream following the <literal>startxref</literal> token
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</chapter>
|
||
<chapter id="ref.using-library">
|
||
<title>Using the QPDF Library</title>
|
||
<sect1 id="ref.using.from-cxx">
|
||
<title>Using QPDF from C++</title>
|
||
<para>
|
||
The source tree for the qpdf package has an
|
||
<filename>examples</filename> directory that contains a few
|
||
example programs. The <filename>qpdf/qpdf.cc</filename> source
|
||
file also serves as a useful example since it exercises almost all
|
||
of the qpdf library's public interface. The best source of
|
||
documentation on the library itself is reading comments in
|
||
<filename>include/qpdf/QPDF.hh</filename>,
|
||
<filename>include/qpdf/QPDFWriter.hh</filename>, and
|
||
<filename>include/qpdf/QPDFObjectHandle.hh</filename>.
|
||
</para>
|
||
<para>
|
||
All header files are installed in the <filename>include/qpdf</filename> directory. It
|
||
is recommend that you use <literal>#include
|
||
<qpdf/QPDF.hh></literal> rather than adding
|
||
<filename>include/qpdf</filename> to your include path.
|
||
</para>
|
||
<para>
|
||
When linking against the qpdf static library, you may also need to
|
||
specify <literal>-lz -ljpeg</literal> on your link command. If
|
||
your system understands how to read libtool
|
||
<filename>.la</filename> files, this may not be necessary.
|
||
</para>
|
||
<para>
|
||
The qpdf library is safe to use in a multithreaded program, but no
|
||
individual <type>QPDF</type> object instance (including
|
||
<type>QPDF</type>, <type>QPDFObjectHandle</type>, or
|
||
<type>QPDFWriter</type>) can be used in more than one thread at a
|
||
time. Multiple threads may simultaneously work with different
|
||
instances of these and all other QPDF objects.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.using.other-languages">
|
||
<title>Using QPDF from other languages</title>
|
||
<para>
|
||
The qpdf library is implemented in C++, which makes it hard to use
|
||
directly in other languages. There are a few things that can help.
|
||
</para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>“C”</term>
|
||
<listitem>
|
||
<para>
|
||
The qpdf library includes a “C” language interface
|
||
that provides a subset of the overall capabilities. The header
|
||
file <filename>qpdf/qpdf-c.h</filename> includes information
|
||
about its use. As long as you use a C++ linker, you can link C
|
||
programs with qpdf and use the C API. For languages that can
|
||
directly load methods from a shared library, the C API can also
|
||
be useful. People have reported success using the C API from
|
||
other languages on Windows by directly calling functions in the
|
||
DLL.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Python</term>
|
||
<listitem>
|
||
<para>
|
||
A Python module called <ulink
|
||
url="https://pypi.org/project/pikepdf/">pikepdf</ulink>
|
||
provides a clean and highly functional set of Python bindings
|
||
to the qpdf library. Using pikepdf, you can work with PDF files
|
||
in a natural way and combine qpdf's capabilities with other
|
||
functionality provided by Python's rich standard library and
|
||
available modules.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Other Languages</term>
|
||
<listitem>
|
||
<para>
|
||
Starting with version 8.3.0, the <command>qpdf</command>
|
||
command-line tool can produce a json representation of the PDF
|
||
file's non-content data. This can facilitate interacting
|
||
programmatically with PDF files through qpdf's command line
|
||
interface. For more information, please see <xref
|
||
linkend="ref.json"/>.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</sect1>
|
||
</chapter>
|
||
<chapter id="ref.json">
|
||
<title>QPDF JSON</title>
|
||
<sect1 id="ref.json-overview">
|
||
<title>Overview</title>
|
||
<para>
|
||
Beginning with qpdf version 8.3.0, the <command>qpdf</command>
|
||
command-line program can produce a json representation of the
|
||
non-content data in a PDF file. It includes a dump in json format
|
||
of all objects in the PDF file excluding the content of streams.
|
||
This json representation makes it very easy to look in detail at
|
||
the structure of a given PDF file, and it also provides a great way
|
||
to work with PDF files programmatically from the command-line in
|
||
languages that can't call or link with the qpdf library directly.
|
||
Note that stream data can be extracted from PDF files using other
|
||
qpdf command-line options.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.json-guarantees">
|
||
<title>JSON Guarantees</title>
|
||
<para>
|
||
The qpdf json representation includes a json serialization of the
|
||
raw objects in the PDF file as well as some computed information in
|
||
a more easily extracted format. QPDF provides some guarantees about
|
||
its json format. These guarantees are designed to simplify the
|
||
experience of a developer working with the JSON format.
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>Compatibility</term>
|
||
<listitem>
|
||
<para>
|
||
The top-level json object output is a dictionary. The json
|
||
output contains various nested dictionaries and arrays. With
|
||
the exception of dictionaries that are populated by the fields
|
||
of objects from the file, all instances of a dictionary are
|
||
guaranteed to have exactly the same keys. Future versions of
|
||
qpdf are free to add additional keys but not to remove keys or
|
||
change the type of object that a key points to. The qpdf
|
||
program validates this guarantee, and in the unlikely event
|
||
that a bug in qpdf should cause it to generate data that
|
||
doesn't conform to this rule, it will ask you to file a bug
|
||
report.
|
||
</para>
|
||
<para>
|
||
The top-level json structure contains a
|
||
“<literal>version</literal>” key whose value is
|
||
simple integer. The value of the <literal>version</literal> key
|
||
will be incremented if a non-compatible change is made. A
|
||
non-compatible change would be any change that involves removal
|
||
of a key, a change to the format of data pointed to by a key,
|
||
or a semantic change that requires a different interpretation
|
||
of a previously existing key. A strong effort will be made to
|
||
avoid breaking compatibility.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Documentation</term>
|
||
<listitem>
|
||
<para>
|
||
The <command>qpdf</command> command can be invoked with the
|
||
<option>--json-help</option> option. This will output a json
|
||
structure that has the same structure as the json output that
|
||
qpdf generates, except that each field in the help output is a
|
||
description of the corresponding field in the json output. The
|
||
specific guarantees are as follows:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
A dictionary in the help output means that the corresponding
|
||
location in the actual json output is also a dictionary with
|
||
exactly the same keys; that is, no keys present in help are
|
||
absent in the real output, and no keys will be present in
|
||
the real output that are not in help.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A string in the help output is a description of the item
|
||
that appears in the corresponding location of the actual
|
||
output. The corresponding output can have any format.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
An array in the help output always contains a single
|
||
element. It indicates that the corresponding location in the
|
||
actual output is also an array, and that each element of the
|
||
array has whatever format is implied by the single element
|
||
of the help output's array.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
For example, the help output indicates includes a
|
||
“<literal>pagelabels</literal>” key whose value is
|
||
an array of one element. That element is a dictionary with keys
|
||
“<literal>index</literal>” and
|
||
“<literal>label</literal>”. In addition to
|
||
describing the meaning of those keys, this tells you that the
|
||
actual json output will contain a <literal>pagelabels</literal>
|
||
array, each of whose elements is a dictionary that contains an
|
||
<literal>index</literal> key, a <literal>label</literal> key,
|
||
and no other keys.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Directness and Simplicity</term>
|
||
<listitem>
|
||
<para>
|
||
The json output contains the value of every object in the file,
|
||
but it also contains some processed data. This is analogous to
|
||
how qpdf's library interface works. The processed data is
|
||
similar to the helper functions in that it allows you to look
|
||
at certain aspects of the PDF file without having to understand
|
||
all the nuances of the PDF specification, while the raw objects
|
||
allow you to mine the PDF for anything that the higher-level
|
||
interfaces are lacking.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="json.limitations">
|
||
<title>Limitations of JSON Representation</title>
|
||
<para>
|
||
There are a few limitations to be aware of with the json structure:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Strings, names, and indirect object references in the original
|
||
PDF file are all converted to strings in the json
|
||
representation. In the case of a “normal” PDF file,
|
||
you can tell the difference because a name starts with a slash
|
||
(<literal>/</literal>), and an indirect object reference looks
|
||
like <literal>n n R</literal>, but if there were to be a string
|
||
that looked like a name or indirect object reference, there
|
||
would be no way to tell this from the json output. Note that
|
||
there are certain cases where you know for sure what something
|
||
is, such as knowing that dictionary keys in objects are always
|
||
names and that certain things in the higher-level computed data
|
||
are known to contain indirect object references.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The json format doesn't support binary data very well. Mostly
|
||
the details are not important, but they are presented here for
|
||
information. When qpdf outputs a string in the json
|
||
representation, it converts the string to UTF-8, assuming usual
|
||
PDF string semantics. Specifically, if the original string is
|
||
UTF-16, it is converted to UTF-8. Otherwise, it is assumed to
|
||
have PDF doc encoding, and is converted to UTF-8 with that
|
||
assumption. This causes strange things to happen to binary
|
||
strings. For example, if you had the binary string
|
||
<literal><038051></literal>, this would be output to the
|
||
json as <literal>\u0003•Q</literal> because
|
||
<literal>03</literal> is not a printable character and
|
||
<literal>80</literal> is the bullet character in PDF doc
|
||
encoding and is mapped to the Unicode value
|
||
<literal>2022</literal>. Since <literal>51</literal> is
|
||
<literal>Q</literal>, it is output as is. If you wanted to
|
||
convert back from here to a binary string, would have to
|
||
recognize Unicode values whose code points are higher than
|
||
<literal>0xFF</literal> and map those back to their
|
||
corresponding PDF doc encoding characters. There is no way to
|
||
tell the difference between a Unicode string that was originally
|
||
encoded as UTF-16 or one that was converted from PDF doc
|
||
encoding. In other words, it's best if you don't try to use the
|
||
json format to extract binary strings from the PDF file, but if
|
||
you really had to, it could be done. Note that qpdf's
|
||
<option>--show-object</option> option does not have this
|
||
limitation and will reveal the string as encoded in the original
|
||
file.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="json.considerations">
|
||
<title>JSON: Special Considerations</title>
|
||
<para>
|
||
For the most part, the built-in JSON help tells you everything you
|
||
need to know about the JSON format, but there are a few
|
||
non-obvious things to be aware of:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
While qpdf guarantees that keys present in the help will be
|
||
present in the output, those fields may be null or empty if the
|
||
information is not known or absent in the file. Also, if you
|
||
specify <option>--json-keys</option>, the keys that are not
|
||
listed will be excluded entirely except for those that
|
||
<option>--json-help</option> says are always present.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
In a few places, there are keys with names containing
|
||
<literal>pageposfrom1</literal>. The values of these keys are
|
||
null or an integer. If an integer, they point to a page index
|
||
within the file numbering from 1. Note that json indexes from
|
||
0, and you would also use 0-based indexing using the API.
|
||
However, 1-based indexing is easier in this case because the
|
||
command-line syntax for specifying page ranges is 1-based. If
|
||
you were going to write a program that looked through the json
|
||
for information about specific pages and then use the
|
||
command-line to extract those pages, 1-based indexing is
|
||
easier. Besides, it's more convenient to subtract 1 from a
|
||
program in a real programming language than it is to add 1 from
|
||
shell code.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The image information included in the <literal>page</literal>
|
||
section of the json output includes the key
|
||
“<literal>filterable</literal>”. Note that the
|
||
value of this field may depend on the
|
||
<option>--decode-level</option> that you invoke qpdf with. The
|
||
json output includes a top-level key
|
||
“<literal>parameters</literal>” that indicates the
|
||
decode level used for computing whether a stream was
|
||
filterable. For example, jpeg images will be shown as not
|
||
filterable by default, but they will be shown as filterable if
|
||
you run <command>qpdf --json --decode-level=all</command>.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
<chapter id="ref.design">
|
||
<title>Design and Library Notes</title>
|
||
<sect1 id="ref.design.intro">
|
||
<title>Introduction</title>
|
||
<para>
|
||
This section was written prior to the implementation of the qpdf
|
||
package and was subsequently modified to reflect the
|
||
implementation. In some cases, for purposes of explanation, it
|
||
may differ slightly from the actual implementation. As always,
|
||
the source code and test suite are authoritative. Even if there
|
||
are some errors, this document should serve as a road map to
|
||
understanding how this code works.
|
||
</para>
|
||
<para>
|
||
In general, one should adhere strictly to a specification when
|
||
writing but be liberal in reading. This way, the product of our
|
||
software will be accepted by the widest range of other programs,
|
||
and we will accept the widest range of input files. This library
|
||
attempts to conform to that philosophy whenever possible but also
|
||
aims to provide strict checking for people who want to validate
|
||
PDF files. If you don't want to see warnings and are trying to
|
||
write something that is tolerant, you can call
|
||
<literal>setSuppressWarnings(true)</literal>. If you want to fail
|
||
on the first error, you can call
|
||
<literal>setAttemptRecovery(false)</literal>. The default behavior
|
||
is to generating warnings for recoverable problems. Note that
|
||
recovery will not always produce the desired results even if it is
|
||
able to get through the file. Unlike most other PDF files that
|
||
produce generic warnings such as “This file is
|
||
damaged,”, qpdf generally issues a detailed error message
|
||
that would be most useful to a PDF developer. This is by design as
|
||
there seems to be a shortage of PDF validation tools out there.
|
||
This was, in fact, one of the major motivations behind the initial
|
||
creation of qpdf.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.design-goals">
|
||
<title>Design Goals</title>
|
||
<para>
|
||
The QPDF package includes support for reading and rewriting PDF
|
||
files. It aims to hide from the user details involving object
|
||
locations, modified (appended) PDF files, the
|
||
directness/indirectness of objects, and stream filters including
|
||
encryption. It does not aim to hide knowledge of the object
|
||
hierarchy or content stream contents. Put another way, a user of
|
||
the qpdf library is expected to have knowledge about how PDF files
|
||
work, but is not expected to have to keep track of bookkeeping
|
||
details such as file positions.
|
||
</para>
|
||
<para>
|
||
A user of the library never has to care whether an object is
|
||
direct or indirect, though it is possible to determine whether an
|
||
object is direct or not if this information is needed. All access
|
||
to objects deals with this transparently. All memory management
|
||
details are also handled by the library.
|
||
</para>
|
||
<para>
|
||
The <classname>PointerHolder</classname> object is used internally
|
||
by the library to deal with memory management. This is basically a
|
||
smart pointer object very similar in spirit to C++-11's
|
||
<classname>std::shared_ptr</classname> object, but predating it by
|
||
several years. This library also makes use of a technique for
|
||
giving fine-grained access to methods in one class to other
|
||
classes by using public subclasses with friends and only private
|
||
members that in turn call private methods of the containing class.
|
||
See <classname>QPDFObjectHandle::Factory</classname> as an
|
||
example.
|
||
</para>
|
||
<para>
|
||
The top-level qpdf class is <classname>QPDF</classname>. A
|
||
<classname>QPDF</classname> object represents a PDF file. The
|
||
library provides methods for both accessing and mutating PDF
|
||
files.
|
||
</para>
|
||
<para>
|
||
The primary class for interacting with PDF objects is
|
||
<classname>QPDFObjectHandle</classname>. Instances of this class
|
||
can be passed around by value, copied, stored in containers, etc.
|
||
with very low overhead. Instances of
|
||
<classname>QPDFObjectHandle</classname> created by reading from a
|
||
file will always contain a reference back to the
|
||
<classname>QPDF</classname> object from which they were created. A
|
||
<classname>QPDFObjectHandle</classname> may be direct or indirect.
|
||
If indirect, the <classname>QPDFObject</classname> the
|
||
<classname>PointerHolder</classname> initially points to is a null
|
||
pointer. In this case, the first attempt to access the underlying
|
||
<classname>QPDFObject</classname> will result in the
|
||
<classname>QPDFObject</classname> being resolved via a call to the
|
||
referenced <classname>QPDF</classname> instance. This makes it
|
||
essentially impossible to make coding errors in which certain
|
||
things will work for some PDF files and not for others based on
|
||
which objects are direct and which objects are indirect.
|
||
</para>
|
||
<para>
|
||
Instances of <classname>QPDFObjectHandle</classname> can be
|
||
directly created and modified using static factory methods in the
|
||
<classname>QPDFObjectHandle</classname> class. There are factory
|
||
methods for each type of object as well as a convenience method
|
||
<function>QPDFObjectHandle::parse</function> that creates an
|
||
object from a string representation of the object. Existing
|
||
instances of <classname>QPDFObjectHandle</classname> can also be
|
||
modified in several ways. See comments in
|
||
<filename>QPDFObjectHandle.hh</filename> for details.
|
||
</para>
|
||
<para>
|
||
An instance of <classname>QPDF</classname> is constructed by using
|
||
the class's default constructor. If desired, the
|
||
<classname>QPDF</classname> object may be configured with various
|
||
methods that change its default behavior. Then the
|
||
<function>QPDF::processFile()</function> method is passed the name
|
||
of a PDF file, which permanently associates the file with that
|
||
QPDF object. A password may also be given for access to
|
||
password-protected files. QPDF does not enforce encryption
|
||
parameters and will treat user and owner passwords equivalently.
|
||
Either password may be used to access an encrypted file.
|
||
<footnote>
|
||
<para>
|
||
As pointed out earlier, the intention is not for qpdf to be used
|
||
to bypass security on files. but as any open source PDF consumer
|
||
may be easily modified to bypass basic PDF document security,
|
||
and qpdf offers may transformations that can do this as well,
|
||
there seems to be little point in the added complexity of
|
||
conditionally enforcing document security.
|
||
</para>
|
||
</footnote>
|
||
<classname>QPDF</classname> will allow recovery of a user password
|
||
given an owner password. The input PDF file must be seekable.
|
||
(Output files written by <classname>QPDFWriter</classname> need
|
||
not be seekable, even when creating linearized files.) During
|
||
construction, <classname>QPDF</classname> validates the PDF file's
|
||
header, and then reads the cross reference tables and trailer
|
||
dictionaries. The <classname>QPDF</classname> class keeps only
|
||
the first trailer dictionary though it does read all of them so it
|
||
can check the <literal>/Prev</literal> key.
|
||
<classname>QPDF</classname> class users may request the root
|
||
object and the trailer dictionary specifically. The cross
|
||
reference table is kept private. Objects may then be requested by
|
||
number of by walking the object tree.
|
||
</para>
|
||
<para>
|
||
When a PDF file has a cross-reference stream instead of a
|
||
cross-reference table and trailer, requesting the document's
|
||
trailer dictionary returns the stream dictionary from the
|
||
cross-reference stream instead.
|
||
</para>
|
||
<para>
|
||
There are some convenience routines for very common operations
|
||
such as walking the page tree and returning a vector of all page
|
||
objects. For full details, please see the header files
|
||
<filename>QPDF.hh</filename> and
|
||
<filename>QPDFObjectHandle.hh</filename>. There are also some
|
||
additional helper classes that provide higher level API functions
|
||
for certain document constructions. These are discussed in <xref
|
||
linkend="ref.helper-classes"/>.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.helper-classes">
|
||
<title>Helper Classes</title>
|
||
<para>
|
||
QPDF version 8.1 introduced the concept of helper classes. Helper
|
||
classes are intended to contain higher level APIs that allow
|
||
developers to work with certain document constructs at an
|
||
abstraction level above that of
|
||
<classname>QPDFObjectHandle</classname> while staying true to
|
||
qpdf's philosophy of not hiding document structure from the
|
||
developer. As with qpdf in general, the goal is take away some of
|
||
the more tedious bookkeeping aspects of working with PDF files,
|
||
not to remove the need for the developer to understand how the PDF
|
||
construction in question works. The driving factor behind the
|
||
creation of helper classes was to allow the evolution of higher
|
||
level interfaces in qpdf without polluting the interfaces of the
|
||
main top-level classes <classname>QPDF</classname> and
|
||
<classname>QPDFObjectHandle</classname>.
|
||
</para>
|
||
<para>
|
||
There are two kinds of helper classes:
|
||
<emphasis>document</emphasis> helpers and
|
||
<emphasis>object</emphasis> helpers. Document helpers are
|
||
constructed with a reference to a <classname>QPDF</classname>
|
||
object and provide methods for working with structures that are at
|
||
the document level. Object helpers are constructed with an
|
||
instance of a <classname>QPDFObjectHandle</classname> and provide
|
||
methods for working with specific types of objects.
|
||
</para>
|
||
<para>
|
||
Examples of document helpers include
|
||
<classname>QPDFPageDocumentHelper</classname>, which contains
|
||
methods for operating on the document's page trees, such as
|
||
enumerating all pages of a document and adding and removing pages;
|
||
and <classname>QPDFAcroFormDocumentHelper</classname>, which
|
||
contains document-level methods related to interactive forms, such
|
||
as enumerating form fields and creating mappings between form
|
||
fields and annotations.
|
||
</para>
|
||
<para>
|
||
Examples of object helpers include
|
||
<classname>QPDFPageObjectHelper</classname> for performing
|
||
operations on pages such as page rotation and some operations on
|
||
content streams, <classname>QPDFFormFieldObjectHelper</classname>
|
||
for performing operations related to interactive form fields, and
|
||
<classname>QPDFAnnotationObjectHelper</classname> for working with
|
||
annotations.
|
||
</para>
|
||
<para>
|
||
It is always possible to retrieve the underlying
|
||
<classname>QPDF</classname> reference from a document helper and
|
||
the underlying <classname>QPDFObjectHandle</classname> reference
|
||
from an object helper. Helpers are designed to be helpers, not
|
||
wrappers. The intention is that, in general, it is safe to freely
|
||
intermix operations that use helpers with operations that use the
|
||
underlying objects. Document and object helpers do not attempt to
|
||
provide a complete interface for working with the things they are
|
||
helping with, nor do they attempt to encapsulate underlying
|
||
structures. They just provide a few methods to help with
|
||
error-prone, repetitive, or complex tasks. In some cases, a helper
|
||
object may cache some information that is expensive to gather. In
|
||
such cases, the helper classes are implemented so that their own
|
||
methods keep the cache consistent, and the header file will
|
||
provide a method to invalidate the cache and a description of what
|
||
kinds of operations would make the cache invalid. If in doubt, you
|
||
can always discard a helper class and create a new one with the
|
||
same underlying objects, which will ensure that you have discarded
|
||
any stale information.
|
||
</para>
|
||
<para>
|
||
By Convention, document helpers are called
|
||
<classname>QPDFSomethingDocumentHelper</classname> and are derived
|
||
from <classname>QPDFDocumentHelper</classname>, and object helpers
|
||
are called <classname>QPDFSomethingObjectHelper</classname> and
|
||
are derived from <classname>QPDFObjectHelper</classname>. For
|
||
details on specific helpers, please see their header files. You
|
||
can find them by looking at
|
||
<filename>include/qpdf/QPDF*DocumentHelper.hh</filename> and
|
||
<filename>include/qpdf/QPDF*ObjectHelper.hh</filename>.
|
||
</para>
|
||
<para>
|
||
In order to avoid creation of circular dependencies, the following
|
||
general guidelines are followed with helper classes:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Core class interfaces do not know about helper classes. For
|
||
example, no methods of <classname>QPDF</classname> or
|
||
<classname>QPDFObjectHandle</classname> will include helper
|
||
classes in their interfaces.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Interfaces of object helpers will usually not use document
|
||
helpers in their interfaces. This is because it is much more
|
||
useful for document helpers to have methods that return object
|
||
helpers. Most operations in PDF files start at the document
|
||
level and go from there to the object level rather than the
|
||
other way around. It can sometimes be useful to map back from
|
||
object-level structures to document-level structures. If there
|
||
is a desire to do this, it will generally be provided by a
|
||
method in the document helper class.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Most of the time, object helpers don't know about other object
|
||
helpers. However, in some cases, one type of object may be a
|
||
container for another type of object, in which case it may make
|
||
sense for the outer object to know about the inner object. For
|
||
example, there are methods in the
|
||
<classname>QPDFPageObjectHelper</classname> that know
|
||
<classname>QPDFAnnotationObjectHelper</classname> because
|
||
references to annotations are contained in page dictionaries.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Any helper or core library class may use helpers in their
|
||
implementations.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Prior to qpdf version 8.1, higher level interfaces were added as
|
||
“convenience functions” in either
|
||
<classname>QPDF</classname> or
|
||
<classname>QPDFObjectHandle</classname>. For compatibility, older
|
||
convenience functions for operating with pages will remain in
|
||
those classes even as alternatives are provided in helper classes.
|
||
Going forward, new higher level interfaces will be provided using
|
||
helper classes.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.implementation-notes">
|
||
<title>Implementation Notes</title>
|
||
<para>
|
||
This section contains a few notes about QPDF's internal
|
||
implementation, particularly around what it does when it first
|
||
processes a file. This section is a bit of a simplification of
|
||
what it actually does, but it could serve as a starting point to
|
||
someone trying to understand the implementation. There is nothing
|
||
in this section that you need to know to use the qpdf library.
|
||
</para>
|
||
<para>
|
||
<classname>QPDFObject</classname> is the basic PDF Object class.
|
||
It is an abstract base class from which are derived classes for
|
||
each type of PDF object. Clients do not interact with Objects
|
||
directly but instead interact with
|
||
<classname>QPDFObjectHandle</classname>.
|
||
</para>
|
||
<para>
|
||
When the <classname>QPDF</classname> class creates a new object,
|
||
it dynamically allocates the appropriate type of
|
||
<classname>QPDFObject</classname> and immediately hands the
|
||
pointer to an instance of <classname>QPDFObjectHandle</classname>.
|
||
The parser reads a token from the current file position. If the
|
||
token is a not either a dictionary or array opener, an object is
|
||
immediately constructed from the single token and the parser
|
||
returns. Otherwise, the parser iterates in a special mode in which
|
||
it accumulates objects until it finds a balancing closer. During
|
||
this process, the “<literal>R</literal>” keyword is
|
||
recognized and an indirect <classname>QPDFObjectHandle</classname>
|
||
may be constructed.
|
||
</para>
|
||
<para>
|
||
The <function>QPDF::resolve()</function> method, which is used to
|
||
resolve an indirect object, may be invoked from the
|
||
<classname>QPDFObjectHandle</classname> class. It first checks a
|
||
cache to see whether this object has already been read. If not,
|
||
it reads the object from the PDF file and caches it. It the
|
||
returns the resulting <classname>QPDFObjectHandle</classname>.
|
||
The calling object handle then replaces its
|
||
<classname>PointerHolder<QDFObject></classname> with the one
|
||
from the newly returned <classname>QPDFObjectHandle</classname>.
|
||
In this way, only a single copy of any direct object need exist
|
||
and clients can access objects transparently without knowing
|
||
caring whether they are direct or indirect objects. Additionally,
|
||
no object is ever read from the file more than once. That means
|
||
that only the portions of the PDF file that are actually needed
|
||
are ever read from the input file, thus allowing the qpdf package
|
||
to take advantage of this important design goal of PDF files.
|
||
</para>
|
||
<para>
|
||
If the requested object is inside of an object stream, the object
|
||
stream itself is first read into memory. Then the tokenizer reads
|
||
objects from the memory stream based on the offset information
|
||
stored in the stream. Those individual objects are cached, after
|
||
which the temporary buffer holding the object stream contents are
|
||
discarded. In this way, the first time an object in an object
|
||
stream is requested, all objects in the stream are cached.
|
||
</para>
|
||
<para>
|
||
The following example should clarify how
|
||
<classname>QPDF</classname> processes a simple file.
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Client constructs <classname>QPDF</classname>
|
||
<varname>pdf</varname> and calls
|
||
<function>pdf.processFile("a.pdf");</function>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <classname>QPDF</classname> class checks the beginning of
|
||
<filename>a.pdf</filename> for a PDF header. It then reads the
|
||
cross reference table mentioned at the end of the file,
|
||
ensuring that it is looking before the last
|
||
<literal>%%EOF</literal>. After getting to
|
||
<literal>trailer</literal> keyword, it invokes the parser.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The parser sees “<literal><<</literal>”, so
|
||
it calls itself recursively in dictionary creation mode.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
In dictionary creation mode, the parser keeps accumulating
|
||
objects until it encounters
|
||
“<literal>>></literal>”. Each object that is
|
||
read is pushed onto a stack. If
|
||
“<literal>R</literal>” is read, the last two
|
||
objects on the stack are inspected. If they are integers, they
|
||
are popped off the stack and their values are used to construct
|
||
an indirect object handle which is then pushed onto the stack.
|
||
When “<literal>>></literal>” is finally read,
|
||
the stack is converted into a
|
||
<classname>QPDF_Dictionary</classname> which is placed in a
|
||
<classname>QPDFObjectHandle</classname> and returned.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The resulting dictionary is saved as the trailer dictionary.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <literal>/Prev</literal> key is searched. If present,
|
||
<classname>QPDF</classname> seeks to that point and repeats
|
||
except that the new trailer dictionary is not saved. If
|
||
<literal>/Prev</literal> is not present, the initial parsing
|
||
process is complete.
|
||
</para>
|
||
<para>
|
||
If there is an encryption dictionary, the document's encryption
|
||
parameters are initialized.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The client requests root object. The
|
||
<classname>QPDF</classname> class gets the value of root key
|
||
from trailer dictionary and returns it. It is an unresolved
|
||
indirect <classname>QPDFObjectHandle</classname>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The client requests the <literal>/Pages</literal> key from root
|
||
<classname>QPDFObjectHandle</classname>. The
|
||
<classname>QPDFObjectHandle</classname> notices that it is
|
||
indirect so it asks <classname>QPDF</classname> to resolve it.
|
||
<classname>QPDF</classname> looks in the object cache for an
|
||
object with the root dictionary's object ID and generation
|
||
number. Upon not seeing it, it checks the cross reference
|
||
table, gets the offset, and reads the object present at that
|
||
offset. It stores the result in the object cache and returns
|
||
the cached result. The calling
|
||
<classname>QPDFObjectHandle</classname> replaces its object
|
||
pointer with the one from the resolved
|
||
<classname>QPDFObjectHandle</classname>, verifies that it a
|
||
valid dictionary object, and returns the (unresolved indirect)
|
||
<classname>QPDFObject</classname> handle to the top of the
|
||
Pages hierarchy.
|
||
</para>
|
||
<para>
|
||
As the client continues to request objects, the same process is
|
||
followed for each new requested object.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.casting">
|
||
<title>Casting Policy</title>
|
||
<para>
|
||
This section describes the casting policy followed by qpdf's
|
||
implementation. This is no concern to qpdf's end users and
|
||
largely of no concern to people writing code that uses qpdf, but
|
||
it could be of interest to people who are porting qpdf to a new
|
||
platform or who are making modifications to the code.
|
||
</para>
|
||
<para>
|
||
The C++ code in qpdf is free of old-style casts except where
|
||
unavoidable (e.g. where the old-style cast is in a macro provided
|
||
by a third-party header file). When there is a need for a cast,
|
||
it is handled, in order of preference, by rewriting the code to
|
||
avoid the need for a cast, calling
|
||
<function>const_cast</function>, calling
|
||
<function>static_cast</function>, calling
|
||
<function>reinterpret_cast</function>, or calling some combination
|
||
of the above. As a last resort, a compiler-specific
|
||
<literal>#pragma</literal> may be used to suppress a warning that
|
||
we don't want to fix. Examples may include suppressing warnings
|
||
about the use of old-style casts in code that is shared between C
|
||
and C++ code.
|
||
</para>
|
||
<para>
|
||
The casting policy explicitly prohibits casting between integer
|
||
sizes for no purpose other than to quiet a compiler warning when
|
||
there is no reasonable chance of a problem resulting. The reason
|
||
for this exclusion is that the practice of adding these additional
|
||
casts precludes future use of additional compiler warnings as a
|
||
tool for making future improvements to this aspect of the code,
|
||
and it also damages the readability of the code.
|
||
</para>
|
||
<para>
|
||
There are a few significant areas where casting is common in the
|
||
qpdf sources or where casting would be required to quiet higher
|
||
levels of compiler warnings but is omitted at present:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<type>char</type> vs. <type>unsigned char</type>. For
|
||
historical reasons, there are a lot of places in qpdf's
|
||
internals that deal with <type>unsigned char</type>, which
|
||
means that a lot of casting is required to interoperate with
|
||
standard library calls and <type>std::string</type>. In
|
||
retrospect, qpdf should have probably used regular (signed)
|
||
<type>char</type> and <type>char*</type> everywhere and just
|
||
cast to <type>unsigned char</type> when needed, but it's too
|
||
late to make that change now. There are
|
||
<function>reinterpret_cast</function> calls to go between
|
||
<type>char*</type> and <type>unsigned char*</type>, and there
|
||
are <function>static_cast</function> calls to go between
|
||
<type>char</type> and <type>unsigned char</type>. These should
|
||
always be safe.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Non-const <type>unsigned char*</type> used in the
|
||
<type>Pipeline</type> interface. The pipeline interface has a
|
||
<function>write</function> call that uses <type>unsigned
|
||
char*</type> without a <type>const</type> qualifier. The main
|
||
reason for this is to support pipelines that make calls to
|
||
third-party libraries, such as zlib, that don't include
|
||
<type>const</type> in their interfaces. Unfortunately, there
|
||
are many places in the code where it is desirable to have
|
||
<type>const char*</type> with pipelines. None of the pipeline
|
||
implementations in qpdf currently modify the data passed to
|
||
write, and doing so would be counter to the intent of
|
||
<type>Pipeline</type>, but there is nothing in the code to
|
||
prevent this from being done. There are places in the code
|
||
where <function>const_cast</function> is used to remove the
|
||
const-ness of pointers going into <type>Pipeline</type>s. This
|
||
could theoretically be unsafe, but there is adequate testing to
|
||
assert that it is safe and will remain safe in qpdf's code.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
|
||
pretty much unavoidable since sizes are unsigned types and
|
||
offsets are signed types. Whenever it is necessary to seek by
|
||
an amount given by a <type>size_t</type>, it becomes necessary
|
||
to mix and match between <type>size_t</type> and
|
||
<type>qpdf_offset_t</type>. Additionally, qpdf sometimes
|
||
treats memory buffers like files (as with
|
||
<type>BufferInputSource</type>, and those seek interfaces have
|
||
to be consistent with file-based input sources. Neither gcc
|
||
nor MSVC give warnings for this case by default, but both have
|
||
warning flags that can enable this. (MSVC:
|
||
<option>/W14267</option> or <option>/W3</option>, which also
|
||
enables some additional warnings that we ignore; gcc:
|
||
<option>-Wconversion -Wsign-conversion</option>). This could
|
||
matter for files whose sizes are larger than
|
||
2<superscript>63</superscript> bytes, but it is reasonable to
|
||
expect that a world where such files are common would also have
|
||
larger <type>size_t</type> and <type>qpdf_offset_t</type> types
|
||
in it. On most 64-bit systems at the time of this writing (the
|
||
release of version 4.1.0 of qpdf), both <type>size_t</type> and
|
||
<type>qpdf_offset_t</type> are 64-bit integer types, while on
|
||
many current 32-bit systems, <type>size_t</type> is a 32-bit
|
||
type while <type>qpdf_offset_t</type> is a 64-bit type. I am
|
||
not aware of any cases where 32-bit systems that have
|
||
<type>size_t</type> smaller than <type>qpdf_offset_t</type>
|
||
could run into problems. Although I can't conclusively rule
|
||
out the possibility of such problems existing, I suspect any
|
||
cases would be pretty contrived. In the event that someone
|
||
should produce a file that qpdf can't handle because of what is
|
||
suspected to be issues involving the handling of
|
||
<type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
|
||
may behave properly on 64-bit systems but not on 32-bit systems
|
||
because they have very large embedded files or streams, for
|
||
example), the above mentioned warning flags could be enabled
|
||
and all those implicit conversions could be carefully
|
||
scrutinized. (I have already gone through that exercise once
|
||
in adding support for files larger than 4 GB in size.) I
|
||
continue to be committed to supporting large files on 32-bit
|
||
systems, but I would not go to any lengths to support corner
|
||
cases involving large embedded files or large streams that work
|
||
on 64-bit systems but not on 32-bit systems because of
|
||
<type>size_t</type> being too small. It is reasonable to
|
||
assume that anyone working with such files would be using a
|
||
64-bit system anyway since many 32-bit applications would have
|
||
similar difficulties.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<type>size_t</type> vs. <type>int</type> or <type>long</type>.
|
||
There are some cases where <type>size_t</type> and
|
||
<type>int</type> or <type>long</type> or <type>size_t</type>
|
||
and <type>unsigned int</type> or <type>unsigned long</type> are
|
||
used interchangeably. These cases occur when working with very
|
||
small amounts of memory, such as with the bit readers (where
|
||
we're working with just a few bytes at a time), some cases of
|
||
<function>strlen</function>, and a few other cases. I have
|
||
scrutinized all of these cases and determined them to be safe,
|
||
but there is no mechanism in the code to ensure that new unsafe
|
||
conversions between <type>int</type> and <type>size_t</type>
|
||
aren't introduced short of good testing and strong awareness of
|
||
the issues. Again, if any such bugs are suspected in the
|
||
future, enabling the additional warning flags and scrutinizing
|
||
the warnings would be in order.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
To be clear, I believe qpdf to be well-behaved with respect to
|
||
sizes and offsets, and qpdf's test suite includes actual
|
||
generation and full processing of files larger than 4 GB in
|
||
size. The issues raised here are largely academic and should not
|
||
in any way be interpreted to mean that qpdf has practical problems
|
||
involving sloppiness with integer types. I also believe that
|
||
appropriate measures have been taken in the code to avoid problems
|
||
with signed vs. unsigned integers from resulting in memory
|
||
overwrites or other issues with potential security implications,
|
||
though there are never any absolute guarantees.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.encryption">
|
||
<title>Encryption</title>
|
||
<para>
|
||
Encryption is supported transparently by qpdf. When opening a PDF
|
||
file, if an encryption dictionary exists, the
|
||
<classname>QPDF</classname> object processes this dictionary using
|
||
the password (if any) provided. The primary decryption key is
|
||
computed and cached. No further access is made to the encryption
|
||
dictionary after that time. When an object is read from a file,
|
||
the object ID and generation of the object in which it is
|
||
contained is always known. Using this information along with the
|
||
stored encryption key, all stream and string objects are
|
||
transparently decrypted. Raw encrypted objects are never stored
|
||
in memory. This way, nothing in the library ever has to know or
|
||
care whether it is reading an encrypted file.
|
||
</para>
|
||
<para>
|
||
An interface is also provided for writing encrypted streams and
|
||
strings given an encryption key. This is used by
|
||
<classname>QPDFWriter</classname> when it rewrites encrypted
|
||
files.
|
||
</para>
|
||
<para>
|
||
When copying encrypted files, unless otherwise directed, qpdf will
|
||
preserve any encryption in force in the original file. qpdf can
|
||
do this with either the user or the owner password. There is no
|
||
difference in capability based on which password is used. When 40
|
||
or 128 bit encryption keys are used, the user password can be
|
||
recovered with the owner password. With 256 keys, the user and
|
||
owner passwords are used independently to encrypt the actual
|
||
encryption key, so while either can be used, the owner password
|
||
can no longer be used to recover the user password.
|
||
</para>
|
||
<para>
|
||
Starting with version 4.0.0, qpdf can read files that are not
|
||
encrypted but that contain encrypted attachments, but it cannot
|
||
write such files. qpdf also requires the password to be specified
|
||
in order to open the file, not just to extract attachments, since
|
||
once the file is open, all decryption is handled transparently.
|
||
When copying files like this while preserving encryption, qpdf
|
||
will apply the file's encryption to everything in the file, not
|
||
just to the attachments. When decrypting the file, qpdf will
|
||
decrypt the attachments. In general, when copying PDF files with
|
||
multiple encryption formats, qpdf will choose the newest format.
|
||
The only exception to this is that clear-text metadata will be
|
||
preserved as clear-text if it is that way in the original file.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.random-numbers">
|
||
<title>Random Number Generation</title>
|
||
<para>
|
||
QPDF generates random numbers to support generation of encrypted
|
||
data. Versions prior to 5.0.1 used <function>random</function> or
|
||
<function>rand</function> from <filename>stdlib</filename> to
|
||
generate random numbers. Version 5.0.1, if available, used
|
||
operating system-provided secure random number generation instead,
|
||
enabling use of <filename>stdlib</filename> random number
|
||
generation only if enabled by a compile-time option. Starting in
|
||
version 5.1.0, use of insecure random numbers was disabled unless
|
||
enabled at compile time. Starting in version 5.1.0, it is also
|
||
possible for you to disable use of OS-provided secure random
|
||
numbers. This is especially useful on Windows if you want to
|
||
avoid a dependency on Microsoft's cryptography API. In this case,
|
||
you must provide your own random data provider. Regardless of how
|
||
you compile qpdf, starting in version 5.1.0, it is possible for
|
||
you to provide your own random data provider at runtime. This
|
||
would enable you to use some software-based secure pseudorandom
|
||
number generator and to avoid use of whatever the operating system
|
||
provides. For details on how to do this, please refer to the
|
||
top-level README.md file in the source distribution and to comments
|
||
in <filename>QUtil.hh</filename>.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.adding-and-remove-pages">
|
||
<title>Adding and Removing Pages</title>
|
||
<para>
|
||
While qpdf's API has supported adding and modifying objects for
|
||
some time, version 3.0 introduces specific methods for adding and
|
||
removing pages. These are largely convenience routines that
|
||
handle two tricky issues: pushing inheritable resources from the
|
||
<literal>/Pages</literal> tree down to individual pages and
|
||
manipulation of the <literal>/Pages</literal> tree itself. For
|
||
details, see <function>addPage</function> and surrounding methods
|
||
in <filename>QPDF.hh</filename>.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.reserved-objects">
|
||
<title>Reserving Object Numbers</title>
|
||
<para>
|
||
Version 3.0 of qpdf introduced the concept of reserved objects.
|
||
These are seldom needed for ordinary operations, but there are
|
||
cases in which you may want to add a series of indirect objects
|
||
with references to each other to a <classname>QPDF</classname>
|
||
object. This causes a problem because you can't determine the
|
||
object ID that a new indirect object will have until you add it to
|
||
the <classname>QPDF</classname> object with
|
||
<function>QPDF::makeIndirectObject</function>. The only way to
|
||
add two mutually referential objects to a
|
||
<classname>QPDF</classname> object prior to version 3.0 would be
|
||
to add the new objects first and then make them refer to each
|
||
other after adding them. Now it is possible to create a
|
||
<firstterm>reserved object</firstterm> using
|
||
<function>QPDFObjectHandle::newReserved</function>. This is an
|
||
indirect object that stays “unresolved” even if it is
|
||
queried for its type. So now, if you want to create a set of
|
||
mutually referential objects, you can create reservations for each
|
||
one of them and use those reservations to construct the
|
||
references. When finished, you can call
|
||
<function>QPDF::replaceReserved</function> to replace the reserved
|
||
objects with the real ones. This functionality will never be
|
||
needed by most applications, but it is used internally by QPDF
|
||
when copying objects from other PDF files, as discussed in <xref
|
||
linkend="ref.foreign-objects"/>. For an example of how to use
|
||
reserved objects, search for <function>newReserved</function> in
|
||
<filename>test_driver.cc</filename> in qpdf's sources.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.foreign-objects">
|
||
<title>Copying Objects From Other PDF Files</title>
|
||
<para>
|
||
Version 3.0 of qpdf introduced the ability to copy objects into a
|
||
<classname>QPDF</classname> object from a different
|
||
<classname>QPDF</classname> object, which we refer to as
|
||
<firstterm>foreign objects</firstterm>. This allows arbitrary
|
||
merging of PDF files. The “from”
|
||
<classname>QPDF</classname> object must remain valid after the
|
||
copy as discussed in the note below. The <command>qpdf</command>
|
||
command-line tool provides limited support for basic page
|
||
selection, including merging in pages from other files, but the
|
||
library's API makes it possible to implement arbitrarily complex
|
||
merging operations. The main method for copying foreign objects is
|
||
<function>QPDF::copyForeignObject</function>. This takes an
|
||
indirect object from another <classname>QPDF</classname> and
|
||
copies it recursively into this object while preserving all object
|
||
structure, including circular references. This means you can add a
|
||
direct object that you create from scratch to a
|
||
<classname>QPDF</classname> object with
|
||
<function>QPDF::makeIndirectObject</function>, and you can add an
|
||
indirect object from another file with
|
||
<function>QPDF::copyForeignObject</function>. The fact that
|
||
<function>QPDF::makeIndirectObject</function> does not
|
||
automatically detect a foreign object and copy it is an explicit
|
||
design decision. Copying a foreign object seems like a
|
||
sufficiently significant thing to do that it should be done
|
||
explicitly.
|
||
</para>
|
||
<para>
|
||
The other way to copy foreign objects is by passing a page from
|
||
one <classname>QPDF</classname> to another by calling
|
||
<function>QPDF::addPage</function>. In contrast to
|
||
<function>QPDF::makeIndirectObject</function>, this method
|
||
automatically distinguishes between indirect objects in the
|
||
current file, foreign objects, and direct objects.
|
||
</para>
|
||
<para>
|
||
Please note: when you copy objects from one
|
||
<classname>QPDF</classname> to another, the source
|
||
<classname>QPDF</classname> object must remain valid until you
|
||
have finished with the destination object. This is because the
|
||
original object is still used to retrieve any referenced stream
|
||
data from the copied object.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.rewriting">
|
||
<title>Writing PDF Files</title>
|
||
<para>
|
||
The qpdf library supports file writing of
|
||
<classname>QPDF</classname> objects to PDF files through the
|
||
<classname>QPDFWriter</classname> class. The
|
||
<classname>QPDFWriter</classname> class has two writing modes: one
|
||
for non-linearized files, and one for linearized files. See <xref
|
||
linkend="ref.linearization"/> for a description of linearization
|
||
is implemented. This section describes how we write
|
||
non-linearized files including the creation of QDF files (see
|
||
<xref linkend="ref.qdf"/>.
|
||
</para>
|
||
<para>
|
||
This outline was written prior to implementation and is not
|
||
exactly accurate, but it provides a correct “notional”
|
||
idea of how writing works. Look at the code in
|
||
<classname>QPDFWriter</classname> for exact details.
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Initialize state:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
next object number = 1
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
object queue = empty
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
renumber table: old object id/generation to new id/0 = empty
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
xref table: new id -> offset = empty
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Create a QPDF object from a file.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Write header for new PDF file.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Request the trailer dictionary.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
For each value that is an indirect object, grab the next object
|
||
number (via an operation that returns and increments the
|
||
number). Map object to new number in renumber table. Push
|
||
object onto queue.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
While there are more objects on the queue:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Pop queue.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Look up object's new number <emphasis>n</emphasis> in the
|
||
renumbering table.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Store current offset into xref table.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Write <literal><replaceable>n</replaceable> 0 obj</literal>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If object is null, whether direct or indirect, write out
|
||
null, thus eliminating unresolvable indirect object
|
||
references.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If the object is a stream stream, write stream contents,
|
||
piped through any filters as required, to a memory buffer.
|
||
Use this buffer to determine the stream length.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If object is not a stream, array, or dictionary, write out
|
||
its contents.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If object is an array or dictionary (including stream),
|
||
traverse its elements (for array) or values (for
|
||
dictionaries), handling recursive dictionaries and arrays,
|
||
looking for indirect objects. When an indirect object is
|
||
found, if it is not resolvable, ignore. (This case is
|
||
handled when writing it out.) Otherwise, look it up in the
|
||
renumbering table. If not found, grab the next available
|
||
object number, assign to the referenced object in the
|
||
renumbering table, and push the referenced object onto the
|
||
queue. As a special case, when writing out a stream
|
||
dictionary, replace length, filters, and decode parameters
|
||
as required.
|
||
</para>
|
||
<para>
|
||
Write out dictionary or array, replacing any unresolvable
|
||
indirect object references with null (pdf spec says
|
||
reference to non-existent object is legal and resolves to
|
||
null) and any resolvable ones with references to the
|
||
renumbered objects.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
If the object is a stream, write
|
||
<literal>stream\n</literal>, the stream contents (from the
|
||
memory buffer), and <literal>\nendstream\n</literal>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When done, write <literal>endobj</literal>.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Once we have finished the queue, all referenced objects will have
|
||
been written out and all deleted objects or unreferenced objects
|
||
will have been skipped. The new cross-reference table will
|
||
contain an offset for every new object number from 1 up to the
|
||
number of objects written. This can be used to write out a new
|
||
xref table. Finally we can write out the trailer dictionary with
|
||
appropriately computed /ID (see spec, 8.3, File Identifiers), the
|
||
cross reference table offset, and <literal>%%EOF</literal>.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.filtered-streams">
|
||
<title>Filtered Streams</title>
|
||
<para>
|
||
Support for streams is implemented through the
|
||
<classname>Pipeline</classname> interface which was designed for
|
||
this package.
|
||
</para>
|
||
<para>
|
||
When reading streams, create a series of
|
||
<classname>Pipeline</classname> objects. The
|
||
<classname>Pipeline</classname> abstract base requires
|
||
implementation <function>write()</function> and
|
||
<function>finish()</function> and provides an implementation of
|
||
<function>getNext()</function>. Each pipeline object, upon
|
||
receiving data, does whatever it is going to do and then writes
|
||
the data (possibly modified) to its successor. Alternatively, a
|
||
pipeline may be an end-of-the-line pipeline that does something
|
||
like store its output to a file or a memory buffer ignoring a
|
||
successor. For additional details, look at
|
||
<filename>Pipeline.hh</filename>.
|
||
</para>
|
||
<para>
|
||
<classname>QPDF</classname> can read raw or filtered streams.
|
||
When reading a filtered stream, the <classname>QPDF</classname>
|
||
class creates a <classname>Pipeline</classname> object for one of
|
||
each appropriate filter object and chains them together. The last
|
||
filter should write to whatever type of output is required. The
|
||
<classname>QPDF</classname> class has an interface to write raw or
|
||
filtered stream contents to a given pipeline.
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
<chapter id="ref.linearization">
|
||
<title>Linearization</title>
|
||
<para>
|
||
This chapter describes how <classname>QPDF</classname> and
|
||
<classname>QPDFWriter</classname> implement creation and processing
|
||
of linearized PDFS.
|
||
</para>
|
||
<sect1 id="ref.linearization-strategy">
|
||
<title>Basic Strategy for Linearization</title>
|
||
<para>
|
||
To avoid the incestuous problem of having the qpdf library
|
||
validate its own linearized files, we have a special linearized
|
||
file checking mode which can be invoked via <command>qpdf
|
||
--check-linearization</command> (or <command>qpdf
|
||
--check</command>). This mode reads the linearization parameter
|
||
dictionary and the hint streams and validates that object
|
||
ordering, parameters, and hint stream contents are correct. The
|
||
validation code was first tested against linearized files created
|
||
by external tools (Acrobat and pdlin) and then used to validate
|
||
files created by <classname>QPDFWriter</classname> itself.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.linearized.preparation">
|
||
<title>Preparing For Linearization</title>
|
||
<para>
|
||
Before creating a linearized PDF file from any other PDF file, the
|
||
PDF file must be altered such that all page attributes are
|
||
propagated down to the page level (and not inherited from parents
|
||
in the <literal>/Pages</literal> tree). We also have to know
|
||
which objects refer to which other objects, being concerned with
|
||
page boundaries and a few other cases. We refer to this part of
|
||
preparing the PDF file as <firstterm>optimization</firstterm>,
|
||
discussed in <xref linkend="ref.optimization"/>. Note the, in
|
||
this context, the term <firstterm>optimization</firstterm> is a
|
||
qpdf term, and the term <firstterm>linearization</firstterm> is a
|
||
term from the PDF specification. Do not be confused by the fact
|
||
that many applications refer to linearization as optimization or
|
||
web optimization.
|
||
</para>
|
||
<para>
|
||
When creating linearized PDF files from optimized PDF files, there
|
||
are really only a few issues that need to be dealt with:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Creation of hints tables
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Placing objects in the correct order
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Filling in offsets and byte sizes
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.optimization">
|
||
<title>Optimization</title>
|
||
<para>
|
||
In order to perform various operations such as linearization and
|
||
splitting files into pages, it is necessary to know which objects
|
||
are referenced by which pages, page thumbnails, and root and
|
||
trailer dictionary keys. It is also necessary to ensure that all
|
||
page-level attributes appear directly at the page level and are
|
||
not inherited from parents in the pages tree.
|
||
</para>
|
||
<para>
|
||
We refer to the process of enforcing these constraints as
|
||
<firstterm>optimization</firstterm>. As mentioned above, note
|
||
that some applications refer to linearization as optimization.
|
||
Although this optimization was initially motivated by the need to
|
||
create linearized files, we are using these terms separately.
|
||
</para>
|
||
<para>
|
||
PDF file optimization is implemented in the
|
||
<filename>QPDF_optimization.cc</filename> source file. That file
|
||
is richly commented and serves as the primary reference for the
|
||
optimization process.
|
||
</para>
|
||
<para>
|
||
After optimization has been completed, the private member
|
||
variables <varname>obj_user_to_objects</varname> and
|
||
<varname>object_to_obj_users</varname> in
|
||
<classname>QPDF</classname> have been populated. Any object that
|
||
has more than one value in the
|
||
<varname>object_to_obj_users</varname> table is shared. Any
|
||
object that has exactly one value in the
|
||
<varname>object_to_obj_users</varname> table is private. To find
|
||
all the private objects in a page or a trailer or root dictionary
|
||
key, one merely has make this determination for each element in
|
||
the <varname>obj_user_to_objects</varname> table for the given
|
||
page or key.
|
||
</para>
|
||
<para>
|
||
Note that pages and thumbnails have different object user types,
|
||
so the above test on a page will not include objects referenced by
|
||
the page's thumbnail dictionary and nothing else.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.linearization.writing">
|
||
<title>Writing Linearized Files</title>
|
||
<para>
|
||
We will create files with only primary hint streams. We will
|
||
never write overflow hint streams. (As of PDF version 1.4,
|
||
Acrobat doesn't either, and they are never necessary.) The hint
|
||
streams contain offset information to objects that point to where
|
||
they would be if the hint stream were not present. This means
|
||
that we have to calculate all object positions before we can
|
||
generate and write the hint table. This means that we have to
|
||
generate the file in two passes. To make this reliable,
|
||
<classname>QPDFWriter</classname> in linearization mode invokes
|
||
exactly the same code twice to write the file to a pipeline.
|
||
</para>
|
||
<para>
|
||
In the first pass, the target pipeline is a count pipeline chained
|
||
to a discard pipeline. The count pipeline simply passes its data
|
||
through to the next pipeline in the chain but can return the
|
||
number of bytes passed through it at any intermediate point. The
|
||
discard pipeline is an end of line pipeline that just throws its
|
||
data away. The hint stream is not written and dummy values with
|
||
adequate padding are stored in the first cross reference table,
|
||
linearization parameter dictionary, and /Prev key of the first
|
||
trailer dictionary. All the offset, length, object renumbering
|
||
information, and anything else we need for the second pass is
|
||
stored.
|
||
</para>
|
||
<para>
|
||
At the end of the first pass, this information is passed to the
|
||
<classname>QPDF</classname> class which constructs a compressed
|
||
hint stream in a memory buffer and returns it.
|
||
<classname>QPDFWriter</classname> uses this information to write a
|
||
complete hint stream object into a memory buffer. At this point,
|
||
the length of the hint stream is known.
|
||
</para>
|
||
<para>
|
||
In the second pass, the end of the pipeline chain is a regular
|
||
file instead of a discard pipeline, and we have known values for
|
||
all the offsets and lengths that we didn't have in the first pass.
|
||
We have to adjust offsets that appear after the start of the hint
|
||
stream by the length of the hint stream, which is known. Anything
|
||
that is of variable length is padded, with the padding code
|
||
surrounding any writing code that differs in the two passes. This
|
||
ensures that changes to the way things are represented never
|
||
results in offsets that were gathered during the first pass
|
||
becoming incorrect for the second pass.
|
||
</para>
|
||
<para>
|
||
Using this strategy, we can write linearized files to a
|
||
non-seekable output stream with only a single pass to disk or
|
||
wherever the output is going.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.linearization-data">
|
||
<title>Calculating Linearization Data</title>
|
||
<para>
|
||
Once a file is optimized, we have information about which objects
|
||
access which other objects. We can then process these tables to
|
||
decide which part (as described in “Linearized PDF Document
|
||
Structure” in the PDF specification) each object is
|
||
contained within. This tells us the exact order in which objects
|
||
are written. The <classname>QPDFWriter</classname> class asks for
|
||
this information and enqueues objects for writing in the proper
|
||
order. It also turns on a check that causes an exception to be
|
||
thrown if an object is encountered that has not already been
|
||
queued. (This could happen only if there were a bug in the
|
||
traversal code used to calculate the linearization data.)
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.linearization-issues">
|
||
<title>Known Issues with Linearization</title>
|
||
<para>
|
||
There are a handful of known issues with this linearization code.
|
||
These issues do not appear to impact the behavior of linearized
|
||
files which still work as intended: it is possible for a web
|
||
browser to begin to display them before they are fully
|
||
downloaded. In fact, it seems that various other programs that
|
||
create linearized files have many of these same issues. These
|
||
items make reference to terminology used in the linearization
|
||
appendix of the PDF specification.
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Thread Dictionary information keys appear in part 4 with the
|
||
rest of Threads instead of in part 9. Objects in part 9 are
|
||
not grouped together functionally.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
We are not calculating numerators for shared object positions
|
||
within content streams or interleaving them within content
|
||
streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
We generate only page offset, shared object, and outline hint
|
||
tables. It would be relatively easy to add some additional
|
||
tables. We gather most of the information needed to create
|
||
thumbnail hint tables. There are comments in the code about
|
||
this.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.linearization-debugging">
|
||
<title>Debugging Note</title>
|
||
<para>
|
||
The <command>qpdf --show-linearization</command> command can show
|
||
the complete contents of linearization hint streams. To look at
|
||
the raw data, you can extract the filtered contents of the
|
||
linearization hint tables using <command>qpdf --show-object=n
|
||
--filtered-stream-data</command>. Then, to convert this into a
|
||
bit stream (since linearization tables are bit streams written
|
||
without regard to byte boundaries), you can pipe the resulting
|
||
data through the following perl code:
|
||
|
||
<programlisting>use bytes;
|
||
binmode STDIN;
|
||
undef $/;
|
||
my $a = <STDIN>;
|
||
my @ch = split(//, $a);
|
||
map { printf("%08b", ord($_)) } @ch;
|
||
print "\n";
|
||
</programlisting>
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
<chapter id="ref.object-and-xref-streams">
|
||
<title>Object and Cross-Reference Streams</title>
|
||
<para>
|
||
This chapter provides information about the implementation of
|
||
object stream and cross-reference stream support in qpdf.
|
||
</para>
|
||
<sect1 id="ref.object-streams">
|
||
<title>Object Streams</title>
|
||
<para>
|
||
Object streams can contain any regular object except the
|
||
following:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
stream objects
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
objects with generation > 0
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
the encryption dictionary
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
objects containing the /Length of another stream
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
In addition, Adobe reader (at least as of version 8.0.0) appears
|
||
to not be able to handle having the document catalog appear in an
|
||
object stream if the file is encrypted, though this is not
|
||
specifically disallowed by the specification.
|
||
</para>
|
||
<para>
|
||
There are additional restrictions for linearized files. See <xref
|
||
linkend="ref.object-streams-linearization"/>for details.
|
||
</para>
|
||
<para>
|
||
The PDF specification refers to objects in object streams as
|
||
“compressed objects” regardless of whether the object
|
||
stream is compressed.
|
||
</para>
|
||
<para>
|
||
The generation number of every object in an object stream must be
|
||
zero. It is possible to delete and replace an object in an object
|
||
stream with a regular object.
|
||
</para>
|
||
<para>
|
||
The object stream dictionary has the following keys:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<literal>/N</literal>: number of objects
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>/First</literal>: byte offset of first object
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>/Extends</literal>: indirect reference to stream that
|
||
this extends
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
Stream collections are formed with <literal>/Extends</literal>.
|
||
They must form a directed acyclic graph. These can be used for
|
||
semantic information and are not meaningful to the PDF document's
|
||
syntactic structure. Although qpdf preserves stream collections,
|
||
it never generates them and doesn't make use of this information
|
||
in any way.
|
||
</para>
|
||
<para>
|
||
The specification recommends limiting the number of objects in
|
||
object stream for efficiency in reading and decoding. Acrobat 6
|
||
uses no more than 100 objects per object stream for linearized
|
||
files and no more 200 objects per stream for non-linearized files.
|
||
<classname>QPDFWriter</classname>, in object stream generation
|
||
mode, never puts more than 100 objects in an object stream.
|
||
</para>
|
||
<para>
|
||
Object stream contents consists of <emphasis>N</emphasis> pairs of
|
||
integers, each of which is the object number and the byte offset
|
||
of the object relative to the first object in the stream, followed
|
||
by the objects themselves, concatenated.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.xref-streams">
|
||
<title>Cross-Reference Streams</title>
|
||
<para>
|
||
For non-hybrid files, the value following
|
||
<literal>startxref</literal> is the byte offset to the xref stream
|
||
rather than the word <literal>xref</literal>.
|
||
</para>
|
||
<para>
|
||
For hybrid files (files containing both xref tables and
|
||
cross-reference streams), the xref table's trailer dictionary
|
||
contains the key <literal>/XRefStm</literal> whose value is the
|
||
byte offset to a cross-reference stream that supplements the xref
|
||
table. A PDF 1.5-compliant application should read the xref table
|
||
first. Then it should replace any object that it has already seen
|
||
with any defined in the xref stream. Then it should follow any
|
||
<literal>/Prev</literal> pointer in the original xref table's
|
||
trailer dictionary. The specification is not clear about what
|
||
should be done, if anything, with a <literal>/Prev</literal>
|
||
pointer in the xref stream referenced by an xref table. The
|
||
<classname>QPDF</classname> class ignores it, which is probably
|
||
reasonable since, if this case were to appear for any sensible PDF
|
||
file, the previous xref table would probably have a corresponding
|
||
<literal>/XRefStm</literal> pointer of its own. For example, if a
|
||
hybrid file were appended, the appended section would have its own
|
||
xref table and <literal>/XRefStm</literal>. The appended xref
|
||
table would point to the previous xref table which would point the
|
||
<literal>/XRefStm</literal>, meaning that the new
|
||
<literal>/XRefStm</literal> doesn't have to point to it.
|
||
</para>
|
||
<para>
|
||
Since xref streams must be read very early, they may not be
|
||
encrypted, and the may not contain indirect objects for keys
|
||
required to read them, which are these:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<literal>/Type</literal>: value <literal>/XRef</literal>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>/Size</literal>: value <emphasis>n+1</emphasis>: where
|
||
<emphasis>n</emphasis> is highest object number (same as
|
||
<literal>/Size</literal> in the trailer dictionary)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>/Index</literal> (optional): value
|
||
<literal>[<replaceable>n count</replaceable> ...]</literal>
|
||
used to determine which objects' information is stored in this
|
||
stream. The default is <literal>[0 /Size]</literal>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>/Prev</literal>: value
|
||
<replaceable>offset</replaceable>: byte offset of previous xref
|
||
stream (same as <literal>/Prev</literal> in the trailer
|
||
dictionary)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<literal>/W [...]</literal>: sizes of each field in the xref
|
||
table
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
The other fields in the xref stream, which may be indirect if
|
||
desired, are the union of those from the xref table's trailer
|
||
dictionary.
|
||
</para>
|
||
<sect2 id="ref.xref-stream-data">
|
||
<title>Cross-Reference Stream Data</title>
|
||
<para>
|
||
The stream data is binary and encoded in big-endian byte order.
|
||
Entries are concatenated, and each entry has a length equal to
|
||
the total of the entries in <literal>/W</literal> above. Each
|
||
entry consists of one or more fields, the first of which is the
|
||
type of the field. The number of bytes for each field is given
|
||
by <literal>/W</literal> above. A 0 in <literal>/W</literal>
|
||
indicates that the field is omitted and has the default value.
|
||
The default value for the field type is
|
||
“<literal>1</literal>”. All other default values are
|
||
“<literal>0</literal>”.
|
||
</para>
|
||
<para>
|
||
PDF 1.5 has three field types:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
0: for free objects. Format: <literal>0 obj
|
||
next-generation</literal>, same as the free table in a
|
||
traditional cross-reference table
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
1: regular non-compressed object. Format: <literal>1 offset
|
||
generation</literal>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
2: for objects in object streams. Format: <literal>2
|
||
object-stream-number index</literal>, the number of object
|
||
stream containing the object and the index within the object
|
||
stream of the object.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
<para>
|
||
It seems standard to have the first entry in the table be
|
||
<literal>0 0 0</literal> instead of <literal>0 0 ffff</literal>
|
||
if there are no deleted objects.
|
||
</para>
|
||
</sect2>
|
||
</sect1>
|
||
<sect1 id="ref.object-streams-linearization">
|
||
<title>Implications for Linearized Files</title>
|
||
<para>
|
||
For linearized files, the linearization dictionary, document
|
||
catalog, and page objects may not be contained in object streams.
|
||
</para>
|
||
<para>
|
||
Objects stored within object streams are given the highest range
|
||
of object numbers within the main and first-page cross-reference
|
||
sections.
|
||
</para>
|
||
<para>
|
||
It is okay to use cross-reference streams in place of regular xref
|
||
tables. There are on special considerations.
|
||
</para>
|
||
<para>
|
||
Hint data refers to object streams themselves, not the objects in
|
||
the streams. Shared object references should also be made to the
|
||
object streams. There are no reference in any hint tables to the
|
||
object numbers of compressed objects (objects within object
|
||
streams).
|
||
</para>
|
||
<para>
|
||
When numbering objects, all shared objects within both the first
|
||
and second halves of the linearized files must be numbered
|
||
consecutively after all normal uncompressed objects in that half.
|
||
</para>
|
||
</sect1>
|
||
<sect1 id="ref.object-stream-implementation">
|
||
<title>Implementation Notes</title>
|
||
<para>
|
||
There are three modes for writing object streams:
|
||
<option>disable</option>, <option>preserve</option>, and
|
||
<option>generate</option>. In disable mode, we do not generate
|
||
any object streams, and we also generate an xref table rather than
|
||
xref streams. This can be used to generate PDF files that are
|
||
viewable with older readers. In preserve mode, we write object
|
||
streams such that written object streams contain the same objects
|
||
and <literal>/Extends</literal> relationships as in the original
|
||
file. This is equal to disable if the file has no object streams.
|
||
In generate, we create object streams ourselves by grouping
|
||
objects that are allowed in object streams together in sets of no
|
||
more than 100 objects. We also ensure that the PDF version is at
|
||
least 1.5 in generate mode, but we preserve the version header in
|
||
the other modes. The default is <option>preserve</option>.
|
||
</para>
|
||
<para>
|
||
We do not support creation of hybrid files. When we write files,
|
||
even in preserve mode, we will lose any xref tables and merge any
|
||
appended sections.
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
<appendix id="ref.release-notes">
|
||
<title>Release Notes</title>
|
||
<para>
|
||
For a detailed list of changes, please see the file
|
||
<filename>ChangeLog</filename> in the source distribution.
|
||
</para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>8.2.1: August 18, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Command-line Enhancements
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add
|
||
<option>--keep-files-open=<replaceable>[yn]</replaceable></option>
|
||
to override default determination of whether to keep files
|
||
open when merging. Please see the discussion of
|
||
<option>--keep-files-open</option> in <xref
|
||
linkend="ref.basic-options"/> for additional details.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>8.2.0: August 16, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Command-line Enhancements
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add <option>--no-warn</option> option to suppress issuing
|
||
warning messages. If there are any conditions that would
|
||
have caused warnings to be issued, the exit status is still
|
||
3.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug Fixes and Optimizations
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Performance fix: optimize page merging operation to avoid
|
||
unnecessary open/close calls on files being merged. This
|
||
solves a dramatic slow-down that was observed when merging
|
||
certain types of files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Optimize how memory was used for the TIFF predictor,
|
||
drastically improving performance and memory usage for files
|
||
containing high-resolution images compressed with Flate
|
||
using the TIFF predictor.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: end of line characters were not properly handled
|
||
inside strings in some cases.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: using <option>--progress</option> on very small
|
||
files could cause an infinite loop.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
API enhancements
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add new class <classname>QPDFSystemError</classname>, derived
|
||
from <classname>std::runtime_error</classname>, which is now
|
||
thrown by <function>QUtil::throw_system_error</function>.
|
||
This enables the triggering <classname>errno</classname>
|
||
value to be retrieved.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add <function>ClosedFileInputSource::stayOpen</function>
|
||
method, enabling a
|
||
<classname>ClosedFileInputSource</classname> to stay open
|
||
during manually indicated periods of high activity, thus
|
||
reducing the overhead of frequent open/close operations.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Build Changes
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
For the mingw builds, change the name of the DLL import
|
||
library from <filename>libqpdf.a</filename> to
|
||
<filename>libqpdf.dll.a</filename> to more accurately
|
||
reflect that it is an import library rather than a static
|
||
library. This potentially clears the way for supporting a
|
||
static library in the future, though presently, the qpdf
|
||
Windows build only builds the DLL and executables.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>8.1.0: June 23, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Usability Improvements
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
When splitting files, qpdf detects fonts and images that the
|
||
document metadata claims are referenced from a page but are
|
||
not actually referenced and omits them from the output file.
|
||
This change can cause a significant reduction in the size of
|
||
split PDF files for files created by some software packages.
|
||
Prior versions of qpdf would believe the document metadata
|
||
and sometimes include all the images from all the other
|
||
pages even though the pages were no longer present. In the
|
||
unlikely event that the old behavior should be desired, it
|
||
can be enabled by specifying
|
||
<option>--preserve-unreferenced-resources</option>. For
|
||
additional details, please see <xref
|
||
linkend="ref.advanced-transformation"/>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When merging multiple PDF files, qpdf no longer leaves all
|
||
the files open. This makes it possible to merge numbers of
|
||
files that may exceed the operating system's limit for the
|
||
maximum number of open files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <option>--rotate</option> option's syntax has been
|
||
extended to make the page range optional. If you specify
|
||
<option>--rotate=<replaceable>angle</replaceable></option>
|
||
without specifying a page range, the rotation will be
|
||
applied to all pages. This can be especially useful for
|
||
adjusting a PDF created from a multi-page document that
|
||
was scanned upside down.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When merging multiple files, the <option>--verbose</option>
|
||
option now prints information about each file as it operates
|
||
on that file.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When the <option>--progress</option> option is specified,
|
||
qpdf will print a running indicator of its best guess at how
|
||
far through the writing process it is. Note that, as with
|
||
all progress meters, it's an approximation. This option is
|
||
implemented in a way that makes it useful for software that
|
||
uses the qpdf library; see API Enhancements below.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug Fixes
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Properly decrypt files that use revision 3 of the standard
|
||
security handler but use 40 bit keys (even though revision 3
|
||
supports 128-bit keys).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Limit depth of nested data structures to prevent crashes
|
||
from certain types of malformed (malicious) PDFs.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
In “newline before endstream” mode, insert the
|
||
required extra newline before the
|
||
<literal>endstream</literal> at the end of object streams.
|
||
This one case was previously omitted.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
API Enhancements
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
The first round of higher level “helper”
|
||
interfaces has been introduced. These are designed to
|
||
provide a more convenient way of interacting with certain
|
||
document features than using
|
||
<classname>QPDFObjectHandle</classname> directly. For
|
||
details on helpers, see <xref
|
||
linkend="ref.helper-classes"/>. Specific additional
|
||
interfaces are described below.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add two new document helper classes:
|
||
<classname>QPDFPageDocumentHelper</classname> for working
|
||
with pages, and
|
||
<classname>QPDFAcroFormDocumentHelper</classname> for
|
||
working with interactive forms. No old methods have been
|
||
removed, but <classname>QPDFPageDocumentHelper</classname>
|
||
is now the preferred way to perform operations on pages
|
||
rather than calling the old methods in
|
||
<classname>QPDFObjectHandle</classname> and
|
||
<classname>QPDF</classname> directly. Comments in the header
|
||
files direct you to the new interfaces. Please see the
|
||
header files and <filename>ChangeLog</filename> for
|
||
additional details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add three new object helper class:
|
||
<classname>QPDFPageObjectHelper</classname> for pages,
|
||
<classname>QPDFFormFieldObjectHelper</classname> for
|
||
interactive form fields, and
|
||
<classname>QPDFAnnotationObjectHelper</classname> for
|
||
annotations. All three classes are fairly sparse at the
|
||
moment, but they have some useful, basic functionality.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A new example program
|
||
<filename>examples/pdf-set-form-values.cc</filename> has
|
||
been added that illustrates use of the new document and
|
||
object helpers.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The method
|
||
<function>QPDFWriter::registerProgressReporter</function>
|
||
has been added. This method allows you to register a
|
||
function that is called by <classname>QPDFWriter</classname>
|
||
to update your idea of the percentage it thinks it is
|
||
through writing its output. Client programs can use this to
|
||
implement reasonably accurate progress meters. The
|
||
<command>qpdf</command> command line tool uses this to
|
||
implement its <option>--progress</option> option.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New methods
|
||
<function>QPDFObjectHandle::newUnicodeString</function> and
|
||
<function>QPDFObject::unparseBinary</function> have been
|
||
added to allow for more convenient creation of strings that
|
||
are explicitly encoded using big-endian UTF-16. This is
|
||
useful for creating strings that appear outside of content
|
||
streams, such as labels, form fields, outlines, document
|
||
metadata, etc.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A new class
|
||
<classname>QPDFObjectHandle::Rectangle</classname> has been
|
||
added to ease working with PDF rectangles, which are just
|
||
arrays of four numeric values.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>8.0.2: March 6, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
When a loop is detected while following cross reference
|
||
streams or tables, treat this as damage instead of silently
|
||
ignoring the previous table. This prevents loss of otherwise
|
||
recoverable data in some damaged files.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Properly handle pages with no contents.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>8.0.1: March 4, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Disregard data check errors when uncompressing
|
||
<option>/FlateDecode</option> streams. This is consistent with
|
||
most other PDF readers and allows qpdf to recover data from
|
||
another class of malformed PDF files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
On the command line when specifying page ranges, support
|
||
preceding a page number by “r” to indicate that it
|
||
should be counted from the end. For example, the range
|
||
<literal>r3-r1</literal> would indicate the last three pages
|
||
of a document.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>8.0.0: February 25, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Packaging and Distribution Changes
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
QPDF is now distributed as an <ulink
|
||
url="https://appimage.org/">AppImage</ulink> in addition to
|
||
all the other ways it is distributed. The AppImage can be
|
||
found in the download area with the other packages. Thanks
|
||
to Kurt Pfeifle and Simon Peter for their contributions.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug Fixes
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<function>QPDFObjectHandle::getUTF8Val</function> now
|
||
properly treats non-Unicode strings as encoded with PDF Doc
|
||
Encoding.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Improvements to handling of objects in PDF files that are
|
||
not of the expected type. In most cases, qpdf will be able
|
||
to warn for such cases rather than fail with an exception.
|
||
Previous versions of qpdf would sometimes fail with errors
|
||
such as “operation for dictionary object attempted on
|
||
object of wrong type”. This situation should be mostly
|
||
or entirely eliminated now.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Enhancements to the <command>qpdf</command> Command-line Tool.
|
||
All new options listed here are documented in more detail in
|
||
<xref linkend="ref.using"/>.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
The option
|
||
<option>--linearize-pass1=<replaceable>file</replaceable></option>
|
||
has been added for debugging qpdf's linearization code.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The option <option>--coalesce-contents</option> can be used
|
||
to combine content streams of a page whose contents are an
|
||
array of streams into a single stream.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
API Enhancements. All new API calls are documented in their
|
||
respective classes' header files. There are no non-compatible
|
||
changes to the API.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add function <function>qpdf_check_pdf</function> to the C API.
|
||
This function does basic checking that is a subset of what
|
||
<command>qpdf --check</command> performs.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Major enhancements to the lexical layer of qpdf. For a
|
||
complete list of enhancements, please refer to the
|
||
<filename>ChangeLog</filename> file. Most of the changes
|
||
result in improvements to qpdf's ability handle erroneous
|
||
files. It is also possible for programs to handle
|
||
whitespace, comments, and inline images as tokens.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New API for working with PDF content streams at a lexical
|
||
level. The new class
|
||
<classname>QPDFObjectHandle::TokenFilter</classname> allows
|
||
the developer to provide token handlers. Token filters can be
|
||
used with several different methods in
|
||
<classname>QPDFObjectHandle</classname> as well as with a
|
||
lower-level interface. See comments in
|
||
<filename>QPDFObjectHandle.hh</filename> as well as the new
|
||
examples <filename>examples/pdf-filter-tokens.cc</filename>
|
||
and <filename>examples/pdf-count-strings.cc</filename> for
|
||
details.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>7.1.1: February 4, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: files whose /ID fields were other than 16 bytes long
|
||
can now be properly linearized
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A few compile and link issues have been corrected for some
|
||
platforms.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>7.1.0: January 14, 2018</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
PDF files contain streams that may be compressed with various
|
||
compression algorithms which, in some cases, may be enhanced
|
||
by various predictor functions. Previously only the PNG up
|
||
predictor was supported. In this version, all the PNG
|
||
predictors as well as the TIFF predictor are supported. This
|
||
increases the range of files that qpdf is able to handle.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
QPDF now allows a raw encryption key to be specified in place
|
||
of a password when opening encrypted files, and will
|
||
optionally display the encryption key used by a file. This is
|
||
a non-standard operation, but it can be useful in certain
|
||
situations. Please see the discussion of
|
||
<option>--password-is-hex-key</option> in <xref
|
||
linkend="ref.basic-options"/> or the comments around
|
||
<function>QPDF::setPasswordIsHexKey</function> in
|
||
<filename>QPDF.hh</filename> for additional details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: numbers ending with a trailing decimal point are now
|
||
properly recognized as numbers.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: when building qpdf from source on some platforms
|
||
(especially MacOS), the build could get confused by older
|
||
versions of qpdf installed on the system. This has been
|
||
corrected.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>7.0.0: September 15, 2017</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Packaging and Distribution Changes
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
QPDF's primary license is now <ulink
|
||
url="http://www.apache.org/licenses/LICENSE-2.0">version 2.0
|
||
of the Apache License</ulink> rather than version 2.0 of the
|
||
Artistic License. You may still, at your option, consider
|
||
qpdf to be licensed with version 2.0 of the Artistic
|
||
license.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
QPDF no longer has a dependency on the PCRE (Perl-Compatible
|
||
Regular Expression) library. QPDF now has an added
|
||
dependency on the JPEG library.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug Fixes
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
This release contains many bug fixes for various infinite
|
||
loops, memory leaks, and other memory errors that could be
|
||
encountered with specially crafted or otherwise erroneous
|
||
PDF files.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
New Features
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
QPDF now supports reading and writing streams encoded with
|
||
JPEG or RunLength encoding. Library API enhancements and
|
||
command-line options have been added to control this
|
||
behavior. See command-line options
|
||
<option>--compress-streams</option> and
|
||
<option>--decode-level</option> and methods
|
||
<function>QPDFWriter::setCompressStreams</function> and
|
||
<function>QPDFWriter::setDecodeLevel</function>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
QPDF is much better at recovering from broken files. In most
|
||
cases, qpdf will skip invalid objects and will preserve
|
||
broken stream data by not attempting to filter broken
|
||
streams. QPDF is now able to recover or at least not crash
|
||
on dozens of broken test files I have received over the past
|
||
few years.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Page rotation is now supported and accessible from both the
|
||
library and the command line.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<classname>QPDFWriter</classname> supports writing files in
|
||
a way that preserves PCLm compliance in support of
|
||
driverless printing. This is very specialized and is only
|
||
useful to applications that already know how to create PCLm
|
||
files.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Enhancements to the <command>qpdf</command> Command-line Tool.
|
||
All new options listed here are documented in more detail in
|
||
<xref linkend="ref.using"/>.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Command-line arguments can now be read from files or
|
||
standard input using <literal>@file</literal> or
|
||
<literal>@-</literal> syntax. Please see <xref
|
||
linkend="ref.invocation"/>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>--rotate</option>: request page rotation
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>--newline-before-endstream</option>: ensure that a
|
||
newline appears before every <literal>endstream</literal>
|
||
keyword in the file; used to prevent qpdf from breaking
|
||
PDF/A compliance on already compliant files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>--preserve-unreferenced</option>: preserve
|
||
unreferenced objects in the input PDF
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>--split-pages</option>: break output into chunks
|
||
with fixed numbers of pages
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>--verbose</option>: print the name of each output
|
||
file that is created
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<option>--compress-streams</option> and
|
||
<option>--decode-level</option> replace
|
||
<option>--stream-data</option> for improving granularity of
|
||
controlling compression and decompression of stream data.
|
||
The <option>--stream-data</option> option will remain
|
||
available.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When running <command>qpdf --check</command> with other
|
||
options, checks are always run first. This enables qpdf to
|
||
perform its full recovery logic before outputting other
|
||
information. This can be especially useful when manually
|
||
recovering broken files, looking at qpdf's regenerated cross
|
||
reference table, or other similar operations.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Process <command>--pages</command> earlier so that other
|
||
options like <option>--show-pages</option> or
|
||
<option>--split-pages</option> can operate on the file after
|
||
page splitting/merging has occurred.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
API Changes. All new API calls are documented in their
|
||
respective classes' header files.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<function>QPDFObjectHandle::rotatePage</function>: apply
|
||
rotation to a page object
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<function>QPDFWriter::setNewlineBeforeEndstream</function>:
|
||
force newline to appear before <literal>endstream</literal>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<function>QPDFWriter::setPreserveUnreferencedObjects</function>:
|
||
preserve unreferenced objects that appear in the input PDF.
|
||
The default behavior is to discard them.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New <classname>Pipeline</classname> types
|
||
<classname>Pl_RunLength</classname> and
|
||
<classname>Pl_DCT</classname> are available for developers
|
||
who wish to produce or consume RunLength or DCT stream data
|
||
directly. The <filename>examples/pdf-create.cc</filename>
|
||
example illustrates their use.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<function>QPDFWriter::setCompressStreams</function> and
|
||
<function>QPDFWriter::setDecodeLevel</function> methods
|
||
control handling of different types of stream compression.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new C API functions
|
||
<function>qpdf_set_compress_streams</function>,
|
||
<function>qpdf_set_decode_level</function>,
|
||
<function>qpdf_set_preserve_unreferenced_objects</function>,
|
||
and <function>qpdf_set_newline_before_endstream</function>
|
||
corresponding to the new <classname>QPDFWriter</classname>
|
||
methods.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>6.0.0: November 10, 2015</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Implement <option>--deterministic-id</option> command-line
|
||
option and <function>QPDFWriter::setDeterministicID</function>
|
||
as well as C API function
|
||
<function>qpdf_set_deterministic_ID</function> for generating
|
||
a deterministic ID for non-encrypted files. When this option
|
||
is selected, the ID of the file depends on the contents of the
|
||
output file, and not on transient items such as the timestamp
|
||
or output file name.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Make qpdf more tolerant of files whose xref table entries are
|
||
not the correct length.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>5.1.3: May 24, 2015</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: fix-qdf was not properly handling files that
|
||
contained object streams with more than 255 objects in them.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: qpdf was not properly initializing Microsoft's secure
|
||
crypto provider on fresh Windows installations that had not
|
||
had any keys created yet.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Fix a few errors found by Gynvael Coldwind and
|
||
Mateusz Jurczyk of the Google Security Team. Please see the
|
||
ChangeLog for details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Properly handle pages that have no contents at all. There were
|
||
many cases in which qpdf handled this fine, but a few methods
|
||
blindly obtained page contents with handling the possibility
|
||
that there were no contents.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Make qpdf more robust for a few more kinds of problems that
|
||
may occur in invalid PDF files.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>5.1.2: June 7, 2014</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: linearizing files could create a corrupted output
|
||
file under extremely unlikely file size circumstances. See
|
||
ChangeLog for details. The odds of getting hit by this are
|
||
very low, though one person did.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: qpdf would fail to write files that had streams with
|
||
decode parameters referencing other streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New example program: <command>pdf-split-pages</command>:
|
||
efficiently split PDF files into individual pages. The example
|
||
program does this more efficiently than using <command>qpdf
|
||
--pages</command> to do it.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Packaging fix: Visual C++ binaries did not support Windows XP.
|
||
This has been rectified by updating the compilers used to
|
||
generate the release binaries.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>5.1.1: January 14, 2014</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Performance fix: copying foreign objects could be very slow
|
||
with certain types of files. This was most likely to be
|
||
visible during page splitting and was due to traversing the
|
||
same objects multiple times in some cases.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>5.1.0: December 17, 2013</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Added runtime option
|
||
(<function>QUtil::setRandomDataProvider</function>) to supply
|
||
your own random data provider. You can use this if you want
|
||
to avoid using the OS-provided secure random number generation
|
||
facility or stdlib's less secure version. See comments in
|
||
include/qpdf/QUtil.hh for details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Fixed image comparison tests to not create 12-bit-per-pixel
|
||
images since some versions of tiffcmp have bugs in comparing
|
||
them in some cases. This increases the disk space required by
|
||
the image comparison tests, which are off by default anyway.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Introduce a number of small fixes for compilation on the
|
||
latest clang in MacOS and the latest Visual C++ in Windows.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Be able to handle broken files that end the xref table header
|
||
with a space instead of a newline.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>5.0.1: October 18, 2013</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Thanks to a detailed review by Florian Weimer and the Red Hat
|
||
Product Security Team, this release includes a number of
|
||
non-user-visible security hardening changes. Please see the
|
||
ChangeLog file in the source distribution for the complete
|
||
list.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When available, operating system-specific secure random number
|
||
generation is used for generating initialization vectors and
|
||
other random values used during encryption or file creation.
|
||
For the Windows build, this results in an added dependency on
|
||
Microsoft's cryptography API. To disable the OS-specific
|
||
cryptography and use the old version, pass the
|
||
<option>--enable-insecure-random</option> option to
|
||
<command>./configure</command>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <command>qpdf</command> command-line tool now issues a
|
||
warning when <option>-accessibility=n</option> is specified
|
||
for newer encryption versions stating that the option is
|
||
ignored. qpdf, per the spec, has always ignored this flag,
|
||
but it previously did so silently. This warning is issued
|
||
only by the command-line tool, not by the library. The
|
||
library's handling of this flag is unchanged.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>5.0.0: July 10, 2013</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: previous versions of qpdf would lose objects with
|
||
generation != 0 when generating object streams. Fixing this
|
||
required changes to the public API.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Removed methods from public API that were only supposed to be
|
||
called by QPDFWriter and couldn't realistically be called
|
||
anywhere else. See ChangeLog for details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New <type>QPDFObjGen</type> class added to represent an object
|
||
ID/generation pair.
|
||
<function>QPDFObjectHandle::getObjGen()</function> is now
|
||
preferred over
|
||
<function>QPDFObjectHandle::getObjectID()</function> and
|
||
<function>QPDFObjectHandle::getGeneration()</function> as it
|
||
makes it less likely for people to accidentally write code
|
||
that ignores the generation number. See
|
||
<filename>QPDF.hh</filename> and
|
||
<filename>QPDFObjectHandle.hh</filename> for additional notes.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add <option>--show-npages</option> command-line option to the
|
||
<command>qpdf</command> command to show the number of pages in
|
||
a file.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Allow omission of the page range within
|
||
<option>--pages</option> for the <command>qpdf</command>
|
||
command. When omitted, the page range is implicitly taken to
|
||
be all the pages in the file.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Various enhancements were made to support different types of
|
||
broken files or broken readers. Details can be found in
|
||
<filename>ChangeLog</filename>.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>4.1.0: April 14, 2013</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Note to people including qpdf in distributions: the
|
||
<filename>.la</filename> files generated by libtool are now
|
||
installed by qpdf's <command>make install</command> target.
|
||
Before, they were not installed. This means that if your
|
||
distribution does not want to include <filename>.la</filename>
|
||
files, you must remove them as part of your packaging process.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Major enhancement: API enhancements have been made to support
|
||
parsing of content streams. This enhancement includes the
|
||
following changes:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<function>QPDFObjectHandle::parseContentStream</function>
|
||
method parses objects in a content stream and calls
|
||
handlers in a callback class. The example
|
||
<filename>examples/pdf-parse-content.cc</filename>
|
||
illustrates how this may be used.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<type>QPDFObjectHandle</type> can now represent operators
|
||
and inline images, object types that may only appear in
|
||
content streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Method <function>QPDFObjectHandle::getTypeCode()</function>
|
||
returns an enumerated type value representing the
|
||
underlying object type. Method
|
||
<function>QPDFObjectHandle::getTypeName()</function>
|
||
returns a text string describing the name of the type of a
|
||
<type>QPDFObjectHandle</type> object. These methods can be
|
||
used for more efficient parsing and debugging/diagnostic
|
||
messages.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<command>qpdf --check</command> now parses all pages' content
|
||
streams in addition to doing other checks. While there are
|
||
still many types of errors that cannot be detected, syntactic
|
||
errors in content streams will now be reported.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Minor compilation enhancements have been made to facilitate
|
||
easier for support for a broader range of compilers and
|
||
compiler versions.
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Warning flags have been moved into a separate variable in
|
||
<filename>autoconf.mk</filename>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The configure flag <option>--enable-werror</option> work
|
||
for Microsoft compilers
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
All MSVC CRT security warnings have been resolved.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
All C-style casts in C++ Code have been replaced by C++
|
||
casts, and many casts that had been included to suppress
|
||
higher warning levels for some compilers have been removed,
|
||
primarily for clarity. Places where integer type coercion
|
||
occurs have been scrutinized. A new casting policy has
|
||
been documented in the manual. This is of concern mainly
|
||
to people porting qpdf to new platforms or compilers. It
|
||
is not visible to programmers writing code that uses the
|
||
library
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Some internal limits have been removed in code that
|
||
converts numbers to strings. This is largely invisible to
|
||
users, but it does trigger a bug in some older versions of
|
||
mingw-w64's C++ library. See
|
||
<filename>README-windows.md</filename> in the source
|
||
distribution if you think this may affect you. The copy of
|
||
the DLL distributed with qpdf's binary distribution is not
|
||
affected by this problem.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The RPM spec file previously included with qpdf has been
|
||
removed. This is because virtually all Linux distributions
|
||
include qpdf now that it is a dependency of CUPS filters.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A few bug fixes are included:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Overridden compressed objects are properly handled.
|
||
Before, there were certain constructs that could cause qpdf
|
||
to see old versions of some objects. The most usual
|
||
manifestation of this was loss of filled in form values for
|
||
certain files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Installation no longer uses GNU/Linux-specific versions of
|
||
some commands, so <command>make install</command> works on
|
||
Solaris with native tools.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The 64-bit mingw Windows binary package no longer includes
|
||
a 32-bit DLL.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>4.0.1: January 17, 2013</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Fix detection of binary attachments in test suite to avoid
|
||
false test failures on some platforms.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add clarifying comment in <filename>QPDF.hh</filename> to
|
||
methods that return the user password explaining that it is no
|
||
longer possible with newer encryption formats to recover the
|
||
user password knowing the owner password. In earlier
|
||
encryption formats, the user password was encrypted in the
|
||
file using the owner password. In newer encryption formats, a
|
||
separate encryption key is used on the file, and that key is
|
||
independently encrypted using both the user password and the
|
||
owner password.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>4.0.0: December 31, 2012</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Major enhancement: support has been added for newer encryption
|
||
schemes supported by version X of Adobe Acrobat. This
|
||
includes use of 127-character passwords, 256-bit encryption
|
||
keys, and the encryption scheme specified in ISO 32000-2, the
|
||
PDF 2.0 specification. This scheme can be chosen from the
|
||
command line by specifying use of 256-bit keys. qpdf also
|
||
supports the deprecated encryption method used by Acrobat IX.
|
||
This encryption style has known security weaknesses and should
|
||
not be used in practice. However, such files exist “in
|
||
the wild,” so support for this scheme is still useful.
|
||
New methods
|
||
<function>QPDFWriter::setR6EncryptionParameters</function>
|
||
(for the PDF 2.0 scheme) and
|
||
<function>QPDFWriter::setR5EncryptionParameters</function>
|
||
(for the deprecated scheme) have been added to enable these
|
||
new encryption schemes. Corresponding functions have been
|
||
added to the C API as well.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Full support for Adobe extension levels in PDF version
|
||
information. Starting with PDF version 1.7, corresponding to
|
||
ISO 32000, Adobe adds new functionality by increasing the
|
||
extension level rather than increasing the version. This
|
||
support includes addition of the
|
||
<function>QPDF::getExtensionLevel</function> method for
|
||
retrieving the document's extension level, addition of
|
||
versions of
|
||
<function>QPDFWriter::setMinimumPDFVersion</function> and
|
||
<function>QPDFWriter::forcePDFVersion</function> that accept
|
||
an extension level, and extended syntax for specifying forced
|
||
and minimum versions on the command line as described in <xref
|
||
linkend="ref.advanced-transformation"/>. Corresponding
|
||
functions have been added to the C API as well.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Minor fixes to prevent qpdf from referencing objects in the
|
||
file that are not referenced in the file's overall structure.
|
||
Most files don't have any such objects, but some files have
|
||
contain unreferenced objects with errors, so these fixes
|
||
prevent qpdf from needlessly rejecting or complaining about
|
||
such objects.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new generalized methods for reading and writing files
|
||
from/to programmer-defined sources. The method
|
||
<function>QPDF::processInputSource</function> allows the
|
||
programmer to use any input source for the input file, and
|
||
<function>QPDFWriter::setOutputPipeline</function> allows the
|
||
programmer to write the output file through any pipeline.
|
||
These methods would make it possible to perform any number of
|
||
specialized operations, such as accessing external storage
|
||
systems, creating bindings for qpdf in other programming
|
||
languages that have their own I/O systems, etc.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new method <function>QPDF::getEncryptionKey</function> for
|
||
retrieving the underlying encryption key used in the file.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
This release includes a small handful of non-compatible API
|
||
changes. While effort is made to avoid such changes, all the
|
||
non-compatible API changes in this version were to parts of
|
||
the API that would likely never be used outside the library
|
||
itself. In all cases, the altered methods or structures were
|
||
parts of the <classname>QPDF</classname> that were public to
|
||
enable them to be called from either
|
||
<classname>QPDFWriter</classname> or were part of validation
|
||
code that was over-zealous in reporting problems in parts of
|
||
the file that would not ordinarily be referenced. In no case
|
||
did any of the removed methods do anything worse that falsely
|
||
report error conditions in files that were broken in ways that
|
||
didn't matter. The following public parts of the
|
||
<classname>QPDF</classname> class were changed in a
|
||
non-compatible way:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Updated nested <classname>QPDF::EncryptionData</classname>
|
||
class to add fields needed by the newer encryption formats,
|
||
member variables changed to private so that future changes
|
||
will not require breaking backward compatibility.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Added additional parameters to
|
||
<function>compute_data_key</function>, which is used by
|
||
<classname>QPDFWriter</classname> to compute the encryption
|
||
key used to encrypt a specific object.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Removed the method
|
||
<function>flattenScalarReferences</function>. This method
|
||
was previously used prior to writing a new PDF file, but it
|
||
has the undesired side effect of causing qpdf to read
|
||
objects in the file that were not referenced. Some
|
||
otherwise files have unreferenced objects with errors in
|
||
them, so this could cause qpdf to reject files that would
|
||
be accepted by virtually all other PDF readers. In fact,
|
||
qpdf relied on only a very small part of what
|
||
flattenScalarReferences did, so only this part has been
|
||
preserved, and it is now done directly inside
|
||
<classname>QPDFWriter</classname>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Removed the method <function>decodeStreams</function>.
|
||
This method was used by the <option>--check</option> option
|
||
of the <command>qpdf</command> command-line tool to force
|
||
all streams in the file to be decoded, but it also suffered
|
||
from the problem of opening otherwise unreferenced streams
|
||
and thus could report false positive. The
|
||
<option>--check</option> option now causes qpdf to go
|
||
through all the motions of writing a new file based on the
|
||
original one, so it will always reference and check exactly
|
||
those parts of a file that any ordinary viewer would check.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Removed the method
|
||
<function>trimTrailerForWrite</function>. This method was
|
||
used by <classname>QPDFWriter</classname> to modify the
|
||
original QPDF object by removing fields from the trailer
|
||
dictionary that wouldn't apply to the newly written file.
|
||
This functionality, though generally harmless, was a poor
|
||
implementation and has been replaced by having QPDFWriter
|
||
filter these out when copying the trailer rather than
|
||
modifying the original QPDF object. (Note that qpdf never
|
||
modifies the original file itself.)
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Allow the PDF header to appear anywhere in the first 1024
|
||
bytes of the file. This is consistent with what other readers
|
||
do.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Fix the <command>pkg-config</command> files to list zlib and
|
||
pcre in <function>Requires.private</function> to better
|
||
support static linking using <command>pkg-config</command>.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>3.0.2: September 6, 2012</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: <function>QPDFWriter::setOutputMemory</function> did
|
||
not work when not used with
|
||
<function>QPDFWriter::setStaticID</function>, which made it
|
||
pretty much useless. This has been fixed.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New API call
|
||
<function>QPDFWriter::setExtraHeaderText</function> inserts
|
||
additional text near the header of the PDF file. The intended
|
||
use case is to insert comments that may be consumed by a
|
||
downstream application, though other use cases may exist.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>3.0.1: August 11, 2012</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Version 3.0.0 included addition of files for
|
||
<command>pkg-config</command>, but this was not mentioned in
|
||
the release notes. The release notes for 3.0.0 were updated
|
||
to mention this.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: if an object stream ended with a scalar object not
|
||
followed by space, qpdf would incorrectly report that it
|
||
encountered a premature EOF. This bug has been in qpdf since
|
||
version 2.0.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>3.0.0: August 2, 2012</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Acknowledgment: I would like to express gratitude for the
|
||
contributions of Tobias Hoffmann toward the release of qpdf
|
||
version 3.0. He is responsible for most of the implementation
|
||
and design of the new API for manipulating pages, and
|
||
contributed code and ideas for many of the improvements made
|
||
in version 3.0. Without his work, this release would
|
||
certainly not have happened as soon as it did, if at all.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<emphasis>Non-compatible API change:</emphasis> The version of
|
||
<function>QPDFObjectHandle::replaceStreamData</function> that
|
||
uses a <classname>StreamDataProvider</classname> no longer
|
||
requires (or accepts) a <varname>length</varname> parameter.
|
||
See <xref linkend="ref.upgrading-to-3.0"/> for an explanation.
|
||
While care is taken to avoid non-compatible API changes in
|
||
general, an exception was made this time because the new
|
||
interface offers an opportunity to significantly simplify
|
||
calling code.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Support has been added for large files. The test suite
|
||
verifies support for files larger than 4 gigabytes, and manual
|
||
testing has verified support for files larger than 10
|
||
gigabytes. Large file support is available for both 32-bit
|
||
and 64-bit platforms as long as the compiler and underlying
|
||
platforms support it.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Support for page selection (splitting and merging PDF files)
|
||
has been added to the <command>qpdf</command> command-line
|
||
tool. See <xref linkend="ref.page-selection"/>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Options have been added to the <command>qpdf</command>
|
||
command-line tool for copying encryption parameters from
|
||
another file. See <xref linkend="ref.basic-options"/>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New methods have been added to the <classname>QPDF</classname>
|
||
object for adding and removing pages. See <xref
|
||
linkend="ref.adding-and-remove-pages"/>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
New methods have been added to the <classname>QPDF</classname>
|
||
object for copying objects from other PDF files. See <xref
|
||
linkend="ref.foreign-objects"/>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A new method <function>QPDFObjectHandle::parse</function> has
|
||
been added for constructing
|
||
<classname>QPDFObjectHandle</classname> objects from a string
|
||
description.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Methods have been added to <classname>QPDFWriter</classname>
|
||
to allow writing to an already open stdio <type>FILE*</type>
|
||
addition to writing to standard output or a named file.
|
||
Methods have been added to <classname>QPDF</classname> to be
|
||
able to process a file from an already open stdio
|
||
<type>FILE*</type>. This makes it possible to read and write
|
||
PDF from secure temporary files that have been unlinked prior
|
||
to being fully read or written.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <function>QPDF::emptyPDF</function> can be used to allow
|
||
creation of PDF files from scratch. The example
|
||
<filename>examples/pdf-create.cc</filename> illustrates how it
|
||
can be used.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Several methods to take
|
||
<classname>PointerHolder<Buffer></classname> can now
|
||
also accept <type>std::string</type> arguments.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Many new convenience methods have been added to the library,
|
||
most in <classname>QPDFObjectHandle</classname>. See
|
||
<filename>ChangeLog</filename> for a full list.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When building on a platform that supports ELF shared libraries
|
||
(such as Linux), symbol versions are enabled by default. They
|
||
can be disabled by passing
|
||
<option>--disable-ld-version-script</option> to
|
||
<command>./configure</command>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The file <filename>libqpdf.pc</filename> is now installed to
|
||
support <command>pkg-config</command>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Image comparison tests are off by default now since they are
|
||
not needed to verify a correct build or port of qpdf. They
|
||
are needed only when changing the actual PDF output generated
|
||
by qpdf. You should enable them if you are making deep
|
||
changes to qpdf itself. See <filename>README.md</filename> for
|
||
details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Large file tests are off by default but can be turned on with
|
||
<command>./configure</command> or by setting an environment
|
||
variable before running the test suite. See
|
||
<filename>README.md</filename> for details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
When qpdf's test suite fails, failures are not printed to the
|
||
terminal anymore by default. Instead, find them in
|
||
<filename>build/qtest.log</filename>. For packagers who are
|
||
building with an autobuilder, you can add the
|
||
<option>--enable-show-failed-test-output</option> option to
|
||
<command>./configure</command> to restore the old behavior.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.3.1: December 28, 2011</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Fix thread-safety problem resulting from non-thread-safe use
|
||
of the PCRE library.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Made a few minor documentation fixes.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add workaround for a bug that appears in some versions of
|
||
ghostscript to the test suite
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Fix minor build issue for Visual C++ 2010.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.3.0: August 11, 2011</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Bug fix: when preserving existing encryption on encrypted
|
||
files with cleartext metadata, older qpdf versions would
|
||
generate password-protected files with no valid password.
|
||
This operation now works. This bug only affected files
|
||
created by copying existing encryption parameters; explicit
|
||
encryption with specification of cleartext metadata worked
|
||
before and continues to work.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Enhance <classname>QPDFWriter</classname> with a new
|
||
constructor that allows you to delay the specification of the
|
||
output file. When using this constructor, you may now call
|
||
<function>QPDFWriter::setOutputFilename</function> to specify
|
||
the output file, or you may use
|
||
<function>QPDFWriter::setOutputMemory</function> to cause
|
||
<classname>QPDFWriter</classname> to write the resulting PDF
|
||
file to a memory buffer. You may then use
|
||
<function>QPDFWriter::getBuffer</function> to retrieve the
|
||
memory buffer.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new API call <function>QPDF::replaceObject</function> for
|
||
replacing objects by object ID
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new API call <function>QPDF::swapObjects</function> for
|
||
swapping two objects by object ID
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add <function>QPDFObjectHandle::getDictAsMap</function> and
|
||
<function>QPDFObjectHandle::getArrayAsVector</function> to
|
||
allow retrieval of dictionary objects as maps and array
|
||
objects as vectors.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add functions <function>qpdf_get_info_key</function> and
|
||
<function>qpdf_set_info_key</function> to the C API for
|
||
manipulating string fields of the document's
|
||
<literal>/Info</literal> dictionary.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add functions <function>qpdf_init_write_memory</function>,
|
||
<function>qpdf_get_buffer_length</function>, and
|
||
<function>qpdf_get_buffer</function> to the C API for writing
|
||
PDF files to a memory buffer instead of a file.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.2.4: June 25, 2011</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Fix installation and compilation issues; no functionality
|
||
changes.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.2.3: April 30, 2011</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Handle some damaged streams with incorrect characters
|
||
following the stream keyword.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Improve handling of inline images when normalizing content
|
||
streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Enhance error recovery to properly handle files that use
|
||
object 0 as a regular object, which is specifically disallowed
|
||
by the spec.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.2.2: October 4, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add new function <function>qpdf_read_memory</function>
|
||
to the C API to call
|
||
<function>QPDF::processMemoryFile</function>. This was an
|
||
omission in qpdf 2.2.1.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.2.1: October 1, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add new method <function>QPDF::setOutputStreams</function>
|
||
to replace <varname>std::cout</varname> and
|
||
<varname>std::cerr</varname> with other streams for generation
|
||
of diagnostic messages and error messages. This can be useful
|
||
for GUIs or other applications that want to capture any output
|
||
generated by the library to present to the user in some other
|
||
way. Note that QPDF does not write to
|
||
<varname>std::cout</varname> (or the specified output stream)
|
||
except where explicitly mentioned in
|
||
<filename>QPDF.hh</filename>, and that the only use of the
|
||
error stream is for warnings. Note also that output of
|
||
warnings is suppressed when
|
||
<literal>setSuppressWarnings(true)</literal> is called.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new method <function>QPDF::processMemoryFile</function>
|
||
for operating on PDF files that are loaded into memory rather
|
||
than in a file on disk.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Give a warning but otherwise ignore empty PDF objects by
|
||
treating them as null. Empty object are not permitted by the
|
||
PDF specification but have been known to appear in some actual
|
||
PDF files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Handle inline image filter abbreviations when the appear as
|
||
stream filter abbreviations. The PDF specification does not
|
||
allow use of stream filter abbreviations in this way, but
|
||
Adobe Reader and some other PDF readers accept them since they
|
||
sometimes appear incorrectly in actual PDF files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Implement miscellaneous enhancements to
|
||
<classname>PointerHolder</classname> and
|
||
<classname>Buffer</classname> to support other changes.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.2.0: August 14, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Add new methods to <classname>QPDFObjectHandle</classname>
|
||
(<function>newStream</function> and
|
||
<function>replaceStreamData</function> for creating new
|
||
streams and replacing stream data. This makes it possible to
|
||
perform a wide range of operations that were not previously
|
||
possible.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new helper method in
|
||
<classname>QPDFObjectHandle</classname>
|
||
(<function>addPageContents</function>) for appending or
|
||
prepending new content streams to a page. This method makes
|
||
it possible to manipulate content streams without having to be
|
||
concerned whether a page's contents are a single stream or an
|
||
array of streams.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new method in <classname>QPDFObjectHandle</classname>:
|
||
<function>replaceOrRemoveKey</function>, which replaces a
|
||
dictionary key
|
||
with a given value unless the value is null, in which case it
|
||
removes the key instead.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add new method in <classname>QPDFObjectHandle</classname>:
|
||
<function>getRawStreamData</function>, which returns the raw
|
||
(unfiltered) stream data into a buffer. This complements the
|
||
<function>getStreamData</function> method, which returns the
|
||
filtered (uncompressed) stream data and can only be used when
|
||
the stream's data is filterable.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Provide two new examples:
|
||
<command>pdf-double-page-size</command> and
|
||
<command>pdf-invert-images</command> that illustrate the newly
|
||
added interfaces.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Fix a memory leak that would cause loss of a few bytes for
|
||
every object involved in a cycle of object references. Thanks
|
||
to Jian Ma for calling my attention to the leak.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.1.5: April 25, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Remove restriction of file identifier strings to 16 bytes.
|
||
This unnecessary restriction was preventing qpdf from being
|
||
able to encrypt or decrypt files with identifier strings that
|
||
were not exactly 16 bytes long. The specification imposes no
|
||
such restriction.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.1.4: April 18, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Apply the same padding calculation fix from version 2.1.2 to
|
||
the main cross reference stream as well.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Since <command>qpdf --check</command> only performs limited
|
||
checks, clarify the output to make it clear that there still
|
||
may be errors that qpdf can't check. This should make it less
|
||
surprising to people when another PDF reader is unable to read
|
||
a file that qpdf thinks is okay.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.1.3: March 27, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Fix bug that could cause a failure when rewriting PDF files
|
||
that contain object streams with unreferenced objects that in
|
||
turn reference indirect scalars.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Don't complain about (invalid) AES streams that aren't a
|
||
multiple of 16 bytes. Instead, pad them before decrypting.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.1.2: January 24, 2010</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Fix bug in padding around first half cross reference stream in
|
||
linearized files. The bug could cause an assertion failure
|
||
when linearizing certain unlucky files.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.1.1: December 14, 2009</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
No changes in functionality; insert missing include in an
|
||
internal library header file to support gcc 4.4, and update
|
||
test suite to ignore broken Adobe Reader installations.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.1: October 30, 2009</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
This is the first version of qpdf to include Windows support.
|
||
On Windows, it is possible to build a DLL. Additionally, a
|
||
partial C-language API has been introduced, which makes it
|
||
possible to call qpdf functions from non-C++ environments. I
|
||
am very grateful to Žarko <!-- Gajić --> Gajic (<ulink
|
||
url="http://zarko-gajic.iz.hr/">http://zarko-gajic.iz.hr/</ulink>)
|
||
for tirelessly testing numerous pre-release versions of this
|
||
DLL and providing many excellent suggestions on improving the
|
||
interface.
|
||
</para>
|
||
<para>
|
||
For programming to the C interface, please see the header file
|
||
<filename>qpdf/qpdf-c.h</filename> and the example
|
||
<filename>examples/pdf-linearize.c</filename>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Žarko Gajic has written a Delphi wrapper for qpdf, which can
|
||
be downloaded from qpdf's download side. Žarko's Delphi
|
||
wrapper is released with the same licensing terms as qpdf
|
||
itself and comes with this disclaimer: “Delphi wrapper
|
||
unit <filename>qpdf.pas</filename> created by Žarko Gajic
|
||
(<ulink
|
||
url="http://zarko-gajic.iz.hr/">http://zarko-gajic.iz.hr/</ulink>).
|
||
Use at your own risk and for whatever purpose you want. No
|
||
support is provided. Sample code is provided.”
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Support has been added for AES encryption and crypt filters.
|
||
Although qpdf does not presently support files that use
|
||
PKI-based encryption, with the addition of AES and crypt
|
||
filters, qpdf is now be able to open most encrypted files
|
||
created with newer versions of Acrobat or other PDF creation
|
||
software. Note that I have not been able to get very many
|
||
files encrypted in this way, so it's possible there could
|
||
still be some cases that qpdf can't handle. Please report
|
||
them if you find them.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Many error messages have been improved to include more
|
||
information in hopes of making qpdf a more useful tool for PDF
|
||
experts to use in manually recovering damaged PDF files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Attempt to avoid compressing metadata streams if possible.
|
||
This is consistent with other PDF creation applications.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Provide new command-line options for AES encrypt, cleartext
|
||
metadata, and setting the minimum and forced PDF versions of
|
||
output files.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Add additional methods to the <classname>QPDF</classname>
|
||
object for querying the document's permissions. Although qpdf
|
||
does not enforce these permissions, it does make them
|
||
available so that applications that use qpdf can enforce
|
||
permissions.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <option>--check</option> option to <command>qpdf</command>
|
||
has been extended to include some additional information.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
There have been a handful of non-compatible API changes. For
|
||
details, see <xref linkend="ref.upgrading-to-2.1"/>.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0.6: May 3, 2009</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Do not attempt to uncompress streams that have decode
|
||
parameters we don't recognize. Earlier versions of qpdf would
|
||
have rejected files with such streams.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0.5: March 10, 2009</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Improve error handling in the LZW decoder, and fix a small
|
||
error introduced in the previous version with regard to
|
||
handling full tables. The LZW decoder has been more strongly
|
||
verified in this release.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0.4: February 21, 2009</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Include proper support for LZW streams encoded without the
|
||
“early code change” flag. Special thanks to Atom
|
||
Smasher who reported the problem and provided an input file
|
||
compressed in this way, which I did not previously have.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Implement some improvements to file recovery logic.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0.3: February 15, 2009</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Compile cleanly with gcc 4.4.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Handle strings encoded as UTF-16BE properly.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0.2: June 30, 2008</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Update test suite to work properly with a
|
||
non-<command>bash</command> <filename>/bin/sh</filename> and
|
||
with Perl 5.10. No changes were made to the actual qpdf
|
||
source code itself for this release.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0.1: May 6, 2008</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
No changes in functionality or interface. This release
|
||
includes fixes to the source code so that qpdf compiles
|
||
properly and passes its test suite on a broader range of
|
||
platforms. See <filename>ChangeLog</filename> in the source
|
||
distribution for details.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>2.0: April 29, 2008</term>
|
||
<listitem>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
First public release.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</appendix>
|
||
<appendix id="ref.upgrading-to-2.1">
|
||
<title>Upgrading from 2.0 to 2.1</title>
|
||
<para>
|
||
Although, as a general rule, we like to avoid introducing
|
||
source-level incompatibilities in qpdf's interface, there were a
|
||
few non-compatible changes made in this version. A considerable
|
||
amount of source code that uses qpdf will probably compile without
|
||
any changes, but in some cases, you may have to update your code.
|
||
The changes are enumerated here. There are also some new
|
||
interfaces; for those, please refer to the header files.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
QPDF's exception handling mechanism now uses
|
||
<classname>std::logic_error</classname> for internal errors and
|
||
<classname>std::runtime_error</classname> for runtime errors in
|
||
favor of the now removed <classname>QEXC</classname> classes used
|
||
in previous versions. The <classname>QEXC</classname> exception
|
||
classes predated the addition of the
|
||
<filename><stdexcept></filename> header file to the C++
|
||
standard library. Most of the exceptions thrown by the qpdf
|
||
library itself are still of type <classname>QPDFExc</classname>
|
||
which is now derived from
|
||
<classname>std::runtime_error</classname>. Programs that caught
|
||
an instance of <classname>std::exception</classname> and
|
||
displayed it by calling the <function>what()</function> method
|
||
will not need to be changed.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The <classname>QPDFExc</classname> class now internally
|
||
represents various fields of the error condition and provides
|
||
interfaces for querying them. Among the fields is a numeric
|
||
error code that can help applications act differently on (a small
|
||
number of) different error conditions. See
|
||
<filename>QPDFExc.hh</filename> for details.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Warnings can be retrieved from qpdf as instances of
|
||
<classname>QPDFExc</classname> instead of strings.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The nested <classname>QPDF::EncryptionData</classname> class's
|
||
constructor takes an additional argument. This class is
|
||
primarily intended to be used by
|
||
<classname>QPDFWriter</classname>. There's not really anything
|
||
useful an end-user application could do with it. It probably
|
||
shouldn't really be part of the public interface to begin with.
|
||
Likewise, some of the methods for computing internal encryption
|
||
dictionary parameters have changed to support
|
||
<literal>/R=4</literal> encryption.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The method <function>QPDF::getUserPassword</function> has been
|
||
removed since it didn't do what people would think it did. There
|
||
are now two new methods:
|
||
<function>QPDF::getPaddedUserPassword</function> and
|
||
<function>QPDF::getTrimmedUserPassword</function>. The first one
|
||
does what the old <function>QPDF::getUserPassword</function>
|
||
method used to do, which is to return the password with possible
|
||
binary padding as specified by the PDF specification. The second
|
||
one returns a human-readable password string.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The enumerated types that used to be nested in
|
||
<classname>QPDFWriter</classname> have moved to top-level
|
||
enumerated types and are now defined in the file
|
||
<filename>qpdf/Constants.h</filename>. This enables them to be
|
||
shared by both the C and C++ interfaces.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</appendix>
|
||
<appendix id="ref.upgrading-to-3.0">
|
||
<title>Upgrading to 3.0</title>
|
||
<para>
|
||
For the most part, the API for qpdf version 3.0 is backward
|
||
compatible with versions 2.1 and later. There are two exceptions:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
The method
|
||
<function>QPDFObjectHandle::replaceStreamData</function> that
|
||
uses a <classname>StreamDataProvider</classname> to provide the
|
||
stream data no longer takes a <varname>length</varname>
|
||
parameter. While it would have been easy enough to keep the
|
||
parameter for backward compatibility, in this case, the
|
||
parameter was removed since this provides the user an
|
||
opportunity to simplify the calling code. This method was
|
||
introduced in version 2.2. At the time, the
|
||
<varname>length</varname> parameter was required in order to
|
||
ensure that calls to the stream data provider returned the same
|
||
length for a specific stream every time they were invoked. In
|
||
particular, the linearization code depends on this. Instead,
|
||
qpdf 3.0 and newer check for that constraint explicitly. The
|
||
first time the stream data provider is called for a specific
|
||
stream, the actual length is saved, and subsequent calls are
|
||
required to return the same number of bytes. This means the
|
||
calling code no longer has to compute the length in advance,
|
||
which can be a significant simplification. If your code fails
|
||
to compile because of the extra argument and you don't want to
|
||
make other changes to your code, just omit the argument.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Many methods take <type>long long</type> instead of other
|
||
integer types. Most if not all existing code should compile
|
||
fine with this change since such parameters had always
|
||
previously been smaller types. This change was required to
|
||
support files larger than two gigabytes in size.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</appendix>
|
||
<appendix id="ref.upgrading-to-4.0">
|
||
<title>Upgrading to 4.0</title>
|
||
<para>
|
||
While version 4.0 includes a few non-compatible API changes, it is
|
||
very unlikely that anyone's code would have used any of those parts
|
||
of the API since they generally required information that would
|
||
only be available inside the library. In the unlikely event that
|
||
you should run into trouble, please see the ChangeLog. See also
|
||
<xref linkend="ref.release-notes"/> for a complete list of the
|
||
non-compatible API changes made in this version.
|
||
</para>
|
||
</appendix>
|
||
</book>
|