2021-12-11 21:36:05 +00:00
|
|
|
|
|
2021-12-11 21:39:51 +00:00
|
|
|
|
QPDF version |release|
|
|
|
|
|
======================
|
2021-12-11 21:36:05 +00:00
|
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
|
:maxdepth: 2
|
|
|
|
|
:caption: Contents:
|
|
|
|
|
|
2021-12-11 21:53:08 +00:00
|
|
|
|
.. _acknowledgments:
|
|
|
|
|
|
|
|
|
|
General Information
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
QPDF is a program that does structural, content-preserving
|
|
|
|
|
transformations on PDF files. QPDF's website is located at
|
|
|
|
|
https://qpdf.sourceforge.io/. QPDF's source code is hosted on github at
|
|
|
|
|
https://github.com/qpdf/qpdf.
|
|
|
|
|
|
|
|
|
|
QPDF is licensed under `the Apache License, Version
|
|
|
|
|
2.0 <http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
|
|
Versions of qpdf prior to version 7 were released under the terms of
|
|
|
|
|
`the Artistic License, version
|
|
|
|
|
2.0 <https://opensource.org/licenses/Artistic-2.0>`__. At your option,
|
|
|
|
|
you may continue to consider qpdf to be licensed under those terms. The
|
|
|
|
|
Apache License 2.0 permits everything that the Artistic License 2.0
|
|
|
|
|
permits but is slightly less restrictive. Allowing the Artistic License
|
|
|
|
|
to continue being used is primary to help people who may have to get
|
|
|
|
|
specific approval to use qpdf in their products.
|
|
|
|
|
|
|
|
|
|
QPDF is intentionally released with a permissive license. However, if
|
|
|
|
|
there is some reason that the licensing terms don't work for your
|
|
|
|
|
requirements, please feel free to contact the copyright holder to make
|
|
|
|
|
other arrangements.
|
|
|
|
|
|
|
|
|
|
QPDF was originally created in 2001 and modified periodically between
|
|
|
|
|
2001 and 2005 during my employment at `Apex
|
|
|
|
|
CoVantage <http://www.apexcovantage.com>`__. Upon my departure from
|
|
|
|
|
Apex, the company graciously allowed me to take ownership of the
|
|
|
|
|
software and continue maintaining as an open source project, a decision
|
|
|
|
|
for which I am very grateful. I have made considerable enhancements to
|
|
|
|
|
it since that time. I feel fortunate to have worked for people who would
|
|
|
|
|
make such a decision. This work would not have been possible without
|
|
|
|
|
their support.
|
|
|
|
|
|
|
|
|
|
.. _ref.overview:
|
|
|
|
|
|
|
|
|
|
What is QPDF?
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
QPDF is a program that does structural, content-preserving
|
|
|
|
|
transformations on PDF files. It could have been called something like
|
|
|
|
|
*pdf-to-pdf*. It also provides many useful capabilities to developers of
|
|
|
|
|
PDF-producing software or for people who just want to look at the
|
|
|
|
|
innards of a PDF file to learn more about how they work.
|
|
|
|
|
|
|
|
|
|
With QPDF, it is possible to copy objects from one PDF file into another
|
|
|
|
|
and to manipulate the list of pages in a PDF file. This makes it
|
|
|
|
|
possible to merge and split PDF files. The QPDF library also makes it
|
|
|
|
|
possible for you to create PDF files from scratch. In this mode, you are
|
|
|
|
|
responsible for supplying all the contents of the file, while the QPDF
|
|
|
|
|
library takes care off all the syntactical representation of the
|
|
|
|
|
objects, creation of cross references tables and, if you use them,
|
|
|
|
|
object streams, encryption, linearization, and other syntactic details.
|
|
|
|
|
You are still responsible for generating PDF content on your own.
|
|
|
|
|
|
|
|
|
|
QPDF has been designed with very few external dependencies, and it is
|
|
|
|
|
intentionally very lightweight. QPDF is *not* a PDF content creation
|
|
|
|
|
library, a PDF viewer, or a program capable of converting PDF into other
|
|
|
|
|
formats. In particular, QPDF knows nothing about the semantics of PDF
|
|
|
|
|
content streams. If you are looking for something that can do that, you
|
|
|
|
|
should look elsewhere. However, once you have a valid PDF file, QPDF can
|
|
|
|
|
be used to transform that file in ways perhaps your original PDF
|
|
|
|
|
creation can't handle. For example, many programs generate simple PDF
|
|
|
|
|
files but can't password-protect them, web-optimize them, or perform
|
|
|
|
|
other transformations of that type.
|
|
|
|
|
|
|
|
|
|
.. _ref.installing:
|
|
|
|
|
|
|
|
|
|
Building and Installing QPDF
|
|
|
|
|
============================
|
|
|
|
|
|
|
|
|
|
This chapter describes how to build and install qpdf. Please see also
|
2021-12-12 00:02:42 +00:00
|
|
|
|
the :file:`README.md` and
|
|
|
|
|
:file:`INSTALL` files in the source distribution.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.prerequisites:
|
|
|
|
|
|
|
|
|
|
System Requirements
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
The qpdf package has few external dependencies. In order to build qpdf,
|
|
|
|
|
the following packages are required:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- A C++ compiler that supports C++-14.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- zlib: http://www.zlib.net/
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
|
|
|
|
|
able to use the gnutls crypto provider, and/or openssl:
|
|
|
|
|
https://openssl.org/ to be able to use the openssl crypto provider.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- gnu make 3.81 or newer: http://www.gnu.org/software/make
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- perl version 5.8 or newer: http://www.perl.org/; required for running
|
|
|
|
|
the test suite. Starting with qpdf version 9.1.1, perl is no longer
|
|
|
|
|
required at runtime.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
|
|
|
|
|
is required to run the test suite. Note that this is the version of
|
|
|
|
|
diff present on virtually all GNU/Linux systems. This is required
|
2021-12-12 00:01:40 +00:00
|
|
|
|
because the test suite uses :command:`diff -u`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Part of qpdf's test suite does comparisons of the contents PDF files by
|
|
|
|
|
converting them images and comparing the images. The image comparison
|
|
|
|
|
tests are disabled by default. Those tests are not required for
|
|
|
|
|
determining correctness of a qpdf build if you have not modified the
|
|
|
|
|
code since the test suite also contains expected output files that are
|
|
|
|
|
compared literally. The image comparison tests provide an extra check to
|
|
|
|
|
make sure that any content transformations don't break the rendering of
|
|
|
|
|
pages. Transformations that affect the content streams themselves are
|
|
|
|
|
off by default and are only provided to help developers look into the
|
|
|
|
|
contents of PDF files. If you are making deep changes to the library
|
|
|
|
|
that cause changes in the contents of the files that qpdf generates,
|
|
|
|
|
then you should enable the image comparison tests. Enable them by
|
2021-12-12 00:01:40 +00:00
|
|
|
|
running :command:`configure` with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--enable-test-compare-images` flag. If you enable
|
2021-12-11 21:53:08 +00:00
|
|
|
|
this, the following additional requirements are required by the test
|
|
|
|
|
suite. Note that in no case are these items required to use qpdf.
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- libtiff: http://www.remotesensing.org/libtiff/
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- GhostScript version 8.60 or newer: http://www.ghostscript.com
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
If you do not enable this, then you do not need to have tiff and
|
|
|
|
|
ghostscript.
|
|
|
|
|
|
|
|
|
|
Pre-built documentation is distributed with qpdf, so you should
|
|
|
|
|
generally not need to rebuild the documentation. In order to build the
|
|
|
|
|
documentation from its docbook sources, you need the docbook XML style
|
|
|
|
|
sheets (http://downloads.sourceforge.net/docbook/). To build the PDF
|
|
|
|
|
version of the documentation, you need Apache fop
|
|
|
|
|
(http://xml.apache.org/fop/) version 0.94 or higher.
|
|
|
|
|
|
|
|
|
|
.. _ref.building:
|
|
|
|
|
|
|
|
|
|
Build Instructions
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
Building qpdf on UNIX is generally just a matter of running
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
./configure
|
|
|
|
|
make
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
You can also run :command:`make check` to run the test
|
|
|
|
|
suite and :command:`make install` to install. Please run
|
|
|
|
|
:command:`./configure --help` for options on what can be
|
2021-12-11 21:53:08 +00:00
|
|
|
|
configured. You can also set the value of ``DESTDIR`` during
|
|
|
|
|
installation to install to a temporary location, as is common with many
|
|
|
|
|
open source packages. Please see also the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`README.md` and
|
|
|
|
|
:file:`INSTALL` files in the source distribution.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Building on Windows is a little bit more complicated. For details,
|
2021-12-12 00:02:42 +00:00
|
|
|
|
please see :file:`README-windows.md` in the source
|
2021-12-11 21:53:08 +00:00
|
|
|
|
distribution. You can also download a binary distribution for Windows.
|
|
|
|
|
There is a port of qpdf to Visual C++ version 6 in the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`contrib` area generously contributed by Jian
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Ma. This is also discussed in more detail in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`README-windows.md`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
|
|
|
|
|
place in the public API, and it's just in a helper function. It is
|
|
|
|
|
possible to build qpdf on a system that doesn't have ``wchar_t``, and
|
|
|
|
|
it's also possible to compile a program that uses qpdf on a system
|
|
|
|
|
without ``wchar_t`` as long as you don't call that one method. This is a
|
|
|
|
|
very unusual situation. For a detailed discussion, please see the
|
|
|
|
|
top-level README.md file in qpdf's source distribution.
|
|
|
|
|
|
|
|
|
|
There are some other things you can do with the build. Although qpdf
|
2021-12-12 00:01:40 +00:00
|
|
|
|
uses :command:`autoconf`, it does not use
|
|
|
|
|
:command:`automake` but instead uses a
|
2021-12-11 21:53:08 +00:00
|
|
|
|
hand-crafted non-recursive Makefile that requires gnu make. If you're
|
|
|
|
|
really interested, please read the comments in the top-level
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`Makefile`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.crypto:
|
|
|
|
|
|
|
|
|
|
Crypto Providers
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
Starting with qpdf 9.1.0, the qpdf library can be built with multiple
|
|
|
|
|
implementations of providers of cryptographic functions, which we refer
|
|
|
|
|
to as "crypto providers." At the time of writing, a crypto
|
|
|
|
|
implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
|
|
|
|
|
and RC4 and AES256 with and without CBC encryption. In the future, if
|
|
|
|
|
digital signature is added to qpdf, there may be additional requirements
|
|
|
|
|
beyond this.
|
|
|
|
|
|
|
|
|
|
Starting with qpdf version 9.1.0, the available implementations are
|
|
|
|
|
``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
|
|
|
|
|
Additional implementations may be added if needed. It is also possible
|
|
|
|
|
for a developer to provide their own implementation without modifying
|
|
|
|
|
the qpdf library.
|
|
|
|
|
|
|
|
|
|
.. _ref.crypto.build:
|
|
|
|
|
|
|
|
|
|
Build Support For Crypto Providers
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
When building with qpdf's build system, crypto providers can be enabled
|
2021-12-12 00:01:40 +00:00
|
|
|
|
at build time using various :command:`./configure`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
options. The default behavior is for
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure` to discover which crypto providers
|
2021-12-11 21:53:08 +00:00
|
|
|
|
can be supported based on available external libraries, to build all
|
|
|
|
|
available crypto providers, and to use an external provider as the
|
|
|
|
|
default over the native one. This behavior can be changed with the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
following flags to :command:`./configure`:
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
- :samp:`--enable-crypto-{x}`
|
|
|
|
|
(where :samp:`{x}` is a supported crypto
|
|
|
|
|
provider): enable the :samp:`{x}` crypto
|
2021-12-11 23:49:31 +00:00
|
|
|
|
provider, requiring any external dependencies it needs
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
- :samp:`--disable-crypto-{x}`:
|
|
|
|
|
disable the :samp:`{x}` provider, and do not
|
2021-12-11 23:49:31 +00:00
|
|
|
|
link against its dependencies even if they are available
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
- :samp:`--with-default-crypto={x}`:
|
|
|
|
|
make :samp:`{x}` the default provider even if
|
2021-12-11 23:49:31 +00:00
|
|
|
|
a higher priority one is available
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--disable-implicit-crypto`: only build crypto
|
2021-12-11 23:49:31 +00:00
|
|
|
|
providers that are explicitly requested with an
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--enable-crypto-{x}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
option
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
For example, if you want to guarantee that the gnutls crypto provider is
|
|
|
|
|
used and that the native provider is not built, you could run
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure --enable-crypto-gnutls
|
|
|
|
|
--disable-implicit-crypto`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
If you build qpdf using your own build system, in order for qpdf to work
|
|
|
|
|
at all, you need to enable at least one crypto provider. The file
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`libqpdf/qpdf/qpdf-config.h.in` provides
|
2021-12-11 21:53:08 +00:00
|
|
|
|
macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
|
|
|
|
|
default crypto provider, and various symbols starting with
|
|
|
|
|
``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
|
|
|
|
|
you must compile the source files that implement a crypto provider. To
|
|
|
|
|
get a list of those files, look at
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`libqpdf/build.mk`. If you want to omit a
|
2021-12-11 21:53:08 +00:00
|
|
|
|
particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
|
|
|
|
|
undefined, you can completely ignore the source files that belong to a
|
|
|
|
|
particular crypto provider. Additionally, crypto providers may have
|
|
|
|
|
their own external dependencies that can be omitted if the crypto
|
|
|
|
|
provider is not used. For example, if you are building qpdf yourself and
|
|
|
|
|
are using an environment that does not support gnutls or openssl, you
|
|
|
|
|
can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
|
|
|
|
|
is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
|
|
|
|
|
you must include the source files used in the native implementation,
|
|
|
|
|
some of which were added or renamed from earlier versions, to your
|
|
|
|
|
build, and you can ignore
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFCrypto_gnutls.cc`. Always consult
|
|
|
|
|
:file:`libqpdf/build.mk` to get the list of source
|
2021-12-11 21:53:08 +00:00
|
|
|
|
files you need to build.
|
|
|
|
|
|
|
|
|
|
.. _ref.crypto.runtime:
|
|
|
|
|
|
|
|
|
|
Runtime Crypto Provider Selection
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
You can use the :samp:`--show-crypto` option to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` to get a list of available crypto
|
2021-12-11 21:53:08 +00:00
|
|
|
|
providers. The default provider is always listed first, and the rest are
|
|
|
|
|
listed in lexical order. Each crypto provider is listed on a line by
|
|
|
|
|
itself with no other text, enabling the output of this command to be
|
|
|
|
|
used easily in scripts.
|
|
|
|
|
|
|
|
|
|
You can override which crypto provider is used by setting the
|
|
|
|
|
``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
|
|
|
|
|
ever do this, but you might want to do it if you were explicitly trying
|
|
|
|
|
to compare behavior of two different crypto providers while testing
|
|
|
|
|
performance or reproducing a bug. It could also be useful for people who
|
|
|
|
|
are implementing their own crypto providers.
|
|
|
|
|
|
|
|
|
|
.. _ref.crypto.develop:
|
|
|
|
|
|
|
|
|
|
Crypto Provider Information for Developers
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
If you are writing code that uses libqpdf and you want to force a
|
|
|
|
|
certain crypto provider to be used, you can call the method
|
|
|
|
|
``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
|
|
|
|
|
a built-in or developer-supplied provider. To add your own crypto
|
|
|
|
|
provider, you have to create a class derived from ``QPDFCryptoImpl`` and
|
|
|
|
|
register it with ``QPDFCryptoProvider``. For additional information, see
|
2021-12-12 00:02:42 +00:00
|
|
|
|
comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.crypto.design:
|
|
|
|
|
|
|
|
|
|
Crypto Provider Design Notes
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
This section describes a few bits of rationale for why the crypto
|
|
|
|
|
provider interface was set up the way it was. You don't need to know any
|
|
|
|
|
of this information, but it's provided for the record and in case it's
|
|
|
|
|
interesting.
|
|
|
|
|
|
|
|
|
|
As a general rule, I want to avoid as much as possible including large
|
|
|
|
|
blocks of code that are conditionally compiled such that, in most
|
|
|
|
|
builds, some code is never built. This is dangerous because it makes it
|
|
|
|
|
very easy for invalid code to creep in unnoticed. As such, I want it to
|
|
|
|
|
be possible to build qpdf with all available crypto providers, and this
|
|
|
|
|
is the way I build qpdf for local development. At the same time, if a
|
|
|
|
|
particular packager feels that it is a security liability for qpdf to
|
|
|
|
|
use crypto functionality from other than a library that gets
|
|
|
|
|
considerable scrutiny for this specific purpose (such as gnutls,
|
|
|
|
|
openssl, or nettle), then I want to give that packager the ability to
|
|
|
|
|
completely disable qpdf's native implementation. Or if someone wants to
|
|
|
|
|
avoid adding a dependency on one of the external crypto providers, I
|
|
|
|
|
don't want the availability of the provider to impose additional
|
|
|
|
|
external dependencies within that environment. Both of these are
|
|
|
|
|
situations that I know to be true for some users of qpdf.
|
|
|
|
|
|
|
|
|
|
I want registration and selection of crypto providers to be thread-safe,
|
|
|
|
|
and I want it to work deterministically for a developer to provide their
|
|
|
|
|
own crypto provider and be able to set it up as the default. This was
|
|
|
|
|
the primary motivation behind requiring C++-11 as doing so enabled me to
|
|
|
|
|
exploit the guaranteed thread safety of local block static
|
|
|
|
|
initialization. The ``QPDFCryptoProvider`` class uses a singleton
|
|
|
|
|
pattern with thread-safe initialization to create the singleton instance
|
|
|
|
|
of ``QPDFCryptoProvider`` and exposes only static methods in its public
|
|
|
|
|
interface. In this way, if a developer wants to call any
|
|
|
|
|
``QPDFCryptoProvider`` methods, the library guarantees the
|
|
|
|
|
``QPDFCryptoProvider`` is fully initialized and all built-in crypto
|
|
|
|
|
providers are registered. Making ``QPDFCryptoProvider`` actually know
|
|
|
|
|
about all the built-in providers may seem a bit sad at first, but this
|
|
|
|
|
choice makes it extremely clear exactly what the initialization behavior
|
|
|
|
|
is. There's no question about provider implementations automatically
|
|
|
|
|
registering themselves in a nondeterministic order. It also means that
|
|
|
|
|
implementations do not need to know anything about the provider
|
|
|
|
|
interface, which makes them easier to test in isolation. Another
|
|
|
|
|
advantage of this approach is that a developer who wants to develop
|
|
|
|
|
their own crypto provider can do so in complete isolation from the qpdf
|
|
|
|
|
library and, with just two calls, can make qpdf use their provider in
|
|
|
|
|
their application. If they decided to contribute their code, plugging it
|
|
|
|
|
into the qpdf library would require a very small change to qpdf's source
|
|
|
|
|
code.
|
|
|
|
|
|
|
|
|
|
The decision to make the crypto provider selectable at runtime was one I
|
|
|
|
|
struggled with a little, but I decided to do it for various reasons.
|
|
|
|
|
Allowing an end user to switch crypto providers easily could be very
|
|
|
|
|
useful for reproducing a potential bug. If a user reports a bug that
|
|
|
|
|
some cryptographic thing is broken, I can easily ask that person to try
|
|
|
|
|
with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
|
|
|
|
|
same could apply in the event of a performance problem. This also makes
|
|
|
|
|
it easier for qpdf's own test suite to exercise code with different
|
|
|
|
|
providers without having to make every program that links with qpdf
|
|
|
|
|
aware of the possibility of multiple providers. In qpdf's continuous
|
|
|
|
|
integration environment, the entire test suite is run for each supported
|
|
|
|
|
crypto provider. This is made simple by being able to select the
|
|
|
|
|
provider using an environment variable.
|
|
|
|
|
|
|
|
|
|
Finally, making crypto providers selectable in this way establish a
|
|
|
|
|
pattern that I may follow again in the future for stream filter
|
|
|
|
|
providers. One could imagine a future enhancement where someone could
|
|
|
|
|
provide their own implementations for basic filters like
|
|
|
|
|
``/FlateDecode`` or for other filters that qpdf doesn't support.
|
|
|
|
|
Implementing the registration functions and internal storage of
|
|
|
|
|
registered providers was also easier using C++-11's functional
|
|
|
|
|
interfaces, which was another reason to require C++-11 at this time.
|
|
|
|
|
|
|
|
|
|
.. _ref.packaging:
|
|
|
|
|
|
|
|
|
|
Notes for Packagers
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
If you are packaging qpdf for an operating system distribution, here are
|
|
|
|
|
some things you may want to keep in mind:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
|
|
|
|
|
dependency on perl. This is because fix-qdf was rewritten in C++.
|
|
|
|
|
However, qpdf still has a build-time dependency on perl.
|
|
|
|
|
|
|
|
|
|
- Make sure you are getting the intended behavior with regard to crypto
|
2021-12-12 00:31:19 +00:00
|
|
|
|
providers. Read :ref:`ref.crypto.build` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Passing :samp:`--enable-show-failed-test-output` to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure` will cause any failed test
|
2021-12-11 23:49:31 +00:00
|
|
|
|
output to be written to the console. This can be very useful for
|
|
|
|
|
seeing test failures generated by autobuilders where you can't access
|
|
|
|
|
qtest.log after the fact.
|
|
|
|
|
|
|
|
|
|
- If qpdf's build environment detects the presence of autoconf and
|
|
|
|
|
related tools, it will check to ensure that automatically generated
|
|
|
|
|
files are up-to-date with recorded checksums and fail if it detects a
|
|
|
|
|
discrepancy. This feature is intended to prevent you from
|
|
|
|
|
accidentally forgetting to regenerate automatic files after modifying
|
|
|
|
|
their sources. If your packaging environment automatically refreshes
|
|
|
|
|
automatic files, it can cause this check to fail. Suppress qpdf's
|
2021-12-12 00:11:56 +00:00
|
|
|
|
checks by passing :samp:`--disable-check-autofiles`
|
2021-12-12 00:01:40 +00:00
|
|
|
|
to :command:`/.configure`. This is safe since qpdf's
|
|
|
|
|
:command:`autogen.sh` just runs autotools in the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
normal way.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- QPDF's :command:`make install` does not install
|
2021-12-11 23:49:31 +00:00
|
|
|
|
completion files by default, but as a packager, it's good if you
|
|
|
|
|
install them wherever your distribution expects such files to go. You
|
|
|
|
|
can find completion files to install in the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`completions` directory.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Packagers are encouraged to install the source files from the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples` directory along with qpdf
|
2021-12-11 23:49:31 +00:00
|
|
|
|
development packages.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.using:
|
|
|
|
|
|
|
|
|
|
Running QPDF
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
This chapter describes how to run the qpdf program from the command
|
|
|
|
|
line.
|
|
|
|
|
|
|
|
|
|
.. _ref.invocation:
|
|
|
|
|
|
|
|
|
|
Basic Invocation
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
When running qpdf, the basic invocation is as follows:
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
qpdf [ options ] { infilename | --empty } outfilename
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
This converts PDF file :samp:`infilename` to PDF file
|
|
|
|
|
:samp:`outfilename`. The output file is functionally
|
2021-12-11 21:53:08 +00:00
|
|
|
|
identical to the input file but may have been structurally reorganized.
|
|
|
|
|
Also, orphaned objects will be removed from the file. Many
|
|
|
|
|
transformations are available as controlled by the options below. In
|
2021-12-12 00:11:56 +00:00
|
|
|
|
place of :samp:`infilename`, the parameter
|
|
|
|
|
:samp:`--empty` may be specified. This causes qpdf to
|
2021-12-11 21:53:08 +00:00
|
|
|
|
use a dummy input file that contains zero pages. The only normal use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
case for using :samp:`--empty` would be if you were
|
2021-12-12 00:31:19 +00:00
|
|
|
|
going to add pages from another source, as discussed in :ref:`ref.page-selection`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
If :samp:`@filename` appears as a word anywhere in the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
command-line, it will be read line by line, and each line will be
|
|
|
|
|
treated as a command-line argument. Leading and trailing whitespace is
|
|
|
|
|
intentionally not removed from lines, which makes it possible to handle
|
2021-12-12 00:11:56 +00:00
|
|
|
|
arguments that start or end with spaces. The :samp:`@-`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option allows arguments to be read from standard input. This allows qpdf
|
|
|
|
|
to be invoked with an arbitrary number of arbitrarily long arguments. It
|
|
|
|
|
is also very useful for avoiding having to pass passwords on the command
|
2021-12-12 00:11:56 +00:00
|
|
|
|
line. Note that the :samp:`@filename` can't appear in
|
2021-12-11 21:53:08 +00:00
|
|
|
|
the middle of an argument, so constructs such as
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--arg=@option` will not work. You would have to
|
2021-12-11 21:53:08 +00:00
|
|
|
|
include the argument and its options together in the arguments file.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`outfilename` does not have to be seekable, even
|
|
|
|
|
when generating linearized files. Specifying ":samp:`-`"
|
|
|
|
|
as :samp:`outfilename` means to write to standard
|
2021-12-11 21:53:08 +00:00
|
|
|
|
output. If you want to overwrite the input file with the output, use the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
option :samp:`--replace-input` and omit the output file
|
2021-12-11 21:53:08 +00:00
|
|
|
|
name. You can't specify the same file as both the input and the output.
|
|
|
|
|
If you do this, qpdf will tell you about the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--replace-input` option.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Most options require an output file, but some testing or inspection
|
|
|
|
|
commands do not. These are specifically noted.
|
|
|
|
|
|
|
|
|
|
.. _ref.exit-status:
|
|
|
|
|
|
|
|
|
|
Exit Status
|
|
|
|
|
~~~~~~~~~~~
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
The exit status of :command:`qpdf` may be interpreted as
|
2021-12-11 21:53:08 +00:00
|
|
|
|
follows:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``0``: no errors or warnings were found. The file may still have
|
|
|
|
|
problems qpdf can't detect. If
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--warning-exit-0` was specified, exit status 0
|
2021-12-11 23:49:31 +00:00
|
|
|
|
is used even if there are warnings.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``2``: errors were found. qpdf was not able to fully process the
|
|
|
|
|
file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``3``: qpdf encountered problems that it was able to recover from. In
|
|
|
|
|
some cases, the resulting file may still be damaged. Note that qpdf
|
|
|
|
|
still exits with status ``3`` if it finds warnings even when
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--no-warn` is specified. With
|
|
|
|
|
:samp:`--warning-exit-0`, warnings without errors
|
2021-12-11 23:49:31 +00:00
|
|
|
|
exit with status 0 instead of 3.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
Note that :command:`qpdf` never exists with status ``1``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If you get an exit status of ``1``, it was something else, like the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
shell not being able to find or execute :command:`qpdf`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.shell-completion:
|
|
|
|
|
|
|
|
|
|
Shell Completion
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
Starting in qpdf version 8.3.0, qpdf provides its own completion support
|
2021-12-12 00:01:40 +00:00
|
|
|
|
for zsh and bash. You can enable bash completion with :command:`eval
|
|
|
|
|
$(qpdf --completion-bash)` and zsh completion with
|
|
|
|
|
:command:`eval $(qpdf --completion-zsh)`. If
|
|
|
|
|
:command:`qpdf` is not in your path, you should invoke it
|
2021-12-11 21:53:08 +00:00
|
|
|
|
above with an absolute path. If you invoke it with a relative path, it
|
|
|
|
|
will warn you, and the completion won't work if you're in a different
|
|
|
|
|
directory.
|
|
|
|
|
|
|
|
|
|
qpdf will use ``argv[0]`` to figure out where its executable is. This
|
|
|
|
|
may produce unwanted results in some cases, especially if you are trying
|
|
|
|
|
to use completion with copy of qpdf that is built from source. You can
|
|
|
|
|
specify a full path to the qpdf you want to use for completion in the
|
|
|
|
|
``QPDF_EXECUTABLE`` environment variable.
|
|
|
|
|
|
|
|
|
|
.. _ref.basic-options:
|
|
|
|
|
|
|
|
|
|
Basic Options
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
The following options are the most common ones and perform commonly
|
|
|
|
|
needed transformations.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--help`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Display command-line invocation help.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--version`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Display the current version of qpdf.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--copyright`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Show detailed copyright information.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-crypto`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Show a list of available crypto providers, each on a line by itself.
|
2021-12-12 00:31:19 +00:00
|
|
|
|
The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto
|
2021-12-11 21:53:08 +00:00
|
|
|
|
providers.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--completion-bash`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Output a completion command you can eval to enable shell completion
|
|
|
|
|
from bash.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--completion-zsh`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Output a completion command you can eval to enable shell completion
|
|
|
|
|
from zsh.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--password={password}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Specifies a password for accessing encrypted files. To read the
|
|
|
|
|
password from a file or standard input, you can use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--password-file`, added in qpdf 10.2. Note
|
|
|
|
|
that you can also use :samp:`@filename` or
|
|
|
|
|
:samp:`@-` as described above to put the password in
|
2021-12-11 21:53:08 +00:00
|
|
|
|
a file or pass it via standard input, but you would do so by
|
|
|
|
|
specifying the entire
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--password={password}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option in the file. Syntax such as
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--password=@filename` won't work since
|
|
|
|
|
:samp:`@filename` is not recognized in the middle of
|
2021-12-11 21:53:08 +00:00
|
|
|
|
an argument.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--password-file={filename}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Reads the first line from the specified file and uses it as the
|
|
|
|
|
password for accessing encrypted files.
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{filename}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
may be ``-`` to read the password from standard input. Note that, in
|
|
|
|
|
this case, the password is echoed and there is no prompt, so use with
|
|
|
|
|
caution.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--is-encrypted`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Silently exit with status 0 if the file is encrypted or status 2 if
|
|
|
|
|
the file is not encrypted. This is useful for shell scripts. Other
|
|
|
|
|
options are ignored if this is given. This option is mutually
|
2021-12-12 00:11:56 +00:00
|
|
|
|
exclusive with :samp:`--requires-password`. Both this
|
|
|
|
|
option and :samp:`--requires-password` exit with
|
2021-12-11 21:53:08 +00:00
|
|
|
|
status 2 for non-encrypted files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--requires-password`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Silently exit with status 0 if a password (other than as supplied) is
|
|
|
|
|
required. Exit with status 2 if the file is not encrypted. Exit with
|
|
|
|
|
status 3 if the file is encrypted but requires no password or the
|
|
|
|
|
correct password has been supplied. This is useful for shell scripts.
|
|
|
|
|
Note that any supplied password is used when opening the file. When
|
2021-12-12 00:11:56 +00:00
|
|
|
|
used with a :samp:`--password` option, this option
|
2021-12-11 21:53:08 +00:00
|
|
|
|
can be used to check the correctness of the password. In that case,
|
|
|
|
|
an exit status of 3 means the file works with the supplied password.
|
|
|
|
|
This option is mutually exclusive with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--is-encrypted`. Both this option and
|
|
|
|
|
:samp:`--is-encrypted` exit with status 2 for
|
2021-12-11 21:53:08 +00:00
|
|
|
|
non-encrypted files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--verbose`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Increase verbosity of output. For now, this just prints some
|
|
|
|
|
indication of any file that it creates.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--progress`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Indicate progress while writing files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--no-warn`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Suppress writing of warnings to stderr. If warnings were detected and
|
2021-12-12 00:01:40 +00:00
|
|
|
|
suppressed, :command:`qpdf` will still exit with exit
|
2021-12-12 00:11:56 +00:00
|
|
|
|
code 3. See also :samp:`--warning-exit-0`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--warning-exit-0`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If warnings are found but no errors, exit with exit code 0 instead 3.
|
2021-12-12 00:11:56 +00:00
|
|
|
|
When combined with :samp:`--no-warn`, the effect is
|
2021-12-12 00:01:40 +00:00
|
|
|
|
for :command:`qpdf` to completely ignore warnings.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--linearize`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Causes generation of a linearized (web-optimized) output file.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--replace-input`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If specified, the output file name should be omitted. This option
|
|
|
|
|
tells qpdf to replace the input file with the output. It does this by
|
|
|
|
|
writing to
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:file:`{infilename}.~qpdf-temp#`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
and, when done, overwriting the input file with the temporary file.
|
|
|
|
|
If there were any warnings, the original input is saved as
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:file:`{infilename}.~qpdf-orig`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--copy-encryption=file`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Encrypt the file using the same encryption parameters, including user
|
|
|
|
|
and owner password, as the specified file. Use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--encryption-file-password` to specify a
|
2021-12-11 21:53:08 +00:00
|
|
|
|
password if one is needed to open this file. Note that copying the
|
|
|
|
|
encryption parameters from a file also copies the first half of
|
|
|
|
|
``/ID`` from the file since this is part of the encryption
|
|
|
|
|
parameters.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--encryption-file-password=password`
|
|
|
|
|
If the file specified with :samp:`--copy-encryption`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
requires a password, specify the password using this option. Note
|
|
|
|
|
that only one of the user or owner password is required. Both
|
|
|
|
|
passwords will be preserved since QPDF does not distinguish between
|
|
|
|
|
the two passwords. It is possible to preserve encryption parameters,
|
|
|
|
|
including the owner password, from a file even if you don't know the
|
|
|
|
|
file's owner password.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--allow-weak-crypto`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Starting with version 10.4, qpdf issues warnings when requested to
|
|
|
|
|
create files using RC4 encryption. This option suppresses those
|
|
|
|
|
warnings. In future versions of qpdf, qpdf will refuse to create
|
2021-12-12 00:31:19 +00:00
|
|
|
|
files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--encrypt options --`
|
2021-12-12 00:31:19 +00:00
|
|
|
|
Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify
|
2021-12-11 21:53:08 +00:00
|
|
|
|
encryption parameters.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--decrypt`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Removes any encryption on the file. A password must be supplied if
|
|
|
|
|
the file is password protected.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--password-is-hex-key`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Overrides the usual computation/retrieval of the PDF file's
|
|
|
|
|
encryption key from user/owner password with an explicit
|
|
|
|
|
specification of the encryption key. When this option is specified,
|
2021-12-12 00:11:56 +00:00
|
|
|
|
the argument to the :samp:`--password` option is
|
2021-12-11 21:53:08 +00:00
|
|
|
|
interpreted as a hexadecimal-encoded key value. This only applies to
|
|
|
|
|
the password used to open the main input file. It does not apply to
|
2021-12-12 00:11:56 +00:00
|
|
|
|
other files opened by :samp:`--pages` or other
|
2021-12-11 21:53:08 +00:00
|
|
|
|
options or to files being written.
|
|
|
|
|
|
|
|
|
|
Most users will never have a need for this option, and no standard
|
|
|
|
|
viewers support this mode of operation, but it can be useful for
|
|
|
|
|
forensic or investigatory purposes. For example, if a PDF file is
|
|
|
|
|
encrypted with an unknown password, a brute-force attack using the
|
|
|
|
|
key directly is sometimes more efficient than one using the password.
|
|
|
|
|
Also, if a file is heavily damaged, it may be possible to derive the
|
|
|
|
|
encryption key and recover parts of the file using it directly. To
|
|
|
|
|
expose the encryption key used by an encrypted file that you can open
|
2021-12-12 00:11:56 +00:00
|
|
|
|
normally, use the :samp:`--show-encryption-key`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--suppress-password-recovery`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Ordinarily, qpdf attempts to automatically compensate for passwords
|
|
|
|
|
specified in the wrong character encoding. This option suppresses
|
|
|
|
|
that behavior. Under normal conditions, there are no reasons to use
|
2021-12-12 00:31:19 +00:00
|
|
|
|
this option. See :ref:`ref.unicode-passwords` for a
|
2021-12-11 21:53:08 +00:00
|
|
|
|
discussion
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--password-mode={mode}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option can be used to fine-tune how qpdf interprets Unicode
|
|
|
|
|
(non-ASCII) password strings passed on the command line. With the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
exception of the :samp:`hex-bytes` mode, these only
|
2021-12-11 21:53:08 +00:00
|
|
|
|
apply to passwords provided when encrypting files. The
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`hex-bytes` mode also applies to passwords
|
2021-12-11 21:53:08 +00:00
|
|
|
|
specified for reading files. For additional discussion of the
|
|
|
|
|
supported password modes and when you might want to use them, see
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.unicode-passwords`. The following modes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
are supported:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`auto`: Automatically determine whether the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
specified password is a properly encoded Unicode (UTF-8) string,
|
|
|
|
|
and transcode it as required by the PDF spec based on the type
|
|
|
|
|
encryption being applied. On Windows starting with version 8.4.0,
|
|
|
|
|
and on almost all other modern platforms, incoming passwords will
|
|
|
|
|
be properly encoded in UTF-8, so this is almost always what you
|
|
|
|
|
want.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`unicode`: Tells qpdf that the incoming
|
2021-12-11 23:49:31 +00:00
|
|
|
|
password is UTF-8, overriding whatever its automatic detection
|
|
|
|
|
determines. The only difference between this mode and
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`auto` is that qpdf will fail with an error
|
2021-12-11 23:49:31 +00:00
|
|
|
|
message if the password is not valid UTF-8 instead of falling back
|
2021-12-12 00:11:56 +00:00
|
|
|
|
to :samp:`bytes` mode with a warning.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`bytes`: Interpret the password as a literal
|
2021-12-11 23:49:31 +00:00
|
|
|
|
byte string. For non-Windows platforms, this is what versions of
|
|
|
|
|
qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
|
|
|
|
|
specify strings of binary data on the command line directly, but
|
2021-12-12 00:11:56 +00:00
|
|
|
|
you can use the :samp:`@filename` option to do it,
|
2021-12-11 23:49:31 +00:00
|
|
|
|
in which case this option forces qpdf to respect the string of
|
|
|
|
|
bytes as provided. This option will allow you to encrypt PDF files
|
|
|
|
|
with passwords that will not be usable by other readers.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`hex-bytes`: Interpret the password as a
|
2021-12-11 23:49:31 +00:00
|
|
|
|
hex-encoded string. This provides a way to pass binary data as a
|
|
|
|
|
password on all platforms including Windows. As with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`bytes`, this option may allow creation of
|
2021-12-11 23:49:31 +00:00
|
|
|
|
files that can't be opened by other readers. This mode affects
|
|
|
|
|
qpdf's interpretation of passwords specified for decrypting files
|
|
|
|
|
as well as for encrypting them. It makes it possible to specify
|
|
|
|
|
strings that are encoded in some manner other than the system's
|
|
|
|
|
default encoding.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--rotate=[+|-]angle[:page-range]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Apply rotation to specified pages. The
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`page-range` portion of the option value has
|
2021-12-12 00:31:19 +00:00
|
|
|
|
the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
rotation is applied to all pages. The :samp:`angle`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
portion of the parameter may be either 0, 90, 180, or 270. If
|
2021-12-12 00:11:56 +00:00
|
|
|
|
preceded by :samp:`+` or :samp:`-`,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
the angle is added to or subtracted from the specified pages'
|
|
|
|
|
original rotations. This is almost always what you want. Otherwise
|
|
|
|
|
the pages' rotations are set to the exact value, which may cause the
|
|
|
|
|
appearances of the pages to be inconsistent, especially for scans.
|
2021-12-12 00:01:40 +00:00
|
|
|
|
For example, the command :command:`qpdf in.pdf out.pdf
|
|
|
|
|
--rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages
|
2021-12-11 21:53:08 +00:00
|
|
|
|
2, 4, and 6 90 degrees clockwise from their original rotation and
|
|
|
|
|
force the rotation of pages 7 through 8 to 180 degrees regardless of
|
2021-12-12 00:01:40 +00:00
|
|
|
|
their original rotation, and the command :command:`qpdf in.pdf
|
|
|
|
|
out.pdf --rotate=+180` would rotate all pages by 180
|
2021-12-11 21:53:08 +00:00
|
|
|
|
degrees.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--keep-files-open={[yn]}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option controls whether qpdf keeps individual files open while
|
|
|
|
|
merging. Prior to version 8.1.0, qpdf always kept all files open, but
|
|
|
|
|
this meant that the number of files that could be merged was limited
|
|
|
|
|
by the operating system's open file limit. Version 8.1.0 opened files
|
|
|
|
|
as they were referenced and closed them after each read, but this
|
|
|
|
|
caused a major performance impact. Version 8.2.0 optimized the
|
|
|
|
|
performance but did so in a way that, for local file systems, there
|
|
|
|
|
was a small but unavoidable performance hit, but for networked file
|
|
|
|
|
systems, the performance impact could be very high. Starting with
|
|
|
|
|
version 8.2.1, the default behavior is that files are kept open if no
|
|
|
|
|
more than 200 files are specified, but that the behavior can be
|
|
|
|
|
explicitly overridden with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--keep-files-open` flag. If you are merging
|
2021-12-11 21:53:08 +00:00
|
|
|
|
more than 200 files but less than the operating system's max open
|
|
|
|
|
files limit, you may want to use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--keep-files-open=y`, especially if working
|
2021-12-11 21:53:08 +00:00
|
|
|
|
over a networked file system. If you are using a local file system
|
|
|
|
|
where the overhead is low and you might sometimes merge more than the
|
|
|
|
|
OS limit's number of files from a script and are not worried about a
|
|
|
|
|
few seconds additional processing time, you may want to specify
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--keep-files-open=n`. The threshold for
|
2021-12-11 21:53:08 +00:00
|
|
|
|
switching may be changed from the default 200 with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--keep-files-open-threshold` option.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--keep-files-open-threshold={count}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If specified, overrides the default value of 200 used as the
|
|
|
|
|
threshold for qpdf deciding whether or not to keep files open. See
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--keep-files-open` for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages options --`
|
2021-12-12 00:31:19 +00:00
|
|
|
|
Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do
|
2021-12-11 21:53:08 +00:00
|
|
|
|
page selection (splitting and merging).
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--collate={n}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
When specified, collate rather than concatenate pages from files
|
2021-12-12 00:11:56 +00:00
|
|
|
|
specified with :samp:`--pages`. With a numeric
|
2021-12-12 21:18:03 +00:00
|
|
|
|
argument, collate in groups of :samp:`{n}`.
|
2021-12-12 00:31:19 +00:00
|
|
|
|
The default is 1. See :ref:`ref.page-selection` for additional details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--flatten-rotation`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
For each page that is rotated using the ``/Rotate`` key in the page's
|
|
|
|
|
dictionary, remove the ``/Rotate`` key and implement the identical
|
|
|
|
|
rotation semantics by modifying the page's contents. This option can
|
|
|
|
|
be useful to prepare files for buggy PDF applications that don't
|
|
|
|
|
properly handle rotated pages.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--split-pages=[n]`
|
|
|
|
|
Write each group of :samp:`n` pages to a separate
|
|
|
|
|
output file. If :samp:`n` is not specified, create
|
2021-12-11 21:53:08 +00:00
|
|
|
|
single pages. Output file names are generated as follows:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If the string ``%d`` appears in the output file name, it is
|
|
|
|
|
replaced with a range of zero-padded page numbers starting from 1.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Otherwise, if the output file name ends in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`.pdf` (case insensitive), a zero-padded
|
2021-12-11 23:49:31 +00:00
|
|
|
|
page range, preceded by a dash, is inserted before the file
|
|
|
|
|
extension.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Otherwise, the file name is appended with a zero-padded page range
|
|
|
|
|
preceded by a dash.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Page ranges are a single number in the case of single-page groups or
|
|
|
|
|
two numbers separated by a dash otherwise. For example, if
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`infile.pdf` has 12 pages
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- :command:`qpdf --split-pages infile.pdf %d-out`
|
2021-12-12 00:02:42 +00:00
|
|
|
|
would generate files :file:`01-out` through
|
|
|
|
|
:file:`12-out`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- :command:`qpdf --split-pages=2 infile.pdf
|
|
|
|
|
outfile.pdf` would generate files
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`outfile-01-02.pdf` through
|
|
|
|
|
:file:`outfile-11-12.pdf`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- :command:`qpdf --split-pages infile.pdf
|
|
|
|
|
something.else` would generate files
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`something.else-01` through
|
|
|
|
|
:file:`something.else-12`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Note that outlines, threads, and other global features of the
|
|
|
|
|
original PDF file are not preserved. For each page of output, this
|
|
|
|
|
option creates an empty PDF and copies a single page from the output
|
|
|
|
|
into it. If you require the global data, you will have to run
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages` option once for each file. Using
|
|
|
|
|
:samp:`--split-pages` is much faster if you don't
|
2021-12-11 21:53:08 +00:00
|
|
|
|
require the global data.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--overlay options --`
|
2021-12-12 00:31:19 +00:00
|
|
|
|
Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
|
2021-12-11 21:53:08 +00:00
|
|
|
|
overlay/underlay.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--underlay options --`
|
2021-12-12 00:31:19 +00:00
|
|
|
|
Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
|
2021-12-11 21:53:08 +00:00
|
|
|
|
overlay/underlay.
|
|
|
|
|
|
|
|
|
|
Password-protected files may be opened by specifying a password. By
|
|
|
|
|
default, qpdf will preserve any encryption data associated with a file.
|
2021-12-12 00:11:56 +00:00
|
|
|
|
If :samp:`--decrypt` is specified, qpdf will attempt to
|
|
|
|
|
remove any encryption information. If :samp:`--encrypt`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
is specified, qpdf will replace the document's encryption parameters
|
|
|
|
|
with whatever is specified.
|
|
|
|
|
|
|
|
|
|
Note that qpdf does not obey encryption restrictions already imposed on
|
|
|
|
|
the file. Doing so would be meaningless since qpdf can be used to remove
|
|
|
|
|
encryption from the file entirely. This functionality is not intended to
|
|
|
|
|
be used for bypassing copyright restrictions or other restrictions
|
|
|
|
|
placed on files by their producers.
|
|
|
|
|
|
|
|
|
|
Prior to 8.4.0, in the case of passwords that contain characters that
|
|
|
|
|
fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
|
|
|
|
|
properly encoded encryption and decryption passwords to the user.
|
|
|
|
|
Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
|
2021-12-12 00:31:19 +00:00
|
|
|
|
an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual
|
2021-12-12 00:01:40 +00:00
|
|
|
|
described workarounds using the :command:`iconv` command.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Such workarounds are no longer required or recommended with qpdf 8.4.0.
|
|
|
|
|
However, for backward compatibility, qpdf attempts to detect those
|
|
|
|
|
workarounds and do the right thing in most cases.
|
|
|
|
|
|
|
|
|
|
.. _ref.encryption-options:
|
|
|
|
|
|
|
|
|
|
Encryption Options
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
To change the encryption parameters of a file, use the --encrypt flag.
|
|
|
|
|
The syntax is
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
--encrypt user-password owner-password key-length [ restrictions ] --
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
Note that ":samp:`--`" terminates parsing of encryption
|
2021-12-11 21:53:08 +00:00
|
|
|
|
flags and must be present even if no restrictions are present.
|
|
|
|
|
|
|
|
|
|
Either or both of the user password and the owner password may be empty
|
|
|
|
|
strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
|
|
|
|
|
of PDF files with a non-empty user password, an empty owner password,
|
|
|
|
|
and a 256-bit key since such files can be opened with no password. If
|
|
|
|
|
you want to create such files, specify the encryption option
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--allow-insecure`, as described below.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
The value for
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{key-length}` may
|
2021-12-11 21:53:08 +00:00
|
|
|
|
be 40, 128, or 256. The restriction flags are dependent upon key length.
|
|
|
|
|
When no additional restrictions are given, the default is to be fully
|
|
|
|
|
permissive.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
If :samp:`{key-length}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
is 40, the following restriction options are available:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--print=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether or not to allow printing.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--modify=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether or not to allow document modification.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--extract=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether or not to allow text/image extraction.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--annotate=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether or not to allow comments and form fill-in and
|
|
|
|
|
signing.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
If :samp:`{key-length}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
is 128, the following restriction options are available:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--accessibility=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether or not to allow accessibility to visually
|
|
|
|
|
impaired. The qpdf library disregards this field when AES is used or
|
|
|
|
|
when 256-bit encryption is used. You should really never disable
|
|
|
|
|
accessibility, but qpdf lets you do it in case you need to configure
|
|
|
|
|
a file this way for testing purposes. The PDF spec says that
|
|
|
|
|
conforming readers should disregard this permission and always allow
|
|
|
|
|
accessibility.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--extract=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether or not to allow text/graphic extraction.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--assemble=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether document assembly (rotation and reordering of
|
|
|
|
|
pages) is allowed.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--annotate=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether modifying annotations is allowed. This includes
|
|
|
|
|
adding comments and filling in form fields. Also allows editing of
|
2021-12-12 00:11:56 +00:00
|
|
|
|
form fields if :samp:`--modify-other=y` is given.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--form=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Determines whether filling form fields is allowed.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--modify-other=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Allow all document editing except those controlled separately by the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--assemble`,
|
|
|
|
|
:samp:`--annotate`, and
|
|
|
|
|
:samp:`--form` options.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--print={print-opt}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Controls printing access.
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{print-opt}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
may be one of the following:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`full`: allow full printing
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`low`: allow low-resolution printing only
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`none`: disallow printing
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--modify={modify-opt}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Controls modify access. This way of controlling modify access has
|
|
|
|
|
less granularity than new options added in qpdf 8.4.
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{modify-opt}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
may be one of the following:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`all`: allow full document modification
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`annotate`: allow comment authoring, form
|
2021-12-11 23:49:31 +00:00
|
|
|
|
operations, and document assembly
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`form`: allow form field fill-in and signing
|
2021-12-11 23:49:31 +00:00
|
|
|
|
and document assembly
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`assembly`: allow document assembly only
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`none`: allow no modifications
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
Using the :samp:`--modify` option does not allow you
|
2021-12-11 21:53:08 +00:00
|
|
|
|
to create certain combinations of permissions such as allowing form
|
|
|
|
|
filling but not allowing document assembly. Starting with qpdf 8.4,
|
|
|
|
|
you can either just use the other options to control fields
|
2021-12-12 00:11:56 +00:00
|
|
|
|
individually, or you can use something like :samp:`--modify=form
|
|
|
|
|
--assembly=n` to fine tune.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--cleartext-metadata`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If specified, any metadata stream in the document will be left
|
|
|
|
|
unencrypted even if the rest of the document is encrypted. This also
|
|
|
|
|
forces the PDF version to be at least 1.5.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--use-aes=[yn]`
|
|
|
|
|
If :samp:`--use-aes=y` is specified, AES encryption
|
2021-12-11 21:53:08 +00:00
|
|
|
|
will be used instead of RC4 encryption. This forces the PDF version
|
|
|
|
|
to be at least 1.6.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--allow-insecure`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
|
|
|
|
|
where the user password is non-empty, the owner password is empty,
|
|
|
|
|
and a 256-bit key is in use. Files created in this way are insecure
|
|
|
|
|
since they can be opened without a password. Users would ordinarily
|
|
|
|
|
never want to create such files. If you are using qpdf to
|
|
|
|
|
intentionally created strange files for testing (a definite valid use
|
|
|
|
|
of qpdf!), this option allows you to create such insecure files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--force-V4`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Use of this option forces the ``/V`` and ``/R`` parameters in the
|
|
|
|
|
document's encryption dictionary to be set to the value ``4``. As
|
|
|
|
|
qpdf will automatically do this when required, there is no reason to
|
|
|
|
|
ever use this option. It exists primarily for use in testing qpdf
|
|
|
|
|
itself. This option also forces the PDF version to be at least 1.5.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
If :samp:`{key-length}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
is 256, the minimum PDF version is 1.7 with extension level 8, and the
|
|
|
|
|
AES-based encryption format used is the PDF 2.0 encryption method
|
|
|
|
|
supported by Acrobat X. the same options are available as with 128 bits
|
|
|
|
|
with the following exceptions:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--use-aes`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option is not available with 256-bit keys. AES is always used
|
|
|
|
|
with 256-bit encryption keys.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--force-V4`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option is not available with 256 keys.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--force-R5`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If specified, qpdf sets the minimum version to 1.7 at extension level
|
|
|
|
|
3 and writes the deprecated encryption format used by Acrobat version
|
|
|
|
|
IX. This option should not be used in practice to generate PDF files
|
|
|
|
|
that will be in general use, but it can be useful to generate files
|
|
|
|
|
if you are trying to test proper support in another application for
|
|
|
|
|
PDF files encrypted in this way.
|
|
|
|
|
|
|
|
|
|
The default for each permission option is to be fully permissive.
|
|
|
|
|
|
|
|
|
|
.. _ref.page-selection:
|
|
|
|
|
|
|
|
|
|
Page Selection Options
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
Starting with qpdf 3.0, it is possible to split and merge PDF files by
|
|
|
|
|
selecting pages from one or more input files. Whatever file is given as
|
|
|
|
|
the primary input file is used as the starting point, but its pages are
|
|
|
|
|
replaced with pages as specified.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
--pages input-file [ --password=password ] [ page-range ] [ ... ] --
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Multiple input files may be specified. Each one is given as the name of
|
|
|
|
|
the input file, an optional password (if required to open the file), and
|
2021-12-12 00:11:56 +00:00
|
|
|
|
the range of pages. Note that ":samp:`--`" terminates
|
2021-12-11 21:53:08 +00:00
|
|
|
|
parsing of page selection flags.
|
|
|
|
|
|
|
|
|
|
Starting with qpf 8.4, the special input file name
|
2021-12-12 00:02:42 +00:00
|
|
|
|
":file:`.`" can be used as a shortcut for the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
primary input filename.
|
|
|
|
|
|
|
|
|
|
For each file that pages should be taken from, specify the file, a
|
|
|
|
|
password needed to open the file (if any), and a page range. The
|
|
|
|
|
password needs to be given only once per file. If any of the input files
|
|
|
|
|
are the same as the primary input file or the file used to copy
|
|
|
|
|
encryption parameters (if specified), you do not need to repeat the
|
|
|
|
|
password here. The same file can be repeated multiple times. If a file
|
|
|
|
|
that is repeated has a password, the password only has to be given the
|
|
|
|
|
first time. All non-page data (info, outlines, page numbers, etc.) are
|
|
|
|
|
taken from the primary input file. To discard these, use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--empty` as the primary input.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
|
|
|
|
|
sees a value in the place where it expects a page range and that value
|
|
|
|
|
is not a valid range but is a valid file name, qpdf will implicitly use
|
|
|
|
|
the range ``1-z``, meaning that it will include all pages in the file.
|
|
|
|
|
This makes it possible to easily combine all pages in a set of files
|
2021-12-12 00:01:40 +00:00
|
|
|
|
with a command like :command:`qpdf --empty out.pdf --pages \*.pdf
|
|
|
|
|
--`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
The page range is a set of numbers separated by commas, ranges of
|
|
|
|
|
numbers separated dashes, or combinations of those. The character "z"
|
|
|
|
|
represents the last page. A number preceded by an "r" indicates to count
|
|
|
|
|
from the end, so ``r3-r1`` would be the last three pages of the
|
|
|
|
|
document. Pages can appear in any order. Ranges can appear with a high
|
|
|
|
|
number followed by a low number, which causes the pages to appear in
|
|
|
|
|
reverse. Numbers may be repeated in a page range. A page range may be
|
|
|
|
|
optionally appended with ``:even`` or ``:odd`` to indicate only the even
|
|
|
|
|
or odd pages in the given range. Note that even and odd refer to the
|
|
|
|
|
positions within the specified, range, not whether the original number
|
|
|
|
|
is even or odd.
|
|
|
|
|
|
|
|
|
|
Example page ranges:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
|
|
|
|
|
that order.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``z-1``: all pages in the document in reverse
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``r3-r1``: the last three pages of the document
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``r1-r3``: the last three pages of the document in reverse order
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``1-20:even``: even pages from 2 to 20
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
|
|
|
|
|
positions from among the original range, which represents pages 5, 7,
|
|
|
|
|
8, 9, and 12.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Starting in qpdf version 8.3, you can specify the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--collate` option. Note that this option is
|
|
|
|
|
specified outside of :samp:`--pages ... --`. When
|
|
|
|
|
:samp:`--collate` is specified, it changes the meaning
|
|
|
|
|
of :samp:`--pages` so that the specified files, as
|
2021-12-11 21:53:08 +00:00
|
|
|
|
modified by page ranges, are collated rather than concatenated. For
|
2021-12-12 00:02:42 +00:00
|
|
|
|
example, if you add the files :file:`odd.pdf` and
|
|
|
|
|
:file:`even.pdf` containing odd and even pages of a
|
2021-12-12 00:01:40 +00:00
|
|
|
|
document respectively, you could run :command:`qpdf --collate odd.pdf
|
|
|
|
|
--pages odd.pdf even.pdf -- all.pdf` to collate the pages.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This would pick page 1 from odd, page 1 from even, page 2 from odd, page
|
|
|
|
|
2 from even, etc. until all pages have been included. Any number of
|
|
|
|
|
files and page ranges can be specified. If any file has fewer pages,
|
|
|
|
|
that file is just skipped when its pages have all been included. For
|
2021-12-12 00:01:40 +00:00
|
|
|
|
example, if you ran :command:`qpdf --collate --empty --pages a.pdf
|
|
|
|
|
1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
following pages in this order:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 1
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- b.pdf page 6
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- c.pdf last page
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 2
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- b.pdf page 5
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 3
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- b.pdf page 4
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 4
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 5
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Starting in qpdf version 10.2, you may specify a numeric argument to
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--collate`. With
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--collate={n}`,
|
|
|
|
|
pull groups of :samp:`{n}` pages from each file,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
again, stopping when there are no more pages. For example, if you ran
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
|
|
|
|
|
r1 -- out.pdf`, you would get the following pages in this
|
2021-12-11 21:53:08 +00:00
|
|
|
|
order:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 1
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 2
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- b.pdf page 6
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- b.pdf page 5
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- c.pdf last page
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 3
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 4
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- b.pdf page 4
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- a.pdf page 5
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Starting in qpdf version 8.3, when you split and merge files, any page
|
|
|
|
|
labels (page numbers) are preserved in the final file. It is expected
|
|
|
|
|
that more document features will be preserved by splitting and merging.
|
|
|
|
|
In the mean time, semantics of splitting and merging vary across
|
|
|
|
|
features. For example, the document's outlines (bookmarks) point to
|
|
|
|
|
actual page objects, so if you select some pages and not others,
|
|
|
|
|
bookmarks that point to pages that are in the output file will work, and
|
|
|
|
|
remaining bookmarks will not work. A future version of
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` may do a better job at handling these
|
2021-12-11 21:53:08 +00:00
|
|
|
|
issues. (Note that the qpdf library already contains all of the APIs
|
|
|
|
|
required in order to implement this in your own application if you need
|
|
|
|
|
it.) In the mean time, you can always use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--empty` as the primary input file to avoid
|
2021-12-11 21:53:08 +00:00
|
|
|
|
copying all of that from the first file. For example, to take pages 1
|
2021-12-12 00:02:42 +00:00
|
|
|
|
through 5 from a :file:`infile.pdf` while preserving
|
2021-12-11 21:53:08 +00:00
|
|
|
|
all metadata associated with that file, you could use
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
qpdf infile.pdf --pages . 1-5 -- outfile.pdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
If you wanted pages 1 through 5 from
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`infile.pdf` but you wanted the rest of the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
metadata to be dropped, you could instead run
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
If you wanted to take pages 1 through 5 from
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`file1.pdf` and pages 11 through 15 from
|
|
|
|
|
:file:`file2.pdf` in reverse, taking document-level
|
|
|
|
|
metadata from :file:`file2.pdf`, you would run
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
If, for some reason, you wanted to take the first page of an encrypted
|
2021-12-12 00:02:42 +00:00
|
|
|
|
file called :file:`encrypted.pdf` with password
|
2021-12-11 21:53:08 +00:00
|
|
|
|
``pass`` and repeat it twice in an output file, and if you wanted to
|
|
|
|
|
drop document-level metadata but preserve encryption, you would use
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
|
2021-12-11 21:53:08 +00:00
|
|
|
|
--pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
|
2021-12-12 00:11:56 +00:00
|
|
|
|
outfile.pdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Note that we had to specify the password all three times because giving
|
2021-12-12 00:11:56 +00:00
|
|
|
|
a password as :samp:`--encryption-file-password` doesn't
|
2021-12-11 21:53:08 +00:00
|
|
|
|
count for page selection, and as far as qpdf is concerned,
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`encrypted.pdf` and
|
|
|
|
|
:file:`./encrypted.pdf` are separated files. These
|
2021-12-11 21:53:08 +00:00
|
|
|
|
are all corner cases that most users should hopefully never have to be
|
|
|
|
|
bothered with.
|
|
|
|
|
|
|
|
|
|
Prior to version 8.4, it was not possible to specify the same page from
|
|
|
|
|
the same file directly more than once, and the workaround of specifying
|
|
|
|
|
the same file in more than one way was required. Version 8.4 removes
|
|
|
|
|
this limitation, but there is still a valid use case. When you specify
|
|
|
|
|
the same page from the same file more than once, qpdf will share objects
|
|
|
|
|
between the pages. If you are going to do further manipulation on the
|
|
|
|
|
file and need the two instances of the same original page to be deep
|
|
|
|
|
copies, then you can specify the file in two different ways. For example
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
would create a file with two copies of the first page of the input, and
|
|
|
|
|
the two copies would share any objects in common. This includes fonts,
|
|
|
|
|
images, and anything else the page references.
|
|
|
|
|
|
|
|
|
|
.. _ref.overlay-underlay:
|
|
|
|
|
|
|
|
|
|
Overlay and Underlay Options
|
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
Starting with qpdf 8.4, it is possible to overlay or underlay pages from
|
|
|
|
|
other files onto the output generated by qpdf. Specify overlay or
|
|
|
|
|
underlay as follows:
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
{ --overlay | --underlay } file [ options ] --
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Overlay and underlay options are processed late, so they can be combined
|
|
|
|
|
with other like merging and will apply to the final output. The
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--overlay` and :samp:`--underlay`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
options work the same way, except underlay pages are drawn underneath
|
|
|
|
|
the page to which they are applied, possibly obscured by the original
|
|
|
|
|
page, and overlay files are drawn on top of the page to which they are
|
|
|
|
|
applied, possibly obscuring the page. You can combine overlay and
|
|
|
|
|
underlay.
|
|
|
|
|
|
|
|
|
|
The default behavior of overlay and underlay is that pages are taken
|
|
|
|
|
from the overlay/underlay file in sequence and applied to corresponding
|
|
|
|
|
pages in the output until there are no more output pages. If the overlay
|
|
|
|
|
or underlay file runs out of pages, remaining output pages are left
|
|
|
|
|
alone. This behavior can be modified by options, which are provided
|
2021-12-12 00:11:56 +00:00
|
|
|
|
between the :samp:`--overlay` or
|
|
|
|
|
:samp:`--underlay` flag and the
|
|
|
|
|
:samp:`--` option. The following options are supported:
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--password=password`: supply a password if the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
overlay/underlay file is encrypted.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--to=page-range`: a range of pages in the same
|
2021-12-12 00:31:19 +00:00
|
|
|
|
form at described in :ref:`ref.page-selection`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
indicates which pages in the output should have the overlay/underlay
|
|
|
|
|
applied. If not specified, overlay/underlay are applied to all pages.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--from=[page-range]`: a range of pages that
|
2021-12-11 23:49:31 +00:00
|
|
|
|
specifies which pages in the overlay/underlay file will be used for
|
|
|
|
|
overlay or underlay. If not specified, all pages will be used. This
|
|
|
|
|
can be explicitly specified to be empty if
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--repeat` is used.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--repeat=page-range`: an optional range of
|
2021-12-11 23:49:31 +00:00
|
|
|
|
pages that specifies which pages in the overlay/underlay file will be
|
|
|
|
|
repeated after the "from" pages are used up. If you want to repeat a
|
|
|
|
|
range of pages starting at the beginning, you can explicitly use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--from=`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Here are some examples.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
|
|
|
|
|
--`: overlay the first three pages from file
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`o.pdf` onto the first three pages of the
|
|
|
|
|
output, then overlay page 4 from :file:`o.pdf`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
onto pages 4 and 5 of the output. Leave remaining output pages
|
|
|
|
|
untouched.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- :command:`--underlay footer.pdf --from= --repeat=1,2
|
|
|
|
|
--`: Underlay page 1 of
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`footer.pdf` on all odd output pages, and
|
|
|
|
|
underlay page 2 of :file:`footer.pdf` on all even
|
2021-12-11 23:49:31 +00:00
|
|
|
|
output pages.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.attachments:
|
|
|
|
|
|
|
|
|
|
Embedded Files/Attachments Options
|
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
|
|
Starting with qpdf 10.2, you can work with file attachments in PDF files
|
|
|
|
|
from the command line. The following options are available:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--list-attachments`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Show the "key" and stream number for embedded files. With
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--verbose`, additional information, including
|
2021-12-11 21:53:08 +00:00
|
|
|
|
preferred file name, description, dates, and more are also displayed.
|
|
|
|
|
The key is usually but not always equal to the file name, and is
|
|
|
|
|
needed by some of the other options.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--show-attachment={key}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Write the contents of the specified attachment to standard output as
|
|
|
|
|
binary data. The key should match one of the keys shown by
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--list-attachments`. If specified multiple
|
2021-12-11 21:53:08 +00:00
|
|
|
|
times, only the last attachment will be shown.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--add-attachment {file} {options} --`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Add or replace an attachment with the contents of
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{file}`. This may be specified more
|
2021-12-11 21:53:08 +00:00
|
|
|
|
than once. The following additional options may appear before the
|
|
|
|
|
``--`` that ends this option:
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--key={key}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
The key to use to register the attachment in the embedded files
|
|
|
|
|
table. Defaults to the last path element of
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{file}`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--filename={name}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
The file name to be used for the attachment. This is what is
|
|
|
|
|
usually displayed to the user and is the name most graphical PDF
|
|
|
|
|
viewers will use when saving a file. It defaults to the last path
|
2021-12-12 21:18:03 +00:00
|
|
|
|
element of :samp:`{file}`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--creationdate={date}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
The attachment's creation date in PDF format; defaults to the
|
|
|
|
|
current time. The date format is explained below.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--moddate={date}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
The attachment's modification date in PDF format; defaults to the
|
|
|
|
|
current time. The date format is explained below.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--mimetype={type/subtype}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
The mime type for the attachment, e.g. ``text/plain`` or
|
|
|
|
|
``application/pdf``. Note that the mimetype appears in a field
|
|
|
|
|
called ``/Subtype`` in the PDF but actually includes the full type
|
|
|
|
|
and subtype of the mime type.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--description={"text"}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Descriptive text for the attachment, displayed by some PDF
|
|
|
|
|
viewers.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--replace`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Indicates that any existing attachment with the same key should be
|
|
|
|
|
replaced by the new attachment. Otherwise,
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` gives an error if an attachment
|
2021-12-11 21:53:08 +00:00
|
|
|
|
with that key is already present.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--remove-attachment={key}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Remove the specified attachment. This doesn't only remove the
|
|
|
|
|
attachment from the embedded files table but also clears out the file
|
|
|
|
|
specification. That means that any potential internal links to the
|
|
|
|
|
attachment will be broken. This option may be specified multiple
|
2021-12-12 00:11:56 +00:00
|
|
|
|
times. Run with :samp:`--verbose` to see status of
|
2021-12-11 21:53:08 +00:00
|
|
|
|
the removal.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--copy-attachments-from {file} {options} --`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Copy attachments from another file. This may be specified more than
|
|
|
|
|
once. The following additional options may appear before the ``--``
|
|
|
|
|
that ends this option:
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--password={password}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If required, the password needed to open
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{file}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--prefix={prefix}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Only required if the file from which attachments are being copied
|
|
|
|
|
has attachments with keys that conflict with attachments already
|
|
|
|
|
in the file. In this case, the specified prefix will be prepended
|
|
|
|
|
to each key. This affects only the key in the embedded files
|
|
|
|
|
table, not the file name. The PDF specification doesn't preclude
|
|
|
|
|
multiple attachments having the same file name.
|
|
|
|
|
|
|
|
|
|
When a date is required, the date should conform to the PDF date format
|
|
|
|
|
specification, which is
|
2021-12-12 21:18:03 +00:00
|
|
|
|
``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where
|
|
|
|
|
:samp:`{<z>}` is either ``Z`` for UTC or a
|
|
|
|
|
timezone offset in the form :samp:`{-hh'mm'}` or
|
|
|
|
|
:samp:`{+hh'mm'}`. Examples:
|
2021-12-11 21:53:08 +00:00
|
|
|
|
``D:20210207161528-05'00'``, ``D:20210207211528Z``.
|
|
|
|
|
|
|
|
|
|
.. _ref.advanced-parsing:
|
|
|
|
|
|
|
|
|
|
Advanced Parsing Options
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
These options control aspects of how qpdf reads PDF files. Mostly these
|
|
|
|
|
are of use to people who are working with damaged files. There is little
|
|
|
|
|
reason to use these options unless you are trying to solve specific
|
|
|
|
|
problems. The following options are available:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--suppress-recovery`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Prevents qpdf from attempting to recover damaged files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--ignore-xref-streams`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Tells qpdf to ignore any cross-reference streams.
|
|
|
|
|
|
|
|
|
|
Ordinarily, qpdf will attempt to recover from certain types of errors in
|
|
|
|
|
PDF files. These include errors in the cross-reference table, certain
|
|
|
|
|
types of object numbering errors, and certain types of stream length
|
|
|
|
|
errors. Sometimes, qpdf may think it has recovered but may not have
|
|
|
|
|
actually recovered, so care should be taken when using this option as
|
|
|
|
|
some data loss is possible. The
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--suppress-recovery` option will prevent qpdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
from attempting recovery. In this case, it will fail on the first error
|
|
|
|
|
that it encounters.
|
|
|
|
|
|
|
|
|
|
Ordinarily, qpdf reads cross-reference streams when they are present in
|
2021-12-12 00:11:56 +00:00
|
|
|
|
a PDF file. If :samp:`--ignore-xref-streams` is
|
2021-12-11 21:53:08 +00:00
|
|
|
|
specified, qpdf will ignore any cross-reference streams for hybrid PDF
|
|
|
|
|
files. The purpose of hybrid files is to make some content available to
|
|
|
|
|
viewers that are not aware of cross-reference streams. It is almost
|
|
|
|
|
never desirable to ignore them. The only time when you might want to use
|
|
|
|
|
this feature is if you are testing creation of hybrid PDF files and wish
|
|
|
|
|
to see how a PDF consumer that doesn't understand object and
|
|
|
|
|
cross-reference streams would interpret such a file.
|
|
|
|
|
|
|
|
|
|
.. _ref.advanced-transformation:
|
|
|
|
|
|
|
|
|
|
Advanced Transformation Options
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
These transformation options control fine points of how qpdf creates the
|
|
|
|
|
output file. Mostly these are of use only to people who are very
|
|
|
|
|
familiar with the PDF file format or who are PDF developers. The
|
|
|
|
|
following options are available:
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--compress-streams={[yn]}`
|
2021-12-12 00:11:56 +00:00
|
|
|
|
By default, or with :samp:`--compress-streams=y`,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
qpdf will compress any stream with no other filters applied to it
|
|
|
|
|
with the ``/FlateDecode`` filter when it writes it. To suppress this
|
|
|
|
|
behavior and preserve uncompressed streams as uncompressed, use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compress-streams=n`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--decode-level={option}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Controls which streams qpdf tries to decode. The default is
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`generalized`. The following options are
|
2021-12-11 21:53:08 +00:00
|
|
|
|
available:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`none`: do not attempt to decode any streams
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`generalized`: decode streams filtered with
|
2021-12-11 23:49:31 +00:00
|
|
|
|
supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
|
|
|
|
|
``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
|
|
|
|
|
filters as those to be used for general-purpose compression or
|
|
|
|
|
encoding, as opposed to filters specifically designed for image
|
|
|
|
|
data. Note that, by default, streams already compressed with
|
|
|
|
|
``/FlateDecode`` are not uncompressed and recompressed unless you
|
2021-12-12 00:11:56 +00:00
|
|
|
|
also specify :samp:`--recompress-flate`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`specialized`: in addition to generalized,
|
2021-12-11 23:49:31 +00:00
|
|
|
|
decode streams with supported non-lossy specialized filters;
|
|
|
|
|
currently this is just ``/RunLengthDecode``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`all`: in addition to generalized and
|
2021-12-11 23:49:31 +00:00
|
|
|
|
specialized, decode streams with supported lossy filters;
|
|
|
|
|
currently this is just ``/DCTDecode`` (JPEG)
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--stream-data={option}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Controls transformation of stream data. This option predates the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compress-streams` and
|
|
|
|
|
:samp:`--decode-level` options. Those options can be
|
2021-12-11 21:53:08 +00:00
|
|
|
|
used to achieve the same affect with more control. The value of
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{option}` may
|
2021-12-11 21:53:08 +00:00
|
|
|
|
be one of the following:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`compress`: recompress stream data when
|
2021-12-11 23:49:31 +00:00
|
|
|
|
possible (default); equivalent to
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compress-streams=y`
|
|
|
|
|
:samp:`--decode-level=generalized`. Does not
|
2021-12-11 23:49:31 +00:00
|
|
|
|
recompress streams already compressed with ``/FlateDecode`` unless
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--recompress-flate` is also specified.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`preserve`: leave all stream data as is;
|
|
|
|
|
equivalent to :samp:`--compress-streams=n`
|
|
|
|
|
:samp:`--decode-level=none`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`uncompress`: uncompress stream data
|
2021-12-11 23:49:31 +00:00
|
|
|
|
compressed with generalized filters when possible; equivalent to
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compress-streams=n`
|
|
|
|
|
:samp:`--decode-level=generalized`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--recompress-flate`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
By default, streams already compressed with ``/FlateDecode`` are left
|
|
|
|
|
alone rather than being uncompressed and recompressed. This option
|
|
|
|
|
causes qpdf to uncompress and recompress the streams. There is a
|
|
|
|
|
significant performance cost to using this option, but you probably
|
|
|
|
|
want to use it if you specify
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compression-level`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--compression-level={level}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
When writing new streams that are compressed with ``/FlateDecode``,
|
|
|
|
|
use the specified compression level. The value of
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`level` should be a number from 1 to 9 and is
|
2021-12-11 21:53:08 +00:00
|
|
|
|
passed directly to zlib, which implements deflate compression. Note
|
|
|
|
|
that qpdf doesn't uncompress and recompress streams by default. To
|
|
|
|
|
have this option apply to already compressed streams, you should also
|
2021-12-12 00:11:56 +00:00
|
|
|
|
specify :samp:`--recompress-flate`. If your goal is
|
2021-12-11 21:53:08 +00:00
|
|
|
|
to shrink the size of PDF files, you should also use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--object-streams=generate`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--normalize-content=[yn]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Enables or disables normalization of content streams. Content
|
2021-12-12 00:31:19 +00:00
|
|
|
|
normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--object-streams={mode}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Controls handling of object streams. The value of
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{mode}` may be
|
2021-12-11 21:53:08 +00:00
|
|
|
|
one of the following:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`preserve`: preserve original object streams
|
2021-12-11 23:49:31 +00:00
|
|
|
|
(default)
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`disable`: don't write any object streams
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`generate`: use object streams wherever
|
2021-12-11 23:49:31 +00:00
|
|
|
|
possible
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--preserve-unreferenced`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Tells qpdf to preserve objects that are not referenced when writing
|
|
|
|
|
the file. Ordinarily any object that is not referenced in a traversal
|
|
|
|
|
of the document from the trailer dictionary will be discarded. This
|
|
|
|
|
may be useful in working with some damaged files or inspecting files
|
|
|
|
|
with known unreferenced objects.
|
|
|
|
|
|
|
|
|
|
This flag is ignored for linearized files and has the effect of
|
|
|
|
|
causing objects in the new file to be written in order by object ID
|
|
|
|
|
from the original file. This does not mean that object numbers will
|
|
|
|
|
be the same since qpdf may create stream lengths as direct or
|
|
|
|
|
indirect differently from the original file, and the original file
|
|
|
|
|
may have gaps in its numbering.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
See also :samp:`--preserve-unreferenced-resources`,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
which does something completely different.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--remove-unreferenced-resources={option}`
|
|
|
|
|
The :samp:`{option}` may be ``auto``,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
``yes``, or ``no``. The default is ``auto``.
|
|
|
|
|
|
|
|
|
|
Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
|
|
|
|
|
to remove images and fonts that are not used by a page even if they
|
|
|
|
|
are referenced in the page's resources dictionary. When shared
|
|
|
|
|
resources are in use, this behavior can greatly reduce the file sizes
|
|
|
|
|
of split pages, but the analysis is very slow. In versions from 8.1
|
|
|
|
|
through 9.1.1, qpdf did this analysis by default. Starting in qpdf
|
|
|
|
|
10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
|
|
|
|
|
to determine whether the file is likely to have unreferenced objects
|
|
|
|
|
on pages, a pattern that frequently occurs when resource dictionaries
|
|
|
|
|
are shared across multiple pages and rarely occurs otherwise. If it
|
|
|
|
|
discovers this pattern, then it will attempt to remove unreferenced
|
|
|
|
|
resources. Usually this means you get the slower splitting speed only
|
|
|
|
|
when it's actually going to create smaller files. You can suppress
|
|
|
|
|
removal of unreferenced resources altogether by specifying ``no`` or
|
|
|
|
|
force it to do the full algorithm by specifying ``yes``.
|
|
|
|
|
|
|
|
|
|
Other than cases in which you don't care about file size and care a
|
|
|
|
|
lot about runtime, there are few reasons to use this option,
|
|
|
|
|
especially now that ``auto`` mode is supported. One reason to use
|
|
|
|
|
this is if you suspect that qpdf is removing resources it shouldn't
|
|
|
|
|
be removing. If you encounter that case, please report it as bug at
|
|
|
|
|
https://github.com/qpdf/qpdf/issues/.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--preserve-unreferenced-resources`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This is a synonym for
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--remove-unreferenced-resources=no`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
See also :samp:`--preserve-unreferenced`, which does
|
2021-12-11 21:53:08 +00:00
|
|
|
|
something completely different.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--newline-before-endstream`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Tells qpdf to insert a newline before the ``endstream`` keyword, not
|
|
|
|
|
counted in the length, after any stream content even if the last
|
|
|
|
|
character of the stream was a newline. This may result in two
|
|
|
|
|
newlines in some cases. This is a requirement of PDF/A. While qpdf
|
|
|
|
|
doesn't specifically know how to generate PDF/A-compliant PDFs, this
|
|
|
|
|
at least prevents it from removing compliance on already compliant
|
|
|
|
|
files.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--linearize-pass1={file}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Write the first pass of linearization to the named file. The
|
|
|
|
|
resulting file is not a valid PDF file. This option is useful only
|
|
|
|
|
for debugging ``QPDFWriter``'s linearization code. When qpdf
|
|
|
|
|
linearizes files, it writes the file in two passes, using the first
|
|
|
|
|
pass to calculate sizes and offsets that are required for hint tables
|
|
|
|
|
and the linearization dictionary. Ordinarily, the first pass is
|
|
|
|
|
discarded. This option enables it to be captured.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--coalesce-contents`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
When a page's contents are split across multiple streams, this option
|
|
|
|
|
causes qpdf to combine them into a single stream. Use of this option
|
|
|
|
|
is never necessary for ordinary usage, but it can help when working
|
|
|
|
|
with some files in some cases. For example, this can also be combined
|
|
|
|
|
with QDF mode or content normalization to make it easier to look at
|
|
|
|
|
all of a page's contents at once.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--flatten-annotations={option}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option collapses annotations into the pages' contents with
|
|
|
|
|
special handling for form fields. Ordinarily, an annotation is
|
|
|
|
|
rendered separately and on top of the page. Combining annotations
|
|
|
|
|
into the page's contents effectively freezes the placement of the
|
|
|
|
|
annotations, making them look right after various page
|
|
|
|
|
transformations. The library functionality backing this option was
|
|
|
|
|
added for the benefit of programs that want to create *n-up* page
|
|
|
|
|
layouts and other similar things that don't work well with
|
2021-12-12 21:18:03 +00:00
|
|
|
|
annotations. The :samp:`{option}` parameter
|
2021-12-11 21:53:08 +00:00
|
|
|
|
may be any of the following:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`all`: include all annotations that are not
|
2021-12-11 23:49:31 +00:00
|
|
|
|
marked invisible or hidden
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`print`: only include annotations that
|
2021-12-11 23:49:31 +00:00
|
|
|
|
indicate that they should appear when the page is printed
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`screen`: omit annotations that indicate
|
2021-12-11 23:49:31 +00:00
|
|
|
|
they should not appear on the screen
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Note that form fields are special because the annotations that are
|
|
|
|
|
used to render filled-in form fields may become out of date from the
|
|
|
|
|
fields' values if the form is filled in by a program that doesn't
|
|
|
|
|
know how to update the appearances. If qpdf detects this case, its
|
|
|
|
|
default behavior is not to flatten those annotations because doing so
|
|
|
|
|
would cause the value of the form field to be lost. This gives you a
|
|
|
|
|
chance to go back and resave the form with a program that knows how
|
|
|
|
|
to generate appearances. QPDF itself can generate appearances with
|
|
|
|
|
some limitations. See the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--generate-appearances` option below.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--generate-appearances`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
If a file contains interactive form fields and indicates that the
|
|
|
|
|
appearances are out of date with the values of the form, this flag
|
|
|
|
|
will regenerate appearances, subject to a few limitations. Note that
|
|
|
|
|
there is not usually a reason to do this, but it can be necessary
|
2021-12-12 00:11:56 +00:00
|
|
|
|
before using the :samp:`--flatten-annotations`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option. Most of these are not a problem with well-behaved PDF files.
|
|
|
|
|
The limitations are as follows:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Radio button and checkbox appearances use the pre-set values in
|
|
|
|
|
the PDF file. QPDF just makes sure that the correct appearance is
|
|
|
|
|
displayed based on the value of the field. This is fine for PDF
|
|
|
|
|
files that create their forms properly. Some PDF writers save
|
|
|
|
|
appearances for fields when they change, which could cause some
|
|
|
|
|
controls to have inconsistent appearances.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- For text fields and list boxes, any characters that fall outside
|
|
|
|
|
of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
|
|
|
|
|
encoding, will be replaced by the ``?`` character.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Quadding is ignored. Quadding is used to specify whether the
|
|
|
|
|
contents of a field should be left, center, or right aligned with
|
|
|
|
|
the field.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Rich text, multi-line, and other more elaborate formatting
|
|
|
|
|
directives are ignored.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- There is no support for multi-select fields or signature fields.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
If qpdf doesn't do a good enough job with your form, use an external
|
|
|
|
|
application to save your filled-in form before processing it with
|
|
|
|
|
qpdf.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--optimize-images`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This flag causes qpdf to recompress all images that are not
|
|
|
|
|
compressed with DCT (JPEG) using DCT compression as long as doing so
|
|
|
|
|
decreases the size in bytes of the image data and the image does not
|
|
|
|
|
fall below minimum specified dimensions. Useful information is
|
|
|
|
|
provided when used in combination with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--verbose`. See also the
|
|
|
|
|
:samp:`--oi-min-width`,
|
|
|
|
|
:samp:`--oi-min-height`, and
|
|
|
|
|
:samp:`--oi-min-area` options. By default, starting
|
2021-12-11 21:53:08 +00:00
|
|
|
|
in qpdf 8.4, inline images are converted to regular images and
|
2021-12-12 00:11:56 +00:00
|
|
|
|
optimized as well. Use :samp:`--keep-inline-images`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
to prevent inline images from being included.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--oi-min-width={width}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Avoid optimizing images whose width is below the specified amount. If
|
|
|
|
|
omitted, the default is 128 pixels. Use 0 for no minimum.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--oi-min-height={height}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Avoid optimizing images whose height is below the specified amount.
|
|
|
|
|
If omitted, the default is 128 pixels. Use 0 for no minimum.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--oi-min-area={area-in-pixels}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Avoid optimizing images whose pixel count (width × height) is below
|
|
|
|
|
the specified amount. If omitted, the default is 16,384 pixels. Use 0
|
|
|
|
|
for no minimum.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--externalize-inline-images`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Convert inline images to regular images. By default, images whose
|
|
|
|
|
data is at least 1,024 bytes are converted when this option is
|
2021-12-12 00:11:56 +00:00
|
|
|
|
selected. Use :samp:`--ii-min-bytes` to change the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
size threshold. This option is implicitly selected when
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--optimize-images` is selected. Use
|
|
|
|
|
:samp:`--keep-inline-images` to exclude inline images
|
2021-12-11 21:53:08 +00:00
|
|
|
|
from image optimization.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--ii-min-bytes={bytes}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Avoid converting inline images whose size is below the specified
|
|
|
|
|
minimum size to regular images. If omitted, the default is 1,024
|
|
|
|
|
bytes. Use 0 for no minimum.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--keep-inline-images`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Prevent inline images from being included in image optimization. This
|
2021-12-12 00:11:56 +00:00
|
|
|
|
option has no affect when :samp:`--optimize-images`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
is not specified.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--remove-page-labels`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Remove page labels from the output file.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--qdf`
|
2021-12-12 00:31:19 +00:00
|
|
|
|
Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
disables QDF mode.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--min-version={version}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Forces the PDF version of the output file to be at least
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{version}`. In other words, if the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
input file has a lower version than the specified version, the
|
|
|
|
|
specified version will be used. If the input file has a higher
|
|
|
|
|
version, the input file's original version will be used. It is seldom
|
|
|
|
|
necessary to use this option since qpdf will automatically increase
|
|
|
|
|
the version as needed when adding features that require newer PDF
|
|
|
|
|
readers.
|
|
|
|
|
|
|
|
|
|
The version number may be expressed in the form
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{major.minor.extension-level}`, in
|
2021-12-11 21:53:08 +00:00
|
|
|
|
which case the version is interpreted as
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{major.minor}` at extension level
|
|
|
|
|
:samp:`{extension-level}`. For example,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
version ``1.7.8`` represents version 1.7 at extension level 8. Note
|
|
|
|
|
that minimal syntax checking is done on the command line.
|
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--force-version={version}`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option forces the PDF version to be the exact version specified
|
|
|
|
|
*even when the file may have content that is not supported in that
|
|
|
|
|
version*. The version number is interpreted in the same way as with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--min-version` so that extension levels can be
|
2021-12-11 21:53:08 +00:00
|
|
|
|
set. In some cases, forcing the output file's PDF version to be lower
|
|
|
|
|
than that of the input file will cause qpdf to disable certain
|
|
|
|
|
features of the document. Specifically, 256-bit keys are disabled if
|
|
|
|
|
the version is less than 1.7 with extension level 8 (except R5 is
|
|
|
|
|
disabled if less than 1.7 with extension level 3), AES encryption is
|
|
|
|
|
disabled if the version is less than 1.6, cleartext metadata and
|
|
|
|
|
object streams are disabled if less than 1.5, 128-bit encryption keys
|
|
|
|
|
are disabled if less than 1.4, and all encryption is disabled if less
|
|
|
|
|
than 1.3. Even with these precautions, qpdf won't be able to do
|
|
|
|
|
things like eliminate use of newer image compression schemes,
|
|
|
|
|
transparency groups, or other features that may have been added in
|
|
|
|
|
more recent versions of PDF.
|
|
|
|
|
|
|
|
|
|
As a general rule, with the exception of big structural things like
|
|
|
|
|
the use of object streams or AES encryption, PDF viewers are supposed
|
|
|
|
|
to ignore features in files that they don't support from newer
|
|
|
|
|
versions. This means that forcing the version to a lower version may
|
|
|
|
|
make it possible to open your PDF file with an older version, though
|
|
|
|
|
bear in mind that some of the original document's functionality may
|
|
|
|
|
be lost.
|
|
|
|
|
|
|
|
|
|
By default, when a stream is encoded using non-lossy filters that qpdf
|
|
|
|
|
understands and is not already compressed using a good compression
|
|
|
|
|
scheme, qpdf will uncompress and recompress streams. Assuming proper
|
|
|
|
|
filter implements, this is safe and generally results in smaller files.
|
|
|
|
|
This behavior may also be explicitly requested with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--stream-data=compress`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
When :samp:`--normalize-content=y` is specified, qpdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
will attempt to normalize whitespace and newlines in page content
|
|
|
|
|
streams. This is generally safe but could, in some cases, cause damage
|
|
|
|
|
to the content streams. This option is intended for people who wish to
|
|
|
|
|
study PDF content streams or to debug PDF content. You should not use
|
|
|
|
|
this for "production" PDF files.
|
|
|
|
|
|
|
|
|
|
When normalizing content, if qpdf runs into any lexical errors, it will
|
|
|
|
|
print a warning indicating that content may be damaged. The only
|
|
|
|
|
situation in which qpdf is known to cause damage during content
|
|
|
|
|
normalization is when a page's contents are split across multiple
|
|
|
|
|
streams and streams are split in the middle of a lexical token such as a
|
|
|
|
|
string, name, or inline image. Note that files that do this are invalid
|
|
|
|
|
since the PDF specification states that content streams are not to be
|
|
|
|
|
split in the middle of a token. If you want to inspect the original
|
|
|
|
|
content streams in an uncompressed format, you can always run with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--qdf --normalize-content=n` for a QDF file
|
2021-12-11 21:53:08 +00:00
|
|
|
|
without content normalization, or alternatively
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--stream-data=uncompress` for a regular non-QDF
|
2021-12-11 21:53:08 +00:00
|
|
|
|
mode file with uncompressed streams. These will both uncompress all the
|
|
|
|
|
streams but will not attempt to normalize content. Please note that if
|
|
|
|
|
you are using content normalization or QDF mode for the purpose of
|
|
|
|
|
manually inspecting files, you don't have to care about this.
|
|
|
|
|
|
|
|
|
|
Object streams, also known as compressed objects, were introduced into
|
|
|
|
|
the PDF specification at version 1.5, corresponding to Acrobat 6. Some
|
|
|
|
|
older PDF viewers may not support files with object streams. qpdf can be
|
|
|
|
|
used to transform files with object streams to files without object
|
|
|
|
|
streams or vice versa. As mentioned above, there are three object stream
|
2021-12-12 00:11:56 +00:00
|
|
|
|
modes: :samp:`preserve`,
|
|
|
|
|
:samp:`disable`, and :samp:`generate`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
In :samp:`preserve` mode, the relationship to objects
|
2021-12-11 21:53:08 +00:00
|
|
|
|
and the streams that contain them is preserved from the original file.
|
2021-12-12 00:11:56 +00:00
|
|
|
|
In :samp:`disable` mode, all objects are written as
|
2021-12-11 21:53:08 +00:00
|
|
|
|
regular, uncompressed objects. The resulting file should be readable by
|
|
|
|
|
older PDF viewers. (Of course, the content of the files may include
|
|
|
|
|
features not supported by older viewers, but at least the structure will
|
2021-12-12 00:11:56 +00:00
|
|
|
|
be supported.) In :samp:`generate` mode, qpdf will
|
2021-12-11 21:53:08 +00:00
|
|
|
|
create its own object streams. This will usually result in more compact
|
|
|
|
|
PDF files, though they may not be readable by older viewers. In this
|
|
|
|
|
mode, qpdf will also make sure the PDF version number in the header is
|
|
|
|
|
at least 1.5.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
The :samp:`--qdf` flag turns on QDF mode, which changes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
some of the defaults described above. Specifically, in QDF mode, by
|
|
|
|
|
default, stream data is uncompressed, content streams are normalized,
|
|
|
|
|
and encryption is removed. These defaults can still be overridden by
|
|
|
|
|
specifying the appropriate options as described above. Additionally, in
|
|
|
|
|
QDF mode, stream lengths are stored as indirect objects, objects are
|
|
|
|
|
laid out in a less efficient but more readable fashion, and the
|
|
|
|
|
documents are interspersed with comments that make it easier for the
|
|
|
|
|
user to find things and also make it possible for
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`fix-qdf` to work properly. QDF mode is intended
|
2021-12-11 21:53:08 +00:00
|
|
|
|
for people, mostly developers, who wish to inspect or modify PDF files
|
2021-12-12 00:31:19 +00:00
|
|
|
|
in a text editor. For details, please see :ref:`ref.qdf`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.testing-options:
|
|
|
|
|
|
|
|
|
|
Testing, Inspection, and Debugging Options
|
|
|
|
|
------------------------------------------
|
|
|
|
|
|
|
|
|
|
These options can be useful for digging into PDF files or for use in
|
|
|
|
|
automated test suites for software that uses the qpdf library. When any
|
|
|
|
|
of the options in this section are specified, no output file should be
|
|
|
|
|
given. The following options are available:
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--deterministic-id`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Causes generation of a deterministic value for /ID. This prevents use
|
|
|
|
|
of timestamp and output file name information in the /ID generation.
|
|
|
|
|
Instead, at some slight additional runtime cost, the /ID field is
|
|
|
|
|
generated to include a digest of the significant parts of the content
|
|
|
|
|
of the output PDF file. This means that a given qpdf operation should
|
|
|
|
|
generate the same /ID each time it is run, which can be useful when
|
|
|
|
|
caching results or for generation of some test data. Use of this flag
|
|
|
|
|
is not compatible with creation of encrypted files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--static-id`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Causes generation of a fixed value for /ID. This is intended for
|
|
|
|
|
testing only. Never use it for production files. If you are trying to
|
|
|
|
|
get the same /ID each time for a given file and you are not
|
|
|
|
|
generating encrypted files, consider using the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--deterministic-id` option.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--static-aes-iv`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Causes use of a static initialization vector for AES-CBC. This is
|
|
|
|
|
intended for testing only so that output files can be reproducible.
|
|
|
|
|
Never use it for production files. This option in particular is not
|
|
|
|
|
secure since it significantly weakens the encryption.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--no-original-object-ids`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Suppresses inclusion of original object ID comments in QDF files.
|
|
|
|
|
This can be useful when generating QDF files for test purposes,
|
|
|
|
|
particularly when comparing them to determine whether two PDF files
|
|
|
|
|
have identical content.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-encryption`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Shows document encryption parameters. Also shows the document's user
|
|
|
|
|
password if the owner password is given.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-encryption-key`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
When encryption information is being displayed, as when
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` or
|
|
|
|
|
:samp:`--show-encryption` is given, display the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
computed or retrieved encryption key as a hexadecimal string. This
|
|
|
|
|
value is not ordinarily useful to users, but it can be used as the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
argument to :samp:`--password` if the
|
|
|
|
|
:samp:`--password-is-hex-key` is specified. Note
|
2021-12-11 21:53:08 +00:00
|
|
|
|
that, when PDF files are encrypted, passwords and other metadata are
|
|
|
|
|
used only to compute an encryption key, and the encryption key is
|
|
|
|
|
what is actually used for encryption. This enables retrieval of that
|
|
|
|
|
key.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check-linearization`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Checks file integrity and linearization status.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-linearization`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Checks and displays all data in the linearization hint tables.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-xref`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Shows the contents of the cross-reference table in a human-readable
|
|
|
|
|
form. This is especially useful for files with cross-reference
|
|
|
|
|
streams which are stored in a binary format.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-object=trailer|obj[,gen]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Show the contents of the given object. This is especially useful for
|
|
|
|
|
inspecting objects that are inside of object streams (also known as
|
|
|
|
|
"compressed objects").
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--raw-stream-data`
|
|
|
|
|
When used along with the :samp:`--show-object`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option, if the object is a stream, shows the raw stream data instead
|
|
|
|
|
of object's contents.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--filtered-stream-data`
|
|
|
|
|
When used along with the :samp:`--show-object`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option, if the object is a stream, shows the filtered stream data
|
|
|
|
|
instead of object's contents. If the stream is filtered using filters
|
|
|
|
|
that qpdf does not support, an error will be issued.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-npages`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Prints the number of pages in the input file on a line by itself.
|
|
|
|
|
Since the number of pages appears by itself on a line, this option
|
|
|
|
|
can be useful for scripting if you need to know the number of pages
|
|
|
|
|
in a file.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-pages`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Shows the object and generation number for each page dictionary
|
|
|
|
|
object and for each content stream associated with the page. Having
|
|
|
|
|
this information makes it more convenient to inspect objects from a
|
|
|
|
|
particular page.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--with-images`
|
|
|
|
|
When used along with :samp:`--show-pages`, also shows
|
2021-12-11 21:53:08 +00:00
|
|
|
|
the object and generation numbers for the image objects on each page.
|
|
|
|
|
(At present, information about images in shared resource dictionaries
|
|
|
|
|
are not output by this command. This is discussed in a comment in the
|
|
|
|
|
source code.)
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Generate a JSON representation of the file. This is described in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
depth in :ref:`ref.json`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json-help`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Describe the format of the JSON output.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json-key=key`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option is repeatable. If specified, only top-level keys
|
|
|
|
|
specified will be included in the JSON output. If not specified, all
|
|
|
|
|
keys will be shown.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json-object=trailer|obj[,gen]`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
This option is repeatable. If specified, only specified objects will
|
|
|
|
|
be shown in the "``objects``" key of the JSON output. If absent, all
|
|
|
|
|
objects will be shown.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
Checks file structure and well as encryption, linearization, and
|
|
|
|
|
encoding of stream data. A file for which
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` reports no errors may still have
|
2021-12-11 21:53:08 +00:00
|
|
|
|
errors in stream data content but should otherwise be structurally
|
2021-12-12 00:11:56 +00:00
|
|
|
|
sound. If :samp:`--check` any errors, qpdf will exit
|
2021-12-11 21:53:08 +00:00
|
|
|
|
with a status of 2. There are some recoverable conditions that
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` detects. These are issued as warnings
|
2021-12-11 21:53:08 +00:00
|
|
|
|
instead of errors. If qpdf finds no errors but finds warnings, it
|
|
|
|
|
will exit with a status of 3 (as of version 2.0.4). When
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` is combined with other options,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
checks are always performed before any other options are processed.
|
2021-12-12 00:11:56 +00:00
|
|
|
|
For erroneous files, :samp:`--check` will cause qpdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
to attempt to recover, after which other options are effectively
|
|
|
|
|
operating on the recovered file. Combining
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` with other options in this way can be
|
2021-12-11 21:53:08 +00:00
|
|
|
|
useful for manually recovering severely damaged files. Note that
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` produces no output to standard output
|
2021-12-11 21:53:08 +00:00
|
|
|
|
when everything is valid, so if you are using this to
|
|
|
|
|
programmatically validate files in bulk, it is safe to run without
|
2021-12-12 00:02:42 +00:00
|
|
|
|
output redirected to :file:`/dev/null` and just
|
2021-12-11 21:53:08 +00:00
|
|
|
|
check for a 0 exit code.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
The :samp:`--raw-stream-data` and
|
|
|
|
|
:samp:`--filtered-stream-data` options are ignored
|
|
|
|
|
unless :samp:`--show-object` is given. Either of these
|
2021-12-11 21:53:08 +00:00
|
|
|
|
options will cause the stream data to be written to standard output. In
|
|
|
|
|
order to avoid commingling of stream data with other output, it is
|
|
|
|
|
recommend that these objects not be combined with other test/inspection
|
|
|
|
|
options.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
If :samp:`--filtered-stream-data` is given and
|
|
|
|
|
:samp:`--normalize-content=y` is also given, qpdf will
|
2021-12-11 21:53:08 +00:00
|
|
|
|
attempt to normalize the stream data as if it is a page content stream.
|
|
|
|
|
This attempt will be made even if it is not a page content stream, in
|
|
|
|
|
which case it will produce unusable results.
|
|
|
|
|
|
|
|
|
|
.. _ref.unicode-passwords:
|
|
|
|
|
|
|
|
|
|
Unicode Passwords
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
At the library API level, all methods that perform encryption and
|
|
|
|
|
decryption interpret passwords as strings of bytes. It is up to the
|
|
|
|
|
caller to ensure that they are appropriately encoded. Starting with qpdf
|
|
|
|
|
version 8.4.0, qpdf will attempt to make this easier for you when
|
|
|
|
|
interact with qpdf via its command line interface. The PDF specification
|
|
|
|
|
requires passwords used to encrypt files with 40-bit or 128-bit
|
|
|
|
|
encryption to be encoded with PDF Doc encoding. This encoding is a
|
|
|
|
|
single-byte encoding that supports ISO-Latin-1 and a handful of other
|
|
|
|
|
commonly used characters. It has a large overlap with Windows ANSI but
|
|
|
|
|
is not exactly the same. There is generally not a way to provide PDF Doc
|
|
|
|
|
encoded strings on the command line. As such, qpdf versions prior to
|
|
|
|
|
8.4.0 would often create PDF files that couldn't be opened with other
|
|
|
|
|
software when given a password with non-ASCII characters to encrypt a
|
|
|
|
|
file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
|
|
|
|
|
recognizes the encoding of the parameter and transcodes it as needed.
|
|
|
|
|
The rest of this section provides the details about exactly how qpdf
|
|
|
|
|
behaves. Most users will not need to know this information, but it might
|
|
|
|
|
be useful if you have been working around qpdf's old behavior or if you
|
|
|
|
|
are using qpdf to generate encrypted files for testing other PDF
|
|
|
|
|
software.
|
|
|
|
|
|
|
|
|
|
A note about Windows: when qpdf builds, it attempts to determine what it
|
|
|
|
|
has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
|
|
|
|
|
function is an alternative entry point that receives all arguments as
|
|
|
|
|
UTF-16-encoded strings. When qpdf starts up this way, it converts all
|
|
|
|
|
the strings to UTF-8 encoding and then invokes the regular main. This
|
|
|
|
|
means that, as far as qpdf is concerned, it receives its command-line
|
|
|
|
|
arguments with UTF-8 encoding, just as it would in any modern Linux or
|
|
|
|
|
UNIX environment.
|
|
|
|
|
|
|
|
|
|
If a file is being encrypted with 40-bit or 128-bit encryption and the
|
|
|
|
|
supplied password is not a valid UTF-8 string, qpdf will fall back to
|
|
|
|
|
the behavior of interpreting the password as a string of bytes. If you
|
|
|
|
|
have old scripts that encrypt files by passing the output of
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`iconv` to qpdf, you no longer need to do that,
|
2021-12-11 21:53:08 +00:00
|
|
|
|
but if you do, qpdf should still work. The only exception would be for
|
|
|
|
|
the extremely unlikely case of a password that is encoded with a
|
|
|
|
|
single-byte encoding but also happens to be valid UTF-8. Such a password
|
|
|
|
|
would contain strings of even numbers of characters that alternate
|
|
|
|
|
between accented letters and symbols. In the extremely unlikely event
|
|
|
|
|
that you are intentionally using such passwords and qpdf is thwarting
|
|
|
|
|
you by interpreting them as UTF-8, you can use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--password-mode=bytes` to suppress qpdf's
|
2021-12-11 21:53:08 +00:00
|
|
|
|
automatic behavior.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
The :samp:`--password-mode` option, as described earlier
|
2021-12-11 21:53:08 +00:00
|
|
|
|
in this chapter, can be used to change qpdf's interpretation of supplied
|
|
|
|
|
passwords. There are very few reasons to use this option. One would be
|
|
|
|
|
the unlikely case described in the previous paragraph in which the
|
|
|
|
|
supplied password happens to be valid UTF-8 but isn't supposed to be
|
|
|
|
|
UTF-8. Your best bet would be just to provide the password as a valid
|
|
|
|
|
UTF-8 string, but you could also use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--password-mode=bytes`. Another reason to use
|
|
|
|
|
:samp:`--password-mode=bytes` would be to intentionally
|
2021-12-11 21:53:08 +00:00
|
|
|
|
generate PDF files encrypted with passwords that are not properly
|
|
|
|
|
encoded. The qpdf test suite does this to generate invalid files for the
|
|
|
|
|
purpose of testing its password recovery capability. If you were trying
|
|
|
|
|
to create intentionally incorrect files for a similar purposes, the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`bytes` password mode can enable you to do this.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
When qpdf attempts to decrypt a file with a password that contains
|
|
|
|
|
non-ASCII characters, it will generate a list of alternative passwords
|
|
|
|
|
by attempting to interpret the password as each of a handful of
|
|
|
|
|
different coding systems and then transcode them to the required format.
|
|
|
|
|
This helps to compensate for the supplied password being given in the
|
|
|
|
|
wrong coding system, such as would happen if you used the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`iconv` workaround that was previously needed.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
It also generates passwords by doing the reverse operation: translating
|
|
|
|
|
from correct in incorrect encoding of the password. This would enable
|
|
|
|
|
qpdf to decrypt files using passwords that were improperly encoded by
|
|
|
|
|
whatever software encrypted the files, including older versions of qpdf
|
|
|
|
|
invoked without properly encoded passwords. The combination of these two
|
|
|
|
|
recovery methods should make qpdf transparently open most encrypted
|
|
|
|
|
files with the password supplied correctly but in the wrong coding
|
|
|
|
|
system. There are no real downsides to this behavior, but if you don't
|
|
|
|
|
want qpdf to do this, you can use the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--suppress-password-recovery` option. One reason
|
2021-12-11 21:53:08 +00:00
|
|
|
|
to do that is to ensure that you know the exact password that was used
|
|
|
|
|
to encrypt the file.
|
|
|
|
|
|
|
|
|
|
With these changes, qpdf now generates compliant passwords in most
|
|
|
|
|
cases. There are still some exceptions. In particular, the PDF
|
|
|
|
|
specification directs compliant writers to normalize Unicode passwords
|
|
|
|
|
and to perform certain transformations on passwords with bidirectional
|
|
|
|
|
text. Implementing this functionality requires using a real Unicode
|
|
|
|
|
library like ICU. If a client application that uses qpdf wants to do
|
|
|
|
|
this, the qpdf library will accept the resulting passwords, but qpdf
|
|
|
|
|
will not perform these transformations itself. It is possible that this
|
|
|
|
|
will be addressed in a future version of qpdf. The ``QPDFWriter``
|
|
|
|
|
methods that enable encryption on the output file accept passwords as
|
|
|
|
|
strings of bytes.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
Please note that the :samp:`--password-is-hex-key`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
option is unrelated to all this. This flag bypasses the normal process
|
|
|
|
|
of going from password to encryption string entirely, allowing the raw
|
|
|
|
|
encryption key to be specified directly. This is useful for forensic
|
|
|
|
|
purposes or for brute-force recovery of files with unknown passwords.
|
|
|
|
|
|
|
|
|
|
.. _ref.qdf:
|
|
|
|
|
|
|
|
|
|
QDF Mode
|
|
|
|
|
========
|
|
|
|
|
|
2021-12-12 00:24:35 +00:00
|
|
|
|
In QDF mode, qpdf creates PDF files in what we call *QDF
|
|
|
|
|
form*. A PDF file in QDF form, sometimes called a QDF
|
2021-12-11 21:53:08 +00:00
|
|
|
|
file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
|
|
|
|
|
line (after the pdf header and binary characters) and has certain other
|
|
|
|
|
characteristics. The purpose of QDF form is to make it possible to edit
|
|
|
|
|
PDF files, with some restrictions, in an ordinary text editor. This can
|
|
|
|
|
be very useful for experimenting with different PDF constructs or for
|
|
|
|
|
making one-off edits to PDF files (though there are other reasons why
|
|
|
|
|
this may not always work). Note that QDF mode does not support
|
|
|
|
|
linearized files. If you enable linearization, QDF mode is automatically
|
|
|
|
|
disabled.
|
|
|
|
|
|
|
|
|
|
It is ordinarily very difficult to edit PDF files in a text editor for
|
|
|
|
|
two reasons: most meaningful data in PDF files is compressed, and PDF
|
|
|
|
|
files are full of offset and length information that makes it hard to
|
|
|
|
|
add or remove data. A QDF file is organized in a manner such that, if
|
|
|
|
|
edits are kept within certain constraints, the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`fix-qdf` program, distributed with qpdf, is
|
2021-12-11 21:53:08 +00:00
|
|
|
|
able to restore edited files to a correct state. The
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`fix-qdf` program takes no command-line
|
2021-12-11 21:53:08 +00:00
|
|
|
|
arguments. It reads a possibly edited QDF file from standard input and
|
|
|
|
|
writes a repaired file to standard output.
|
|
|
|
|
|
|
|
|
|
The following attributes characterize a QDF file:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- All objects appear in numerical order in the PDF file, including when
|
|
|
|
|
objects appear in object streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Objects are printed in an easy-to-read format, and all line endings
|
|
|
|
|
are normalized to UNIX line endings.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Unless specifically overridden, streams appear uncompressed (when
|
|
|
|
|
qpdf supports the filters and they are compressed with a non-lossy
|
|
|
|
|
compression scheme), and most content streams are normalized (line
|
|
|
|
|
endings are converted to just a UNIX-style linefeeds).
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- All streams lengths are represented as indirect objects, and the
|
|
|
|
|
stream length object is always the next object after the stream. If
|
|
|
|
|
the stream data does not end with a newline, an extra newline is
|
|
|
|
|
inserted, and a special comment appears after the stream indicating
|
|
|
|
|
that this has been done.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If the PDF file contains object streams, if object stream *n*
|
|
|
|
|
contains *k* objects, those objects are numbered from *n+1* through
|
|
|
|
|
*n+k*, and the object number/offset pairs appear on a separate line
|
|
|
|
|
for each object. Additionally, each object in the object stream is
|
|
|
|
|
preceded by a comment indicating its object number and index. This
|
|
|
|
|
makes it very easy to find objects in object streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
|
|
|
|
|
and ``endobj`` tokens appear on lines by themselves. A blank line
|
|
|
|
|
follows every ``endobj`` token.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If there is a cross-reference stream, it is unfiltered.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Page dictionaries and page content streams are marked with special
|
|
|
|
|
comments that make them easy to find.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Comments precede each object indicating the object number of the
|
|
|
|
|
corresponding object in the original file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
When editing a QDF file, any edits can be made as long as the above
|
|
|
|
|
constraints are maintained. This means that you can freely edit a page's
|
|
|
|
|
content without worrying about messing up the QDF file. It is also
|
|
|
|
|
possible to add new objects so long as those objects are added after the
|
|
|
|
|
last object in the file or subsequent objects are renumbered. If a QDF
|
|
|
|
|
file has object streams in it, you can always add the new objects before
|
|
|
|
|
the xref stream and then change the number of the xref stream, since
|
|
|
|
|
nothing generally ever references it by number.
|
|
|
|
|
|
|
|
|
|
It is not generally practical to remove objects from QDF files without
|
|
|
|
|
messing up object numbering, but if you remove all references to an
|
|
|
|
|
object, you can run qpdf on the file (after running
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`fix-qdf`), and qpdf will omit the now-orphaned
|
2021-12-11 21:53:08 +00:00
|
|
|
|
object.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
When :command:`fix-qdf` is run, it goes through the file
|
2021-12-11 21:53:08 +00:00
|
|
|
|
and recomputes the following parts of the file:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- the ``/N``, ``/W``, and ``/First`` keys of all object stream
|
|
|
|
|
dictionaries
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- the pairs of numbers representing object numbers and offsets of
|
|
|
|
|
objects in object streams
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- all stream lengths
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- the cross-reference table or cross-reference stream
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- the offset to the cross-reference table or cross-reference stream
|
|
|
|
|
following the ``startxref`` token
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.using-library:
|
|
|
|
|
|
|
|
|
|
Using the QPDF Library
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
.. _ref.using.from-cxx:
|
|
|
|
|
|
|
|
|
|
Using QPDF from C++
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
The source tree for the qpdf package has an
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples` directory that contains a few
|
|
|
|
|
example programs. The :file:`qpdf/qpdf.cc` source
|
2021-12-11 21:53:08 +00:00
|
|
|
|
file also serves as a useful example since it exercises almost all of
|
|
|
|
|
the qpdf library's public interface. The best source of documentation on
|
|
|
|
|
the library itself is reading comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf/QPDF.hh`,
|
|
|
|
|
:file:`include/qpdf/QPDFWriter.hh`, and
|
|
|
|
|
:file:`include/qpdf/QPDFObjectHandle.hh`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
All header files are installed in the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf` directory. It is recommend that
|
2021-12-11 22:16:18 +00:00
|
|
|
|
you use ``#include <qpdf/QPDF.hh>`` rather than adding
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf` to your include path.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
When linking against the qpdf static library, you may also need to
|
|
|
|
|
specify ``-lz -ljpeg`` on your link command. If your system understands
|
2021-12-12 00:02:42 +00:00
|
|
|
|
how to read libtool :file:`.la` files, this may not
|
2021-12-11 21:53:08 +00:00
|
|
|
|
be necessary.
|
|
|
|
|
|
|
|
|
|
The qpdf library is safe to use in a multithreaded program, but no
|
|
|
|
|
individual ``QPDF`` object instance (including ``QPDF``,
|
|
|
|
|
``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
|
|
|
|
|
thread at a time. Multiple threads may simultaneously work with
|
|
|
|
|
different instances of these and all other QPDF objects.
|
|
|
|
|
|
|
|
|
|
.. _ref.using.other-languages:
|
|
|
|
|
|
|
|
|
|
Using QPDF from other languages
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
The qpdf library is implemented in C++, which makes it hard to use
|
|
|
|
|
directly in other languages. There are a few things that can help.
|
|
|
|
|
|
|
|
|
|
"C"
|
|
|
|
|
The qpdf library includes a "C" language interface that provides a
|
|
|
|
|
subset of the overall capabilities. The header file
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`qpdf/qpdf-c.h` includes information about
|
2021-12-11 21:53:08 +00:00
|
|
|
|
its use. As long as you use a C++ linker, you can link C programs
|
|
|
|
|
with qpdf and use the C API. For languages that can directly load
|
|
|
|
|
methods from a shared library, the C API can also be useful. People
|
|
|
|
|
have reported success using the C API from other languages on Windows
|
|
|
|
|
by directly calling functions in the DLL.
|
|
|
|
|
|
|
|
|
|
Python
|
|
|
|
|
A Python module called
|
|
|
|
|
`pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
|
|
|
|
|
highly functional set of Python bindings to the qpdf library. Using
|
|
|
|
|
pikepdf, you can work with PDF files in a natural way and combine
|
|
|
|
|
qpdf's capabilities with other functionality provided by Python's
|
|
|
|
|
rich standard library and available modules.
|
|
|
|
|
|
|
|
|
|
Other Languages
|
2021-12-12 00:01:40 +00:00
|
|
|
|
Starting with version 8.3.0, the :command:`qpdf`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
command-line tool can produce a JSON representation of the PDF file's
|
|
|
|
|
non-content data. This can facilitate interacting programmatically
|
|
|
|
|
with PDF files through qpdf's command line interface. For more
|
2021-12-12 00:31:19 +00:00
|
|
|
|
information, please see :ref:`ref.json`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.unicode-files:
|
|
|
|
|
|
|
|
|
|
A Note About Unicode File Names
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
When strings are passed to qpdf library routines either as ``char*`` or
|
|
|
|
|
as ``std::string``, they are treated as byte arrays except where
|
|
|
|
|
otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
|
|
|
|
|
otherwise noted in comments in header files. In modern UNIX/Linux
|
|
|
|
|
environments, this generally does the right thing. In Windows, it's a
|
|
|
|
|
bit more complicated. Starting in qpdf 8.4.0, passwords that contain
|
|
|
|
|
Unicode characters are handled much better, and starting in qpdf 8.4.1,
|
|
|
|
|
the library attempts to properly handle Unicode characters in filenames.
|
|
|
|
|
In particular, in Windows, if a UTF-8 encoded string is used as a
|
|
|
|
|
filename in either ``QPDF`` or ``QPDFWriter``, it is internally
|
|
|
|
|
converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
|
|
|
|
|
such, qpdf will generally operate properly on files with non-ASCII
|
|
|
|
|
characters in their names as long as the filenames are UTF-8 encoded for
|
|
|
|
|
passing into the qpdf library API, but there are still some rough edges,
|
|
|
|
|
such as the encoding of the filenames in error messages our CLI output
|
|
|
|
|
messages. Patches or bug reports are welcome for any continuing issues
|
|
|
|
|
with Unicode file names in Windows.
|
|
|
|
|
|
|
|
|
|
.. _ref.weak-crypto:
|
|
|
|
|
|
|
|
|
|
Weak Cryptography
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
Start with version 10.4, qpdf is taking steps to reduce the likelihood
|
|
|
|
|
of a user *accidentally* creating PDF files with insecure cryptography
|
|
|
|
|
but will continue to allow creation of such files indefinitely with
|
|
|
|
|
explicit acknowledgment.
|
|
|
|
|
|
|
|
|
|
The PDF file format makes use of RC4, which is known to be a weak
|
|
|
|
|
cryptography algorithm, and MD5, which is a weak hashing algorithm. In
|
|
|
|
|
version 10.4, qpdf generates warnings for some (but not all) cases of
|
|
|
|
|
writing files with weak cryptography when invoked from the command-line.
|
|
|
|
|
These warnings can be suppressed using the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--allow-weak-crypto` option.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
It is planned for qpdf version 11 to be stricter, making it an error to
|
|
|
|
|
write files with insecure cryptography from the command-line tool in
|
|
|
|
|
most cases without specifying the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--allow-weak-crypto` flag and also to require
|
2021-12-11 21:53:08 +00:00
|
|
|
|
explicit steps when using the C++ library to enable use of insecure
|
|
|
|
|
cryptography.
|
|
|
|
|
|
|
|
|
|
Note that qpdf must always retain support for weak cryptographic
|
|
|
|
|
algorithms since this is required for reading older PDF files that use
|
|
|
|
|
it. Additionally, qpdf will always retain the ability to create files
|
|
|
|
|
using weak cryptographic algorithms since, as a development tool, qpdf
|
|
|
|
|
explicitly supports creating older or deprecated types of PDF files
|
|
|
|
|
since these are sometimes needed to test or work with older versions of
|
|
|
|
|
software. Even if other cryptography libraries drop support for RC4 or
|
|
|
|
|
MD5, qpdf can always fall back to its internal implementations of those
|
|
|
|
|
algorithms, so they are not going to disappear from qpdf.
|
|
|
|
|
|
|
|
|
|
.. _ref.json:
|
|
|
|
|
|
|
|
|
|
QPDF JSON
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
.. _ref.json-overview:
|
|
|
|
|
|
|
|
|
|
Overview
|
|
|
|
|
--------
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
Beginning with qpdf version 8.3.0, the :command:`qpdf`
|
2021-12-11 21:53:08 +00:00
|
|
|
|
command-line program can produce a JSON representation of the
|
|
|
|
|
non-content data in a PDF file. It includes a dump in JSON format of all
|
|
|
|
|
objects in the PDF file excluding the content of streams. This JSON
|
|
|
|
|
representation makes it very easy to look in detail at the structure of
|
|
|
|
|
a given PDF file, and it also provides a great way to work with PDF
|
|
|
|
|
files programmatically from the command-line in languages that can't
|
|
|
|
|
call or link with the qpdf library directly. Note that stream data can
|
|
|
|
|
be extracted from PDF files using other qpdf command-line options.
|
|
|
|
|
|
|
|
|
|
.. _ref.json-guarantees:
|
|
|
|
|
|
|
|
|
|
JSON Guarantees
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
The qpdf JSON representation includes a JSON serialization of the raw
|
|
|
|
|
objects in the PDF file as well as some computed information in a more
|
|
|
|
|
easily extracted format. QPDF provides some guarantees about its JSON
|
|
|
|
|
format. These guarantees are designed to simplify the experience of a
|
|
|
|
|
developer working with the JSON format.
|
|
|
|
|
|
|
|
|
|
Compatibility
|
|
|
|
|
The top-level JSON object output is a dictionary. The JSON output
|
|
|
|
|
contains various nested dictionaries and arrays. With the exception
|
|
|
|
|
of dictionaries that are populated by the fields of objects from the
|
|
|
|
|
file, all instances of a dictionary are guaranteed to have exactly
|
|
|
|
|
the same keys. Future versions of qpdf are free to add additional
|
|
|
|
|
keys but not to remove keys or change the type of object that a key
|
|
|
|
|
points to. The qpdf program validates this guarantee, and in the
|
|
|
|
|
unlikely event that a bug in qpdf should cause it to generate data
|
|
|
|
|
that doesn't conform to this rule, it will ask you to file a bug
|
|
|
|
|
report.
|
|
|
|
|
|
|
|
|
|
The top-level JSON structure contains a "``version``" key whose value
|
|
|
|
|
is simple integer. The value of the ``version`` key will be
|
|
|
|
|
incremented if a non-compatible change is made. A non-compatible
|
|
|
|
|
change would be any change that involves removal of a key, a change
|
|
|
|
|
to the format of data pointed to by a key, or a semantic change that
|
|
|
|
|
requires a different interpretation of a previously existing key. A
|
|
|
|
|
strong effort will be made to avoid breaking compatibility.
|
|
|
|
|
|
|
|
|
|
Documentation
|
2021-12-12 00:01:40 +00:00
|
|
|
|
The :command:`qpdf` command can be invoked with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json-help` option. This will output a JSON
|
2021-12-11 21:53:08 +00:00
|
|
|
|
structure that has the same structure as the JSON output that qpdf
|
|
|
|
|
generates, except that each field in the help output is a description
|
|
|
|
|
of the corresponding field in the JSON output. The specific
|
|
|
|
|
guarantees are as follows:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- A dictionary in the help output means that the corresponding
|
|
|
|
|
location in the actual JSON output is also a dictionary with
|
|
|
|
|
exactly the same keys; that is, no keys present in help are absent
|
|
|
|
|
in the real output, and no keys will be present in the real output
|
|
|
|
|
that are not in help. As a special case, if the dictionary has a
|
|
|
|
|
single key whose name starts with ``<`` and ends with ``>``, it
|
|
|
|
|
means that the JSON output is a dictionary that can have any keys,
|
|
|
|
|
each of which conforms to the value of the special key. This is
|
|
|
|
|
used for cases in which the keys of the dictionary are things like
|
|
|
|
|
object IDs.
|
|
|
|
|
|
|
|
|
|
- A string in the help output is a description of the item that
|
|
|
|
|
appears in the corresponding location of the actual output. The
|
|
|
|
|
corresponding output can have any format.
|
|
|
|
|
|
|
|
|
|
- An array in the help output always contains a single element. It
|
|
|
|
|
indicates that the corresponding location in the actual output is
|
|
|
|
|
also an array, and that each element of the array has whatever
|
|
|
|
|
format is implied by the single element of the help output's
|
|
|
|
|
array.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
For example, the help output indicates includes a "``pagelabels``"
|
|
|
|
|
key whose value is an array of one element. That element is a
|
|
|
|
|
dictionary with keys "``index``" and "``label``". In addition to
|
|
|
|
|
describing the meaning of those keys, this tells you that the actual
|
|
|
|
|
JSON output will contain a ``pagelabels`` array, each of whose
|
|
|
|
|
elements is a dictionary that contains an ``index`` key, a ``label``
|
|
|
|
|
key, and no other keys.
|
|
|
|
|
|
|
|
|
|
Directness and Simplicity
|
|
|
|
|
The JSON output contains the value of every object in the file, but
|
|
|
|
|
it also contains some processed data. This is analogous to how qpdf's
|
|
|
|
|
library interface works. The processed data is similar to the helper
|
|
|
|
|
functions in that it allows you to look at certain aspects of the PDF
|
|
|
|
|
file without having to understand all the nuances of the PDF
|
|
|
|
|
specification, while the raw objects allow you to mine the PDF for
|
|
|
|
|
anything that the higher-level interfaces are lacking.
|
|
|
|
|
|
|
|
|
|
.. _json.limitations:
|
|
|
|
|
|
|
|
|
|
Limitations of JSON Representation
|
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
|
|
There are a few limitations to be aware of with the JSON structure:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Strings, names, and indirect object references in the original PDF
|
|
|
|
|
file are all converted to strings in the JSON representation. In the
|
|
|
|
|
case of a "normal" PDF file, you can tell the difference because a
|
|
|
|
|
name starts with a slash (``/``), and an indirect object reference
|
|
|
|
|
looks like ``n n R``, but if there were to be a string that looked
|
|
|
|
|
like a name or indirect object reference, there would be no way to
|
|
|
|
|
tell this from the JSON output. Note that there are certain cases
|
|
|
|
|
where you know for sure what something is, such as knowing that
|
|
|
|
|
dictionary keys in objects are always names and that certain things
|
|
|
|
|
in the higher-level computed data are known to contain indirect
|
|
|
|
|
object references.
|
|
|
|
|
|
|
|
|
|
- The JSON format doesn't support binary data very well. Mostly the
|
|
|
|
|
details are not important, but they are presented here for
|
|
|
|
|
information. When qpdf outputs a string in the JSON representation,
|
|
|
|
|
it converts the string to UTF-8, assuming usual PDF string semantics.
|
|
|
|
|
Specifically, if the original string is UTF-16, it is converted to
|
|
|
|
|
UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
|
|
|
|
|
converted to UTF-8 with that assumption. This causes strange things
|
|
|
|
|
to happen to binary strings. For example, if you had the binary
|
|
|
|
|
string ``<038051>``, this would be output to the JSON as ``\u0003•Q``
|
|
|
|
|
because ``03`` is not a printable character and ``80`` is the bullet
|
|
|
|
|
character in PDF doc encoding and is mapped to the Unicode value
|
|
|
|
|
``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
|
|
|
|
|
convert back from here to a binary string, would have to recognize
|
|
|
|
|
Unicode values whose code points are higher than ``0xFF`` and map
|
|
|
|
|
those back to their corresponding PDF doc encoding characters. There
|
|
|
|
|
is no way to tell the difference between a Unicode string that was
|
|
|
|
|
originally encoded as UTF-16 or one that was converted from PDF doc
|
|
|
|
|
encoding. In other words, it's best if you don't try to use the JSON
|
|
|
|
|
format to extract binary strings from the PDF file, but if you really
|
|
|
|
|
had to, it could be done. Note that qpdf's
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-object` option does not have this
|
2021-12-11 23:49:31 +00:00
|
|
|
|
limitation and will reveal the string as encoded in the original
|
|
|
|
|
file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _json.considerations:
|
|
|
|
|
|
|
|
|
|
JSON: Special Considerations
|
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
For the most part, the built-in JSON help tells you everything you need
|
|
|
|
|
to know about the JSON format, but there are a few non-obvious things to
|
|
|
|
|
be aware of:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- While qpdf guarantees that keys present in the help will be present
|
|
|
|
|
in the output, those fields may be null or empty if the information
|
|
|
|
|
is not known or absent in the file. Also, if you specify
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json-keys`, the keys that are not listed
|
2021-12-11 23:49:31 +00:00
|
|
|
|
will be excluded entirely except for those that
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--json-help` says are always present.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- In a few places, there are keys with names containing
|
|
|
|
|
``pageposfrom1``. The values of these keys are null or an integer. If
|
|
|
|
|
an integer, they point to a page index within the file numbering from
|
|
|
|
|
1. Note that JSON indexes from 0, and you would also use 0-based
|
|
|
|
|
indexing using the API. However, 1-based indexing is easier in this
|
|
|
|
|
case because the command-line syntax for specifying page ranges is
|
|
|
|
|
1-based. If you were going to write a program that looked through the
|
|
|
|
|
JSON for information about specific pages and then use the
|
|
|
|
|
command-line to extract those pages, 1-based indexing is easier.
|
|
|
|
|
Besides, it's more convenient to subtract 1 from a program in a real
|
|
|
|
|
programming language than it is to add 1 from shell code.
|
|
|
|
|
|
|
|
|
|
- The image information included in the ``page`` section of the JSON
|
|
|
|
|
output includes the key "``filterable``". Note that the value of this
|
2021-12-12 00:11:56 +00:00
|
|
|
|
field may depend on the :samp:`--decode-level` that
|
2021-12-11 23:49:31 +00:00
|
|
|
|
you invoke qpdf with. The JSON output includes a top-level key
|
|
|
|
|
"``parameters``" that indicates the decode level used for computing
|
|
|
|
|
whether a stream was filterable. For example, jpeg images will be
|
|
|
|
|
shown as not filterable by default, but they will be shown as
|
2021-12-12 00:01:40 +00:00
|
|
|
|
filterable if you run :command:`qpdf --json
|
|
|
|
|
--decode-level=all`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.design:
|
|
|
|
|
|
|
|
|
|
Design and Library Notes
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
.. _ref.design.intro:
|
|
|
|
|
|
|
|
|
|
Introduction
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
This section was written prior to the implementation of the qpdf package
|
|
|
|
|
and was subsequently modified to reflect the implementation. In some
|
|
|
|
|
cases, for purposes of explanation, it may differ slightly from the
|
|
|
|
|
actual implementation. As always, the source code and test suite are
|
|
|
|
|
authoritative. Even if there are some errors, this document should serve
|
|
|
|
|
as a road map to understanding how this code works.
|
|
|
|
|
|
|
|
|
|
In general, one should adhere strictly to a specification when writing
|
|
|
|
|
but be liberal in reading. This way, the product of our software will be
|
|
|
|
|
accepted by the widest range of other programs, and we will accept the
|
|
|
|
|
widest range of input files. This library attempts to conform to that
|
|
|
|
|
philosophy whenever possible but also aims to provide strict checking
|
|
|
|
|
for people who want to validate PDF files. If you don't want to see
|
|
|
|
|
warnings and are trying to write something that is tolerant, you can
|
|
|
|
|
call ``setSuppressWarnings(true)``. If you want to fail on the first
|
|
|
|
|
error, you can call ``setAttemptRecovery(false)``. The default behavior
|
|
|
|
|
is to generating warnings for recoverable problems. Note that recovery
|
|
|
|
|
will not always produce the desired results even if it is able to get
|
|
|
|
|
through the file. Unlike most other PDF files that produce generic
|
|
|
|
|
warnings such as "This file is damaged,", qpdf generally issues a
|
|
|
|
|
detailed error message that would be most useful to a PDF developer.
|
|
|
|
|
This is by design as there seems to be a shortage of PDF validation
|
|
|
|
|
tools out there. This was, in fact, one of the major motivations behind
|
|
|
|
|
the initial creation of qpdf.
|
|
|
|
|
|
|
|
|
|
.. _ref.design-goals:
|
|
|
|
|
|
|
|
|
|
Design Goals
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
The QPDF package includes support for reading and rewriting PDF files.
|
|
|
|
|
It aims to hide from the user details involving object locations,
|
|
|
|
|
modified (appended) PDF files, the directness/indirectness of objects,
|
|
|
|
|
and stream filters including encryption. It does not aim to hide
|
|
|
|
|
knowledge of the object hierarchy or content stream contents. Put
|
|
|
|
|
another way, a user of the qpdf library is expected to have knowledge
|
|
|
|
|
about how PDF files work, but is not expected to have to keep track of
|
|
|
|
|
bookkeeping details such as file positions.
|
|
|
|
|
|
|
|
|
|
A user of the library never has to care whether an object is direct or
|
|
|
|
|
indirect, though it is possible to determine whether an object is direct
|
|
|
|
|
or not if this information is needed. All access to objects deals with
|
|
|
|
|
this transparently. All memory management details are also handled by
|
|
|
|
|
the library.
|
|
|
|
|
|
|
|
|
|
The ``PointerHolder`` object is used internally by the library to deal
|
|
|
|
|
with memory management. This is basically a smart pointer object very
|
|
|
|
|
similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
|
|
|
|
|
it by several years. This library also makes use of a technique for
|
|
|
|
|
giving fine-grained access to methods in one class to other classes by
|
|
|
|
|
using public subclasses with friends and only private members that in
|
|
|
|
|
turn call private methods of the containing class. See
|
|
|
|
|
``QPDFObjectHandle::Factory`` as an example.
|
|
|
|
|
|
|
|
|
|
The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
|
|
|
|
|
file. The library provides methods for both accessing and mutating PDF
|
|
|
|
|
files.
|
|
|
|
|
|
|
|
|
|
The primary class for interacting with PDF objects is
|
|
|
|
|
``QPDFObjectHandle``. Instances of this class can be passed around by
|
|
|
|
|
value, copied, stored in containers, etc. with very low overhead.
|
|
|
|
|
Instances of ``QPDFObjectHandle`` created by reading from a file will
|
|
|
|
|
always contain a reference back to the ``QPDF`` object from which they
|
|
|
|
|
were created. A ``QPDFObjectHandle`` may be direct or indirect. If
|
|
|
|
|
indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
|
|
|
|
|
is a null pointer. In this case, the first attempt to access the
|
|
|
|
|
underlying ``QPDFObject`` will result in the ``QPDFObject`` being
|
|
|
|
|
resolved via a call to the referenced ``QPDF`` instance. This makes it
|
|
|
|
|
essentially impossible to make coding errors in which certain things
|
|
|
|
|
will work for some PDF files and not for others based on which objects
|
|
|
|
|
are direct and which objects are indirect.
|
|
|
|
|
|
|
|
|
|
Instances of ``QPDFObjectHandle`` can be directly created and modified
|
|
|
|
|
using static factory methods in the ``QPDFObjectHandle`` class. There
|
|
|
|
|
are factory methods for each type of object as well as a convenience
|
|
|
|
|
method ``QPDFObjectHandle::parse`` that creates an object from a string
|
|
|
|
|
representation of the object. Existing instances of ``QPDFObjectHandle``
|
|
|
|
|
can also be modified in several ways. See comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFObjectHandle.hh` for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
An instance of ``QPDF`` is constructed by using the class's default
|
|
|
|
|
constructor. If desired, the ``QPDF`` object may be configured with
|
|
|
|
|
various methods that change its default behavior. Then the
|
|
|
|
|
``QPDF::processFile()`` method is passed the name of a PDF file, which
|
|
|
|
|
permanently associates the file with that QPDF object. A password may
|
|
|
|
|
also be given for access to password-protected files. QPDF does not
|
|
|
|
|
enforce encryption parameters and will treat user and owner passwords
|
|
|
|
|
equivalently. Either password may be used to access an encrypted file.
|
|
|
|
|
``QPDF`` will allow recovery of a user password given an owner password.
|
|
|
|
|
The input PDF file must be seekable. (Output files written by
|
|
|
|
|
``QPDFWriter`` need not be seekable, even when creating linearized
|
|
|
|
|
files.) During construction, ``QPDF`` validates the PDF file's header,
|
|
|
|
|
and then reads the cross reference tables and trailer dictionaries. The
|
|
|
|
|
``QPDF`` class keeps only the first trailer dictionary though it does
|
|
|
|
|
read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
|
|
|
|
|
may request the root object and the trailer dictionary specifically. The
|
|
|
|
|
cross reference table is kept private. Objects may then be requested by
|
|
|
|
|
number of by walking the object tree.
|
|
|
|
|
|
|
|
|
|
When a PDF file has a cross-reference stream instead of a
|
|
|
|
|
cross-reference table and trailer, requesting the document's trailer
|
|
|
|
|
dictionary returns the stream dictionary from the cross-reference stream
|
|
|
|
|
instead.
|
|
|
|
|
|
|
|
|
|
There are some convenience routines for very common operations such as
|
|
|
|
|
walking the page tree and returning a vector of all page objects. For
|
|
|
|
|
full details, please see the header files
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF.hh` and
|
|
|
|
|
:file:`QPDFObjectHandle.hh`. There are also some
|
2021-12-11 21:53:08 +00:00
|
|
|
|
additional helper classes that provide higher level API functions for
|
2021-12-12 00:31:19 +00:00
|
|
|
|
certain document constructions. These are discussed in :ref:`ref.helper-classes`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.helper-classes:
|
|
|
|
|
|
|
|
|
|
Helper Classes
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
QPDF version 8.1 introduced the concept of helper classes. Helper
|
|
|
|
|
classes are intended to contain higher level APIs that allow developers
|
|
|
|
|
to work with certain document constructs at an abstraction level above
|
|
|
|
|
that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
|
|
|
|
|
not hiding document structure from the developer. As with qpdf in
|
|
|
|
|
general, the goal is take away some of the more tedious bookkeeping
|
|
|
|
|
aspects of working with PDF files, not to remove the need for the
|
|
|
|
|
developer to understand how the PDF construction in question works. The
|
|
|
|
|
driving factor behind the creation of helper classes was to allow the
|
|
|
|
|
evolution of higher level interfaces in qpdf without polluting the
|
|
|
|
|
interfaces of the main top-level classes ``QPDF`` and
|
|
|
|
|
``QPDFObjectHandle``.
|
|
|
|
|
|
|
|
|
|
There are two kinds of helper classes: *document* helpers and *object*
|
|
|
|
|
helpers. Document helpers are constructed with a reference to a ``QPDF``
|
|
|
|
|
object and provide methods for working with structures that are at the
|
|
|
|
|
document level. Object helpers are constructed with an instance of a
|
|
|
|
|
``QPDFObjectHandle`` and provide methods for working with specific types
|
|
|
|
|
of objects.
|
|
|
|
|
|
|
|
|
|
Examples of document helpers include ``QPDFPageDocumentHelper``, which
|
|
|
|
|
contains methods for operating on the document's page trees, such as
|
|
|
|
|
enumerating all pages of a document and adding and removing pages; and
|
|
|
|
|
``QPDFAcroFormDocumentHelper``, which contains document-level methods
|
|
|
|
|
related to interactive forms, such as enumerating form fields and
|
|
|
|
|
creating mappings between form fields and annotations.
|
|
|
|
|
|
|
|
|
|
Examples of object helpers include ``QPDFPageObjectHelper`` for
|
|
|
|
|
performing operations on pages such as page rotation and some operations
|
|
|
|
|
on content streams, ``QPDFFormFieldObjectHelper`` for performing
|
|
|
|
|
operations related to interactive form fields, and
|
|
|
|
|
``QPDFAnnotationObjectHelper`` for working with annotations.
|
|
|
|
|
|
|
|
|
|
It is always possible to retrieve the underlying ``QPDF`` reference from
|
|
|
|
|
a document helper and the underlying ``QPDFObjectHandle`` reference from
|
|
|
|
|
an object helper. Helpers are designed to be helpers, not wrappers. The
|
|
|
|
|
intention is that, in general, it is safe to freely intermix operations
|
|
|
|
|
that use helpers with operations that use the underlying objects.
|
|
|
|
|
Document and object helpers do not attempt to provide a complete
|
|
|
|
|
interface for working with the things they are helping with, nor do they
|
|
|
|
|
attempt to encapsulate underlying structures. They just provide a few
|
|
|
|
|
methods to help with error-prone, repetitive, or complex tasks. In some
|
|
|
|
|
cases, a helper object may cache some information that is expensive to
|
|
|
|
|
gather. In such cases, the helper classes are implemented so that their
|
|
|
|
|
own methods keep the cache consistent, and the header file will provide
|
|
|
|
|
a method to invalidate the cache and a description of what kinds of
|
|
|
|
|
operations would make the cache invalid. If in doubt, you can always
|
|
|
|
|
discard a helper class and create a new one with the same underlying
|
|
|
|
|
objects, which will ensure that you have discarded any stale
|
|
|
|
|
information.
|
|
|
|
|
|
|
|
|
|
By Convention, document helpers are called
|
|
|
|
|
``QPDFSomethingDocumentHelper`` and are derived from
|
|
|
|
|
``QPDFDocumentHelper``, and object helpers are called
|
|
|
|
|
``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
|
|
|
|
|
For details on specific helpers, please see their header files. You can
|
|
|
|
|
find them by looking at
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf/QPDF*DocumentHelper.hh` and
|
|
|
|
|
:file:`include/qpdf/QPDF*ObjectHelper.hh`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
In order to avoid creation of circular dependencies, the following
|
|
|
|
|
general guidelines are followed with helper classes:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Core class interfaces do not know about helper classes. For example,
|
|
|
|
|
no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
|
|
|
|
|
classes in their interfaces.
|
|
|
|
|
|
|
|
|
|
- Interfaces of object helpers will usually not use document helpers in
|
|
|
|
|
their interfaces. This is because it is much more useful for document
|
|
|
|
|
helpers to have methods that return object helpers. Most operations
|
|
|
|
|
in PDF files start at the document level and go from there to the
|
|
|
|
|
object level rather than the other way around. It can sometimes be
|
|
|
|
|
useful to map back from object-level structures to document-level
|
|
|
|
|
structures. If there is a desire to do this, it will generally be
|
|
|
|
|
provided by a method in the document helper class.
|
|
|
|
|
|
|
|
|
|
- Most of the time, object helpers don't know about other object
|
|
|
|
|
helpers. However, in some cases, one type of object may be a
|
|
|
|
|
container for another type of object, in which case it may make sense
|
|
|
|
|
for the outer object to know about the inner object. For example,
|
|
|
|
|
there are methods in the ``QPDFPageObjectHelper`` that know
|
|
|
|
|
``QPDFAnnotationObjectHelper`` because references to annotations are
|
|
|
|
|
contained in page dictionaries.
|
|
|
|
|
|
|
|
|
|
- Any helper or core library class may use helpers in their
|
|
|
|
|
implementations.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Prior to qpdf version 8.1, higher level interfaces were added as
|
|
|
|
|
"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
|
|
|
|
|
compatibility, older convenience functions for operating with pages will
|
|
|
|
|
remain in those classes even as alternatives are provided in helper
|
|
|
|
|
classes. Going forward, new higher level interfaces will be provided
|
|
|
|
|
using helper classes.
|
|
|
|
|
|
|
|
|
|
.. _ref.implementation-notes:
|
|
|
|
|
|
|
|
|
|
Implementation Notes
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
This section contains a few notes about QPDF's internal implementation,
|
|
|
|
|
particularly around what it does when it first processes a file. This
|
|
|
|
|
section is a bit of a simplification of what it actually does, but it
|
|
|
|
|
could serve as a starting point to someone trying to understand the
|
|
|
|
|
implementation. There is nothing in this section that you need to know
|
|
|
|
|
to use the qpdf library.
|
|
|
|
|
|
|
|
|
|
``QPDFObject`` is the basic PDF Object class. It is an abstract base
|
|
|
|
|
class from which are derived classes for each type of PDF object.
|
|
|
|
|
Clients do not interact with Objects directly but instead interact with
|
|
|
|
|
``QPDFObjectHandle``.
|
|
|
|
|
|
|
|
|
|
When the ``QPDF`` class creates a new object, it dynamically allocates
|
|
|
|
|
the appropriate type of ``QPDFObject`` and immediately hands the pointer
|
|
|
|
|
to an instance of ``QPDFObjectHandle``. The parser reads a token from
|
|
|
|
|
the current file position. If the token is a not either a dictionary or
|
|
|
|
|
array opener, an object is immediately constructed from the single token
|
|
|
|
|
and the parser returns. Otherwise, the parser iterates in a special mode
|
|
|
|
|
in which it accumulates objects until it finds a balancing closer.
|
|
|
|
|
During this process, the "``R``" keyword is recognized and an indirect
|
|
|
|
|
``QPDFObjectHandle`` may be constructed.
|
|
|
|
|
|
|
|
|
|
The ``QPDF::resolve()`` method, which is used to resolve an indirect
|
|
|
|
|
object, may be invoked from the ``QPDFObjectHandle`` class. It first
|
|
|
|
|
checks a cache to see whether this object has already been read. If not,
|
|
|
|
|
it reads the object from the PDF file and caches it. It the returns the
|
|
|
|
|
resulting ``QPDFObjectHandle``. The calling object handle then replaces
|
|
|
|
|
its ``PointerHolder<QDFObject>`` with the one from the newly returned
|
|
|
|
|
``QPDFObjectHandle``. In this way, only a single copy of any direct
|
|
|
|
|
object need exist and clients can access objects transparently without
|
|
|
|
|
knowing caring whether they are direct or indirect objects.
|
|
|
|
|
Additionally, no object is ever read from the file more than once. That
|
|
|
|
|
means that only the portions of the PDF file that are actually needed
|
|
|
|
|
are ever read from the input file, thus allowing the qpdf package to
|
|
|
|
|
take advantage of this important design goal of PDF files.
|
|
|
|
|
|
|
|
|
|
If the requested object is inside of an object stream, the object stream
|
|
|
|
|
itself is first read into memory. Then the tokenizer reads objects from
|
|
|
|
|
the memory stream based on the offset information stored in the stream.
|
|
|
|
|
Those individual objects are cached, after which the temporary buffer
|
|
|
|
|
holding the object stream contents are discarded. In this way, the first
|
|
|
|
|
time an object in an object stream is requested, all objects in the
|
|
|
|
|
stream are cached.
|
|
|
|
|
|
|
|
|
|
The following example should clarify how ``QPDF`` processes a simple
|
|
|
|
|
file.
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Client constructs ``QPDF`` ``pdf`` and calls
|
|
|
|
|
``pdf.processFile("a.pdf");``.
|
|
|
|
|
|
|
|
|
|
- The ``QPDF`` class checks the beginning of
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`a.pdf` for a PDF header. It then reads the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
cross reference table mentioned at the end of the file, ensuring that
|
|
|
|
|
it is looking before the last ``%%EOF``. After getting to ``trailer``
|
|
|
|
|
keyword, it invokes the parser.
|
|
|
|
|
|
|
|
|
|
- The parser sees "``<<``", so it calls itself recursively in
|
|
|
|
|
dictionary creation mode.
|
|
|
|
|
|
|
|
|
|
- In dictionary creation mode, the parser keeps accumulating objects
|
|
|
|
|
until it encounters "``>>``". Each object that is read is pushed onto
|
|
|
|
|
a stack. If "``R``" is read, the last two objects on the stack are
|
|
|
|
|
inspected. If they are integers, they are popped off the stack and
|
|
|
|
|
their values are used to construct an indirect object handle which is
|
|
|
|
|
then pushed onto the stack. When "``>>``" is finally read, the stack
|
|
|
|
|
is converted into a ``QPDF_Dictionary`` which is placed in a
|
|
|
|
|
``QPDFObjectHandle`` and returned.
|
|
|
|
|
|
|
|
|
|
- The resulting dictionary is saved as the trailer dictionary.
|
|
|
|
|
|
|
|
|
|
- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
|
|
|
|
|
point and repeats except that the new trailer dictionary is not
|
|
|
|
|
saved. If ``/Prev`` is not present, the initial parsing process is
|
|
|
|
|
complete.
|
|
|
|
|
|
|
|
|
|
If there is an encryption dictionary, the document's encryption
|
|
|
|
|
parameters are initialized.
|
|
|
|
|
|
|
|
|
|
- The client requests root object. The ``QPDF`` class gets the value of
|
|
|
|
|
root key from trailer dictionary and returns it. It is an unresolved
|
|
|
|
|
indirect ``QPDFObjectHandle``.
|
|
|
|
|
|
|
|
|
|
- The client requests the ``/Pages`` key from root
|
|
|
|
|
``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
|
|
|
|
|
indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
|
|
|
|
|
object cache for an object with the root dictionary's object ID and
|
|
|
|
|
generation number. Upon not seeing it, it checks the cross reference
|
|
|
|
|
table, gets the offset, and reads the object present at that offset.
|
|
|
|
|
It stores the result in the object cache and returns the cached
|
|
|
|
|
result. The calling ``QPDFObjectHandle`` replaces its object pointer
|
|
|
|
|
with the one from the resolved ``QPDFObjectHandle``, verifies that it
|
|
|
|
|
a valid dictionary object, and returns the (unresolved indirect)
|
|
|
|
|
``QPDFObject`` handle to the top of the Pages hierarchy.
|
|
|
|
|
|
|
|
|
|
As the client continues to request objects, the same process is
|
|
|
|
|
followed for each new requested object.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.casting:
|
|
|
|
|
|
|
|
|
|
Casting Policy
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
This section describes the casting policy followed by qpdf's
|
|
|
|
|
implementation. This is no concern to qpdf's end users and largely of no
|
|
|
|
|
concern to people writing code that uses qpdf, but it could be of
|
|
|
|
|
interest to people who are porting qpdf to a new platform or who are
|
|
|
|
|
making modifications to the code.
|
|
|
|
|
|
|
|
|
|
The C++ code in qpdf is free of old-style casts except where unavoidable
|
|
|
|
|
(e.g. where the old-style cast is in a macro provided by a third-party
|
|
|
|
|
header file). When there is a need for a cast, it is handled, in order
|
|
|
|
|
of preference, by rewriting the code to avoid the need for a cast,
|
|
|
|
|
calling ``const_cast``, calling ``static_cast``, calling
|
|
|
|
|
``reinterpret_cast``, or calling some combination of the above. As a
|
|
|
|
|
last resort, a compiler-specific ``#pragma`` may be used to suppress a
|
|
|
|
|
warning that we don't want to fix. Examples may include suppressing
|
|
|
|
|
warnings about the use of old-style casts in code that is shared between
|
|
|
|
|
C and C++ code.
|
|
|
|
|
|
|
|
|
|
The ``QIntC`` namespace, provided by
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf/QIntC.hh`, implements safe
|
2021-12-11 21:53:08 +00:00
|
|
|
|
functions for converting between integer types. These functions do range
|
|
|
|
|
checking and throw a ``std::range_error``, which is subclass of
|
|
|
|
|
``std::runtime_error``, if conversion from one integer type to another
|
|
|
|
|
results in loss of information. There are many cases in which we have to
|
|
|
|
|
move between different integer types because of incompatible integer
|
|
|
|
|
types used in interoperable interfaces. Some are unavoidable, such as
|
|
|
|
|
moving between sizes and offsets, and others are there because of old
|
|
|
|
|
code that is too in entrenched to be fixable without breaking source
|
|
|
|
|
compatibility and causing pain for users. QPDF is compiled with extra
|
|
|
|
|
warnings to detect conversions with potential data loss, and all such
|
|
|
|
|
cases should be fixed by either using a function from ``QIntC`` or a
|
|
|
|
|
``static_cast``.
|
|
|
|
|
|
|
|
|
|
When the intention is just to switch the type because of exchanging data
|
|
|
|
|
between incompatible interfaces, use ``QIntC``. This is the usual case.
|
|
|
|
|
However, there are some cases in which we are explicitly intending to
|
|
|
|
|
use the exact same bit pattern with a different type. This is most
|
|
|
|
|
common when switching between signed and unsigned characters. A lot of
|
|
|
|
|
qpdf's code uses unsigned characters internally, but ``std::string`` and
|
|
|
|
|
``char`` are signed. Using ``QIntC::to_char`` would be wrong for
|
|
|
|
|
converting from unsigned to signed characters because a negative
|
2021-12-11 22:16:18 +00:00
|
|
|
|
``char`` value and the corresponding ``unsigned char`` value greater
|
|
|
|
|
than 127 *mean the same thing*. There are also
|
2021-12-11 21:53:08 +00:00
|
|
|
|
cases in which we use ``static_cast`` when working with bit fields where
|
|
|
|
|
we are not representing a numerical value but rather a bunch of bits
|
|
|
|
|
packed together in some integer type. Also note that ``size_t`` and
|
|
|
|
|
``long`` both typically differ between 32-bit and 64-bit environments,
|
|
|
|
|
so sometimes an explicit cast may not be needed to avoid warnings on one
|
|
|
|
|
platform but may be needed on another. A conversion with ``QIntC``
|
|
|
|
|
should always be used when the types are different even if the
|
|
|
|
|
underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
|
|
|
|
|
platforms, and the test suite is very thorough, so it is hard to make
|
|
|
|
|
any of the potential errors here without being caught in build or test.
|
|
|
|
|
|
|
|
|
|
Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
|
2021-12-11 22:16:18 +00:00
|
|
|
|
pipeline interface has a ``write`` call that uses ``unsigned char*``
|
|
|
|
|
without a ``const`` qualifier. The main reason for this is
|
2021-12-11 21:53:08 +00:00
|
|
|
|
to support pipelines that make calls to third-party libraries, such as
|
|
|
|
|
zlib, that don't include ``const`` in their interfaces. Unfortunately,
|
2021-12-11 22:16:18 +00:00
|
|
|
|
there are many places in the code where it is desirable to have
|
|
|
|
|
``const char*`` with pipelines. None of the pipeline implementations
|
|
|
|
|
in qpdf
|
2021-12-11 21:53:08 +00:00
|
|
|
|
currently modify the data passed to write, and doing so would be counter
|
|
|
|
|
to the intent of ``Pipeline``, but there is nothing in the code to
|
|
|
|
|
prevent this from being done. There are places in the code where
|
|
|
|
|
``const_cast`` is used to remove the const-ness of pointers going into
|
|
|
|
|
``Pipeline``\ s. This could theoretically be unsafe, but there is
|
|
|
|
|
adequate testing to assert that it is safe and will remain safe in
|
|
|
|
|
qpdf's code.
|
|
|
|
|
|
|
|
|
|
.. _ref.encryption:
|
|
|
|
|
|
|
|
|
|
Encryption
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
|
|
Encryption is supported transparently by qpdf. When opening a PDF file,
|
|
|
|
|
if an encryption dictionary exists, the ``QPDF`` object processes this
|
|
|
|
|
dictionary using the password (if any) provided. The primary decryption
|
|
|
|
|
key is computed and cached. No further access is made to the encryption
|
|
|
|
|
dictionary after that time. When an object is read from a file, the
|
|
|
|
|
object ID and generation of the object in which it is contained is
|
|
|
|
|
always known. Using this information along with the stored encryption
|
|
|
|
|
key, all stream and string objects are transparently decrypted. Raw
|
|
|
|
|
encrypted objects are never stored in memory. This way, nothing in the
|
|
|
|
|
library ever has to know or care whether it is reading an encrypted
|
|
|
|
|
file.
|
|
|
|
|
|
|
|
|
|
An interface is also provided for writing encrypted streams and strings
|
|
|
|
|
given an encryption key. This is used by ``QPDFWriter`` when it rewrites
|
|
|
|
|
encrypted files.
|
|
|
|
|
|
|
|
|
|
When copying encrypted files, unless otherwise directed, qpdf will
|
|
|
|
|
preserve any encryption in force in the original file. qpdf can do this
|
|
|
|
|
with either the user or the owner password. There is no difference in
|
|
|
|
|
capability based on which password is used. When 40 or 128 bit
|
|
|
|
|
encryption keys are used, the user password can be recovered with the
|
|
|
|
|
owner password. With 256 keys, the user and owner passwords are used
|
|
|
|
|
independently to encrypt the actual encryption key, so while either can
|
|
|
|
|
be used, the owner password can no longer be used to recover the user
|
|
|
|
|
password.
|
|
|
|
|
|
|
|
|
|
Starting with version 4.0.0, qpdf can read files that are not encrypted
|
|
|
|
|
but that contain encrypted attachments, but it cannot write such files.
|
|
|
|
|
qpdf also requires the password to be specified in order to open the
|
|
|
|
|
file, not just to extract attachments, since once the file is open, all
|
|
|
|
|
decryption is handled transparently. When copying files like this while
|
|
|
|
|
preserving encryption, qpdf will apply the file's encryption to
|
|
|
|
|
everything in the file, not just to the attachments. When decrypting the
|
|
|
|
|
file, qpdf will decrypt the attachments. In general, when copying PDF
|
|
|
|
|
files with multiple encryption formats, qpdf will choose the newest
|
|
|
|
|
format. The only exception to this is that clear-text metadata will be
|
|
|
|
|
preserved as clear-text if it is that way in the original file.
|
|
|
|
|
|
|
|
|
|
One point of confusion some people have about encrypted PDF files is
|
|
|
|
|
that encryption is not the same as password protection. Password
|
|
|
|
|
protected files are always encrypted, but it is also possible to create
|
|
|
|
|
encrypted files that do not have passwords. Internally, such files use
|
|
|
|
|
the empty string as a password, and most readers try the empty string
|
|
|
|
|
first to see if it works and prompt for a password only if the empty
|
|
|
|
|
string doesn't work. Normally such files have an empty user password and
|
|
|
|
|
a non-empty owner password. In that way, if the file is opened by an
|
|
|
|
|
ordinary reader without specification of password, the restrictions
|
|
|
|
|
specified in the encryption dictionary can be enforced. Most users
|
|
|
|
|
wouldn't even realize such a file was encrypted. Since qpdf always
|
|
|
|
|
ignores the restrictions (except for the purpose of reporting what they
|
|
|
|
|
are), qpdf doesn't care which password you use. QPDF will allow you to
|
|
|
|
|
create PDF files with non-empty user passwords and empty owner
|
|
|
|
|
passwords. Some readers will require a password when you open these
|
|
|
|
|
files, and others will open the files without a password and not enforce
|
|
|
|
|
restrictions. Having a non-empty user password and an empty owner
|
|
|
|
|
password doesn't really make sense because it would mean that opening
|
|
|
|
|
the file with the user password would be more restrictive than not
|
|
|
|
|
supplying a password at all. QPDF also allows you to create PDF files
|
|
|
|
|
with the same password as both the user and owner password. Some readers
|
|
|
|
|
will not ever allow such files to be accessed without restrictions
|
|
|
|
|
because they never try the password as the owner password if it works as
|
|
|
|
|
the user password. Nonetheless, one of the powerful aspects of qpdf is
|
|
|
|
|
that it allows you to finely specify the way encrypted files are
|
|
|
|
|
created, even if the results are not useful to some readers. One use
|
|
|
|
|
case for this would be for testing a PDF reader to ensure that it
|
|
|
|
|
handles odd configurations of input files.
|
|
|
|
|
|
|
|
|
|
.. _ref.random-numbers:
|
|
|
|
|
|
|
|
|
|
Random Number Generation
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
QPDF generates random numbers to support generation of encrypted data.
|
|
|
|
|
Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
|
|
|
|
|
random numbers. Older versions used the OS-provided source of secure
|
|
|
|
|
random numbers or, if allowed at build time, insecure random numbers
|
|
|
|
|
from stdlib. Starting with version 5.1.0, you can disable use of
|
|
|
|
|
OS-provided secure random numbers at build time. This is especially
|
|
|
|
|
useful on Windows if you want to avoid a dependency on Microsoft's
|
|
|
|
|
cryptography API. You can also supply your own random data provider. For
|
|
|
|
|
details on how to do this, please refer to the top-level README.md file
|
|
|
|
|
in the source distribution and to comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QUtil.hh`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.adding-and-remove-pages:
|
|
|
|
|
|
|
|
|
|
Adding and Removing Pages
|
|
|
|
|
-------------------------
|
|
|
|
|
|
|
|
|
|
While qpdf's API has supported adding and modifying objects for some
|
|
|
|
|
time, version 3.0 introduces specific methods for adding and removing
|
|
|
|
|
pages. These are largely convenience routines that handle two tricky
|
|
|
|
|
issues: pushing inheritable resources from the ``/Pages`` tree down to
|
|
|
|
|
individual pages and manipulation of the ``/Pages`` tree itself. For
|
|
|
|
|
details, see ``addPage`` and surrounding methods in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF.hh`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.reserved-objects:
|
|
|
|
|
|
|
|
|
|
Reserving Object Numbers
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
Version 3.0 of qpdf introduced the concept of reserved objects. These
|
|
|
|
|
are seldom needed for ordinary operations, but there are cases in which
|
|
|
|
|
you may want to add a series of indirect objects with references to each
|
|
|
|
|
other to a ``QPDF`` object. This causes a problem because you can't
|
|
|
|
|
determine the object ID that a new indirect object will have until you
|
|
|
|
|
add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
|
|
|
|
|
only way to add two mutually referential objects to a ``QPDF`` object
|
|
|
|
|
prior to version 3.0 would be to add the new objects first and then make
|
|
|
|
|
them refer to each other after adding them. Now it is possible to create
|
2021-12-12 00:24:35 +00:00
|
|
|
|
a *reserved object* using
|
2021-12-11 21:53:08 +00:00
|
|
|
|
``QPDFObjectHandle::newReserved``. This is an indirect object that stays
|
|
|
|
|
"unresolved" even if it is queried for its type. So now, if you want to
|
|
|
|
|
create a set of mutually referential objects, you can create
|
|
|
|
|
reservations for each one of them and use those reservations to
|
|
|
|
|
construct the references. When finished, you can call
|
|
|
|
|
``QPDF::replaceReserved`` to replace the reserved objects with the real
|
|
|
|
|
ones. This functionality will never be needed by most applications, but
|
|
|
|
|
it is used internally by QPDF when copying objects from other PDF files,
|
2021-12-12 00:31:19 +00:00
|
|
|
|
as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved
|
2021-12-11 21:53:08 +00:00
|
|
|
|
objects, search for ``newReserved`` in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`test_driver.cc` in qpdf's sources.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.foreign-objects:
|
|
|
|
|
|
|
|
|
|
Copying Objects From Other PDF Files
|
|
|
|
|
------------------------------------
|
|
|
|
|
|
|
|
|
|
Version 3.0 of qpdf introduced the ability to copy objects into a
|
|
|
|
|
``QPDF`` object from a different ``QPDF`` object, which we refer to as
|
2021-12-12 00:24:35 +00:00
|
|
|
|
*foreign objects*. This allows arbitrary
|
2021-12-11 21:53:08 +00:00
|
|
|
|
merging of PDF files. The "from" ``QPDF`` object must remain valid after
|
|
|
|
|
the copy as discussed in the note below. The
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` command-line tool provides limited
|
2021-12-11 21:53:08 +00:00
|
|
|
|
support for basic page selection, including merging in pages from other
|
|
|
|
|
files, but the library's API makes it possible to implement arbitrarily
|
|
|
|
|
complex merging operations. The main method for copying foreign objects
|
|
|
|
|
is ``QPDF::copyForeignObject``. This takes an indirect object from
|
|
|
|
|
another ``QPDF`` and copies it recursively into this object while
|
|
|
|
|
preserving all object structure, including circular references. This
|
|
|
|
|
means you can add a direct object that you create from scratch to a
|
|
|
|
|
``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
|
|
|
|
|
indirect object from another file with ``QPDF::copyForeignObject``. The
|
|
|
|
|
fact that ``QPDF::makeIndirectObject`` does not automatically detect a
|
|
|
|
|
foreign object and copy it is an explicit design decision. Copying a
|
|
|
|
|
foreign object seems like a sufficiently significant thing to do that it
|
|
|
|
|
should be done explicitly.
|
|
|
|
|
|
|
|
|
|
The other way to copy foreign objects is by passing a page from one
|
|
|
|
|
``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
|
|
|
|
|
``QPDF::makeIndirectObject``, this method automatically distinguishes
|
|
|
|
|
between indirect objects in the current file, foreign objects, and
|
|
|
|
|
direct objects.
|
|
|
|
|
|
|
|
|
|
Please note: when you copy objects from one ``QPDF`` to another, the
|
|
|
|
|
source ``QPDF`` object must remain valid until you have finished with
|
|
|
|
|
the destination object. This is because the original object is still
|
|
|
|
|
used to retrieve any referenced stream data from the copied object.
|
|
|
|
|
|
|
|
|
|
.. _ref.rewriting:
|
|
|
|
|
|
|
|
|
|
Writing PDF Files
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
The qpdf library supports file writing of ``QPDF`` objects to PDF files
|
|
|
|
|
through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
|
|
|
|
|
writing modes: one for non-linearized files, and one for linearized
|
2021-12-12 00:31:19 +00:00
|
|
|
|
files. See :ref:`ref.linearization` for a description of
|
2021-12-11 21:53:08 +00:00
|
|
|
|
linearization is implemented. This section describes how we write
|
2021-12-12 00:31:19 +00:00
|
|
|
|
non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
This outline was written prior to implementation and is not exactly
|
|
|
|
|
accurate, but it provides a correct "notional" idea of how writing
|
|
|
|
|
works. Look at the code in ``QPDFWriter`` for exact details.
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Initialize state:
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- next object number = 1
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- object queue = empty
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- renumber table: old object id/generation to new id/0 = empty
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- xref table: new id -> offset = empty
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Create a QPDF object from a file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Write header for new PDF file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Request the trailer dictionary.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- For each value that is an indirect object, grab the next object
|
|
|
|
|
number (via an operation that returns and increments the number). Map
|
|
|
|
|
object to new number in renumber table. Push object onto queue.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- While there are more objects on the queue:
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Pop queue.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Look up object's new number *n* in the renumbering table.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Store current offset into xref table.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
- Write ``:samp:`{n}` 0 obj``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If object is null, whether direct or indirect, write out null,
|
|
|
|
|
thus eliminating unresolvable indirect object references.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If the object is a stream stream, write stream contents, piped
|
|
|
|
|
through any filters as required, to a memory buffer. Use this
|
|
|
|
|
buffer to determine the stream length.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If object is not a stream, array, or dictionary, write out its
|
|
|
|
|
contents.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If object is an array or dictionary (including stream), traverse
|
|
|
|
|
its elements (for array) or values (for dictionaries), handling
|
|
|
|
|
recursive dictionaries and arrays, looking for indirect objects.
|
|
|
|
|
When an indirect object is found, if it is not resolvable, ignore.
|
|
|
|
|
(This case is handled when writing it out.) Otherwise, look it up
|
|
|
|
|
in the renumbering table. If not found, grab the next available
|
|
|
|
|
object number, assign to the referenced object in the renumbering
|
|
|
|
|
table, and push the referenced object onto the queue. As a special
|
|
|
|
|
case, when writing out a stream dictionary, replace length,
|
|
|
|
|
filters, and decode parameters as required.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
Write out dictionary or array, replacing any unresolvable indirect
|
|
|
|
|
object references with null (pdf spec says reference to
|
|
|
|
|
non-existent object is legal and resolves to null) and any
|
|
|
|
|
resolvable ones with references to the renumbered objects.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- If the object is a stream, write ``stream\n``, the stream contents
|
|
|
|
|
(from the memory buffer), and ``\nendstream\n``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When done, write ``endobj``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Once we have finished the queue, all referenced objects will have been
|
|
|
|
|
written out and all deleted objects or unreferenced objects will have
|
|
|
|
|
been skipped. The new cross-reference table will contain an offset for
|
|
|
|
|
every new object number from 1 up to the number of objects written. This
|
|
|
|
|
can be used to write out a new xref table. Finally we can write out the
|
|
|
|
|
trailer dictionary with appropriately computed /ID (see spec, 8.3, File
|
|
|
|
|
Identifiers), the cross reference table offset, and ``%%EOF``.
|
|
|
|
|
|
|
|
|
|
.. _ref.filtered-streams:
|
|
|
|
|
|
|
|
|
|
Filtered Streams
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
Support for streams is implemented through the ``Pipeline`` interface
|
|
|
|
|
which was designed for this package.
|
|
|
|
|
|
|
|
|
|
When reading streams, create a series of ``Pipeline`` objects. The
|
|
|
|
|
``Pipeline`` abstract base requires implementation ``write()`` and
|
|
|
|
|
``finish()`` and provides an implementation of ``getNext()``. Each
|
|
|
|
|
pipeline object, upon receiving data, does whatever it is going to do
|
|
|
|
|
and then writes the data (possibly modified) to its successor.
|
|
|
|
|
Alternatively, a pipeline may be an end-of-the-line pipeline that does
|
|
|
|
|
something like store its output to a file or a memory buffer ignoring a
|
|
|
|
|
successor. For additional details, look at
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`Pipeline.hh`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
``QPDF`` can read raw or filtered streams. When reading a filtered
|
|
|
|
|
stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
|
|
|
|
|
appropriate filter object and chains them together. The last filter
|
|
|
|
|
should write to whatever type of output is required. The ``QPDF`` class
|
|
|
|
|
has an interface to write raw or filtered stream contents to a given
|
|
|
|
|
pipeline.
|
|
|
|
|
|
|
|
|
|
.. _ref.object-accessors:
|
|
|
|
|
|
|
|
|
|
Object Accessor Methods
|
|
|
|
|
-----------------------
|
|
|
|
|
|
2021-12-12 21:18:41 +00:00
|
|
|
|
..
|
|
|
|
|
This section is referenced in QPDFObjectHandle.hh
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
For general information about how to access instances of
|
|
|
|
|
``QPDFObjectHandle``, please see the comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFObjectHandle.hh`. Search for "Accessor
|
2021-12-11 21:53:08 +00:00
|
|
|
|
methods". This section provides a more in-depth discussion of the
|
|
|
|
|
behavior and the rationale for the behavior.
|
|
|
|
|
|
|
|
|
|
*Why were type errors made into warnings?* When type checks were
|
|
|
|
|
introduced into qpdf in the early days, it was expected that type errors
|
|
|
|
|
would only occur as a result of programmer error. However, in practice,
|
|
|
|
|
type errors would occur with malformed PDF files because of assumptions
|
|
|
|
|
made in code, including code within the qpdf library and code written by
|
|
|
|
|
library users. The most common case would be chaining calls to
|
|
|
|
|
``getKey()`` to access keys deep within a dictionary. In many cases,
|
|
|
|
|
qpdf would be able to recover from these situations, but the old
|
|
|
|
|
behavior often resulted in crashes rather than graceful recovery. For
|
|
|
|
|
this reason, the errors were changed to warnings.
|
|
|
|
|
|
|
|
|
|
*Why even warn about type errors when the user can't usually do anything
|
|
|
|
|
about them?* Type warnings are extremely valuable during development.
|
|
|
|
|
Since it's impossible to catch at compile time things like typos in
|
|
|
|
|
dictionary key names or logic errors around what the structure of a PDF
|
|
|
|
|
file might be, the presence of type warnings can save lots of developer
|
|
|
|
|
time. They have also proven useful in exposing issues in qpdf itself
|
|
|
|
|
that would have otherwise gone undetected.
|
|
|
|
|
|
|
|
|
|
*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
|
|
|
|
|
``QPDFObjectHandle`` could be more strongly typed so that you'd have to
|
|
|
|
|
have check that something was of a particular type before calling
|
|
|
|
|
type-specific accessor methods. However, implementing this at this stage
|
|
|
|
|
of the library's history would be quite difficult, and it would make a
|
|
|
|
|
the common pattern of drilling into an object no longer work. While it
|
|
|
|
|
would be possible to have a parallel interface, it would create a lot of
|
|
|
|
|
extra code. If qpdf were written in a language like rust, an interface
|
|
|
|
|
like this would make a lot of sense, but, for a variety of reasons, the
|
|
|
|
|
qpdf API is consistent with other APIs of its time, relying on exception
|
|
|
|
|
handling to catch errors. The underlying PDF objects are inherently not
|
|
|
|
|
type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
|
|
|
|
|
ultimately cause a lot more code to have to be written and would like
|
|
|
|
|
make software that uses qpdf more brittle, and even so, checks would
|
|
|
|
|
have to occur at runtime.
|
|
|
|
|
|
|
|
|
|
*Why do type errors sometimes raise exceptions?* The way warnings work
|
|
|
|
|
in qpdf requires a ``QPDF`` object to be associated with an object
|
|
|
|
|
handle for a warning to be issued. It would be nice if this could be
|
|
|
|
|
fixed, but it would require major changes to the API. Rather than
|
|
|
|
|
throwing away these conditions, we convert them to exceptions. It's not
|
|
|
|
|
that bad though. Since any object handle that was read from a file has
|
|
|
|
|
an associated ``QPDF`` object, it would only be type errors on objects
|
|
|
|
|
that were created explicitly that would cause exceptions, and in that
|
|
|
|
|
case, type errors are much more likely to be the result of a coding
|
|
|
|
|
error than invalid input.
|
|
|
|
|
|
|
|
|
|
*Why does the behavior of a type exception differ between the C and C++
|
|
|
|
|
API?* There is no way to throw and catch exceptions in C short of
|
|
|
|
|
something like ``setjmp`` and ``longjmp``, and that approach is not
|
|
|
|
|
portable across language barriers. Since the C API is often used from
|
|
|
|
|
other languages, it's important to keep things as simple as possible.
|
|
|
|
|
Starting in qpdf 10.5, exceptions that used to crash code using the C
|
|
|
|
|
API will be written to stderr by default, and it is possible to register
|
|
|
|
|
an error handler. There's no reason that the error handler can't
|
|
|
|
|
simulate exception handling in some way, such as by using ``setjmp`` and
|
|
|
|
|
``longjmp`` or by setting some variable that can be checked after
|
|
|
|
|
library calls are made. In retrospect, it might have been better if the
|
|
|
|
|
C API object handle methods returned error codes like the other methods
|
|
|
|
|
and set return values in passed-in pointers, but this would complicate
|
|
|
|
|
both the implementation and the use of the library for a case that is
|
|
|
|
|
actually quite rare and largely avoidable.
|
|
|
|
|
|
|
|
|
|
.. _ref.linearization:
|
|
|
|
|
|
|
|
|
|
Linearization
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
|
|
|
|
|
creation and processing of linearized PDFS.
|
|
|
|
|
|
|
|
|
|
.. _ref.linearization-strategy:
|
|
|
|
|
|
|
|
|
|
Basic Strategy for Linearization
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
To avoid the incestuous problem of having the qpdf library validate its
|
|
|
|
|
own linearized files, we have a special linearized file checking mode
|
2021-12-12 00:01:40 +00:00
|
|
|
|
which can be invoked via :command:`qpdf
|
|
|
|
|
--check-linearization` (or :command:`qpdf
|
|
|
|
|
--check`). This mode reads the linearization parameter
|
2021-12-11 21:53:08 +00:00
|
|
|
|
dictionary and the hint streams and validates that object ordering,
|
|
|
|
|
parameters, and hint stream contents are correct. The validation code
|
|
|
|
|
was first tested against linearized files created by external tools
|
|
|
|
|
(Acrobat and pdlin) and then used to validate files created by
|
|
|
|
|
``QPDFWriter`` itself.
|
|
|
|
|
|
|
|
|
|
.. _ref.linearized.preparation:
|
|
|
|
|
|
|
|
|
|
Preparing For Linearization
|
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
|
|
Before creating a linearized PDF file from any other PDF file, the PDF
|
|
|
|
|
file must be altered such that all page attributes are propagated down
|
|
|
|
|
to the page level (and not inherited from parents in the ``/Pages``
|
|
|
|
|
tree). We also have to know which objects refer to which other objects,
|
|
|
|
|
being concerned with page boundaries and a few other cases. We refer to
|
|
|
|
|
this part of preparing the PDF file as
|
2021-12-12 00:24:35 +00:00
|
|
|
|
*optimization*, discussed in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.optimization`. Note the, in this context, the
|
2021-12-12 00:24:35 +00:00
|
|
|
|
term *optimization* is a qpdf term, and the
|
|
|
|
|
term *linearization* is a term from the PDF
|
2021-12-11 21:53:08 +00:00
|
|
|
|
specification. Do not be confused by the fact that many applications
|
|
|
|
|
refer to linearization as optimization or web optimization.
|
|
|
|
|
|
|
|
|
|
When creating linearized PDF files from optimized PDF files, there are
|
|
|
|
|
really only a few issues that need to be dealt with:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Creation of hints tables
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Placing objects in the correct order
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Filling in offsets and byte sizes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.optimization:
|
|
|
|
|
|
|
|
|
|
Optimization
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
In order to perform various operations such as linearization and
|
|
|
|
|
splitting files into pages, it is necessary to know which objects are
|
|
|
|
|
referenced by which pages, page thumbnails, and root and trailer
|
|
|
|
|
dictionary keys. It is also necessary to ensure that all page-level
|
|
|
|
|
attributes appear directly at the page level and are not inherited from
|
|
|
|
|
parents in the pages tree.
|
|
|
|
|
|
|
|
|
|
We refer to the process of enforcing these constraints as
|
2021-12-12 00:24:35 +00:00
|
|
|
|
*optimization*. As mentioned above, note
|
2021-12-11 21:53:08 +00:00
|
|
|
|
that some applications refer to linearization as optimization. Although
|
|
|
|
|
this optimization was initially motivated by the need to create
|
|
|
|
|
linearized files, we are using these terms separately.
|
|
|
|
|
|
|
|
|
|
PDF file optimization is implemented in the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF_optimization.cc` source file. That file
|
2021-12-11 21:53:08 +00:00
|
|
|
|
is richly commented and serves as the primary reference for the
|
|
|
|
|
optimization process.
|
|
|
|
|
|
|
|
|
|
After optimization has been completed, the private member variables
|
|
|
|
|
``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
|
|
|
|
|
been populated. Any object that has more than one value in the
|
|
|
|
|
``object_to_obj_users`` table is shared. Any object that has exactly one
|
|
|
|
|
value in the ``object_to_obj_users`` table is private. To find all the
|
|
|
|
|
private objects in a page or a trailer or root dictionary key, one
|
|
|
|
|
merely has make this determination for each element in the
|
|
|
|
|
``obj_user_to_objects`` table for the given page or key.
|
|
|
|
|
|
|
|
|
|
Note that pages and thumbnails have different object user types, so the
|
|
|
|
|
above test on a page will not include objects referenced by the page's
|
|
|
|
|
thumbnail dictionary and nothing else.
|
|
|
|
|
|
|
|
|
|
.. _ref.linearization.writing:
|
|
|
|
|
|
|
|
|
|
Writing Linearized Files
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
We will create files with only primary hint streams. We will never write
|
|
|
|
|
overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
|
|
|
|
|
and they are never necessary.) The hint streams contain offset
|
|
|
|
|
information to objects that point to where they would be if the hint
|
|
|
|
|
stream were not present. This means that we have to calculate all object
|
|
|
|
|
positions before we can generate and write the hint table. This means
|
|
|
|
|
that we have to generate the file in two passes. To make this reliable,
|
|
|
|
|
``QPDFWriter`` in linearization mode invokes exactly the same code twice
|
|
|
|
|
to write the file to a pipeline.
|
|
|
|
|
|
|
|
|
|
In the first pass, the target pipeline is a count pipeline chained to a
|
|
|
|
|
discard pipeline. The count pipeline simply passes its data through to
|
|
|
|
|
the next pipeline in the chain but can return the number of bytes passed
|
|
|
|
|
through it at any intermediate point. The discard pipeline is an end of
|
|
|
|
|
line pipeline that just throws its data away. The hint stream is not
|
|
|
|
|
written and dummy values with adequate padding are stored in the first
|
|
|
|
|
cross reference table, linearization parameter dictionary, and /Prev key
|
|
|
|
|
of the first trailer dictionary. All the offset, length, object
|
|
|
|
|
renumbering information, and anything else we need for the second pass
|
|
|
|
|
is stored.
|
|
|
|
|
|
|
|
|
|
At the end of the first pass, this information is passed to the ``QPDF``
|
|
|
|
|
class which constructs a compressed hint stream in a memory buffer and
|
|
|
|
|
returns it. ``QPDFWriter`` uses this information to write a complete
|
|
|
|
|
hint stream object into a memory buffer. At this point, the length of
|
|
|
|
|
the hint stream is known.
|
|
|
|
|
|
|
|
|
|
In the second pass, the end of the pipeline chain is a regular file
|
|
|
|
|
instead of a discard pipeline, and we have known values for all the
|
|
|
|
|
offsets and lengths that we didn't have in the first pass. We have to
|
|
|
|
|
adjust offsets that appear after the start of the hint stream by the
|
|
|
|
|
length of the hint stream, which is known. Anything that is of variable
|
|
|
|
|
length is padded, with the padding code surrounding any writing code
|
|
|
|
|
that differs in the two passes. This ensures that changes to the way
|
|
|
|
|
things are represented never results in offsets that were gathered
|
|
|
|
|
during the first pass becoming incorrect for the second pass.
|
|
|
|
|
|
|
|
|
|
Using this strategy, we can write linearized files to a non-seekable
|
|
|
|
|
output stream with only a single pass to disk or wherever the output is
|
|
|
|
|
going.
|
|
|
|
|
|
|
|
|
|
.. _ref.linearization-data:
|
|
|
|
|
|
|
|
|
|
Calculating Linearization Data
|
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
|
|
Once a file is optimized, we have information about which objects access
|
|
|
|
|
which other objects. We can then process these tables to decide which
|
|
|
|
|
part (as described in "Linearized PDF Document Structure" in the PDF
|
|
|
|
|
specification) each object is contained within. This tells us the exact
|
|
|
|
|
order in which objects are written. The ``QPDFWriter`` class asks for
|
|
|
|
|
this information and enqueues objects for writing in the proper order.
|
|
|
|
|
It also turns on a check that causes an exception to be thrown if an
|
|
|
|
|
object is encountered that has not already been queued. (This could
|
|
|
|
|
happen only if there were a bug in the traversal code used to calculate
|
|
|
|
|
the linearization data.)
|
|
|
|
|
|
|
|
|
|
.. _ref.linearization-issues:
|
|
|
|
|
|
|
|
|
|
Known Issues with Linearization
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
There are a handful of known issues with this linearization code. These
|
|
|
|
|
issues do not appear to impact the behavior of linearized files which
|
|
|
|
|
still work as intended: it is possible for a web browser to begin to
|
|
|
|
|
display them before they are fully downloaded. In fact, it seems that
|
|
|
|
|
various other programs that create linearized files have many of these
|
|
|
|
|
same issues. These items make reference to terminology used in the
|
|
|
|
|
linearization appendix of the PDF specification.
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Thread Dictionary information keys appear in part 4 with the rest of
|
|
|
|
|
Threads instead of in part 9. Objects in part 9 are not grouped
|
|
|
|
|
together functionally.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- We are not calculating numerators for shared object positions within
|
|
|
|
|
content streams or interleaving them within content streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- We generate only page offset, shared object, and outline hint tables.
|
|
|
|
|
It would be relatively easy to add some additional tables. We gather
|
|
|
|
|
most of the information needed to create thumbnail hint tables. There
|
|
|
|
|
are comments in the code about this.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.linearization-debugging:
|
|
|
|
|
|
|
|
|
|
Debugging Note
|
|
|
|
|
--------------
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
The :command:`qpdf --show-linearization` command can show
|
2021-12-11 21:53:08 +00:00
|
|
|
|
the complete contents of linearization hint streams. To look at the raw
|
|
|
|
|
data, you can extract the filtered contents of the linearization hint
|
2021-12-12 00:01:40 +00:00
|
|
|
|
tables using :command:`qpdf --show-object=n
|
|
|
|
|
--filtered-stream-data`. Then, to convert this into a bit
|
2021-12-11 21:53:08 +00:00
|
|
|
|
stream (since linearization tables are bit streams written without
|
|
|
|
|
regard to byte boundaries), you can pipe the resulting data through the
|
|
|
|
|
following perl code:
|
|
|
|
|
|
2021-12-12 19:05:59 +00:00
|
|
|
|
.. code-block:: perl
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
use bytes;
|
|
|
|
|
binmode STDIN;
|
|
|
|
|
undef $/;
|
|
|
|
|
my $a = <STDIN>;
|
|
|
|
|
my @ch = split(//, $a);
|
|
|
|
|
map { printf("%08b", ord($_)) } @ch;
|
|
|
|
|
print "\n";
|
|
|
|
|
|
|
|
|
|
.. _ref.object-and-xref-streams:
|
|
|
|
|
|
|
|
|
|
Object and Cross-Reference Streams
|
|
|
|
|
==================================
|
|
|
|
|
|
|
|
|
|
This chapter provides information about the implementation of object
|
|
|
|
|
stream and cross-reference stream support in qpdf.
|
|
|
|
|
|
|
|
|
|
.. _ref.object-streams:
|
|
|
|
|
|
|
|
|
|
Object Streams
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
Object streams can contain any regular object except the following:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- stream objects
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- objects with generation > 0
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- the encryption dictionary
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- objects containing the /Length of another stream
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
In addition, Adobe reader (at least as of version 8.0.0) appears to not
|
|
|
|
|
be able to handle having the document catalog appear in an object stream
|
|
|
|
|
if the file is encrypted, though this is not specifically disallowed by
|
|
|
|
|
the specification.
|
|
|
|
|
|
|
|
|
|
There are additional restrictions for linearized files. See
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.object-streams-linearization` for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
The PDF specification refers to objects in object streams as "compressed
|
|
|
|
|
objects" regardless of whether the object stream is compressed.
|
|
|
|
|
|
|
|
|
|
The generation number of every object in an object stream must be zero.
|
|
|
|
|
It is possible to delete and replace an object in an object stream with
|
|
|
|
|
a regular object.
|
|
|
|
|
|
|
|
|
|
The object stream dictionary has the following keys:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/N``: number of objects
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/First``: byte offset of first object
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/Extends``: indirect reference to stream that this extends
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
Stream collections are formed with ``/Extends``. They must form a
|
|
|
|
|
directed acyclic graph. These can be used for semantic information and
|
|
|
|
|
are not meaningful to the PDF document's syntactic structure. Although
|
|
|
|
|
qpdf preserves stream collections, it never generates them and doesn't
|
|
|
|
|
make use of this information in any way.
|
|
|
|
|
|
|
|
|
|
The specification recommends limiting the number of objects in object
|
|
|
|
|
stream for efficiency in reading and decoding. Acrobat 6 uses no more
|
|
|
|
|
than 100 objects per object stream for linearized files and no more 200
|
|
|
|
|
objects per stream for non-linearized files. ``QPDFWriter``, in object
|
|
|
|
|
stream generation mode, never puts more than 100 objects in an object
|
|
|
|
|
stream.
|
|
|
|
|
|
|
|
|
|
Object stream contents consists of *N* pairs of integers, each of which
|
|
|
|
|
is the object number and the byte offset of the object relative to the
|
|
|
|
|
first object in the stream, followed by the objects themselves,
|
|
|
|
|
concatenated.
|
|
|
|
|
|
|
|
|
|
.. _ref.xref-streams:
|
|
|
|
|
|
|
|
|
|
Cross-Reference Streams
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
For non-hybrid files, the value following ``startxref`` is the byte
|
|
|
|
|
offset to the xref stream rather than the word ``xref``.
|
|
|
|
|
|
|
|
|
|
For hybrid files (files containing both xref tables and cross-reference
|
|
|
|
|
streams), the xref table's trailer dictionary contains the key
|
|
|
|
|
``/XRefStm`` whose value is the byte offset to a cross-reference stream
|
|
|
|
|
that supplements the xref table. A PDF 1.5-compliant application should
|
|
|
|
|
read the xref table first. Then it should replace any object that it has
|
|
|
|
|
already seen with any defined in the xref stream. Then it should follow
|
|
|
|
|
any ``/Prev`` pointer in the original xref table's trailer dictionary.
|
|
|
|
|
The specification is not clear about what should be done, if anything,
|
|
|
|
|
with a ``/Prev`` pointer in the xref stream referenced by an xref table.
|
|
|
|
|
The ``QPDF`` class ignores it, which is probably reasonable since, if
|
|
|
|
|
this case were to appear for any sensible PDF file, the previous xref
|
|
|
|
|
table would probably have a corresponding ``/XRefStm`` pointer of its
|
|
|
|
|
own. For example, if a hybrid file were appended, the appended section
|
|
|
|
|
would have its own xref table and ``/XRefStm``. The appended xref table
|
|
|
|
|
would point to the previous xref table which would point the
|
|
|
|
|
``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
|
|
|
|
|
it.
|
|
|
|
|
|
|
|
|
|
Since xref streams must be read very early, they may not be encrypted,
|
|
|
|
|
and the may not contain indirect objects for keys required to read them,
|
|
|
|
|
which are these:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/Type``: value ``/XRef``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/Size``: value *n+1*: where *n* is highest object number (same as
|
|
|
|
|
``/Size`` in the trailer dictionary)
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/Index`` (optional): value
|
2021-12-12 21:18:03 +00:00
|
|
|
|
``[:samp:`{n count}` ...]`` used to determine
|
2021-12-11 23:49:31 +00:00
|
|
|
|
which objects' information is stored in this stream. The default is
|
|
|
|
|
``[0 /Size]``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 21:18:03 +00:00
|
|
|
|
- ``/Prev``: value :samp:`{offset}`: byte
|
2021-12-11 23:49:31 +00:00
|
|
|
|
offset of previous xref stream (same as ``/Prev`` in the trailer
|
|
|
|
|
dictionary)
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``/W [...]``: sizes of each field in the xref table
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
The other fields in the xref stream, which may be indirect if desired,
|
|
|
|
|
are the union of those from the xref table's trailer dictionary.
|
|
|
|
|
|
|
|
|
|
.. _ref.xref-stream-data:
|
|
|
|
|
|
|
|
|
|
Cross-Reference Stream Data
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
The stream data is binary and encoded in big-endian byte order. Entries
|
|
|
|
|
are concatenated, and each entry has a length equal to the total of the
|
|
|
|
|
entries in ``/W`` above. Each entry consists of one or more fields, the
|
|
|
|
|
first of which is the type of the field. The number of bytes for each
|
|
|
|
|
field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
|
|
|
|
|
is omitted and has the default value. The default value for the field
|
|
|
|
|
type is "``1``". All other default values are "``0``".
|
|
|
|
|
|
|
|
|
|
PDF 1.5 has three field types:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- 0: for free objects. Format: ``0 obj next-generation``, same as the
|
|
|
|
|
free table in a traditional cross-reference table
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- 1: regular non-compressed object. Format: ``1 offset generation``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- 2: for objects in object streams. Format: ``2 object-stream-number
|
|
|
|
|
index``, the number of object stream containing the object and the
|
|
|
|
|
index within the object stream of the object.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
It seems standard to have the first entry in the table be ``0 0 0``
|
|
|
|
|
instead of ``0 0 ffff`` if there are no deleted objects.
|
|
|
|
|
|
|
|
|
|
.. _ref.object-streams-linearization:
|
|
|
|
|
|
|
|
|
|
Implications for Linearized Files
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
|
|
For linearized files, the linearization dictionary, document catalog,
|
|
|
|
|
and page objects may not be contained in object streams.
|
|
|
|
|
|
|
|
|
|
Objects stored within object streams are given the highest range of
|
|
|
|
|
object numbers within the main and first-page cross-reference sections.
|
|
|
|
|
|
|
|
|
|
It is okay to use cross-reference streams in place of regular xref
|
|
|
|
|
tables. There are on special considerations.
|
|
|
|
|
|
|
|
|
|
Hint data refers to object streams themselves, not the objects in the
|
|
|
|
|
streams. Shared object references should also be made to the object
|
|
|
|
|
streams. There are no reference in any hint tables to the object numbers
|
|
|
|
|
of compressed objects (objects within object streams).
|
|
|
|
|
|
|
|
|
|
When numbering objects, all shared objects within both the first and
|
|
|
|
|
second halves of the linearized files must be numbered consecutively
|
|
|
|
|
after all normal uncompressed objects in that half.
|
|
|
|
|
|
|
|
|
|
.. _ref.object-stream-implementation:
|
|
|
|
|
|
|
|
|
|
Implementation Notes
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
There are three modes for writing object streams:
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`disable`, :samp:`preserve`, and
|
|
|
|
|
:samp:`generate`. In disable mode, we do not generate
|
2021-12-11 21:53:08 +00:00
|
|
|
|
any object streams, and we also generate an xref table rather than xref
|
|
|
|
|
streams. This can be used to generate PDF files that are viewable with
|
|
|
|
|
older readers. In preserve mode, we write object streams such that
|
|
|
|
|
written object streams contain the same objects and ``/Extends``
|
|
|
|
|
relationships as in the original file. This is equal to disable if the
|
|
|
|
|
file has no object streams. In generate, we create object streams
|
|
|
|
|
ourselves by grouping objects that are allowed in object streams
|
|
|
|
|
together in sets of no more than 100 objects. We also ensure that the
|
|
|
|
|
PDF version is at least 1.5 in generate mode, but we preserve the
|
|
|
|
|
version header in the other modes. The default is
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`preserve`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
We do not support creation of hybrid files. When we write files, even in
|
|
|
|
|
preserve mode, we will lose any xref tables and merge any appended
|
|
|
|
|
sections.
|
|
|
|
|
|
|
|
|
|
.. _ref.release-notes:
|
|
|
|
|
|
|
|
|
|
Release Notes
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
For a detailed list of changes, please see the file
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog` in the source distribution.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.5.0: XXX Month dd, YYYY
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- Since qpdf version 8, using object accessor methods on an
|
|
|
|
|
instance of ``QPDFObjectHandle`` may create warnings if the
|
|
|
|
|
object is not of the expected type. These warnings now have an
|
|
|
|
|
error code of ``qpdf_e_object`` instead of
|
|
|
|
|
``qpdf_e_damaged_pdf``. Also, comments have been added to
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFObjectHandle.hh` to explain in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
more detail what the behavior is. See :ref:`ref.object-accessors` for a more in-depth
|
2021-12-11 23:49:31 +00:00
|
|
|
|
discussion.
|
|
|
|
|
|
|
|
|
|
- Overhaul error handling for the object handle functions in the
|
|
|
|
|
C API. See comments in the "Object handling" section of
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf/qpdf-c.h` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
In particular, exceptions thrown by the underlying C++ code
|
|
|
|
|
when calling object accessors are caught and converted into
|
|
|
|
|
errors. The errors can be trapped by registering an error
|
|
|
|
|
handler with ``qpdf_register_oh_error_handler`` or will be
|
|
|
|
|
written to stderr if no handler is registered.
|
|
|
|
|
|
|
|
|
|
- Add ``qpdf_get_last_string_length`` to the C API to get the
|
|
|
|
|
length of the last string that was returned. This is needed to
|
|
|
|
|
handle strings that contain embedded null characters.
|
|
|
|
|
|
|
|
|
|
- Add ``qpdf_oh_is_initialized`` and
|
|
|
|
|
``qpdf_oh_new_uninitialized`` to the C API to make it possible
|
|
|
|
|
to work with uninitialized objects.
|
|
|
|
|
|
|
|
|
|
- Add ``qpdf_oh_new_object`` to the C API. This allows you to
|
|
|
|
|
clone an object handle.
|
|
|
|
|
|
|
|
|
|
- Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
|
|
|
|
|
and ``qpdf_replace_object``, exposing the corresponding methods
|
|
|
|
|
in ``QPDF`` and ``QPDFObjectHandle``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.4.0: November 16, 2021
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Handling of Weak Cryptography Algorithms
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- From the qpdf CLI, the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--allow-weak-crypto` is now required to
|
2021-12-11 23:49:31 +00:00
|
|
|
|
suppress a warning when explicitly creating PDF files using RC4
|
|
|
|
|
encryption. While qpdf will always retain the ability to read
|
|
|
|
|
and write such files, doing so will require explicit
|
|
|
|
|
acknowledgment moving forward. For qpdf 10.4, this change only
|
|
|
|
|
affects the command-line tool. Starting in qpdf 11, there will
|
|
|
|
|
be small API changes to require explicit acknowledgment in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
those cases as well. For additional information, see :ref:`ref.weak-crypto`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix potential bounds error when handling shell completion that
|
|
|
|
|
could occur when given bogus input.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Properly handle overlay/underlay on completely empty pages
|
|
|
|
|
(with no resource dictionary).
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix crash that could occur under certain conditions when using
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages` with files that had form
|
2021-12-11 23:49:31 +00:00
|
|
|
|
fields.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Library Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Make ``QPDF::findPage`` functions public.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add methods to ``Pl_Flate`` to be able to receive warnings on
|
|
|
|
|
certain recoverable conditions.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add an extra check to the library to detect when foreign
|
|
|
|
|
objects are inserted directly (instead of using
|
|
|
|
|
``QPDF::copyForeignObject``) at the time of insertion rather
|
|
|
|
|
than when the file is written. Catching the error sooner makes
|
|
|
|
|
it much easier to locate the incorrect code.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- CLI Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Improve diagnostics around parsing
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages` command-line options
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Packaging Changes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- The Windows binary distribution is now built with crypto
|
|
|
|
|
provided by OpenSSL 3.0.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.3.2: May 8, 2021
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When generating a file while preserving object streams,
|
|
|
|
|
unreferenced objects are correctly removed unless
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--preserve-unreferenced` is specified.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Library Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When adding a page that already exists, make a shallow copy
|
|
|
|
|
instead of throwing an exception. This makes the library
|
|
|
|
|
behavior consistent with the CLI behavior. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog` for additional notes.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.3.1: March 11, 2021
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Form field copying failed on files where /DR was a direct
|
|
|
|
|
object in the document-level form dictionary.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.3.0: March 4, 2021
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- The code for handling form fields when copying pages from
|
|
|
|
|
10.2.0 was not quite right and didn't work in a number of
|
|
|
|
|
situations, such as when the same page was copied multiple
|
|
|
|
|
times or when there were conflicting resource or field names
|
|
|
|
|
across multiple copies. The 10.3.0 code has been much more
|
|
|
|
|
thoroughly tested with more complex cases and with a multitude
|
|
|
|
|
of readers and should be much closer to correct. The 10.2.0
|
|
|
|
|
code worked well enough for page splitting or for copying pages
|
|
|
|
|
with form fields into documents that didn't already have them
|
|
|
|
|
but was still not quite correct in handling of field-level
|
|
|
|
|
resources.
|
|
|
|
|
|
|
|
|
|
- When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
|
|
|
|
|
called, existing ``QPDFObjectHandle`` instances no longer point
|
|
|
|
|
to the old objects. The next time they are accessed, they
|
|
|
|
|
automatically notice the change to the underlying object and
|
|
|
|
|
update themselves. This resolves a very longstanding source of
|
|
|
|
|
confusion, albeit in a very rarely used method call.
|
|
|
|
|
|
|
|
|
|
- Fix form field handling code to look for default appearances,
|
|
|
|
|
quadding, and default resources in the right places. The code
|
|
|
|
|
was not looking for things in the document-level interactive
|
|
|
|
|
form dictionary that it was supposed to be finding there. This
|
|
|
|
|
required adding a few new methods to
|
|
|
|
|
``QPDFFormFieldObjectHelper``.
|
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- Reworked the code that handles copying annotations and form
|
|
|
|
|
fields during page operations. There were additional methods
|
|
|
|
|
added to the public API from 10.2.0 and a one deprecation of a
|
|
|
|
|
method added in 10.2.0. The majority of the API changes are in
|
|
|
|
|
methods most people would never call and that will hopefully be
|
|
|
|
|
superseded by higher-level interfaces for handling page copies.
|
2021-12-12 00:02:42 +00:00
|
|
|
|
Please see the :file:`ChangeLog` file for
|
2021-12-11 23:49:31 +00:00
|
|
|
|
details.
|
|
|
|
|
|
|
|
|
|
- The method ``QPDF::numWarnings`` was added so that you can tell
|
|
|
|
|
whether any warnings happened during a specific block of code.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.2.0: February 23, 2021
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- CLI Behavior Changes
|
|
|
|
|
|
|
|
|
|
- Operations that work on combining pages are much better about
|
|
|
|
|
protecting form fields. In particular,
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--split-pages` and
|
|
|
|
|
:samp:`--pages` now preserve interaction form
|
2021-12-11 23:49:31 +00:00
|
|
|
|
functionality by copying the relevant form field information
|
|
|
|
|
from the original files. Additionally, if you use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages` to select only some pages from
|
2021-12-11 23:49:31 +00:00
|
|
|
|
the original input file, unused form fields are removed, which
|
|
|
|
|
prevents lots of unused annotations from being retained.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- By default, :command:`qpdf` no longer allows
|
2021-12-11 23:49:31 +00:00
|
|
|
|
creation of encrypted PDF files whose user password is
|
|
|
|
|
non-empty and owner password is empty when a 256-bit key is in
|
2021-12-12 00:11:56 +00:00
|
|
|
|
use. The :samp:`--allow-insecure` option,
|
|
|
|
|
specified inside the :samp:`--encrypt` options,
|
2021-12-11 23:49:31 +00:00
|
|
|
|
allows creation of such files. Behavior changes in the CLI are
|
|
|
|
|
avoided when possible, but an exception was made here because
|
|
|
|
|
this is security-related. qpdf must always allow creation of
|
|
|
|
|
weird files for testing purposes, but it should not default to
|
|
|
|
|
letting users unknowingly create insecure files.
|
|
|
|
|
|
|
|
|
|
- Library Behavior Changes
|
|
|
|
|
|
|
|
|
|
- Note: the changes in this section cause differences in output
|
|
|
|
|
in some cases. These differences change the syntax of the PDF
|
|
|
|
|
but do not change the semantics (meaning). I make a strong
|
|
|
|
|
effort to avoid gratuitous changes in qpdf's output so that
|
|
|
|
|
qpdf changes don't break people's tests. In this case, the
|
|
|
|
|
changes significantly improve the readability of the generated
|
|
|
|
|
PDF and don't affect any output that's generated by simple
|
|
|
|
|
transformation. If you are annoyed by having to update test
|
|
|
|
|
files, please rest assured that changes like this have been and
|
|
|
|
|
will continue to be rare events.
|
|
|
|
|
|
|
|
|
|
- ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
|
|
|
|
|
ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
|
|
|
|
|
the characters in the string. This reduces needless encoding in
|
|
|
|
|
UTF-16 of strings that can be encoded in ASCII. This change may
|
|
|
|
|
cause qpdf to generate different output than before when form
|
|
|
|
|
field values are set using ``QPDFFormFieldObjectHelper`` but
|
|
|
|
|
does not change the meaning of the output.
|
|
|
|
|
|
|
|
|
|
- The code that places form XObjects and also the code that
|
|
|
|
|
flattens rotations trim trailing zeroes from real numbers that
|
|
|
|
|
they calculate. This causes slight (but semantically
|
|
|
|
|
equivalent) differences in generated appearance streams and
|
|
|
|
|
form XObject invocations in overlay/underlay code or in user
|
|
|
|
|
code that calls the methods that place form XObjects on a page.
|
|
|
|
|
|
|
|
|
|
- CLI Enhancements
|
|
|
|
|
|
|
|
|
|
- Add new command line options for listing, saving, adding,
|
2021-12-12 00:31:19 +00:00
|
|
|
|
removing, and and copying file attachments. See :ref:`ref.attachments` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Page splitting and merging operations, as well as
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--flatten-rotation`, are better behaved
|
2021-12-11 23:49:31 +00:00
|
|
|
|
with respect to annotations and interactive form fields. In
|
|
|
|
|
most cases, interactive form field functionality and proper
|
|
|
|
|
formatting and functionality of annotations is preserved by
|
|
|
|
|
these operations. There are still some cases that aren't
|
|
|
|
|
perfect, such as when functionality of annotations depends on
|
|
|
|
|
document-level data that qpdf doesn't yet understand or when
|
|
|
|
|
there are problems with referential integrity among form fields
|
|
|
|
|
and annotations (e.g., when a single form field object or its
|
|
|
|
|
associated annotations are shared across multiple pages, a case
|
|
|
|
|
that is out of spec but that works in most viewers anyway).
|
|
|
|
|
|
|
|
|
|
- The option
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--password-file={filename}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
can now be used to read the decryption password from a file.
|
|
|
|
|
You can use ``-`` as the file name to read the password from
|
|
|
|
|
standard input. This is an easier/more obvious way to read
|
|
|
|
|
passwords from files or standard input than using
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`@file` for this purpose.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Add some information about attachments to the json output, and
|
|
|
|
|
added ``attachments`` as an additional json key. The
|
|
|
|
|
information included here is limited to the preferred name and
|
|
|
|
|
content stream and a reference to the file spec object. This is
|
|
|
|
|
enough detail for clients to avoid the hassle of navigating a
|
|
|
|
|
name tree and provides what is needed for basic enumeration and
|
|
|
|
|
extraction of attachments. More detailed information can be
|
|
|
|
|
obtained by following the reference to the file spec object.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Add numeric option to :samp:`--collate`. If
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--collate={n}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
is given, take pages in groups of
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`{n}` from the given files.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- It is now valid to provide :samp:`--rotate=0`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
to clear rotation from a page.
|
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- This release includes numerous additions to the API. Not all
|
|
|
|
|
changes are listed here. Please see the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog` file in the source
|
2021-12-11 23:49:31 +00:00
|
|
|
|
distribution for a comprehensive list. Highlights appear below.
|
|
|
|
|
|
|
|
|
|
- Add ``QPDFObjectHandle::ditems()`` and
|
|
|
|
|
``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
|
|
|
|
|
including range-for iteration, over dictionary and array
|
|
|
|
|
QPDFObjectHandles. See comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf/QPDFObjectHandle.hh`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
and
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-name-number-tree.cc`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for details.
|
|
|
|
|
|
|
|
|
|
- Add ``QPDFObjectHandle::copyStream`` for making a copy of a
|
|
|
|
|
stream within the same ``QPDF`` instance.
|
|
|
|
|
|
|
|
|
|
- Add new helper classes for supporting file attachments, also
|
|
|
|
|
known as embedded files. New classes are
|
|
|
|
|
``QPDFEmbeddedFileDocumentHelper``,
|
|
|
|
|
``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
|
|
|
|
|
See their respective headers for details and
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-attach-file.cc` for an
|
2021-12-11 23:49:31 +00:00
|
|
|
|
example.
|
|
|
|
|
|
|
|
|
|
- Add a version of ``QPDFObjectHandle::parse`` that takes a
|
|
|
|
|
``QPDF`` pointer as context so that it can parse strings
|
|
|
|
|
containing indirect object references. This is illustrated in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-attach-file.cc`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Re-implement ``QPDFNameTreeObjectHelper`` and
|
|
|
|
|
``QPDFNumberTreeObjectHelper`` to be more efficient, add an
|
|
|
|
|
iterator-based API, give them the capability to repair broken
|
|
|
|
|
trees, and create methods for modifying the trees. With this
|
|
|
|
|
change, qpdf has a robust read/write implementation of name and
|
|
|
|
|
number trees.
|
|
|
|
|
|
|
|
|
|
- Add new versions of ``QPDFObjectHandle::replaceStreamData``
|
|
|
|
|
that take ``std::function`` objects for cases when you need
|
|
|
|
|
something between a static string and a full-fledged
|
|
|
|
|
StreamDataProvider. Using this with ``QUtil::file_provider`` is
|
|
|
|
|
a very easy way to create a stream from the contents of a file.
|
|
|
|
|
|
|
|
|
|
- The ``QPDFMatrix`` class, formerly a private, internal class,
|
|
|
|
|
has been added to the public API. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`include/qpdf/QPDFMatrix.hh` for
|
2021-12-11 23:49:31 +00:00
|
|
|
|
details. This class is for working with transformation
|
|
|
|
|
matrices. Some methods in ``QPDFPageObjectHelper`` make use of
|
|
|
|
|
this to make information about transformation matrices
|
|
|
|
|
available. For an example, see
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-overlay-page.cc`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Several new methods were added to
|
|
|
|
|
``QPDFAcroFormDocumentHelper`` for adding, removing, getting
|
|
|
|
|
information about, and enumerating form fields.
|
|
|
|
|
|
|
|
|
|
- Add method
|
|
|
|
|
``QPDFAcroFormDocumentHelper::transformAnnotations``, which
|
|
|
|
|
applies a transformation to each annotation on a page.
|
|
|
|
|
|
|
|
|
|
- Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
|
|
|
|
|
annotations and, if applicable, associated form fields, from
|
|
|
|
|
one page to another, possibly transforming the rectangles.
|
|
|
|
|
|
|
|
|
|
- Build Changes
|
|
|
|
|
|
|
|
|
|
- A C++-14 compiler is now required to build qpdf. There is no
|
|
|
|
|
intention to require anything newer than that for a while.
|
|
|
|
|
C++-14 includes modest enhancements to C++-11 and appears to be
|
|
|
|
|
supported about as widely as C++-11.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--flatten-rotation` option applies
|
2021-12-11 23:49:31 +00:00
|
|
|
|
transformations to any annotations that may be on the page.
|
|
|
|
|
|
|
|
|
|
- If a form XObject lacks a resources dictionary, consider any
|
|
|
|
|
names in that form XObject to be referenced from the containing
|
|
|
|
|
page. This is compliant with older PDF versions. Also detect if
|
|
|
|
|
any form XObjects have any unresolved names and, if so, don't
|
|
|
|
|
remove unreferenced resources from them or from the page that
|
|
|
|
|
contains them. Unfortunately this has the side effect of
|
|
|
|
|
preventing removal of unreferenced resources in some cases
|
|
|
|
|
where names appear that don't refer to resources, such as with
|
|
|
|
|
tagged PDF. This is a bit of a corner case that is not likely
|
|
|
|
|
to cause a significant problem in practice, but the only side
|
|
|
|
|
effect would be lack of removal of shared resources. A future
|
|
|
|
|
version of qpdf may be more sophisticated in its detection of
|
|
|
|
|
names that refer to resources.
|
|
|
|
|
|
|
|
|
|
- Properly handle strings if they appear in inline image
|
|
|
|
|
dictionaries while externalizing inline images.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.1.0: January 5, 2021
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- CLI Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Add :samp:`--flatten-rotation` command-line
|
2021-12-11 23:49:31 +00:00
|
|
|
|
option, which causes all pages that are rotated using
|
|
|
|
|
parameters in the page's dictionary to instead be identically
|
|
|
|
|
rotated in the page's contents. The change is not user-visible
|
|
|
|
|
for compliant PDF readers but can be used to work around broken
|
|
|
|
|
PDF applications that don't properly handle page rotation.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Library Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Support for user-provided (pluggable, modular) stream filters.
|
|
|
|
|
It is now possible to derive a class from ``QPDFStreamFilter``
|
|
|
|
|
and register it with ``QPDF`` so that regular library methods,
|
|
|
|
|
including those used by ``QPDFWriter``, can decode streams with
|
|
|
|
|
filters not directly supported by the library. The example
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-custom-filter.cc`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
illustrates how to use this capability.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add methods to ``QPDFPageObjectHelper`` to iterate through
|
|
|
|
|
XObjects on a page or form XObjects, possibly recursing into
|
|
|
|
|
nested form XObjects: ``forEachXObject``, ``ForEachImage``,
|
|
|
|
|
``forEachFormXObject``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Enhance several methods in ``QPDFPageObjectHelper`` to work
|
|
|
|
|
with form XObjects as well as pages, as noted in comments. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog` for a full list.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Rename some functions in ``QPDFPageObjectHelper``, while
|
|
|
|
|
keeping old names for compatibility:
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``getPageImages`` to ``getImages``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``filterPageContents`` to ``filterContents``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``pipePageContents`` to ``pipeContents``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``parsePageContents`` to ``parseContents``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
|
|
|
|
|
a map of form XObjects directly on a page or form XObject
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add new helper methods to ``QPDFObjectHandle``:
|
|
|
|
|
``isFormXObject``, ``isImage``
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add the optional ``allow_streams`` parameter
|
|
|
|
|
``QPDFObjectHandle::makeDirect``. When
|
|
|
|
|
``QPDFObjectHandle::makeDirect`` is called in this way, it
|
|
|
|
|
preserves references to streams rather than throwing an
|
|
|
|
|
exception.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
|
|
|
|
|
on a stream prevents ``QPDFWriter`` from attempting to
|
|
|
|
|
uncompress, recompress, or otherwise filter a stream even if it
|
|
|
|
|
could. Developers can use this to protect streams that are
|
|
|
|
|
optimized should be protected from ``QPDFWriter``'s default
|
|
|
|
|
behavior for any other reason.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
|
|
|
|
|
useful to have for debugging.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add method ``QPDFPageObjectHelper::flattenRotation``, which
|
|
|
|
|
replaces a page's ``/Rotate`` keyword by rotating the page
|
|
|
|
|
within the content stream and altering the page's bounding
|
|
|
|
|
boxes so the rendering is the same. This can be used to work
|
|
|
|
|
around buggy PDF readers that can't properly handle page
|
|
|
|
|
rotation.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- C API Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add several new functions to the C API for working with
|
|
|
|
|
objects. These are wrappers around many of the methods in
|
|
|
|
|
``QPDFObjectHandle``. Their inclusion adds considerable new
|
|
|
|
|
capability to the C API.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add ``qpdf_register_progress_reporter`` to the C API,
|
|
|
|
|
corresponding to ``QPDFWriter::registerProgressReporter``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Performance Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
|
|
|
|
|
for writing, resulting in about an 8% improvement in write
|
|
|
|
|
performance while allowing indirect objects to appear in
|
|
|
|
|
``/DecodeParms``.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- When extracting pages, the :command:`qpdf` CLI
|
2021-12-11 23:49:31 +00:00
|
|
|
|
only removes unreferenced resources from the pages that are
|
|
|
|
|
being kept, resulting in a significant performance improvement
|
|
|
|
|
when extracting small numbers of pages from large, complex
|
|
|
|
|
documents.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``QPDFPageObjectHelper::externalizeInlineImages`` was not
|
|
|
|
|
externalizing images referenced from form XObjects that
|
|
|
|
|
appeared on the page.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- ``QPDFObjectHandle::filterPageContents`` was broken for pages
|
|
|
|
|
with multiple content streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Tweak zsh completion code to behave a little better with
|
|
|
|
|
respect to path completion.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.0.4: November 21, 2020
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix a handful of integer overflows. This includes cases found
|
|
|
|
|
by fuzzing as well as having qpdf not do range checking on
|
|
|
|
|
unused values in the xref stream.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.0.3: October 31, 2020
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- The fix to the bug involving copying streams with indirect
|
|
|
|
|
filters was incorrect and introduced a new, more serious bug.
|
|
|
|
|
The original bug has been fixed correctly, as has the bug
|
|
|
|
|
introduced in 10.0.2.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.0.2: October 27, 2020
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- When concatenating content streams, as with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--coalesce-contents`, there were cases
|
2021-12-11 23:49:31 +00:00
|
|
|
|
in which qpdf would merge two lexical tokens together, creating
|
|
|
|
|
invalid results. A newline is now inserted between merged
|
|
|
|
|
content streams if one is not already present.
|
|
|
|
|
|
|
|
|
|
- Fix an internal error that could occur when copying foreign
|
|
|
|
|
streams whose stream data had been replaced using a stream data
|
|
|
|
|
provider if those streams had indirect filters or decode
|
|
|
|
|
parameters. This is a rare corner case.
|
|
|
|
|
|
|
|
|
|
- Ensure that the caller's locale settings do not change the
|
|
|
|
|
results of numeric conversions performed internally by the qpdf
|
|
|
|
|
library. Note that the problem here could only be caused when
|
|
|
|
|
the qpdf library was used programmatically. Using the qpdf CLI
|
|
|
|
|
already ignored the user's locale for numeric conversion.
|
|
|
|
|
|
|
|
|
|
- Fix several instances in which warnings were not suppressed in
|
2021-12-12 00:11:56 +00:00
|
|
|
|
spite of :samp:`--no-warn` and/or errors or
|
2021-12-11 23:49:31 +00:00
|
|
|
|
warnings were written to standard output rather than standard
|
|
|
|
|
error.
|
|
|
|
|
|
|
|
|
|
- Fixed a memory leak that could occur under specific
|
|
|
|
|
circumstances when
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--object-streams=generate` was used.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Fix various integer overflows and similar conditions found by
|
|
|
|
|
the OSS-Fuzz project.
|
|
|
|
|
|
|
|
|
|
- Enhancements
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- New option :samp:`--warning-exit-0` causes qpdf
|
2021-12-11 23:49:31 +00:00
|
|
|
|
to exit with a status of ``0`` rather than ``3`` if there are
|
|
|
|
|
warnings but no errors. Combine with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--no-warn` to completely ignore
|
2021-12-11 23:49:31 +00:00
|
|
|
|
warnings.
|
|
|
|
|
|
|
|
|
|
- Performance improvements have been made to
|
|
|
|
|
``QPDF::processMemoryFile``.
|
|
|
|
|
|
|
|
|
|
- The OpenSSL crypto provider produces more detailed error
|
|
|
|
|
messages.
|
|
|
|
|
|
|
|
|
|
- Build Changes
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The option :samp:`--disable-rpath` is now
|
2021-12-12 00:01:40 +00:00
|
|
|
|
supported by qpdf's :command:`./configure`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
script. Some distributions' packaging standards recommended the
|
|
|
|
|
use of this option.
|
|
|
|
|
|
|
|
|
|
- Selection of a printf format string for ``long long`` has
|
|
|
|
|
been moved from ``ifdefs`` to an autoconf
|
|
|
|
|
test. If you are using your own build system, you will need to
|
|
|
|
|
provide a value for ``LL_FMT`` in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`libqpdf/qpdf/qpdf-config.h`, which
|
2021-12-11 23:49:31 +00:00
|
|
|
|
would typically be ``"%lld"`` or, for some Windows compilers,
|
|
|
|
|
``"%I64d"``.
|
|
|
|
|
|
|
|
|
|
- Several improvements were made to build-time configuration of
|
|
|
|
|
the OpenSSL crypto provider.
|
|
|
|
|
|
|
|
|
|
- A nearly stand-alone Linux binary zip file is now included with
|
|
|
|
|
the qpdf release. This is built on an older (but supported)
|
|
|
|
|
Ubuntu LTS release, but would work on most reasonably recent
|
|
|
|
|
Linux distributions. It contains only the executables and
|
|
|
|
|
required shared libraries that would not be present on a
|
|
|
|
|
minimal system. It can be used for including qpdf in a minimal
|
|
|
|
|
environment, such as a docker container. The zip file is also
|
|
|
|
|
known to work as a layer in AWS Lambda.
|
|
|
|
|
|
|
|
|
|
- QPDF's automated build has been migrated from Azure Pipelines
|
|
|
|
|
to GitHub Actions.
|
|
|
|
|
|
|
|
|
|
- Windows-specific Changes
|
|
|
|
|
|
|
|
|
|
- The Windows executables distributed with qpdf releases now use
|
|
|
|
|
the OpenSSL crypto provider by default. The native crypto
|
|
|
|
|
provider is also compiled in and can be selected at runtime
|
|
|
|
|
with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
|
|
|
|
|
|
|
|
|
|
- Improvements have been made to how a cryptographic provider is
|
|
|
|
|
obtained in the native Windows crypto implementation. However
|
|
|
|
|
mostly this is shadowed by OpenSSL being used by default.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.0.1: April 9, 2020
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- 10.0.0 introduced a bug in which calling
|
|
|
|
|
``QPDFObjectHandle::getStreamData`` on a stream that can't be
|
|
|
|
|
filtered was returning the raw data instead of throwing an
|
|
|
|
|
exception. This is now fixed.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix a bug that was preventing qpdf from linking with some
|
|
|
|
|
versions of clang on some platforms.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:02:42 +00:00
|
|
|
|
- Improve the :file:`pdf-invert-images`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
example to avoid having to load all the images into RAM at the
|
|
|
|
|
same time.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
10.0.0: April 6, 2020
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Performance Enhancements
|
|
|
|
|
|
|
|
|
|
- The qpdf library and executable should run much faster in this
|
|
|
|
|
version than in the last several releases. Several internal
|
|
|
|
|
library optimizations have been made, and there has been
|
|
|
|
|
improved behavior on page splitting as well. This version of
|
|
|
|
|
qpdf should outperform any of the 8.x or 9.x versions.
|
|
|
|
|
|
|
|
|
|
- Incompatible API (source-level) Changes (minor)
|
|
|
|
|
|
|
|
|
|
- The ``QUtil::srandom`` method was removed. It didn't do
|
|
|
|
|
anything unless insecure random numbers were compiled in, and
|
|
|
|
|
they have been off by default for a long time. If you were
|
|
|
|
|
calling it, just remove the call since it wasn't doing anything
|
|
|
|
|
anyway.
|
|
|
|
|
|
|
|
|
|
- Build/Packaging Changes
|
|
|
|
|
|
|
|
|
|
- Add a ``openssl`` crypto provider, which is implemented with
|
|
|
|
|
OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
|
|
|
|
|
for this contribution. If you maintain qpdf for a distribution,
|
|
|
|
|
pay special attention to make sure that you are including
|
|
|
|
|
support for the crypto providers you want. Package maintainers
|
|
|
|
|
will have to weigh the advantages of allowing users to pick a
|
|
|
|
|
crypto provider at runtime against the disadvantages of adding
|
|
|
|
|
more dependencies to qpdf.
|
|
|
|
|
|
|
|
|
|
- Allow qpdf to built on stripped down systems whose C/C++
|
|
|
|
|
libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
|
|
|
|
|
qpdf's README.md for details. This should be very rare, but it
|
|
|
|
|
is known to be helpful in some embedded environments.
|
|
|
|
|
|
|
|
|
|
- CLI Enhancements
|
|
|
|
|
|
|
|
|
|
- Add ``objectinfo`` key to the JSON output. This will be a place
|
|
|
|
|
to put computed metadata or other information about PDF objects
|
|
|
|
|
that are not immediately evident in other ways or that seem
|
|
|
|
|
useful for some other reason. In this version, information is
|
|
|
|
|
provided about each object indicating whether it is a stream
|
|
|
|
|
and, if so, what its length and filters are. Without this, it
|
|
|
|
|
was not possible to tell conclusively from the JSON output
|
|
|
|
|
alone whether or not an object was a stream. Run
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf --json-help` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Add new option
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--remove-unreferenced-resources` which
|
2021-12-11 23:49:31 +00:00
|
|
|
|
takes ``auto``, ``yes``, or ``no`` as arguments. The new
|
|
|
|
|
``auto`` mode, which is the default, performs a fast heuristic
|
|
|
|
|
over a PDF file when splitting pages to determine whether the
|
|
|
|
|
expensive process of finding and removing unreferenced
|
|
|
|
|
resources is likely to be of benefit. For most files, this new
|
|
|
|
|
default will result in a significant performance improvement
|
2021-12-12 00:31:19 +00:00
|
|
|
|
for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed
|
2021-12-11 23:49:31 +00:00
|
|
|
|
discussion.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--preserve-unreferenced-resources`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
is now just a synonym for
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--remove-unreferenced-resources=no`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- If the ``QPDF_EXECUTABLE`` environment variable is set when
|
2021-12-12 00:01:40 +00:00
|
|
|
|
invoking :command:`qpdf --bash-completion` or
|
|
|
|
|
:command:`qpdf --zsh-completion`, the completion
|
2021-12-11 23:49:31 +00:00
|
|
|
|
command that it outputs will refer to qpdf using the value of
|
2021-12-12 00:01:40 +00:00
|
|
|
|
that variable rather than what :command:`qpdf`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
determines its executable path to be. This can be useful when
|
2021-12-12 00:01:40 +00:00
|
|
|
|
wrapping :command:`qpdf` with a script, working
|
2021-12-11 23:49:31 +00:00
|
|
|
|
with a version in the source tree, using an AppImage, or other
|
|
|
|
|
situations where there is some indirection.
|
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- Random number generation is now delegated to the crypto
|
|
|
|
|
provider. The old behavior is still used by the native crypto
|
|
|
|
|
provider. It is still possible to provide your own random
|
|
|
|
|
number generator.
|
|
|
|
|
|
|
|
|
|
- Add a new version of
|
|
|
|
|
``QPDFObjectHandle::StreamDataProvider::provideStreamData``
|
|
|
|
|
that accepts the ``suppress_warnings`` and ``will_retry``
|
|
|
|
|
options and allows a success code to be returned. This makes it
|
|
|
|
|
possible to implement a ``StreamDataProvider`` that calls
|
|
|
|
|
``pipeStreamData`` on another stream and to pass the response
|
|
|
|
|
back to the caller, which enables better error handling on
|
|
|
|
|
those proxied streams.
|
|
|
|
|
|
|
|
|
|
- Update ``QPDFObjectHandle::pipeStreamData`` to return an
|
|
|
|
|
overall success code that goes beyond whether or not filtered
|
|
|
|
|
data was written successfully. This allows better error
|
|
|
|
|
handling of cases that were not filtering errors. You have to
|
|
|
|
|
call this explicitly. Methods in previously existing APIs have
|
|
|
|
|
the same semantics as before.
|
|
|
|
|
|
|
|
|
|
- The ``QPDFPageObjectHelper::placeFormXObject`` method now
|
|
|
|
|
allows separate control over whether it should be willing to
|
|
|
|
|
shrink or expand objects to fit them better into the
|
|
|
|
|
destination rectangle. The previous behavior was that shrinking
|
|
|
|
|
was allowed but expansion was not. The previous behavior is
|
|
|
|
|
still the default.
|
|
|
|
|
|
|
|
|
|
- When calling the C API, any non-zero value passed to a boolean
|
|
|
|
|
parameter is treated as ``TRUE``. Previously only the value
|
|
|
|
|
``1`` was accepted. This makes the C API behave more like most
|
|
|
|
|
C interfaces and is known to improve compatibility with some
|
|
|
|
|
Windows environments that dynamically load the DLL and call
|
|
|
|
|
functions from it.
|
|
|
|
|
|
|
|
|
|
- Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
|
|
|
|
|
top-level dictionary keys or array items. This is unsafe
|
|
|
|
|
because it creates a situation in which changing a lower-level
|
|
|
|
|
item in one object may also change it in another object, but
|
|
|
|
|
for cases in which you *know* you are only inserting or
|
|
|
|
|
replacing top-level items, it is much faster than
|
|
|
|
|
``QPDFObjectHandle::shallowCopy``.
|
|
|
|
|
|
|
|
|
|
- Add ``QPDFObjectHandle::filterAsContents``, which filter's a
|
|
|
|
|
stream's data as a content stream. This is useful for parsing
|
|
|
|
|
the contents for form XObjects in the same way as parsing page
|
|
|
|
|
content streams.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- When detecting and removing unreferenced resources during page
|
|
|
|
|
splitting, traverse into form XObjects and handle their
|
|
|
|
|
resources dictionaries as well.
|
|
|
|
|
|
|
|
|
|
- The same error recovery is applied to streams in other than the
|
|
|
|
|
primary input file when merging or splitting pages.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
9.1.1: January 26, 2020
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Build/Packaging Changes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- The fix-qdf program was converted from perl to C++. As such,
|
|
|
|
|
qpdf no longer has a runtime dependency on perl.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Library Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Added new helper routine ``QUtil::call_main_from_wmain`` which
|
|
|
|
|
converts ``wchar_t`` arguments to UTF-8 encoded strings. This
|
|
|
|
|
is useful for qpdf because library methods expect file names to
|
|
|
|
|
be UTF-8 encoded, even on Windows
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Added new ``QUtil::read_lines_from_file`` methods that take
|
|
|
|
|
``FILE*`` arguments and that allow preservation of end-of-line
|
|
|
|
|
characters. This also fixes a bug where
|
|
|
|
|
``QUtil::read_lines_from_file`` wouldn't work properly with
|
|
|
|
|
Unicode filenames.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- CLI Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Added options :samp:`--is-encrypted` and
|
|
|
|
|
:samp:`--requires-password` for testing whether
|
2021-12-11 23:49:31 +00:00
|
|
|
|
a file is encrypted or requires a password other than the
|
|
|
|
|
supplied (or empty) password. These communicate via exit
|
|
|
|
|
status, making them useful for shell scripts. They also work on
|
|
|
|
|
encrypted files with unknown passwords.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Added ``encrypt`` key to JSON options. With the exception of
|
|
|
|
|
the reconstructed user password for older encryption formats,
|
|
|
|
|
this provides the same information as
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--show-encryption` but in a consistent,
|
2021-12-12 00:01:40 +00:00
|
|
|
|
parseable format. See output of :command:`qpdf
|
|
|
|
|
--json-help` for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- In QDF mode, be sure not to write more than one XRef stream to
|
|
|
|
|
a file, even when
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--preserve-unreferenced` is used.
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`fix-qdf` assumes that there is only
|
2021-12-11 23:49:31 +00:00
|
|
|
|
one XRef stream, and that it appears at the end of the file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When externalizing inline images, properly handle images whose
|
|
|
|
|
color space is a reference to an object in the page's resource
|
|
|
|
|
dictionary.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Windows-specific fix for acquiring crypt context with a new
|
|
|
|
|
keyset.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
9.1.0: November 17, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Build Changes
|
|
|
|
|
|
|
|
|
|
- A C++-11 compiler is now required to build qpdf.
|
|
|
|
|
|
|
|
|
|
- A new crypto provider that uses gnutls for crypto functions is
|
2021-12-12 00:31:19 +00:00
|
|
|
|
now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto
|
|
|
|
|
providers and :ref:`ref.crypto.build` for specific information about
|
2021-12-11 23:49:31 +00:00
|
|
|
|
the build.
|
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- Incorporate contribution from Masamichi Hosoda to properly
|
|
|
|
|
handle signature dictionaries by not including them in object
|
|
|
|
|
streams, formatting the ``Contents`` key has a hexadecimal
|
|
|
|
|
string, and excluding the ``/Contents`` key from encryption and
|
|
|
|
|
decryption.
|
|
|
|
|
|
|
|
|
|
- Incorporate contribution from Masamichi Hosoda to provide new
|
|
|
|
|
API calls for getting file-level information about input and
|
|
|
|
|
output files, enabling certain operations on the files at the
|
|
|
|
|
file level rather than the object level. New methods include
|
|
|
|
|
``QPDF::getXRefTable()``,
|
|
|
|
|
``QPDFObjectHandle::getParsedOffset()``,
|
|
|
|
|
``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
|
|
|
|
|
``QPDFWriter::getWrittenXRefTable()``.
|
|
|
|
|
|
|
|
|
|
- Support build-time and runtime selectable crypto providers.
|
|
|
|
|
This includes the addition of new classes
|
|
|
|
|
``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
|
|
|
|
|
recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
|
2021-12-12 00:31:19 +00:00
|
|
|
|
variable. Crypto providers are described in depth in :ref:`ref.crypto`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- CLI Enhancements
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Addition of the :samp:`--show-crypto` option in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
support of selectable crypto providers, as described in :ref:`ref.crypto`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Allow ``:even`` or ``:odd`` to be appended to numeric ranges
|
|
|
|
|
for specification of the even or odd pages from among the pages
|
|
|
|
|
specified in the range.
|
|
|
|
|
|
|
|
|
|
- Fix shell wildcard expansion behavior (``*`` and ``?``) of the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf.exe` as built my MSVC.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
9.0.2: October 12, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fix
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix the name of the temporary file used by
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--replace-input` so that it doesn't
|
2021-12-11 23:49:31 +00:00
|
|
|
|
require path splitting and works with paths include
|
|
|
|
|
directories.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
9.0.1: September 20, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes/Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix some build and test issues on big-endian systems and
|
|
|
|
|
compilers with characters that are unsigned by default. The
|
|
|
|
|
problems were in build and test only. There were no actual bugs
|
|
|
|
|
in the qpdf library itself relating to endianness or unsigned
|
|
|
|
|
characters.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When a dictionary has a duplicated key, report this with a
|
|
|
|
|
warning. The behavior of the library in this case is unchanged,
|
|
|
|
|
but the error condition is no longer silently ignored.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When a form field's display rectangle is erroneously specified
|
|
|
|
|
with inverted coordinates, detect and correct this situation.
|
|
|
|
|
This avoids some form fields from being flipped when flattening
|
|
|
|
|
annotations on files with this condition.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
9.0.0: August 31, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Incompatible API (source-level) Changes (minor)
|
|
|
|
|
|
|
|
|
|
- The method ``QUtil::strcasecmp`` has been renamed to
|
|
|
|
|
``QUtil::str_compare_nocase``. This incompatible change is
|
|
|
|
|
necessary to enable qpdf to build on platforms that define
|
|
|
|
|
``strcasecmp`` as a macro.
|
|
|
|
|
|
|
|
|
|
- The ``QPDF::copyForeignObject`` method had an overloaded
|
|
|
|
|
version that took a boolean parameter that was not used. If you
|
|
|
|
|
were using this version, just omit the extra parameter.
|
|
|
|
|
|
|
|
|
|
- There was a version ``QPDFTokenizer::expectInlineImage`` that
|
|
|
|
|
took no arguments. This version has been removed since it
|
|
|
|
|
caused the tokenizer to return incorrect inline images. A new
|
|
|
|
|
version was added some time ago that produces correct output.
|
|
|
|
|
This is a very low level method that doesn't make sense to call
|
|
|
|
|
outside of qpdf's lexical engine. There are higher level
|
|
|
|
|
methods for tokenizing content streams.
|
|
|
|
|
|
|
|
|
|
- Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
|
|
|
|
|
``QPDFOutlineObjectHelper::getKids`` to return a
|
|
|
|
|
``std::vector`` instead of a ``std::list`` of
|
|
|
|
|
``QPDFOutlineObjectHelper`` objects.
|
|
|
|
|
|
|
|
|
|
- Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
|
|
|
|
|
function would allow creation of name tokens whose value would
|
|
|
|
|
change when unparsed, which is never the correct behavior.
|
|
|
|
|
|
|
|
|
|
- CLI Enhancements
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--replace-input` option may be given
|
2021-12-11 23:49:31 +00:00
|
|
|
|
in place of an output file name. This causes qpdf to overwrite
|
|
|
|
|
the input file with the output. See the description of
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:samp:`--replace-input` in :ref:`ref.basic-options` for more details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--recompress-flate` instructs
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` to recompress streams that are
|
2021-12-11 23:49:31 +00:00
|
|
|
|
already compressed with ``/FlateDecode``. Useful with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compression-level`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- The
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--compression-level={level}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
sets the zlib compression level used for any streams compressed
|
|
|
|
|
by ``/FlateDecode``. Most effective when combined with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--recompress-flate`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- A new namespace ``QIntC``, provided by
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`qpdf/QIntC.hh`, provides safe
|
2021-12-11 23:49:31 +00:00
|
|
|
|
conversion methods between different integer types. These
|
|
|
|
|
conversion methods do range checking to ensure that the cast
|
|
|
|
|
can be performed with no loss of information. Every use of
|
|
|
|
|
``static_cast`` in the library was inspected to see if it could
|
2021-12-12 00:31:19 +00:00
|
|
|
|
use one of these safe converters instead. See :ref:`ref.casting` for additional details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Method ``QPDF::anyWarnings`` tells whether there have been any
|
|
|
|
|
warnings without clearing the list of warnings.
|
|
|
|
|
|
|
|
|
|
- Method ``QPDF::closeInputSource`` closes or otherwise releases
|
|
|
|
|
the input source. This enables the input file to be deleted or
|
|
|
|
|
renamed.
|
|
|
|
|
|
|
|
|
|
- New methods have been added to ``QUtil`` for converting back
|
|
|
|
|
and forth between strings and unsigned integers:
|
|
|
|
|
``uint_to_string``, ``uint_to_string_base``,
|
|
|
|
|
``string_to_uint``, and ``string_to_ull``.
|
|
|
|
|
|
|
|
|
|
- New methods have been added to ``QPDFObjectHandle`` that return
|
|
|
|
|
the value of ``Integer`` objects as ``int`` or ``unsigned int``
|
|
|
|
|
with range checking and sensible fallback values, and a new
|
|
|
|
|
method was added to return an unsigned value. This makes it
|
|
|
|
|
easier to write code that is safe from unintentional data loss.
|
|
|
|
|
Functions: ``getUIntValue``, ``getIntValueAsInt``,
|
|
|
|
|
``getUIntValueAsUInt``.
|
|
|
|
|
|
|
|
|
|
- When parsing content streams with
|
|
|
|
|
``QPDFObjectHandle::ParserCallbacks``, in place of the method
|
|
|
|
|
``handleObject(QPDFObjectHandle)``, the developer may override
|
|
|
|
|
``handleObject(QPDFObjectHandle, size_t offset, size_t
|
|
|
|
|
length)``. If this method is defined, it will
|
|
|
|
|
be invoked with the object along with its offset and length
|
|
|
|
|
within the overall contents being parsed. Intervening spaces
|
|
|
|
|
and comments are not included in offset and length.
|
|
|
|
|
Additionally, a new method ``contentSize(size_t)`` may be
|
|
|
|
|
implemented. If present, it will be called prior to the first
|
|
|
|
|
call to ``handleObject`` with the total size in bytes of the
|
|
|
|
|
combined contents.
|
|
|
|
|
|
|
|
|
|
- New methods ``QPDF::userPasswordMatched`` and
|
|
|
|
|
``QPDF::ownerPasswordMatched`` have been added to enable a
|
|
|
|
|
caller to determine whether the supplied password was the user
|
|
|
|
|
password, the owner password, or both. This information is also
|
2021-12-12 00:01:40 +00:00
|
|
|
|
displayed by :command:`qpdf --show-encryption`
|
|
|
|
|
and :command:`qpdf --check`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Static method ``Pl_Flate::setCompressionLevel`` can be called
|
|
|
|
|
to set the zlib compression level globally used by all
|
|
|
|
|
instances of Pl_Flate in deflate mode.
|
|
|
|
|
|
|
|
|
|
- The method ``QPDFWriter::setRecompressFlate`` can be called to
|
|
|
|
|
tell ``QPDFWriter`` to uncompress and recompress streams
|
|
|
|
|
already compressed with ``/FlateDecode``.
|
|
|
|
|
|
|
|
|
|
- The underlying implementation of QPDF arrays has been enhanced
|
|
|
|
|
to be much more memory efficient when dealing with arrays with
|
|
|
|
|
lots of nulls. This enables qpdf to use drastically less memory
|
|
|
|
|
for certain types of files.
|
|
|
|
|
|
|
|
|
|
- When traversing the pages tree, if nodes are encountered with
|
|
|
|
|
invalid types, the types are fixed, and a warning is issued.
|
|
|
|
|
|
|
|
|
|
- A new helper method ``QUtil::read_file_into_memory`` was added.
|
|
|
|
|
|
|
|
|
|
- All conditions previously reported by
|
|
|
|
|
``QPDF::checkLinearization()`` as errors are now presented as
|
|
|
|
|
warnings.
|
|
|
|
|
|
|
|
|
|
- Name tokens containing the ``#`` character not preceded by two
|
|
|
|
|
hexadecimal digits, which is invalid in PDF 1.2 and above, are
|
|
|
|
|
properly handled by the library: a warning is generated, and
|
|
|
|
|
the name token is properly preserved, even if invalid, in the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
output. See :file:`ChangeLog` for a more
|
2021-12-11 23:49:31 +00:00
|
|
|
|
complete description of this change.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- A small handful of memory issues, assertion failures, and
|
|
|
|
|
unhandled exceptions that could occur on badly mangled input
|
|
|
|
|
files have been fixed. Most of these problems were found by
|
|
|
|
|
Google's OSS-Fuzz project.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- When :command:`qpdf --check` or
|
|
|
|
|
:command:`qpdf --check-linearization` encounters
|
2021-12-11 23:49:31 +00:00
|
|
|
|
a file with linearization warnings but not errors, it now
|
|
|
|
|
properly exits with exit code 3 instead of 2.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--completion-bash` and
|
|
|
|
|
:samp:`--completion-zsh` options now work
|
2021-12-11 23:49:31 +00:00
|
|
|
|
properly when qpdf is invoked as an AppImage.
|
|
|
|
|
|
|
|
|
|
- Calling ``QPDFWriter::set*EncryptionParameters`` on a
|
|
|
|
|
``QPDFWriter`` object whose output filename has not yet been
|
|
|
|
|
set no longer produces a segmentation fault.
|
|
|
|
|
|
|
|
|
|
- When reading encrypted files, follow the spec more closely
|
|
|
|
|
regarding encryption key length. This allows qpdf to open
|
|
|
|
|
encrypted files in most cases when they have invalid or missing
|
|
|
|
|
/Length keys in the encryption dictionary.
|
|
|
|
|
|
|
|
|
|
- Build Changes
|
|
|
|
|
|
|
|
|
|
- On platforms that support it, qpdf now builds with
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`-fvisibility=hidden`. If you build qpdf
|
2021-12-11 23:49:31 +00:00
|
|
|
|
with your own build system, this is now safe to use. This
|
|
|
|
|
prevents methods that are not part of the public API from being
|
|
|
|
|
exported by the shared library, and makes qpdf's ELF shared
|
|
|
|
|
libraries (used on Linux, MacOS, and most other UNIX flavors)
|
|
|
|
|
behave more like the Windows DLL. Since the DLL already behaves
|
|
|
|
|
in much this way, it is unlikely that there are any methods
|
|
|
|
|
that were accidentally not exported. However, with ELF shared
|
|
|
|
|
libraries, typeinfo for some classes has to be explicitly
|
|
|
|
|
exported. If there are problems in dynamically linked code
|
|
|
|
|
catching exceptions or subclassing, this could be the reason.
|
|
|
|
|
If you see this, please report a bug at
|
|
|
|
|
https://github.com/qpdf/qpdf/issues/.
|
|
|
|
|
|
|
|
|
|
- QPDF is now compiled with integer conversion and sign
|
|
|
|
|
conversion warnings enabled. Numerous changes were made to the
|
|
|
|
|
library to make this safe.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- QPDF's :command:`make install` target explicitly
|
2021-12-11 23:49:31 +00:00
|
|
|
|
specifies the mode to use when installing files instead of
|
|
|
|
|
relying the user's umask. It was previously doing this for some
|
|
|
|
|
files but not others.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- If :command:`pkg-config` is available, use it to
|
2021-12-12 00:02:42 +00:00
|
|
|
|
locate :file:`libjpeg` and
|
|
|
|
|
:file:`zlib` dependencies, falling back on
|
2021-12-11 23:49:31 +00:00
|
|
|
|
old behavior if unsuccessful.
|
|
|
|
|
|
|
|
|
|
- Other Notes
|
|
|
|
|
|
|
|
|
|
- QPDF has been fully integrated into `Google's OSS-Fuzz
|
|
|
|
|
project <https://github.com/google/oss-fuzz>`__. This project
|
|
|
|
|
exercises code with randomly mutated inputs and is great for
|
|
|
|
|
discovering hidden security crashes and security issues.
|
|
|
|
|
Several bugs found by oss-fuzz have already been fixed in qpdf.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.4.2: May 18, 2019
|
|
|
|
|
This release has just one change: correction of a buffer overrun in
|
|
|
|
|
the Windows code used to open files. Windows users should take this
|
|
|
|
|
update. There are no code changes that affect non-Windows releases.
|
|
|
|
|
|
|
|
|
|
8.4.1: April 27, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Enhancements
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- When :command:`qpdf --version` is run, it will
|
2021-12-11 23:49:31 +00:00
|
|
|
|
detect if the qpdf CLI was built with a different version of
|
|
|
|
|
qpdf than the library, which may indicate a problem with the
|
|
|
|
|
installation.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- New option :samp:`--remove-page-labels` will
|
2021-12-11 23:49:31 +00:00
|
|
|
|
remove page labels before generating output. This used to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
happen if you ran :command:`qpdf --empty --pages ..
|
|
|
|
|
--`, but the behavior changed in qpdf 8.3.0. This
|
2021-12-11 23:49:31 +00:00
|
|
|
|
option enables people who were relying on the old behavior to
|
|
|
|
|
get it again.
|
|
|
|
|
|
|
|
|
|
- New option
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--keep-files-open-threshold={count}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
can be used to override number of files that qpdf will use to
|
|
|
|
|
trigger the behavior of not keeping all files open when merging
|
|
|
|
|
files. This may be necessary if your system allows fewer than
|
|
|
|
|
the default value of 200 files to be open at the same time.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- Handle Unicode characters in filenames on Windows. The changes
|
|
|
|
|
to support Unicode on the CLI in Windows broke Unicode
|
|
|
|
|
filenames for Windows.
|
|
|
|
|
|
|
|
|
|
- Slightly tighten logic that determines whether an object is a
|
|
|
|
|
page. This should resolve problems in some rare files where
|
|
|
|
|
some non-page objects were passing qpdf's test for whether
|
|
|
|
|
something was a page, thus causing them to be erroneously lost
|
|
|
|
|
during page splitting operations.
|
|
|
|
|
|
|
|
|
|
- Revert change that included preservation of outlines
|
2021-12-12 00:11:56 +00:00
|
|
|
|
(bookmarks) in :samp:`--split-pages`. The way
|
2021-12-11 23:49:31 +00:00
|
|
|
|
it was implemented in 8.3.0 and 8.4.0 caused a very significant
|
|
|
|
|
degradation of performance for splitting certain files. A
|
|
|
|
|
future release of qpdf may re-introduce the behavior in a more
|
|
|
|
|
performant and also more correct fashion.
|
|
|
|
|
|
|
|
|
|
- In JSON mode, add missing leading 0 to decimal values between
|
|
|
|
|
-1 and 1 even if not present in the input. The JSON
|
|
|
|
|
specification requires the leading 0. The PDF specification
|
|
|
|
|
does not.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.4.0: February 1, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Command-line Enhancements
|
|
|
|
|
|
|
|
|
|
- *Non-compatible CLI change:* The qpdf command-line tool
|
|
|
|
|
interprets passwords given at the command-line differently from
|
|
|
|
|
previous releases when the passwords contain non-ASCII
|
|
|
|
|
characters. In some cases, the behavior differs from previous
|
|
|
|
|
releases. For a discussion of the current behavior, please see
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.unicode-passwords`. The
|
2021-12-11 23:49:31 +00:00
|
|
|
|
incompatibilities are as follows:
|
|
|
|
|
|
|
|
|
|
- On Windows, qpdf now receives all command-line options as
|
|
|
|
|
Unicode strings if it can figure out the appropriate
|
|
|
|
|
compile/link options. This is enabled at least for MSVC and
|
|
|
|
|
mingw builds. That means that if non-ASCII strings are
|
|
|
|
|
passed to the qpdf CLI in Windows, qpdf will now correctly
|
|
|
|
|
receive them. In the past, they would have either been
|
|
|
|
|
encoded as Windows code page 1252 (also known as "Windows
|
|
|
|
|
ANSI" or as something unintelligible. In almost all cases,
|
|
|
|
|
qpdf is able to properly interpret Unicode arguments now,
|
|
|
|
|
whereas in the past, it would almost never interpret them
|
|
|
|
|
properly. The result is that non-ASCII passwords given to
|
|
|
|
|
the qpdf CLI on Windows now have a much greater chance of
|
|
|
|
|
creating PDF files that can be opened by a variety of
|
|
|
|
|
readers. In the past, usually files encrypted from the
|
|
|
|
|
Windows CLI using non-ASCII passwords would not be readable
|
|
|
|
|
by most viewers. Note that the current version of qpdf is
|
|
|
|
|
able to decrypt files that it previously created using the
|
|
|
|
|
previously supplied password.
|
|
|
|
|
|
|
|
|
|
- The PDF specification requires passwords to be encoded as
|
|
|
|
|
UTF-8 for 256-bit encryption and with PDF Doc encoding for
|
|
|
|
|
40-bit or 128-bit encryption. Older versions of qpdf left it
|
|
|
|
|
up to the user to provide passwords with the correct
|
|
|
|
|
encoding. The qpdf CLI now detects when a password is given
|
|
|
|
|
with UTF-8 encoding and automatically transcodes it to what
|
|
|
|
|
the PDF spec requires. While this is almost always the
|
|
|
|
|
correct behavior, it is possible to override the behavior if
|
|
|
|
|
there is some reason to do so. This is discussed in more
|
2021-12-12 00:31:19 +00:00
|
|
|
|
depth in :ref:`ref.unicode-passwords`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- New options
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--externalize-inline-images`,
|
|
|
|
|
:samp:`--ii-min-bytes`, and
|
|
|
|
|
:samp:`--keep-inline-images` control qpdf's
|
2021-12-11 23:49:31 +00:00
|
|
|
|
handling of inline images and possible conversion of them to
|
|
|
|
|
regular images. By default,
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--optimize-images` now also applies to
|
2021-12-12 00:31:19 +00:00
|
|
|
|
inline images. These options are discussed in :ref:`ref.advanced-transformation`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Add options :samp:`--overlay` and
|
|
|
|
|
:samp:`--underlay` for overlaying or
|
2021-12-11 23:49:31 +00:00
|
|
|
|
underlaying pages of other files onto output pages. See
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.overlay-underlay` for
|
2021-12-11 23:49:31 +00:00
|
|
|
|
details.
|
|
|
|
|
|
|
|
|
|
- When opening an encrypted file with a password, if the
|
|
|
|
|
specified password doesn't work and the password contains any
|
|
|
|
|
non-ASCII characters, qpdf will try a number of alternative
|
|
|
|
|
passwords to try to compensate for possible character encoding
|
|
|
|
|
errors. This behavior can be suppressed with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--suppress-password-recovery` option.
|
2021-12-12 00:31:19 +00:00
|
|
|
|
See :ref:`ref.unicode-passwords` for a full
|
2021-12-11 23:49:31 +00:00
|
|
|
|
discussion.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Add the :samp:`--password-mode` option to
|
2021-12-11 23:49:31 +00:00
|
|
|
|
fine-tune how qpdf interprets password arguments, especially
|
2021-12-12 00:31:19 +00:00
|
|
|
|
when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- In the :samp:`--pages` option, it is now
|
2021-12-11 23:49:31 +00:00
|
|
|
|
possible to copy the same page more than once from the same
|
|
|
|
|
file without using the previous workaround of specifying two
|
|
|
|
|
different paths to the same file.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- In the :samp:`--pages` option, allow use of "."
|
2021-12-11 23:49:31 +00:00
|
|
|
|
as a shortcut for the primary input file. That way, you can do
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf in.pdf --pages . 1-2 -- out.pdf`
|
2021-12-12 00:02:42 +00:00
|
|
|
|
instead of having to repeat :file:`in.pdf`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
in the command.
|
|
|
|
|
|
|
|
|
|
- When encrypting with 128-bit and 256-bit encryption, new
|
2021-12-12 00:11:56 +00:00
|
|
|
|
encryption options :samp:`--assemble`,
|
|
|
|
|
:samp:`--annotate`,
|
|
|
|
|
:samp:`--form`, and
|
|
|
|
|
:samp:`--modify-other` allow more fine-grained
|
2021-12-11 23:49:31 +00:00
|
|
|
|
granularity in configuring options. Before, the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--modify` option only configured certain
|
2021-12-11 23:49:31 +00:00
|
|
|
|
predefined groups of permissions.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes and Enhancements
|
|
|
|
|
|
|
|
|
|
- *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
|
|
|
|
|
8.3.0 had a bug that could cause page splitting and merging
|
|
|
|
|
operations to drop some font or image resources if the PDF
|
|
|
|
|
file's internal structure shared these resource lists across
|
|
|
|
|
pages and if some but not all of the pages in the output did
|
|
|
|
|
not reference all the fonts and images. Using the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--preserve-unreferenced-resources`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
option would work around the incorrect behavior. This bug was
|
|
|
|
|
the result of a typo in the code and a deficiency in the test
|
|
|
|
|
suite. The case that triggered the error was known, just not
|
|
|
|
|
handled properly. This case is now exercised in qpdf's test
|
|
|
|
|
suite and properly handled.
|
|
|
|
|
|
|
|
|
|
- When optimizing images, detect and refuse to optimize images
|
|
|
|
|
that can't be converted to JPEG because of bit depth or color
|
|
|
|
|
space.
|
|
|
|
|
|
|
|
|
|
- Linearization and page manipulation APIs now detect and recover
|
|
|
|
|
from files that have duplicate Page objects in the pages tree.
|
|
|
|
|
|
|
|
|
|
- Using older option
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--stream-data=compress` with object
|
2021-12-11 23:49:31 +00:00
|
|
|
|
streams, object streams and xref streams were not compressed.
|
|
|
|
|
|
|
|
|
|
- When the tokenizer returns inline image tokens, delimiters
|
|
|
|
|
following ``ID`` and ``EI`` operators are no longer excluded.
|
|
|
|
|
This makes it possible to reliably extract the actual image
|
|
|
|
|
data.
|
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
|
|
|
|
|
convert inline images to regular images.
|
|
|
|
|
|
|
|
|
|
- Add method ``QUtil::possible_repaired_encodings()`` to generate
|
|
|
|
|
a list of strings that represent other ways the given string
|
|
|
|
|
could have been encoded. This is the method the QPDF CLI uses
|
|
|
|
|
to generate the strings it tries when recovering incorrectly
|
|
|
|
|
encoded Unicode passwords.
|
|
|
|
|
|
|
|
|
|
- Add new versions of
|
|
|
|
|
``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
|
|
|
|
|
more granular setting of permissions bits. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFWriter.hh` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Add new versions of the transcoders from UTF-8 to single-byte
|
|
|
|
|
coding systems in ``QUtil`` that report success or failure
|
|
|
|
|
rather than just substituting a specified unknown character.
|
|
|
|
|
|
|
|
|
|
- Add method ``QUtil::analyze_encoding()`` to determine whether a
|
|
|
|
|
string has high-bit characters and is appears to be UTF-16 or
|
|
|
|
|
valid UTF-8 encoding.
|
|
|
|
|
|
|
|
|
|
- Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
|
|
|
|
|
copy a new page that is a "shallow copy" of a page. The
|
|
|
|
|
resulting object is an indirect object ready to be passed to
|
|
|
|
|
``QPDFPageDocumentHelper::addPage()`` for either the original
|
|
|
|
|
``QPDF`` object or a different one. This is what the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` command-line tool uses to copy
|
2021-12-11 23:49:31 +00:00
|
|
|
|
the same page multiple times from the same file during
|
|
|
|
|
splitting and merging operations.
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDF::getUniqueId()``, which returns a unique
|
|
|
|
|
identifier for the given QPDF object. The identifier will be
|
|
|
|
|
unique across the life of the application. The returned value
|
|
|
|
|
can be safely used as a map key.
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDF::setImmediateCopyFrom``. This further
|
|
|
|
|
enhances qpdf's ability to allow a ``QPDF`` object from which
|
|
|
|
|
objects are being copied to go out of scope before the
|
|
|
|
|
destination object is written. If you call this method on a
|
|
|
|
|
``QPDF`` instances, objects copied *from* this instance will be
|
|
|
|
|
copied immediately instead of lazily. This option uses more
|
|
|
|
|
memory but allows the source object to go out of scope before
|
|
|
|
|
the destination object is written in all cases. See comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF.hh` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Add method ``QPDFPageObjectHelper::getAttribute`` for
|
|
|
|
|
retrieving an attribute from the page dictionary taking
|
|
|
|
|
inheritance into consideration, and optionally making a copy if
|
|
|
|
|
your intention is to modify the attribute.
|
|
|
|
|
|
|
|
|
|
- Fix long-standing limitation of
|
|
|
|
|
``QPDFPageObjectHelper::getPageImages`` so that it now properly
|
|
|
|
|
reports images from inherited resources dictionaries,
|
|
|
|
|
eliminating the need to call
|
|
|
|
|
``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
|
|
|
|
|
this case.
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDFObjectHandle::getUniqueResourceName`` for
|
|
|
|
|
finding an unused name in a resource dictionary.
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
|
|
|
|
|
generating a form XObject equivalent to a page. The resulting
|
|
|
|
|
object can be used in the same file or copied to another file
|
|
|
|
|
with ``copyForeignObject``. This can be useful for implementing
|
|
|
|
|
underlay, overlay, n-up, thumbnails, or any other functionality
|
|
|
|
|
requiring replication of pages in other contexts.
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDFPageObjectHelper::placeFormXObject`` for
|
|
|
|
|
generating content stream text that places a given form XObject
|
|
|
|
|
on a page, centered and fit within a specified rectangle. This
|
|
|
|
|
method takes care of computing the proper transformation matrix
|
|
|
|
|
and may optionally compensate for rotation or scaling of the
|
|
|
|
|
destination page.
|
|
|
|
|
|
|
|
|
|
- Build Improvements
|
|
|
|
|
|
|
|
|
|
- Add new configure option
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--enable-avoid-windows-handle`, which
|
2021-12-11 23:49:31 +00:00
|
|
|
|
causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
|
|
|
|
|
defined. When defined, qpdf will avoid referencing the Windows
|
|
|
|
|
``HANDLE`` type, which is disallowed with certain versions of
|
|
|
|
|
the Windows SDK.
|
|
|
|
|
|
|
|
|
|
- For Windows builds, attempt to determine what options, if any,
|
|
|
|
|
have to be passed to the compiler and linker to enable use of
|
|
|
|
|
``wmain``. This causes the preprocessor symbol
|
|
|
|
|
``WINDOWS_WMAIN`` to be defined. If you do your own builds with
|
|
|
|
|
other compilers, you can define this symbol to cause ``wmain``
|
|
|
|
|
to be used. This is needed to allow the Windows
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` command to receive Unicode
|
2021-12-11 23:49:31 +00:00
|
|
|
|
command-line options.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.3.0: January 7, 2019
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Command-line Enhancements
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Shell completion: you can now use eval :command:`$(qpdf
|
|
|
|
|
--completion-bash)` and eval :command:`$(qpdf
|
|
|
|
|
--completion-zsh)` to enable shell completion for
|
2021-12-11 23:49:31 +00:00
|
|
|
|
bash and zsh.
|
|
|
|
|
|
|
|
|
|
- Page numbers (also known as page labels) are now preserved when
|
|
|
|
|
merging and splitting files with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages` and
|
|
|
|
|
:samp:`--split-pages` options.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Bookmarks are partially preserved when splitting pages with the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--split-pages` option. Specifically, the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
outlines dictionary and some supporting metadata are copied
|
|
|
|
|
into the split files. The result is that all bookmarks from the
|
|
|
|
|
original file appear, those that point to pages that are
|
|
|
|
|
preserved work, and those that point to pages that are not
|
|
|
|
|
preserved don't do anything. This is an interim step toward
|
|
|
|
|
proper support for bookmarks in splitting and merging
|
|
|
|
|
operations.
|
|
|
|
|
|
|
|
|
|
- Page collation: add new option
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--collate`. When specified, the
|
|
|
|
|
semantics of :samp:`--pages` change from
|
2021-12-12 00:31:19 +00:00
|
|
|
|
concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Generation of information in JSON format, primarily to
|
|
|
|
|
facilitate use of qpdf from languages other than C++. Add new
|
2021-12-12 00:11:56 +00:00
|
|
|
|
options :samp:`--json`,
|
|
|
|
|
:samp:`--json-key`, and
|
|
|
|
|
:samp:`--json-object` to generate a JSON
|
2021-12-12 00:01:40 +00:00
|
|
|
|
representation of the PDF file. Run :command:`qpdf
|
|
|
|
|
--json-help` to get a description of the JSON
|
2021-12-12 00:31:19 +00:00
|
|
|
|
format. For more information, see :ref:`ref.json`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--generate-appearances` flag will
|
2021-12-11 23:49:31 +00:00
|
|
|
|
cause qpdf to generate appearances for form fields if the PDF
|
|
|
|
|
file indicates that form field appearances are out of date.
|
|
|
|
|
This can happen when PDF forms are filled in by a program that
|
|
|
|
|
doesn't know how to regenerate the appearances of the filled-in
|
|
|
|
|
fields.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--flatten-annotations` flag can be
|
2021-12-11 23:49:31 +00:00
|
|
|
|
used to *flatten* annotations, including form fields.
|
|
|
|
|
Ordinarily, annotations are drawn separately from the page.
|
|
|
|
|
Flattening annotations is the process of combining their
|
|
|
|
|
appearances into the page's contents. You might want to do this
|
|
|
|
|
if you are going to rotate or combine pages using a tool that
|
|
|
|
|
doesn't understand about annotations. You may also want to use
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--generate-appearances` when using this
|
2021-12-11 23:49:31 +00:00
|
|
|
|
flag since annotations for outdated form fields are not
|
|
|
|
|
flattened as that would cause loss of information.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--optimize-images` flag tells qpdf
|
2021-12-11 23:49:31 +00:00
|
|
|
|
to recompresses every image using DCT (JPEG) compression as
|
|
|
|
|
long as the image is not already compressed with lossy
|
|
|
|
|
compression and recompressing the image reduces its size. The
|
2021-12-12 00:11:56 +00:00
|
|
|
|
additional options :samp:`--oi-min-width`,
|
|
|
|
|
:samp:`--oi-min-height`, and
|
|
|
|
|
:samp:`--oi-min-area` prevent recompression of
|
2021-12-11 23:49:31 +00:00
|
|
|
|
images whose width, height, or pixel area (width × height) are
|
|
|
|
|
below a specified threshold.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--show-object` option can now be
|
|
|
|
|
given as :samp:`--show-object=trailer` to show
|
2021-12-11 23:49:31 +00:00
|
|
|
|
the trailer dictionary.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes and Enhancements
|
|
|
|
|
|
|
|
|
|
- QPDF now automatically detects and recovers from dangling
|
|
|
|
|
references. If a PDF file contained an indirect reference to a
|
|
|
|
|
non-existent object, which is valid, when adding a new object
|
|
|
|
|
to the file, it was possible for the new object to take the
|
|
|
|
|
object ID of the dangling reference, thereby causing the
|
|
|
|
|
dangling reference to point to the new object. This case is now
|
|
|
|
|
prevented.
|
|
|
|
|
|
|
|
|
|
- Fixes to form field setting code: strings are always written in
|
|
|
|
|
UTF-16 format, and checkboxes and radio buttons are handled
|
|
|
|
|
properly with respect to synchronization of values and
|
|
|
|
|
appearance states.
|
|
|
|
|
|
|
|
|
|
- The ``QPDF::checkLinearization()`` no longer causes the program
|
|
|
|
|
to crash when it detects problems with linearization data.
|
|
|
|
|
Instead, it issues a normal warning or error.
|
|
|
|
|
|
|
|
|
|
- Ordinarily qpdf treats an argument of the form
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`@file` to mean that command-line options
|
2021-12-12 00:02:42 +00:00
|
|
|
|
should be read from :file:`file`. Now, if
|
|
|
|
|
:file:`file` does not exist but
|
|
|
|
|
:file:`@file` does, qpdf will treat
|
|
|
|
|
:file:`@file` as a regular option. This
|
2021-12-11 23:49:31 +00:00
|
|
|
|
makes it possible to work more easily with PDF files whose
|
|
|
|
|
names happen to start with the ``@`` character.
|
|
|
|
|
|
|
|
|
|
- Library Enhancements
|
|
|
|
|
|
|
|
|
|
- Remove the restriction in most cases that the source QPDF
|
|
|
|
|
object used in a ``QPDF::copyForeignObject`` call has to stick
|
|
|
|
|
around until the destination QPDF is written. The exceptional
|
|
|
|
|
case is when the source stream gets is data using a
|
|
|
|
|
QPDFObjectHandle::StreamDataProvider. For a more in-depth
|
|
|
|
|
discussion, see comments around ``copyForeignObject`` in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF.hh`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Add new method ``QPDFWriter::getFinalVersion()``, which returns
|
|
|
|
|
the PDF version that will ultimately be written to the final
|
2021-12-12 00:02:42 +00:00
|
|
|
|
file. See comments in :file:`QPDFWriter.hh`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for some restrictions on its use.
|
|
|
|
|
|
|
|
|
|
- Add several methods for transcoding strings to some of the
|
|
|
|
|
character sets used in PDF files: ``QUtil::utf8_to_ascii``,
|
|
|
|
|
``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
|
|
|
|
|
``QUtil::utf8_to_utf16``. For the single-byte encodings that
|
|
|
|
|
support only a limited character sets, these methods replace
|
|
|
|
|
unsupported characters with a specified substitute.
|
|
|
|
|
|
|
|
|
|
- Add new methods to ``QPDFAnnotationObjectHelper`` and
|
|
|
|
|
``QPDFFormFieldObjectHelper`` for querying flags and
|
|
|
|
|
interpretation of different field types. Define constants in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`qpdf/Constants.h` to help with
|
2021-12-11 23:49:31 +00:00
|
|
|
|
interpretation of flag values.
|
|
|
|
|
|
|
|
|
|
- Add new methods
|
|
|
|
|
``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
|
|
|
|
|
``QPDFFormFieldObjectHelper::generateAppearance`` for
|
|
|
|
|
generating appearance streams. See discussion in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFFormFieldObjectHelper.hh` for
|
2021-12-11 23:49:31 +00:00
|
|
|
|
limitations.
|
|
|
|
|
|
|
|
|
|
- Add two new helper functions for dealing with resource
|
|
|
|
|
dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
|
|
|
|
|
a list of all second-level keys, which correspond to the names
|
|
|
|
|
of resources, and ``QPDFObjectHandle::mergeResources()`` merges
|
|
|
|
|
two resources dictionaries as long as they have non-conflicting
|
|
|
|
|
keys. These methods are useful for certain types of objects
|
|
|
|
|
that resolve resources from multiple places, such as form
|
|
|
|
|
fields.
|
|
|
|
|
|
|
|
|
|
- Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
|
|
|
|
|
and
|
|
|
|
|
``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
|
|
|
|
|
for handling low-level details of annotation flattening.
|
|
|
|
|
|
|
|
|
|
- Add new helper classes: ``QPDFOutlineDocumentHelper``,
|
|
|
|
|
``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
|
|
|
|
|
``QPDFNameTreeObjectHelper``, and
|
|
|
|
|
``QPDFNumberTreeObjectHelper``.
|
|
|
|
|
|
|
|
|
|
- Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
|
|
|
|
|
representation of the object. Call ``serialize()`` on the
|
|
|
|
|
result to convert it to a string.
|
|
|
|
|
|
|
|
|
|
- Add a simple JSON serializer. This is not a complete or
|
|
|
|
|
general-purpose JSON library. It allows assembly and
|
|
|
|
|
serialization of JSON structures with some restrictions, which
|
|
|
|
|
are described in the header file. This is the serializer used
|
|
|
|
|
by qpdf's new JSON representation.
|
|
|
|
|
|
|
|
|
|
- Add new ``QPDFObjectHandle::Matrix`` class along with a few
|
|
|
|
|
convenience methods for dealing with six-element numerical
|
|
|
|
|
arrays as matrices.
|
|
|
|
|
|
|
|
|
|
- Add new method ``QPDFObjectHandle::wrapInArray``, which returns
|
|
|
|
|
the object itself if it is an array, or an array containing the
|
|
|
|
|
object otherwise. This is a common construct in PDF. This
|
|
|
|
|
method prevents you from having to explicitly test whether
|
|
|
|
|
something is a single element or an array.
|
|
|
|
|
|
|
|
|
|
- Build Improvements
|
|
|
|
|
|
|
|
|
|
- It is no longer necessary to run
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`autogen.sh` to build from a pristine
|
2021-12-11 23:49:31 +00:00
|
|
|
|
checkout. Automatically generated files are now committed so
|
|
|
|
|
that it is possible to build on platforms without autoconf
|
|
|
|
|
directly from a clean checkout of the repository. The
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`configure` script detects if the files
|
2021-12-11 23:49:31 +00:00
|
|
|
|
are out of date when it also determines that the tools are
|
|
|
|
|
present to regenerate them.
|
|
|
|
|
|
|
|
|
|
- Pull requests and the master branch are now built automatically
|
|
|
|
|
in `Azure
|
|
|
|
|
Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
|
|
|
|
|
free for open source projects. The build includes Linux, mac,
|
|
|
|
|
Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
|
|
|
|
|
build. Official qpdf releases are now built with Azure
|
|
|
|
|
Pipelines.
|
|
|
|
|
|
|
|
|
|
- Notes for Packagers
|
|
|
|
|
|
|
|
|
|
- A new section has been added to the documentation with notes
|
2021-12-12 00:31:19 +00:00
|
|
|
|
for packagers. Please see :ref:`ref.packaging`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- The qpdf detects out-of-date automatically generated files. If
|
|
|
|
|
your packaging system automatically refreshes libtool or
|
|
|
|
|
autoconf files, it could cause this check to fail. To avoid
|
|
|
|
|
this problem, pass
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--disable-check-autofiles` to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`configure`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- If you would like to have qpdf completion enabled
|
|
|
|
|
automatically, you can install completion files in the
|
|
|
|
|
distribution's default location. You can find sample completion
|
2021-12-12 00:02:42 +00:00
|
|
|
|
files to install in the :file:`completions`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
directory.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.2.1: August 18, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Command-line Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--keep-files-open={[yn]}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
to override default determination of whether to keep files open
|
|
|
|
|
when merging. Please see the discussion of
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.2.0: August 16, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Command-line Enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Add :samp:`--no-warn` option to suppress
|
2021-12-11 23:49:31 +00:00
|
|
|
|
issuing warning messages. If there are any conditions that
|
|
|
|
|
would have caused warnings to be issued, the exit status is
|
|
|
|
|
still 3.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug Fixes and Optimizations
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Performance fix: optimize page merging operation to avoid
|
|
|
|
|
unnecessary open/close calls on files being merged. This solves
|
|
|
|
|
a dramatic slow-down that was observed when merging certain
|
|
|
|
|
types of files.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Optimize how memory was used for the TIFF predictor,
|
|
|
|
|
drastically improving performance and memory usage for files
|
|
|
|
|
containing high-resolution images compressed with Flate using
|
|
|
|
|
the TIFF predictor.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: end of line characters were not properly handled
|
|
|
|
|
inside strings in some cases.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Bug fix: using :samp:`--progress` on very small
|
2021-12-11 23:49:31 +00:00
|
|
|
|
files could cause an infinite loop.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- API enhancements
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add new class ``QPDFSystemError``, derived from
|
|
|
|
|
``std::runtime_error``, which is now thrown by
|
|
|
|
|
``QUtil::throw_system_error``. This enables the triggering
|
|
|
|
|
``errno`` value to be retrieved.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add ``ClosedFileInputSource::stayOpen`` method, enabling a
|
|
|
|
|
``ClosedFileInputSource`` to stay open during manually
|
|
|
|
|
indicated periods of high activity, thus reducing the overhead
|
|
|
|
|
of frequent open/close operations.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Build Changes
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- For the mingw builds, change the name of the DLL import library
|
2021-12-12 00:02:42 +00:00
|
|
|
|
from :file:`libqpdf.a` to
|
|
|
|
|
:file:`libqpdf.dll.a` to more accurately
|
2021-12-11 23:49:31 +00:00
|
|
|
|
reflect that it is an import library rather than a static
|
|
|
|
|
library. This potentially clears the way for supporting a
|
|
|
|
|
static library in the future, though presently, the qpdf
|
|
|
|
|
Windows build only builds the DLL and executables.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.1.0: June 23, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Usability Improvements
|
|
|
|
|
|
|
|
|
|
- When splitting files, qpdf detects fonts and images that the
|
|
|
|
|
document metadata claims are referenced from a page but are not
|
|
|
|
|
actually referenced and omits them from the output file. This
|
|
|
|
|
change can cause a significant reduction in the size of split
|
|
|
|
|
PDF files for files created by some software packages. In some
|
|
|
|
|
cases, it can also make page splitting slower. Prior versions
|
|
|
|
|
of qpdf would believe the document metadata and sometimes
|
|
|
|
|
include all the images from all the other pages even though the
|
|
|
|
|
pages were no longer present. In the unlikely event that the
|
|
|
|
|
old behavior should be desired, or if you have a case where
|
|
|
|
|
page splitting is very slow, the old behavior (and speed) can
|
|
|
|
|
be enabled by specifying
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--preserve-unreferenced-resources`. For
|
2021-12-12 00:31:19 +00:00
|
|
|
|
additional details, please see :ref:`ref.advanced-transformation`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- When merging multiple PDF files, qpdf no longer leaves all the
|
|
|
|
|
files open. This makes it possible to merge numbers of files
|
|
|
|
|
that may exceed the operating system's limit for the maximum
|
|
|
|
|
number of open files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--rotate` option's syntax has been
|
2021-12-11 23:49:31 +00:00
|
|
|
|
extended to make the page range optional. If you specify
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--rotate={angle}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
without specifying a page range, the rotation will be applied
|
|
|
|
|
to all pages. This can be especially useful for adjusting a PDF
|
|
|
|
|
created from a multi-page document that was scanned upside
|
|
|
|
|
down.
|
|
|
|
|
|
|
|
|
|
- When merging multiple files, the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--verbose` option now prints information
|
2021-12-11 23:49:31 +00:00
|
|
|
|
about each file as it operates on that file.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- When the :samp:`--progress` option is
|
2021-12-11 23:49:31 +00:00
|
|
|
|
specified, qpdf will print a running indicator of its best
|
|
|
|
|
guess at how far through the writing process it is. Note that,
|
|
|
|
|
as with all progress meters, it's an approximation. This option
|
|
|
|
|
is implemented in a way that makes it useful for software that
|
|
|
|
|
uses the qpdf library; see API Enhancements below.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- Properly decrypt files that use revision 3 of the standard
|
|
|
|
|
security handler but use 40 bit keys (even though revision 3
|
|
|
|
|
supports 128-bit keys).
|
|
|
|
|
|
|
|
|
|
- Limit depth of nested data structures to prevent crashes from
|
|
|
|
|
certain types of malformed (malicious) PDFs.
|
|
|
|
|
|
|
|
|
|
- In "newline before endstream" mode, insert the required extra
|
|
|
|
|
newline before the ``endstream`` at the end of object streams.
|
|
|
|
|
This one case was previously omitted.
|
|
|
|
|
|
|
|
|
|
- API Enhancements
|
|
|
|
|
|
|
|
|
|
- The first round of higher level "helper" interfaces has been
|
|
|
|
|
introduced. These are designed to provide a more convenient way
|
|
|
|
|
of interacting with certain document features than using
|
|
|
|
|
``QPDFObjectHandle`` directly. For details on helpers, see
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.helper-classes`. Specific additional
|
2021-12-11 23:49:31 +00:00
|
|
|
|
interfaces are described below.
|
|
|
|
|
|
|
|
|
|
- Add two new document helper classes: ``QPDFPageDocumentHelper``
|
|
|
|
|
for working with pages, and ``QPDFAcroFormDocumentHelper`` for
|
|
|
|
|
working with interactive forms. No old methods have been
|
|
|
|
|
removed, but ``QPDFPageDocumentHelper`` is now the preferred
|
|
|
|
|
way to perform operations on pages rather than calling the old
|
|
|
|
|
methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
|
|
|
|
|
in the header files direct you to the new interfaces. Please
|
2021-12-12 00:02:42 +00:00
|
|
|
|
see the header files and :file:`ChangeLog`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for additional details.
|
|
|
|
|
|
|
|
|
|
- Add three new object helper class: ``QPDFPageObjectHelper`` for
|
|
|
|
|
pages, ``QPDFFormFieldObjectHelper`` for interactive form
|
|
|
|
|
fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
|
|
|
|
|
three classes are fairly sparse at the moment, but they have
|
|
|
|
|
some useful, basic functionality.
|
|
|
|
|
|
|
|
|
|
- A new example program
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-set-form-values.cc` has
|
2021-12-11 23:49:31 +00:00
|
|
|
|
been added that illustrates use of the new document and object
|
|
|
|
|
helpers.
|
|
|
|
|
|
|
|
|
|
- The method ``QPDFWriter::registerProgressReporter`` has been
|
|
|
|
|
added. This method allows you to register a function that is
|
|
|
|
|
called by ``QPDFWriter`` to update your idea of the percentage
|
|
|
|
|
it thinks it is through writing its output. Client programs can
|
|
|
|
|
use this to implement reasonably accurate progress meters. The
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` command line tool uses this to
|
2021-12-12 00:11:56 +00:00
|
|
|
|
implement its :samp:`--progress` option.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- New methods ``QPDFObjectHandle::newUnicodeString`` and
|
|
|
|
|
``QPDFObject::unparseBinary`` have been added to allow for more
|
|
|
|
|
convenient creation of strings that are explicitly encoded
|
|
|
|
|
using big-endian UTF-16. This is useful for creating strings
|
|
|
|
|
that appear outside of content streams, such as labels, form
|
|
|
|
|
fields, outlines, document metadata, etc.
|
|
|
|
|
|
|
|
|
|
- A new class ``QPDFObjectHandle::Rectangle`` has been added to
|
|
|
|
|
ease working with PDF rectangles, which are just arrays of four
|
|
|
|
|
numeric values.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.0.2: March 6, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- When a loop is detected while following cross reference streams or
|
|
|
|
|
tables, treat this as damage instead of silently ignoring the
|
|
|
|
|
previous table. This prevents loss of otherwise recoverable data
|
|
|
|
|
in some damaged files.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Properly handle pages with no contents.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.0.1: March 4, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Disregard data check errors when uncompressing ``/FlateDecode``
|
|
|
|
|
streams. This is consistent with most other PDF readers and allows
|
|
|
|
|
qpdf to recover data from another class of malformed PDF files.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- On the command line when specifying page ranges, support preceding
|
|
|
|
|
a page number by "r" to indicate that it should be counted from
|
|
|
|
|
the end. For example, the range ``r3-r1`` would indicate the last
|
|
|
|
|
three pages of a document.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
8.0.0: February 25, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Packaging and Distribution Changes
|
|
|
|
|
|
|
|
|
|
- QPDF is now distributed as an
|
|
|
|
|
`AppImage <https://appimage.org/>`__ in addition to all the
|
|
|
|
|
other ways it is distributed. The AppImage can be found in the
|
|
|
|
|
download area with the other packages. Thanks to Kurt Pfeifle
|
|
|
|
|
and Simon Peter for their contributions.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- ``QPDFObjectHandle::getUTF8Val`` now properly treats
|
|
|
|
|
non-Unicode strings as encoded with PDF Doc Encoding.
|
|
|
|
|
|
|
|
|
|
- Improvements to handling of objects in PDF files that are not
|
|
|
|
|
of the expected type. In most cases, qpdf will be able to warn
|
|
|
|
|
for such cases rather than fail with an exception. Previous
|
|
|
|
|
versions of qpdf would sometimes fail with errors such as
|
|
|
|
|
"operation for dictionary object attempted on object of wrong
|
|
|
|
|
type". This situation should be mostly or entirely eliminated
|
|
|
|
|
now.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Enhancements to the :command:`qpdf` Command-line
|
2021-12-11 23:49:31 +00:00
|
|
|
|
Tool. All new options listed here are documented in more detail in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.using`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- The option
|
2021-12-12 21:18:03 +00:00
|
|
|
|
:samp:`--linearize-pass1={file}`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
has been added for debugging qpdf's linearization code.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The option :samp:`--coalesce-contents` can be
|
2021-12-11 23:49:31 +00:00
|
|
|
|
used to combine content streams of a page whose contents are an
|
|
|
|
|
array of streams into a single stream.
|
|
|
|
|
|
|
|
|
|
- API Enhancements. All new API calls are documented in their
|
|
|
|
|
respective classes' header files. There are no non-compatible
|
|
|
|
|
changes to the API.
|
|
|
|
|
|
|
|
|
|
- Add function ``qpdf_check_pdf`` to the C API. This function
|
2021-12-12 00:01:40 +00:00
|
|
|
|
does basic checking that is a subset of what :command:`qpdf
|
|
|
|
|
--check` performs.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Major enhancements to the lexical layer of qpdf. For a complete
|
|
|
|
|
list of enhancements, please refer to the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog` file. Most of the changes
|
2021-12-11 23:49:31 +00:00
|
|
|
|
result in improvements to qpdf's ability handle erroneous
|
|
|
|
|
files. It is also possible for programs to handle whitespace,
|
|
|
|
|
comments, and inline images as tokens.
|
|
|
|
|
|
|
|
|
|
- New API for working with PDF content streams at a lexical
|
|
|
|
|
level. The new class ``QPDFObjectHandle::TokenFilter`` allows
|
|
|
|
|
the developer to provide token handlers. Token filters can be
|
|
|
|
|
used with several different methods in ``QPDFObjectHandle`` as
|
|
|
|
|
well as with a lower-level interface. See comments in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFObjectHandle.hh` as well as the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
new examples
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-filter-tokens.cc` and
|
|
|
|
|
:file:`examples/pdf-count-strings.cc` for
|
2021-12-11 23:49:31 +00:00
|
|
|
|
details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
7.1.1: February 4, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: files whose /ID fields were other than 16 bytes long can
|
|
|
|
|
now be properly linearized
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- A few compile and link issues have been corrected for some
|
|
|
|
|
platforms.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
7.1.0: January 14, 2018
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- PDF files contain streams that may be compressed with various
|
|
|
|
|
compression algorithms which, in some cases, may be enhanced by
|
|
|
|
|
various predictor functions. Previously only the PNG up predictor
|
|
|
|
|
was supported. In this version, all the PNG predictors as well as
|
|
|
|
|
the TIFF predictor are supported. This increases the range of
|
|
|
|
|
files that qpdf is able to handle.
|
|
|
|
|
|
|
|
|
|
- QPDF now allows a raw encryption key to be specified in place of a
|
|
|
|
|
password when opening encrypted files, and will optionally display
|
|
|
|
|
the encryption key used by a file. This is a non-standard
|
|
|
|
|
operation, but it can be useful in certain situations. Please see
|
2021-12-12 00:11:56 +00:00
|
|
|
|
the discussion of :samp:`--password-is-hex-key` in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.basic-options` or the comments around
|
2021-12-11 23:49:31 +00:00
|
|
|
|
``QPDF::setPasswordIsHexKey`` in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF.hh` for additional details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Bug fix: numbers ending with a trailing decimal point are now
|
|
|
|
|
properly recognized as numbers.
|
|
|
|
|
|
|
|
|
|
- Bug fix: when building qpdf from source on some platforms
|
|
|
|
|
(especially MacOS), the build could get confused by older versions
|
|
|
|
|
of qpdf installed on the system. This has been corrected.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
7.0.0: September 15, 2017
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Packaging and Distribution Changes
|
|
|
|
|
|
|
|
|
|
- QPDF's primary license is now `version 2.0 of the Apache
|
|
|
|
|
License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
|
|
|
|
|
than version 2.0 of the Artistic License. You may still, at
|
|
|
|
|
your option, consider qpdf to be licensed with version 2.0 of
|
|
|
|
|
the Artistic license.
|
|
|
|
|
|
|
|
|
|
- QPDF no longer has a dependency on the PCRE (Perl-Compatible
|
|
|
|
|
Regular Expression) library. QPDF now has an added dependency
|
|
|
|
|
on the JPEG library.
|
|
|
|
|
|
|
|
|
|
- Bug Fixes
|
|
|
|
|
|
|
|
|
|
- This release contains many bug fixes for various infinite
|
|
|
|
|
loops, memory leaks, and other memory errors that could be
|
|
|
|
|
encountered with specially crafted or otherwise erroneous PDF
|
|
|
|
|
files.
|
|
|
|
|
|
|
|
|
|
- New Features
|
|
|
|
|
|
|
|
|
|
- QPDF now supports reading and writing streams encoded with JPEG
|
|
|
|
|
or RunLength encoding. Library API enhancements and
|
|
|
|
|
command-line options have been added to control this behavior.
|
|
|
|
|
See command-line options
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--compress-streams` and
|
|
|
|
|
:samp:`--decode-level` and methods
|
2021-12-11 23:49:31 +00:00
|
|
|
|
``QPDFWriter::setCompressStreams`` and
|
|
|
|
|
``QPDFWriter::setDecodeLevel``.
|
|
|
|
|
|
|
|
|
|
- QPDF is much better at recovering from broken files. In most
|
|
|
|
|
cases, qpdf will skip invalid objects and will preserve broken
|
|
|
|
|
stream data by not attempting to filter broken streams. QPDF is
|
|
|
|
|
now able to recover or at least not crash on dozens of broken
|
|
|
|
|
test files I have received over the past few years.
|
|
|
|
|
|
|
|
|
|
- Page rotation is now supported and accessible from both the
|
|
|
|
|
library and the command line.
|
|
|
|
|
|
|
|
|
|
- ``QPDFWriter`` supports writing files in a way that preserves
|
|
|
|
|
PCLm compliance in support of driverless printing. This is very
|
|
|
|
|
specialized and is only useful to applications that already
|
|
|
|
|
know how to create PCLm files.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Enhancements to the :command:`qpdf` Command-line
|
2021-12-11 23:49:31 +00:00
|
|
|
|
Tool. All new options listed here are documented in more detail in
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.using`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Command-line arguments can now be read from files or standard
|
2021-12-12 00:31:19 +00:00
|
|
|
|
input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--rotate`: request page rotation
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--newline-before-endstream`: ensure that
|
2021-12-11 23:49:31 +00:00
|
|
|
|
a newline appears before every ``endstream`` keyword in the
|
|
|
|
|
file; used to prevent qpdf from breaking PDF/A compliance on
|
|
|
|
|
already compliant files.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--preserve-unreferenced`: preserve
|
2021-12-11 23:49:31 +00:00
|
|
|
|
unreferenced objects in the input PDF
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--split-pages`: break output into chunks
|
2021-12-11 23:49:31 +00:00
|
|
|
|
with fixed numbers of pages
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--verbose`: print the name of each
|
2021-12-11 23:49:31 +00:00
|
|
|
|
output file that is created
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- :samp:`--compress-streams` and
|
|
|
|
|
:samp:`--decode-level` replace
|
|
|
|
|
:samp:`--stream-data` for improving granularity
|
2021-12-11 23:49:31 +00:00
|
|
|
|
of controlling compression and decompression of stream data.
|
2021-12-12 00:11:56 +00:00
|
|
|
|
The :samp:`--stream-data` option will remain
|
2021-12-11 23:49:31 +00:00
|
|
|
|
available.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- When running :command:`qpdf --check` with other
|
2021-12-11 23:49:31 +00:00
|
|
|
|
options, checks are always run first. This enables qpdf to
|
|
|
|
|
perform its full recovery logic before outputting other
|
|
|
|
|
information. This can be especially useful when manually
|
|
|
|
|
recovering broken files, looking at qpdf's regenerated cross
|
|
|
|
|
reference table, or other similar operations.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Process :command:`--pages` earlier so that other
|
2021-12-12 00:11:56 +00:00
|
|
|
|
options like :samp:`--show-pages` or
|
|
|
|
|
:samp:`--split-pages` can operate on the file
|
2021-12-11 23:49:31 +00:00
|
|
|
|
after page splitting/merging has occurred.
|
|
|
|
|
|
|
|
|
|
- API Changes. All new API calls are documented in their respective
|
|
|
|
|
classes' header files.
|
|
|
|
|
|
|
|
|
|
- ``QPDFObjectHandle::rotatePage``: apply rotation to a page
|
|
|
|
|
object
|
|
|
|
|
|
|
|
|
|
- ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
|
|
|
|
|
appear before ``endstream``
|
|
|
|
|
|
|
|
|
|
- ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
|
|
|
|
|
unreferenced objects that appear in the input PDF. The default
|
|
|
|
|
behavior is to discard them.
|
|
|
|
|
|
|
|
|
|
- New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
|
|
|
|
|
available for developers who wish to produce or consume
|
|
|
|
|
RunLength or DCT stream data directly. The
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-create.cc` example
|
2021-12-11 23:49:31 +00:00
|
|
|
|
illustrates their use.
|
|
|
|
|
|
|
|
|
|
- ``QPDFWriter::setCompressStreams`` and
|
|
|
|
|
``QPDFWriter::setDecodeLevel`` methods control handling of
|
|
|
|
|
different types of stream compression.
|
|
|
|
|
|
|
|
|
|
- Add new C API functions ``qpdf_set_compress_streams``,
|
|
|
|
|
``qpdf_set_decode_level``,
|
|
|
|
|
``qpdf_set_preserve_unreferenced_objects``, and
|
|
|
|
|
``qpdf_set_newline_before_endstream`` corresponding to the new
|
|
|
|
|
``QPDFWriter`` methods.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
6.0.0: November 10, 2015
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Implement :samp:`--deterministic-id` command-line
|
2021-12-11 23:49:31 +00:00
|
|
|
|
option and ``QPDFWriter::setDeterministicID`` as well as C API
|
|
|
|
|
function ``qpdf_set_deterministic_ID`` for generating a
|
|
|
|
|
deterministic ID for non-encrypted files. When this option is
|
|
|
|
|
selected, the ID of the file depends on the contents of the output
|
|
|
|
|
file, and not on transient items such as the timestamp or output
|
|
|
|
|
file name.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Make qpdf more tolerant of files whose xref table entries are not
|
|
|
|
|
the correct length.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
5.1.3: May 24, 2015
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: fix-qdf was not properly handling files that contained
|
|
|
|
|
object streams with more than 255 objects in them.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: qpdf was not properly initializing Microsoft's secure
|
|
|
|
|
crypto provider on fresh Windows installations that had not had
|
|
|
|
|
any keys created yet.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
|
|
|
|
|
the Google Security Team. Please see the ChangeLog for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Properly handle pages that have no contents at all. There were
|
|
|
|
|
many cases in which qpdf handled this fine, but a few methods
|
|
|
|
|
blindly obtained page contents with handling the possibility that
|
|
|
|
|
there were no contents.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Make qpdf more robust for a few more kinds of problems that may
|
|
|
|
|
occur in invalid PDF files.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
5.1.2: June 7, 2014
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: linearizing files could create a corrupted output file
|
|
|
|
|
under extremely unlikely file size circumstances. See ChangeLog
|
|
|
|
|
for details. The odds of getting hit by this are very low, though
|
|
|
|
|
one person did.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: qpdf would fail to write files that had streams with
|
|
|
|
|
decode parameters referencing other streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- New example program: :command:`pdf-split-pages`:
|
2021-12-11 23:49:31 +00:00
|
|
|
|
efficiently split PDF files into individual pages. The example
|
2021-12-12 00:01:40 +00:00
|
|
|
|
program does this more efficiently than using :command:`qpdf
|
|
|
|
|
--pages` to do it.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Packaging fix: Visual C++ binaries did not support Windows XP.
|
|
|
|
|
This has been rectified by updating the compilers used to generate
|
|
|
|
|
the release binaries.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
5.1.1: January 14, 2014
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Performance fix: copying foreign objects could be very slow with
|
|
|
|
|
certain types of files. This was most likely to be visible during
|
|
|
|
|
page splitting and was due to traversing the same objects multiple
|
|
|
|
|
times in some cases.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
5.1.0: December 17, 2013
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Added runtime option (``QUtil::setRandomDataProvider``) to supply
|
|
|
|
|
your own random data provider. You can use this if you want to
|
|
|
|
|
avoid using the OS-provided secure random number generation
|
|
|
|
|
facility or stdlib's less secure version. See comments in
|
|
|
|
|
include/qpdf/QUtil.hh for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fixed image comparison tests to not create 12-bit-per-pixel images
|
|
|
|
|
since some versions of tiffcmp have bugs in comparing them in some
|
|
|
|
|
cases. This increases the disk space required by the image
|
|
|
|
|
comparison tests, which are off by default anyway.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Introduce a number of small fixes for compilation on the latest
|
|
|
|
|
clang in MacOS and the latest Visual C++ in Windows.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Be able to handle broken files that end the xref table header with
|
|
|
|
|
a space instead of a newline.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
5.0.1: October 18, 2013
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Thanks to a detailed review by Florian Weimer and the Red Hat
|
|
|
|
|
Product Security Team, this release includes a number of
|
|
|
|
|
non-user-visible security hardening changes. Please see the
|
|
|
|
|
ChangeLog file in the source distribution for the complete list.
|
|
|
|
|
|
|
|
|
|
- When available, operating system-specific secure random number
|
|
|
|
|
generation is used for generating initialization vectors and other
|
|
|
|
|
random values used during encryption or file creation. For the
|
|
|
|
|
Windows build, this results in an added dependency on Microsoft's
|
|
|
|
|
cryptography API. To disable the OS-specific cryptography and use
|
|
|
|
|
the old version, pass the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--enable-insecure-random` option to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- The :command:`qpdf` command-line tool now issues a
|
2021-12-12 00:11:56 +00:00
|
|
|
|
warning when :samp:`-accessibility=n` is specified
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for newer encryption versions stating that the option is ignored.
|
|
|
|
|
qpdf, per the spec, has always ignored this flag, but it
|
|
|
|
|
previously did so silently. This warning is issued only by the
|
|
|
|
|
command-line tool, not by the library. The library's handling of
|
|
|
|
|
this flag is unchanged.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
5.0.0: July 10, 2013
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: previous versions of qpdf would lose objects with
|
|
|
|
|
generation != 0 when generating object streams. Fixing this
|
|
|
|
|
required changes to the public API.
|
|
|
|
|
|
|
|
|
|
- Removed methods from public API that were only supposed to be
|
|
|
|
|
called by QPDFWriter and couldn't realistically be called anywhere
|
|
|
|
|
else. See ChangeLog for details.
|
|
|
|
|
|
|
|
|
|
- New ``QPDFObjGen`` class added to represent an object
|
|
|
|
|
ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
|
|
|
|
|
preferred over ``QPDFObjectHandle::getObjectID()`` and
|
|
|
|
|
``QPDFObjectHandle::getGeneration()`` as it makes it less likely
|
|
|
|
|
for people to accidentally write code that ignores the generation
|
2021-12-12 00:02:42 +00:00
|
|
|
|
number. See :file:`QPDF.hh` and
|
|
|
|
|
:file:`QPDFObjectHandle.hh` for additional
|
2021-12-11 23:49:31 +00:00
|
|
|
|
notes.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- Add :samp:`--show-npages` command-line option to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
the :command:`qpdf` command to show the number of
|
2021-12-11 23:49:31 +00:00
|
|
|
|
pages in a file.
|
|
|
|
|
|
|
|
|
|
- Allow omission of the page range within
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--pages` for the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` command. When omitted, the page
|
2021-12-11 23:49:31 +00:00
|
|
|
|
range is implicitly taken to be all the pages in the file.
|
|
|
|
|
|
|
|
|
|
- Various enhancements were made to support different types of
|
|
|
|
|
broken files or broken readers. Details can be found in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
4.1.0: April 14, 2013
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Note to people including qpdf in distributions: the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`.la` files generated by libtool are now
|
2021-12-12 00:01:40 +00:00
|
|
|
|
installed by qpdf's :command:`make install` target.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
Before, they were not installed. This means that if your
|
|
|
|
|
distribution does not want to include
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`.la` files, you must remove them as
|
2021-12-11 23:49:31 +00:00
|
|
|
|
part of your packaging process.
|
|
|
|
|
|
|
|
|
|
- Major enhancement: API enhancements have been made to support
|
|
|
|
|
parsing of content streams. This enhancement includes the
|
|
|
|
|
following changes:
|
|
|
|
|
|
|
|
|
|
- ``QPDFObjectHandle::parseContentStream`` method parses objects
|
|
|
|
|
in a content stream and calls handlers in a callback class. The
|
|
|
|
|
example
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-parse-content.cc`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
illustrates how this may be used.
|
|
|
|
|
|
|
|
|
|
- ``QPDFObjectHandle`` can now represent operators and inline
|
|
|
|
|
images, object types that may only appear in content streams.
|
|
|
|
|
|
|
|
|
|
- Method ``QPDFObjectHandle::getTypeCode()`` returns an
|
|
|
|
|
enumerated type value representing the underlying object type.
|
|
|
|
|
Method ``QPDFObjectHandle::getTypeName()`` returns a text
|
|
|
|
|
string describing the name of the type of a
|
|
|
|
|
``QPDFObjectHandle`` object. These methods can be used for more
|
|
|
|
|
efficient parsing and debugging/diagnostic messages.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- :command:`qpdf --check` now parses all pages'
|
2021-12-11 23:49:31 +00:00
|
|
|
|
content streams in addition to doing other checks. While there are
|
|
|
|
|
still many types of errors that cannot be detected, syntactic
|
|
|
|
|
errors in content streams will now be reported.
|
|
|
|
|
|
|
|
|
|
- Minor compilation enhancements have been made to facilitate easier
|
|
|
|
|
for support for a broader range of compilers and compiler
|
|
|
|
|
versions.
|
|
|
|
|
|
|
|
|
|
- Warning flags have been moved into a separate variable in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`autoconf.mk`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The configure flag :samp:`--enable-werror` work
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for Microsoft compilers
|
|
|
|
|
|
|
|
|
|
- All MSVC CRT security warnings have been resolved.
|
|
|
|
|
|
|
|
|
|
- All C-style casts in C++ Code have been replaced by C++ casts,
|
|
|
|
|
and many casts that had been included to suppress higher
|
|
|
|
|
warning levels for some compilers have been removed, primarily
|
|
|
|
|
for clarity. Places where integer type coercion occurs have
|
|
|
|
|
been scrutinized. A new casting policy has been documented in
|
|
|
|
|
the manual. This is of concern mainly to people porting qpdf to
|
|
|
|
|
new platforms or compilers. It is not visible to programmers
|
|
|
|
|
writing code that uses the library
|
|
|
|
|
|
|
|
|
|
- Some internal limits have been removed in code that converts
|
|
|
|
|
numbers to strings. This is largely invisible to users, but it
|
|
|
|
|
does trigger a bug in some older versions of mingw-w64's C++
|
2021-12-12 00:02:42 +00:00
|
|
|
|
library. See :file:`README-windows.md` in
|
2021-12-11 23:49:31 +00:00
|
|
|
|
the source distribution if you think this may affect you. The
|
|
|
|
|
copy of the DLL distributed with qpdf's binary distribution is
|
|
|
|
|
not affected by this problem.
|
|
|
|
|
|
|
|
|
|
- The RPM spec file previously included with qpdf has been removed.
|
|
|
|
|
This is because virtually all Linux distributions include qpdf now
|
|
|
|
|
that it is a dependency of CUPS filters.
|
|
|
|
|
|
|
|
|
|
- A few bug fixes are included:
|
|
|
|
|
|
|
|
|
|
- Overridden compressed objects are properly handled. Before,
|
|
|
|
|
there were certain constructs that could cause qpdf to see old
|
|
|
|
|
versions of some objects. The most usual manifestation of this
|
|
|
|
|
was loss of filled in form values for certain files.
|
|
|
|
|
|
|
|
|
|
- Installation no longer uses GNU/Linux-specific versions of some
|
2021-12-12 00:01:40 +00:00
|
|
|
|
commands, so :command:`make install` works on
|
2021-12-11 23:49:31 +00:00
|
|
|
|
Solaris with native tools.
|
|
|
|
|
|
|
|
|
|
- The 64-bit mingw Windows binary package no longer includes a
|
|
|
|
|
32-bit DLL.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
4.0.1: January 17, 2013
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix detection of binary attachments in test suite to avoid false
|
|
|
|
|
test failures on some platforms.
|
|
|
|
|
|
2021-12-12 00:02:42 +00:00
|
|
|
|
- Add clarifying comment in :file:`QPDF.hh` to
|
2021-12-11 23:49:31 +00:00
|
|
|
|
methods that return the user password explaining that it is no
|
|
|
|
|
longer possible with newer encryption formats to recover the user
|
|
|
|
|
password knowing the owner password. In earlier encryption
|
|
|
|
|
formats, the user password was encrypted in the file using the
|
|
|
|
|
owner password. In newer encryption formats, a separate encryption
|
|
|
|
|
key is used on the file, and that key is independently encrypted
|
|
|
|
|
using both the user password and the owner password.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
4.0.0: December 31, 2012
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Major enhancement: support has been added for newer encryption
|
|
|
|
|
schemes supported by version X of Adobe Acrobat. This includes use
|
|
|
|
|
of 127-character passwords, 256-bit encryption keys, and the
|
|
|
|
|
encryption scheme specified in ISO 32000-2, the PDF 2.0
|
|
|
|
|
specification. This scheme can be chosen from the command line by
|
|
|
|
|
specifying use of 256-bit keys. qpdf also supports the deprecated
|
|
|
|
|
encryption method used by Acrobat IX. This encryption style has
|
|
|
|
|
known security weaknesses and should not be used in practice.
|
|
|
|
|
However, such files exist "in the wild," so support for this
|
|
|
|
|
scheme is still useful. New methods
|
|
|
|
|
``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
|
|
|
|
|
and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
|
|
|
|
|
scheme) have been added to enable these new encryption schemes.
|
|
|
|
|
Corresponding functions have been added to the C API as well.
|
|
|
|
|
|
|
|
|
|
- Full support for Adobe extension levels in PDF version
|
|
|
|
|
information. Starting with PDF version 1.7, corresponding to ISO
|
|
|
|
|
32000, Adobe adds new functionality by increasing the extension
|
|
|
|
|
level rather than increasing the version. This support includes
|
|
|
|
|
addition of the ``QPDF::getExtensionLevel`` method for retrieving
|
|
|
|
|
the document's extension level, addition of versions of
|
|
|
|
|
``QPDFWriter::setMinimumPDFVersion`` and
|
|
|
|
|
``QPDFWriter::forcePDFVersion`` that accept an extension level,
|
|
|
|
|
and extended syntax for specifying forced and minimum versions on
|
2021-12-12 00:31:19 +00:00
|
|
|
|
the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions
|
2021-12-11 23:49:31 +00:00
|
|
|
|
have been added to the C API as well.
|
|
|
|
|
|
|
|
|
|
- Minor fixes to prevent qpdf from referencing objects in the file
|
|
|
|
|
that are not referenced in the file's overall structure. Most
|
|
|
|
|
files don't have any such objects, but some files have contain
|
|
|
|
|
unreferenced objects with errors, so these fixes prevent qpdf from
|
|
|
|
|
needlessly rejecting or complaining about such objects.
|
|
|
|
|
|
|
|
|
|
- Add new generalized methods for reading and writing files from/to
|
|
|
|
|
programmer-defined sources. The method
|
|
|
|
|
``QPDF::processInputSource`` allows the programmer to use any
|
|
|
|
|
input source for the input file, and
|
|
|
|
|
``QPDFWriter::setOutputPipeline`` allows the programmer to write
|
|
|
|
|
the output file through any pipeline. These methods would make it
|
|
|
|
|
possible to perform any number of specialized operations, such as
|
|
|
|
|
accessing external storage systems, creating bindings for qpdf in
|
|
|
|
|
other programming languages that have their own I/O systems, etc.
|
|
|
|
|
|
|
|
|
|
- Add new method ``QPDF::getEncryptionKey`` for retrieving the
|
|
|
|
|
underlying encryption key used in the file.
|
|
|
|
|
|
|
|
|
|
- This release includes a small handful of non-compatible API
|
|
|
|
|
changes. While effort is made to avoid such changes, all the
|
|
|
|
|
non-compatible API changes in this version were to parts of the
|
|
|
|
|
API that would likely never be used outside the library itself. In
|
|
|
|
|
all cases, the altered methods or structures were parts of the
|
|
|
|
|
``QPDF`` that were public to enable them to be called from either
|
|
|
|
|
``QPDFWriter`` or were part of validation code that was
|
|
|
|
|
over-zealous in reporting problems in parts of the file that would
|
|
|
|
|
not ordinarily be referenced. In no case did any of the removed
|
|
|
|
|
methods do anything worse that falsely report error conditions in
|
|
|
|
|
files that were broken in ways that didn't matter. The following
|
|
|
|
|
public parts of the ``QPDF`` class were changed in a
|
|
|
|
|
non-compatible way:
|
|
|
|
|
|
|
|
|
|
- Updated nested ``QPDF::EncryptionData`` class to add fields
|
|
|
|
|
needed by the newer encryption formats, member variables
|
|
|
|
|
changed to private so that future changes will not require
|
|
|
|
|
breaking backward compatibility.
|
|
|
|
|
|
|
|
|
|
- Added additional parameters to ``compute_data_key``, which is
|
|
|
|
|
used by ``QPDFWriter`` to compute the encryption key used to
|
|
|
|
|
encrypt a specific object.
|
|
|
|
|
|
|
|
|
|
- Removed the method ``flattenScalarReferences``. This method was
|
|
|
|
|
previously used prior to writing a new PDF file, but it has the
|
|
|
|
|
undesired side effect of causing qpdf to read objects in the
|
|
|
|
|
file that were not referenced. Some otherwise files have
|
|
|
|
|
unreferenced objects with errors in them, so this could cause
|
|
|
|
|
qpdf to reject files that would be accepted by virtually all
|
|
|
|
|
other PDF readers. In fact, qpdf relied on only a very small
|
|
|
|
|
part of what flattenScalarReferences did, so only this part has
|
|
|
|
|
been preserved, and it is now done directly inside
|
|
|
|
|
``QPDFWriter``.
|
|
|
|
|
|
|
|
|
|
- Removed the method ``decodeStreams``. This method was used by
|
2021-12-12 00:11:56 +00:00
|
|
|
|
the :samp:`--check` option of the
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` command-line tool to force all
|
2021-12-11 23:49:31 +00:00
|
|
|
|
streams in the file to be decoded, but it also suffered from
|
|
|
|
|
the problem of opening otherwise unreferenced streams and thus
|
|
|
|
|
could report false positive. The
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--check` option now causes qpdf to go
|
2021-12-11 23:49:31 +00:00
|
|
|
|
through all the motions of writing a new file based on the
|
|
|
|
|
original one, so it will always reference and check exactly
|
|
|
|
|
those parts of a file that any ordinary viewer would check.
|
|
|
|
|
|
|
|
|
|
- Removed the method ``trimTrailerForWrite``. This method was
|
|
|
|
|
used by ``QPDFWriter`` to modify the original QPDF object by
|
|
|
|
|
removing fields from the trailer dictionary that wouldn't apply
|
|
|
|
|
to the newly written file. This functionality, though generally
|
|
|
|
|
harmless, was a poor implementation and has been replaced by
|
|
|
|
|
having QPDFWriter filter these out when copying the trailer
|
|
|
|
|
rather than modifying the original QPDF object. (Note that qpdf
|
|
|
|
|
never modifies the original file itself.)
|
|
|
|
|
|
|
|
|
|
- Allow the PDF header to appear anywhere in the first 1024 bytes of
|
|
|
|
|
the file. This is consistent with what other readers do.
|
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Fix the :command:`pkg-config` files to list zlib
|
2021-12-11 23:49:31 +00:00
|
|
|
|
and pcre in ``Requires.private`` to better support static linking
|
2021-12-12 00:01:40 +00:00
|
|
|
|
using :command:`pkg-config`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
3.0.2: September 6, 2012
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
|
|
|
|
|
used with ``QPDFWriter::setStaticID``, which made it pretty much
|
|
|
|
|
useless. This has been fixed.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
|
|
|
|
|
text near the header of the PDF file. The intended use case is to
|
|
|
|
|
insert comments that may be consumed by a downstream application,
|
|
|
|
|
though other use cases may exist.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
3.0.1: August 11, 2012
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Version 3.0.0 included addition of files for
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`pkg-config`, but this was not mentioned
|
2021-12-11 23:49:31 +00:00
|
|
|
|
in the release notes. The release notes for 3.0.0 were updated to
|
|
|
|
|
mention this.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: if an object stream ended with a scalar object not
|
|
|
|
|
followed by space, qpdf would incorrectly report that it
|
|
|
|
|
encountered a premature EOF. This bug has been in qpdf since
|
|
|
|
|
version 2.0.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
3.0.0: August 2, 2012
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Acknowledgment: I would like to express gratitude for the
|
|
|
|
|
contributions of Tobias Hoffmann toward the release of qpdf
|
|
|
|
|
version 3.0. He is responsible for most of the implementation and
|
|
|
|
|
design of the new API for manipulating pages, and contributed code
|
|
|
|
|
and ideas for many of the improvements made in version 3.0.
|
|
|
|
|
Without his work, this release would certainly not have happened
|
|
|
|
|
as soon as it did, if at all.
|
|
|
|
|
|
|
|
|
|
- *Non-compatible API change:* The version of
|
|
|
|
|
``QPDFObjectHandle::replaceStreamData`` that uses a
|
|
|
|
|
``StreamDataProvider`` no longer requires (or accepts) a
|
|
|
|
|
``length`` parameter. See
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.upgrading-to-3.0` for an explanation.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
While care is taken to avoid non-compatible API changes in
|
|
|
|
|
general, an exception was made this time because the new interface
|
|
|
|
|
offers an opportunity to significantly simplify calling code.
|
|
|
|
|
|
|
|
|
|
- Support has been added for large files. The test suite verifies
|
|
|
|
|
support for files larger than 4 gigabytes, and manual testing has
|
|
|
|
|
verified support for files larger than 10 gigabytes. Large file
|
|
|
|
|
support is available for both 32-bit and 64-bit platforms as long
|
|
|
|
|
as the compiler and underlying platforms support it.
|
|
|
|
|
|
|
|
|
|
- Support for page selection (splitting and merging PDF files) has
|
2021-12-12 00:01:40 +00:00
|
|
|
|
been added to the :command:`qpdf` command-line
|
2021-12-12 00:31:19 +00:00
|
|
|
|
tool. See :ref:`ref.page-selection`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Options have been added to the :command:`qpdf`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
command-line tool for copying encryption parameters from another
|
2021-12-12 00:31:19 +00:00
|
|
|
|
file. See :ref:`ref.basic-options`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- New methods have been added to the ``QPDF`` object for adding and
|
2021-12-12 00:31:19 +00:00
|
|
|
|
removing pages. See :ref:`ref.adding-and-remove-pages`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- New methods have been added to the ``QPDF`` object for copying
|
2021-12-12 00:31:19 +00:00
|
|
|
|
objects from other PDF files. See :ref:`ref.foreign-objects`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- A new method ``QPDFObjectHandle::parse`` has been added for
|
|
|
|
|
constructing ``QPDFObjectHandle`` objects from a string
|
|
|
|
|
description.
|
|
|
|
|
|
|
|
|
|
- Methods have been added to ``QPDFWriter`` to allow writing to an
|
|
|
|
|
already open stdio ``FILE*`` addition to writing to standard
|
|
|
|
|
output or a named file. Methods have been added to ``QPDF`` to be
|
|
|
|
|
able to process a file from an already open stdio ``FILE*``. This
|
|
|
|
|
makes it possible to read and write PDF from secure temporary
|
|
|
|
|
files that have been unlinked prior to being fully read or
|
|
|
|
|
written.
|
|
|
|
|
|
|
|
|
|
- The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
|
|
|
|
|
from scratch. The example
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`examples/pdf-create.cc` illustrates how
|
2021-12-11 23:49:31 +00:00
|
|
|
|
it can be used.
|
|
|
|
|
|
|
|
|
|
- Several methods to take ``PointerHolder<Buffer>`` can now also
|
|
|
|
|
accept ``std::string`` arguments.
|
|
|
|
|
|
|
|
|
|
- Many new convenience methods have been added to the library, most
|
2021-12-12 00:02:42 +00:00
|
|
|
|
in ``QPDFObjectHandle``. See :file:`ChangeLog`
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for a full list.
|
|
|
|
|
|
|
|
|
|
- When building on a platform that supports ELF shared libraries
|
|
|
|
|
(such as Linux), symbol versions are enabled by default. They can
|
|
|
|
|
be disabled by passing
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--disable-ld-version-script` to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
2021-12-12 00:02:42 +00:00
|
|
|
|
- The file :file:`libqpdf.pc` is now installed
|
2021-12-12 00:01:40 +00:00
|
|
|
|
to support :command:`pkg-config`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Image comparison tests are off by default now since they are not
|
|
|
|
|
needed to verify a correct build or port of qpdf. They are needed
|
|
|
|
|
only when changing the actual PDF output generated by qpdf. You
|
|
|
|
|
should enable them if you are making deep changes to qpdf itself.
|
2021-12-12 00:02:42 +00:00
|
|
|
|
See :file:`README.md` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Large file tests are off by default but can be turned on with
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure` or by setting an environment
|
2021-12-11 23:49:31 +00:00
|
|
|
|
variable before running the test suite. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`README.md` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- When qpdf's test suite fails, failures are not printed to the
|
|
|
|
|
terminal anymore by default. Instead, find them in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`build/qtest.log`. For packagers who are
|
2021-12-11 23:49:31 +00:00
|
|
|
|
building with an autobuilder, you can add the
|
2021-12-12 00:11:56 +00:00
|
|
|
|
:samp:`--enable-show-failed-test-output` option to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`./configure` to restore the old behavior.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.3.1: December 28, 2011
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix thread-safety problem resulting from non-thread-safe use of
|
|
|
|
|
the PCRE library.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Made a few minor documentation fixes.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add workaround for a bug that appears in some versions of
|
|
|
|
|
ghostscript to the test suite
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix minor build issue for Visual C++ 2010.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.3.0: August 11, 2011
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Bug fix: when preserving existing encryption on encrypted files
|
|
|
|
|
with cleartext metadata, older qpdf versions would generate
|
|
|
|
|
password-protected files with no valid password. This operation
|
|
|
|
|
now works. This bug only affected files created by copying
|
|
|
|
|
existing encryption parameters; explicit encryption with
|
|
|
|
|
specification of cleartext metadata worked before and continues to
|
|
|
|
|
work.
|
|
|
|
|
|
|
|
|
|
- Enhance ``QPDFWriter`` with a new constructor that allows you to
|
|
|
|
|
delay the specification of the output file. When using this
|
|
|
|
|
constructor, you may now call ``QPDFWriter::setOutputFilename`` to
|
|
|
|
|
specify the output file, or you may use
|
|
|
|
|
``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
|
|
|
|
|
the resulting PDF file to a memory buffer. You may then use
|
|
|
|
|
``QPDFWriter::getBuffer`` to retrieve the memory buffer.
|
|
|
|
|
|
|
|
|
|
- Add new API call ``QPDF::replaceObject`` for replacing objects by
|
|
|
|
|
object ID
|
|
|
|
|
|
|
|
|
|
- Add new API call ``QPDF::swapObjects`` for swapping two objects by
|
|
|
|
|
object ID
|
|
|
|
|
|
|
|
|
|
- Add ``QPDFObjectHandle::getDictAsMap`` and
|
|
|
|
|
``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
|
|
|
|
|
dictionary objects as maps and array objects as vectors.
|
|
|
|
|
|
|
|
|
|
- Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
|
|
|
|
|
the C API for manipulating string fields of the document's
|
|
|
|
|
``/Info`` dictionary.
|
|
|
|
|
|
|
|
|
|
- Add functions ``qpdf_init_write_memory``,
|
|
|
|
|
``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
|
|
|
|
|
for writing PDF files to a memory buffer instead of a file.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.2.4: June 25, 2011
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix installation and compilation issues; no functionality changes.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.2.3: April 30, 2011
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Handle some damaged streams with incorrect characters following
|
|
|
|
|
the stream keyword.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Improve handling of inline images when normalizing content
|
|
|
|
|
streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Enhance error recovery to properly handle files that use object 0
|
|
|
|
|
as a regular object, which is specifically disallowed by the spec.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.2.2: October 4, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add new function ``qpdf_read_memory`` to the C API to call
|
|
|
|
|
``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.2.1: October 1, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
|
|
|
|
|
and ``std::cerr`` with other streams for generation of diagnostic
|
|
|
|
|
messages and error messages. This can be useful for GUIs or other
|
|
|
|
|
applications that want to capture any output generated by the
|
|
|
|
|
library to present to the user in some other way. Note that QPDF
|
|
|
|
|
does not write to ``std::cout`` (or the specified output stream)
|
|
|
|
|
except where explicitly mentioned in
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDF.hh`, and that the only use of the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
error stream is for warnings. Note also that output of warnings is
|
|
|
|
|
suppressed when ``setSuppressWarnings(true)`` is called.
|
|
|
|
|
|
|
|
|
|
- Add new method ``QPDF::processMemoryFile`` for operating on PDF
|
|
|
|
|
files that are loaded into memory rather than in a file on disk.
|
|
|
|
|
|
|
|
|
|
- Give a warning but otherwise ignore empty PDF objects by treating
|
|
|
|
|
them as null. Empty object are not permitted by the PDF
|
|
|
|
|
specification but have been known to appear in some actual PDF
|
|
|
|
|
files.
|
|
|
|
|
|
|
|
|
|
- Handle inline image filter abbreviations when the appear as stream
|
|
|
|
|
filter abbreviations. The PDF specification does not allow use of
|
|
|
|
|
stream filter abbreviations in this way, but Adobe Reader and some
|
|
|
|
|
other PDF readers accept them since they sometimes appear
|
|
|
|
|
incorrectly in actual PDF files.
|
|
|
|
|
|
|
|
|
|
- Implement miscellaneous enhancements to ``PointerHolder`` and
|
|
|
|
|
``Buffer`` to support other changes.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.2.0: August 14, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Add new methods to ``QPDFObjectHandle`` (``newStream`` and
|
|
|
|
|
``replaceStreamData`` for creating new streams and replacing
|
|
|
|
|
stream data. This makes it possible to perform a wide range of
|
|
|
|
|
operations that were not previously possible.
|
|
|
|
|
|
|
|
|
|
- Add new helper method in ``QPDFObjectHandle``
|
|
|
|
|
(``addPageContents``) for appending or prepending new content
|
|
|
|
|
streams to a page. This method makes it possible to manipulate
|
|
|
|
|
content streams without having to be concerned whether a page's
|
|
|
|
|
contents are a single stream or an array of streams.
|
|
|
|
|
|
|
|
|
|
- Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
|
|
|
|
|
which replaces a dictionary key with a given value unless the
|
|
|
|
|
value is null, in which case it removes the key instead.
|
|
|
|
|
|
|
|
|
|
- Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
|
|
|
|
|
which returns the raw (unfiltered) stream data into a buffer. This
|
|
|
|
|
complements the ``getStreamData`` method, which returns the
|
|
|
|
|
filtered (uncompressed) stream data and can only be used when the
|
|
|
|
|
stream's data is filterable.
|
|
|
|
|
|
|
|
|
|
- Provide two new examples:
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`pdf-double-page-size` and
|
|
|
|
|
:command:`pdf-invert-images` that illustrate the
|
2021-12-11 23:49:31 +00:00
|
|
|
|
newly added interfaces.
|
|
|
|
|
|
|
|
|
|
- Fix a memory leak that would cause loss of a few bytes for every
|
|
|
|
|
object involved in a cycle of object references. Thanks to Jian Ma
|
|
|
|
|
for calling my attention to the leak.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.1.5: April 25, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Remove restriction of file identifier strings to 16 bytes. This
|
|
|
|
|
unnecessary restriction was preventing qpdf from being able to
|
|
|
|
|
encrypt or decrypt files with identifier strings that were not
|
|
|
|
|
exactly 16 bytes long. The specification imposes no such
|
|
|
|
|
restriction.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.1.4: April 18, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Apply the same padding calculation fix from version 2.1.2 to the
|
|
|
|
|
main cross reference stream as well.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-12 00:01:40 +00:00
|
|
|
|
- Since :command:`qpdf --check` only performs limited
|
2021-12-11 23:49:31 +00:00
|
|
|
|
checks, clarify the output to make it clear that there still may
|
|
|
|
|
be errors that qpdf can't check. This should make it less
|
|
|
|
|
surprising to people when another PDF reader is unable to read a
|
|
|
|
|
file that qpdf thinks is okay.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.1.3: March 27, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix bug that could cause a failure when rewriting PDF files that
|
|
|
|
|
contain object streams with unreferenced objects that in turn
|
|
|
|
|
reference indirect scalars.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Don't complain about (invalid) AES streams that aren't a multiple
|
|
|
|
|
of 16 bytes. Instead, pad them before decrypting.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.1.2: January 24, 2010
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Fix bug in padding around first half cross reference stream in
|
|
|
|
|
linearized files. The bug could cause an assertion failure when
|
|
|
|
|
linearizing certain unlucky files.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.1.1: December 14, 2009
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- No changes in functionality; insert missing include in an internal
|
|
|
|
|
library header file to support gcc 4.4, and update test suite to
|
|
|
|
|
ignore broken Adobe Reader installations.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.1: October 30, 2009
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- This is the first version of qpdf to include Windows support. On
|
|
|
|
|
Windows, it is possible to build a DLL. Additionally, a partial
|
|
|
|
|
C-language API has been introduced, which makes it possible to
|
|
|
|
|
call qpdf functions from non-C++ environments. I am very grateful
|
|
|
|
|
to Žarko Gajić (http://zarko-gajic.iz.hr/) for tirelessly testing
|
|
|
|
|
numerous pre-release versions of this DLL and providing many
|
|
|
|
|
excellent suggestions on improving the interface.
|
|
|
|
|
|
|
|
|
|
For programming to the C interface, please see the header file
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`qpdf/qpdf-c.h` and the example
|
|
|
|
|
:file:`examples/pdf-linearize.c`.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Žarko Gajić has written a Delphi wrapper for qpdf, which can be
|
|
|
|
|
downloaded from qpdf's download side. Žarko's Delphi wrapper is
|
|
|
|
|
released with the same licensing terms as qpdf itself and comes
|
|
|
|
|
with this disclaimer: "Delphi wrapper unit
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`qpdf.pas` created by Žarko Gajić
|
2021-12-11 23:49:31 +00:00
|
|
|
|
(http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
|
|
|
|
|
purpose you want. No support is provided. Sample code is
|
|
|
|
|
provided."
|
|
|
|
|
|
|
|
|
|
- Support has been added for AES encryption and crypt filters.
|
|
|
|
|
Although qpdf does not presently support files that use PKI-based
|
|
|
|
|
encryption, with the addition of AES and crypt filters, qpdf is
|
|
|
|
|
now be able to open most encrypted files created with newer
|
|
|
|
|
versions of Acrobat or other PDF creation software. Note that I
|
|
|
|
|
have not been able to get very many files encrypted in this way,
|
|
|
|
|
so it's possible there could still be some cases that qpdf can't
|
|
|
|
|
handle. Please report them if you find them.
|
|
|
|
|
|
|
|
|
|
- Many error messages have been improved to include more information
|
|
|
|
|
in hopes of making qpdf a more useful tool for PDF experts to use
|
|
|
|
|
in manually recovering damaged PDF files.
|
|
|
|
|
|
|
|
|
|
- Attempt to avoid compressing metadata streams if possible. This is
|
|
|
|
|
consistent with other PDF creation applications.
|
|
|
|
|
|
|
|
|
|
- Provide new command-line options for AES encrypt, cleartext
|
|
|
|
|
metadata, and setting the minimum and forced PDF versions of
|
|
|
|
|
output files.
|
|
|
|
|
|
|
|
|
|
- Add additional methods to the ``QPDF`` object for querying the
|
|
|
|
|
document's permissions. Although qpdf does not enforce these
|
|
|
|
|
permissions, it does make them available so that applications that
|
|
|
|
|
use qpdf can enforce permissions.
|
|
|
|
|
|
2021-12-12 00:11:56 +00:00
|
|
|
|
- The :samp:`--check` option to
|
2021-12-12 00:01:40 +00:00
|
|
|
|
:command:`qpdf` has been extended to include some
|
2021-12-11 23:49:31 +00:00
|
|
|
|
additional information.
|
|
|
|
|
|
|
|
|
|
- There have been a handful of non-compatible API changes. For
|
2021-12-12 00:31:19 +00:00
|
|
|
|
details, see :ref:`ref.upgrading-to-2.1`.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0.6: May 3, 2009
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Do not attempt to uncompress streams that have decode parameters
|
|
|
|
|
we don't recognize. Earlier versions of qpdf would have rejected
|
|
|
|
|
files with such streams.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0.5: March 10, 2009
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Improve error handling in the LZW decoder, and fix a small error
|
|
|
|
|
introduced in the previous version with regard to handling full
|
|
|
|
|
tables. The LZW decoder has been more strongly verified in this
|
|
|
|
|
release.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0.4: February 21, 2009
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Include proper support for LZW streams encoded without the "early
|
|
|
|
|
code change" flag. Special thanks to Atom Smasher who reported the
|
|
|
|
|
problem and provided an input file compressed in this way, which I
|
|
|
|
|
did not previously have.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Implement some improvements to file recovery logic.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0.3: February 15, 2009
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Compile cleanly with gcc 4.4.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Handle strings encoded as UTF-16BE properly.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0.2: June 30, 2008
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- Update test suite to work properly with a
|
2021-12-12 00:01:40 +00:00
|
|
|
|
non-:command:`bash`
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`/bin/sh` and with Perl 5.10. No changes
|
2021-12-11 23:49:31 +00:00
|
|
|
|
were made to the actual qpdf source code itself for this release.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0.1: May 6, 2008
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- No changes in functionality or interface. This release includes
|
|
|
|
|
fixes to the source code so that qpdf compiles properly and passes
|
|
|
|
|
its test suite on a broader range of platforms. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`ChangeLog` in the source distribution
|
2021-12-11 23:49:31 +00:00
|
|
|
|
for details.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
2.0: April 29, 2008
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- First public release.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.upgrading-to-2.1:
|
|
|
|
|
|
|
|
|
|
Upgrading from 2.0 to 2.1
|
|
|
|
|
=========================
|
|
|
|
|
|
|
|
|
|
Although, as a general rule, we like to avoid introducing source-level
|
|
|
|
|
incompatibilities in qpdf's interface, there were a few non-compatible
|
|
|
|
|
changes made in this version. A considerable amount of source code that
|
|
|
|
|
uses qpdf will probably compile without any changes, but in some cases,
|
|
|
|
|
you may have to update your code. The changes are enumerated here. There
|
|
|
|
|
are also some new interfaces; for those, please refer to the header
|
|
|
|
|
files.
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- QPDF's exception handling mechanism now uses ``std::logic_error`` for
|
|
|
|
|
internal errors and ``std::runtime_error`` for runtime errors in
|
|
|
|
|
favor of the now removed ``QEXC`` classes used in previous versions.
|
|
|
|
|
The ``QEXC`` exception classes predated the addition of the
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`<stdexcept>` header file to the C++
|
2021-12-11 23:49:31 +00:00
|
|
|
|
standard library. Most of the exceptions thrown by the qpdf library
|
|
|
|
|
itself are still of type ``QPDFExc`` which is now derived from
|
|
|
|
|
``std::runtime_error``. Programs that caught an instance of
|
|
|
|
|
``std::exception`` and displayed it by calling the ``what()`` method
|
|
|
|
|
will not need to be changed.
|
|
|
|
|
|
|
|
|
|
- The ``QPDFExc`` class now internally represents various fields of the
|
|
|
|
|
error condition and provides interfaces for querying them. Among the
|
|
|
|
|
fields is a numeric error code that can help applications act
|
|
|
|
|
differently on (a small number of) different error conditions. See
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`QPDFExc.hh` for details.
|
2021-12-11 23:49:31 +00:00
|
|
|
|
|
|
|
|
|
- Warnings can be retrieved from qpdf as instances of ``QPDFExc``
|
|
|
|
|
instead of strings.
|
|
|
|
|
|
|
|
|
|
- The nested ``QPDF::EncryptionData`` class's constructor takes an
|
|
|
|
|
additional argument. This class is primarily intended to be used by
|
|
|
|
|
``QPDFWriter``. There's not really anything useful an end-user
|
|
|
|
|
application could do with it. It probably shouldn't really be part of
|
|
|
|
|
the public interface to begin with. Likewise, some of the methods for
|
|
|
|
|
computing internal encryption dictionary parameters have changed to
|
|
|
|
|
support ``/R=4`` encryption.
|
|
|
|
|
|
|
|
|
|
- The method ``QPDF::getUserPassword`` has been removed since it didn't
|
|
|
|
|
do what people would think it did. There are now two new methods:
|
|
|
|
|
``QPDF::getPaddedUserPassword`` and ``QPDF::getTrimmedUserPassword``.
|
|
|
|
|
The first one does what the old ``QPDF::getUserPassword`` method used
|
|
|
|
|
to do, which is to return the password with possible binary padding
|
|
|
|
|
as specified by the PDF specification. The second one returns a
|
|
|
|
|
human-readable password string.
|
|
|
|
|
|
|
|
|
|
- The enumerated types that used to be nested in ``QPDFWriter`` have
|
|
|
|
|
moved to top-level enumerated types and are now defined in the file
|
2021-12-12 00:02:42 +00:00
|
|
|
|
:file:`qpdf/Constants.h`. This enables them to be
|
2021-12-11 23:49:31 +00:00
|
|
|
|
shared by both the C and C++ interfaces.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.upgrading-to-3.0:
|
|
|
|
|
|
|
|
|
|
Upgrading to 3.0
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
For the most part, the API for qpdf version 3.0 is backward compatible
|
|
|
|
|
with versions 2.1 and later. There are two exceptions:
|
|
|
|
|
|
2021-12-11 23:49:31 +00:00
|
|
|
|
- The method ``QPDFObjectHandle::replaceStreamData`` that uses a
|
|
|
|
|
``StreamDataProvider`` to provide the stream data no longer takes a
|
|
|
|
|
``length`` parameter. While it would have been easy enough to keep
|
|
|
|
|
the parameter for backward compatibility, in this case, the parameter
|
|
|
|
|
was removed since this provides the user an opportunity to simplify
|
|
|
|
|
the calling code. This method was introduced in version 2.2. At the
|
|
|
|
|
time, the ``length`` parameter was required in order to ensure that
|
|
|
|
|
calls to the stream data provider returned the same length for a
|
|
|
|
|
specific stream every time they were invoked. In particular, the
|
|
|
|
|
linearization code depends on this. Instead, qpdf 3.0 and newer check
|
|
|
|
|
for that constraint explicitly. The first time the stream data
|
|
|
|
|
provider is called for a specific stream, the actual length is saved,
|
|
|
|
|
and subsequent calls are required to return the same number of bytes.
|
|
|
|
|
This means the calling code no longer has to compute the length in
|
|
|
|
|
advance, which can be a significant simplification. If your code
|
|
|
|
|
fails to compile because of the extra argument and you don't want to
|
|
|
|
|
make other changes to your code, just omit the argument.
|
|
|
|
|
|
|
|
|
|
- Many methods take ``long long`` instead of other integer types. Most
|
|
|
|
|
if not all existing code should compile fine with this change since
|
|
|
|
|
such parameters had always previously been smaller types. This change
|
|
|
|
|
was required to support files larger than two gigabytes in size.
|
2021-12-11 21:53:08 +00:00
|
|
|
|
|
|
|
|
|
.. _ref.upgrading-to-4.0:
|
|
|
|
|
|
|
|
|
|
Upgrading to 4.0
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
While version 4.0 includes a few non-compatible API changes, it is very
|
|
|
|
|
unlikely that anyone's code would have used any of those parts of the
|
|
|
|
|
API since they generally required information that would only be
|
|
|
|
|
available inside the library. In the unlikely event that you should run
|
|
|
|
|
into trouble, please see the ChangeLog. See also
|
2021-12-12 00:31:19 +00:00
|
|
|
|
:ref:`ref.release-notes` for a complete list of the
|
2021-12-11 21:53:08 +00:00
|
|
|
|
non-compatible API changes made in this version.
|