mirror of
https://github.com/qpdf/qpdf.git
synced 2024-09-28 21:19:06 +00:00
Clean up the Design and Library Notes chapter of the manual
This commit is contained in:
parent
a6c4b293b1
commit
910a373a79
@ -8,50 +8,53 @@ Design and Library Notes
|
|||||||
Introduction
|
Introduction
|
||||||
------------
|
------------
|
||||||
|
|
||||||
This section was written prior to the implementation of the qpdf package
|
This section was written prior to the implementation of the qpdf
|
||||||
and was subsequently modified to reflect the implementation. In some
|
library and was subsequently modified to reflect the implementation.
|
||||||
cases, for purposes of explanation, it may differ slightly from the
|
In some cases, for purposes of explanation, it may differ slightly
|
||||||
actual implementation. As always, the source code and test suite are
|
from the actual implementation. As always, the source code and test
|
||||||
authoritative. Even if there are some errors, this document should serve
|
suite are authoritative. Even if there are some errors, this document
|
||||||
as a road map to understanding how this code works.
|
should serve as a road map to understanding how this code works.
|
||||||
|
|
||||||
In general, one should adhere strictly to a specification when writing
|
In general, one should adhere strictly to a specification when writing
|
||||||
but be liberal in reading. This way, the product of our software will be
|
but be liberal in reading. This way, the product of our software will
|
||||||
accepted by the widest range of other programs, and we will accept the
|
be accepted by the widest range of other programs, and we will accept
|
||||||
widest range of input files. This library attempts to conform to that
|
the widest range of input files. This library attempts to conform to
|
||||||
philosophy whenever possible but also aims to provide strict checking
|
that philosophy whenever possible but also aims to provide strict
|
||||||
for people who want to validate PDF files. If you don't want to see
|
checking for people who want to validate PDF files. If you don't want
|
||||||
warnings and are trying to write something that is tolerant, you can
|
to see warnings and are trying to write something that is tolerant,
|
||||||
call ``setSuppressWarnings(true)``. If you want to fail on the first
|
you can call ``setSuppressWarnings(true)``. If you want to fail on the
|
||||||
error, you can call ``setAttemptRecovery(false)``. The default behavior
|
first error, you can call ``setAttemptRecovery(false)``. The default
|
||||||
is to generating warnings for recoverable problems. Note that recovery
|
behavior is to generating warnings for recoverable problems. Note that
|
||||||
will not always produce the desired results even if it is able to get
|
recovery will not always produce the desired results even if it is
|
||||||
through the file. Unlike most other PDF files that produce generic
|
able to get through the file. Unlike most other PDF files that produce
|
||||||
warnings such as "This file is damaged,", qpdf generally issues a
|
generic warnings such as "This file is damaged," qpdf generally issues
|
||||||
detailed error message that would be most useful to a PDF developer.
|
a detailed error message that would be most useful to a PDF developer.
|
||||||
This is by design as there seems to be a shortage of PDF validation
|
This is by design as there seems to be a shortage of PDF validation
|
||||||
tools out there. This was, in fact, one of the major motivations behind
|
tools out there. This was, in fact, one of the major motivations
|
||||||
the initial creation of qpdf.
|
behind the initial creation of qpdf. That said, qpdf is not a strict
|
||||||
|
PDF checker. There are many ways in which a PDF file can be out of
|
||||||
|
conformance to the spec that qpdf doesn't notice or report.
|
||||||
|
|
||||||
.. _design-goals:
|
.. _design-goals:
|
||||||
|
|
||||||
Design Goals
|
Design Goals
|
||||||
------------
|
------------
|
||||||
|
|
||||||
The QPDF package includes support for reading and rewriting PDF files.
|
The qpdf library includes support for reading and rewriting PDF files.
|
||||||
It aims to hide from the user details involving object locations,
|
It aims to hide from the user details involving object locations,
|
||||||
modified (appended) PDF files, the directness/indirectness of objects,
|
modified (appended) PDF files, use of object streams, and stream
|
||||||
and stream filters including encryption. It does not aim to hide
|
filters including encryption. It does not aim to hide knowledge of the
|
||||||
knowledge of the object hierarchy or content stream contents. Put
|
object hierarchy or content stream contents. Put another way, a user
|
||||||
another way, a user of the qpdf library is expected to have knowledge
|
of the qpdf library is expected to have knowledge about how PDF files
|
||||||
about how PDF files work, but is not expected to have to keep track of
|
work, but is not expected to have to keep track of bookkeeping details
|
||||||
bookkeeping details such as file positions.
|
such as file positions.
|
||||||
|
|
||||||
A user of the library never has to care whether an object is direct or
|
When accessing objects, a user of the library never has to care
|
||||||
indirect, though it is possible to determine whether an object is direct
|
whether an object is direct or indirect as all access to objects deals
|
||||||
or not if this information is needed. All access to objects deals with
|
with this transparently. All memory management details are also
|
||||||
this transparently. All memory management details are also handled by
|
handled by the library. When modifying objects, it is possible to
|
||||||
the library.
|
determine whether an object is indirect and to make copies of the
|
||||||
|
object if needed.
|
||||||
|
|
||||||
Memory is managed mostly with ``std::shared_ptr`` object to minimize
|
Memory is managed mostly with ``std::shared_ptr`` object to minimize
|
||||||
explicit memory handling. This library also makes use of a technique
|
explicit memory handling. This library also makes use of a technique
|
||||||
@ -85,29 +88,32 @@ objects to indirect objects and vice versa.
|
|||||||
Instances of ``QPDFObjectHandle`` can be directly created and modified
|
Instances of ``QPDFObjectHandle`` can be directly created and modified
|
||||||
using static factory methods in the ``QPDFObjectHandle`` class. There
|
using static factory methods in the ``QPDFObjectHandle`` class. There
|
||||||
are factory methods for each type of object as well as a convenience
|
are factory methods for each type of object as well as a convenience
|
||||||
method ``QPDFObjectHandle::parse`` that creates an object from a string
|
method ``QPDFObjectHandle::parse`` that creates an object from a
|
||||||
representation of the object. Existing instances of ``QPDFObjectHandle``
|
string representation of the object. The ``_qpdf`` user-defined string
|
||||||
can also be modified in several ways. See comments in
|
literal is also available, making it possible to create instances of
|
||||||
:file:`QPDFObjectHandle.hh` for details.
|
``QPDFObjectHandle`` with ``"(pdf-syntax)"_qpdf``. Existing instances
|
||||||
|
of ``QPDFObjectHandle`` can also be modified in several ways. See
|
||||||
|
comments in :file:`QPDFObjectHandle.hh` for details.
|
||||||
|
|
||||||
An instance of ``QPDF`` is constructed by using the class's default
|
An instance of ``QPDF`` is constructed by using the class's default
|
||||||
constructor. If desired, the ``QPDF`` object may be configured with
|
constructor or with ``QPDF::create()``. If desired, the ``QPDF``
|
||||||
various methods that change its default behavior. Then the
|
object may be configured with various methods that change its default
|
||||||
``QPDF::processFile()`` method is passed the name of a PDF file, which
|
behavior. Then the ``QPDF::processFile`` method is passed the name of
|
||||||
permanently associates the file with that QPDF object. A password may
|
a PDF file, which permanently associates the file with that ``QPDF``
|
||||||
also be given for access to password-protected files. QPDF does not
|
object. A password may also be given for access to password-protected
|
||||||
enforce encryption parameters and will treat user and owner passwords
|
files. ``QPDF`` does not enforce encryption parameters and will treat
|
||||||
equivalently. Either password may be used to access an encrypted file.
|
user and owner passwords equivalently. Either password may be used to
|
||||||
``QPDF`` will allow recovery of a user password given an owner password.
|
access an encrypted file. ``QPDF`` will allow recovery of a user
|
||||||
The input PDF file must be seekable. (Output files written by
|
password given an owner password. The input PDF file must be seekable.
|
||||||
``QPDFWriter`` need not be seekable, even when creating linearized
|
Output files written by ``QPDFWriter`` need not be seekable, even when
|
||||||
files.) During construction, ``QPDF`` validates the PDF file's header,
|
creating linearized files. During construction, ``QPDF`` validates the
|
||||||
and then reads the cross reference tables and trailer dictionaries. The
|
PDF file's header, and then reads the cross reference tables and
|
||||||
``QPDF`` class keeps only the first trailer dictionary though it does
|
trailer dictionaries. The ``QPDF`` class keeps only the first trailer
|
||||||
read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
|
dictionary though it does read all of them so it can check the
|
||||||
may request the root object and the trailer dictionary specifically. The
|
``/Prev`` key. ``QPDF`` class users may request the root object and
|
||||||
cross reference table is kept private. Objects may then be requested by
|
the trailer dictionary specifically. The cross reference table is kept
|
||||||
number or by walking the object tree.
|
private. Objects may then be requested by number or by walking the
|
||||||
|
object tree.
|
||||||
|
|
||||||
When a PDF file has a cross-reference stream instead of a
|
When a PDF file has a cross-reference stream instead of a
|
||||||
cross-reference table and trailer, requesting the document's trailer
|
cross-reference table and trailer, requesting the document's trailer
|
||||||
@ -240,13 +246,14 @@ the ``QPDFObjectHandle`` type to hold onto objects and to abstract
|
|||||||
away in most cases whether the object is direct or indirect.
|
away in most cases whether the object is direct or indirect.
|
||||||
|
|
||||||
Internally, ``QPDFObjectHandle`` holds onto a shared pointer to the
|
Internally, ``QPDFObjectHandle`` holds onto a shared pointer to the
|
||||||
underlying object value. When a direct object is created, the
|
underlying object value. When a direct object is created
|
||||||
``QPDFObjectHandle`` that holds it is not associated with a ``QPDF``
|
programmatically by client code (rather than being read from the
|
||||||
object. When an indirect object reference is created, it starts off in
|
file), the ``QPDFObjectHandle`` that holds it is not associated with a
|
||||||
an *unresolved* state and must be associated with a ``QPDF`` object,
|
``QPDF`` object. When an indirect object reference is created, it
|
||||||
which is considered its *owner*. To access the actual value of the
|
starts off in an *unresolved* state and must be associated with a
|
||||||
object, the object must be *resolved*. This happens automatically when
|
``QPDF`` object, which is considered its *owner*. To access the actual
|
||||||
the the object is accessed in any way.
|
value of the object, the object must be *resolved*. This happens
|
||||||
|
automatically when the the object is accessed in any way.
|
||||||
|
|
||||||
To resolve an object, qpdf checks its object cache. If not found in
|
To resolve an object, qpdf checks its object cache. If not found in
|
||||||
the cache, it attempts to read the object from the input source
|
the cache, it attempts to read the object from the input source
|
||||||
@ -286,18 +293,20 @@ file.
|
|||||||
it is looking before the last ``%%EOF``. After getting to ``trailer``
|
it is looking before the last ``%%EOF``. After getting to ``trailer``
|
||||||
keyword, it invokes the parser.
|
keyword, it invokes the parser.
|
||||||
|
|
||||||
- The parser sees ``<<``, so it calls itself recursively in
|
- The parser sees ``<<``, so it changes state and starts accumulating
|
||||||
dictionary creation mode.
|
the keys and values of the dictionary.
|
||||||
|
|
||||||
- In dictionary creation mode, the parser keeps accumulating objects
|
- In dictionary creation mode, the parser keeps accumulating objects
|
||||||
until it encounters ``>>``. Each object that is read is pushed onto
|
until it encounters ``>>``. Each object that is read is pushed onto
|
||||||
a stack. If ``R`` is read, the last two objects on the stack are
|
a stack. If ``R`` is read, the last two objects on the stack are
|
||||||
inspected. If they are integers, they are popped off the stack and
|
inspected. If they are integers, they are popped off the stack and
|
||||||
their values are used to construct an indirect object handle which
|
their values are used to obtain an indirect object handle from the
|
||||||
is then pushed onto the stack. When ``>>`` is finally read, the
|
``QPDF`` class. The ``QPDF`` class consults its cache, and if
|
||||||
stack is converted into a ``QPDF_Dictionary`` (not directly
|
necessary, inserts a new unresolved object, and returns an object
|
||||||
accessible through the API) which is placed in a
|
handle pointing to the cache entry, which is then pushed onto the
|
||||||
``QPDFObjectHandle`` and returned.
|
stack. When ``>>`` is finally read, the stack is converted into a
|
||||||
|
``QPDF_Dictionary`` (not directly accessible through the API) which
|
||||||
|
is placed in a ``QPDFObjectHandle`` and returned.
|
||||||
|
|
||||||
- The resulting dictionary is saved as the trailer dictionary.
|
- The resulting dictionary is saved as the trailer dictionary.
|
||||||
|
|
||||||
@ -309,23 +318,21 @@ file.
|
|||||||
- If there is an encryption dictionary, the document's encryption
|
- If there is an encryption dictionary, the document's encryption
|
||||||
parameters are initialized.
|
parameters are initialized.
|
||||||
|
|
||||||
- The client requests root object. The ``QPDF`` class gets the value of
|
- The client requests the root object by getting the value of the
|
||||||
root key from trailer dictionary and returns it. It is an unresolved
|
``/Root`` key from trailer dictionary and returns it. It is an
|
||||||
indirect ``QPDFObjectHandle``.
|
unresolved indirect ``QPDFObjectHandle``.
|
||||||
|
|
||||||
- The client requests the ``/Pages`` key from root
|
- The client requests the ``/Pages`` key from root
|
||||||
``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
|
``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is an
|
||||||
indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
|
unresolved indirect object, so it asks ``QPDF`` to resolve it.
|
||||||
object cache for an object with the root dictionary's object ID and
|
``QPDF`` checks the cross reference table, gets the offset, and
|
||||||
generation number. Upon not seeing it, it checks the cross reference
|
reads the object present at that offset. The object cache entry's
|
||||||
table, gets the offset, and reads the object present at that offset.
|
``unresolved`` value is replaced by the actual value, which causes
|
||||||
It stores the result in the object cache. The cache entry's value is
|
any previously unresolved ``QPDFObjectHandle`` objects that pointed
|
||||||
replaced by the actual value, which causes any previously unresolved
|
there to now have a shared copy of the actual object. Modifications
|
||||||
``QPDFObjectHandle`` objects that that pointed there to now have a
|
through any such ``QPDFObjectHandle`` will be reflected in all of
|
||||||
shared copy of the actual object. Modifications through any such
|
them. As the client continues to request objects, the same process
|
||||||
``QPDFObjectHandle`` will be reflected in all of them. As the client
|
is followed for each new requested object.
|
||||||
continues to request objects, the same process is followed for each
|
|
||||||
new requested object.
|
|
||||||
|
|
||||||
.. _object_internals:
|
.. _object_internals:
|
||||||
|
|
||||||
@ -339,11 +346,12 @@ Object Internals
|
|||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
The ``QPDF`` object has an object cache which contains a shared
|
The ``QPDF`` object has an object cache which contains a shared
|
||||||
pointer to each object that was read from the file. Changes can be
|
pointer to each object that was read from the file or added as an
|
||||||
made to any of those objects through ``QPDFObjectHandle`` methods. Any
|
indirect object. Changes can be made to any of those objects through
|
||||||
such changes are visible to all ``QPDFObjectHandle`` instances that
|
``QPDFObjectHandle`` methods. Any such changes are visible to all
|
||||||
point to the same object. When a ``QPDF`` object is written by
|
``QPDFObjectHandle`` instances that point to the same object. When a
|
||||||
``QPDFWriter`` or serialized to JSON, any changes are reflected.
|
``QPDF`` object is written by ``QPDFWriter`` or serialized to JSON,
|
||||||
|
any changes are reflected.
|
||||||
|
|
||||||
Objects in qpdf 11 and Newer
|
Objects in qpdf 11 and Newer
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -356,30 +364,32 @@ reference to that object has a copy of that shared pointer. Each
|
|||||||
is an implementation for each of the basic object types (array,
|
is an implementation for each of the basic object types (array,
|
||||||
dictionary, null, boolean, string, number, etc.) as well as a few
|
dictionary, null, boolean, string, number, etc.) as well as a few
|
||||||
special ones including ``uninitialized``, ``unresolved``,
|
special ones including ``uninitialized``, ``unresolved``,
|
||||||
``reserved``, and ``destroyed``. When an object is first referenced,
|
``reserved``, and ``destroyed``. When an object is first created,
|
||||||
its underlying ``QPDFValue`` has type ``unresolved``. When the object
|
its underlying ``QPDFValue`` has type ``unresolved``. When the object
|
||||||
is first resolved, the ``QPDFObject`` in the cache has its internal
|
is first accessed, the ``QPDFObject`` in the cache has its internal
|
||||||
``QPDFValue`` replaced with the object as read from the file. Since it
|
``QPDFValue`` replaced with the object as read from the file. Since it
|
||||||
is the ``QPDFObject`` object that is shared by all referencing
|
is the ``QPDFObject`` object that is shared by all referencing
|
||||||
``QPDFObjectHandle`` objects as well as by the owning ``QPDF`` object,
|
``QPDFObjectHandle`` objects as well as by the owning ``QPDF`` object,
|
||||||
this ensures that any future changes to the object, including
|
this ensures that any future changes to the object, including
|
||||||
replacing the object with a completely different one, will be
|
replacing the object with a completely different one by calling
|
||||||
|
``QPDF::replaceObject`` or ``QPDF::swapObjects``, will be
|
||||||
reflected across all ``QPDFObjectHandle`` objects that reference it.
|
reflected across all ``QPDFObjectHandle`` objects that reference it.
|
||||||
|
|
||||||
A ``QPDFValue`` that originated from a PDF input source maintains a
|
A ``QPDFValue`` that originated from a PDF input source maintains a
|
||||||
pointer to the ``QPDF`` object that read it (its *owner*). When that
|
pointer to the ``QPDF`` object that read it (its *owner*). When that
|
||||||
``QPDF`` object is destroyed, it disconnects all reachable from it by
|
``QPDF`` object is destroyed, it disconnects all objects reachable
|
||||||
clearing their owner. For indirect objects (all objects in the object
|
from it by clearing their owner. For indirect objects (all objects in
|
||||||
cache), it also replaces the object's value with an object of type
|
the object cache), it also replaces the object's value with an object
|
||||||
``destroyed``. This means that, if there are still any referencing
|
of type ``destroyed``. This means that, if there are still any
|
||||||
``QPDFObjectHandle`` objects floating around, requesting their owning
|
referencing ``QPDFObjectHandle`` objects floating around, requesting
|
||||||
``QPDF`` will return a null pointer rather than a pointer to a
|
their owning ``QPDF`` will return a null pointer rather than a pointer
|
||||||
``QPDF`` object that is either invalid or points to something else,
|
to a ``QPDF`` object that is either invalid or points to something
|
||||||
and any attempt to access an indirect object that is associated with a
|
else, and any attempt to access an indirect object that is associated
|
||||||
destroyed ``QPDF`` object will throw an exception. This operation also
|
with a destroyed ``QPDF`` object will throw an exception. This
|
||||||
has the effect of breaking any circular references (which are common
|
operation also has the effect of breaking any circular references
|
||||||
and, in some cases, required by the PDF specification), thus
|
(which are common and, in some cases, required by the PDF
|
||||||
preventing memory leaks when ``QPDF`` objects are destroyed.
|
specification), thus preventing memory leaks when ``QPDF`` objects are
|
||||||
|
destroyed.
|
||||||
|
|
||||||
Objects prior to qpdf 11
|
Objects prior to qpdf 11
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -478,22 +488,6 @@ and 64-bit platforms, and the test suite is very thorough, so it is
|
|||||||
hard to make any of the potential errors here without being caught in
|
hard to make any of the potential errors here without being caught in
|
||||||
build or test.
|
build or test.
|
||||||
|
|
||||||
Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
|
|
||||||
pipeline interface has a ``write`` call that uses ``unsigned char*``
|
|
||||||
without a ``const`` qualifier. The main reason for this is
|
|
||||||
to support pipelines that make calls to third-party libraries, such as
|
|
||||||
zlib, that don't include ``const`` in their interfaces. Unfortunately,
|
|
||||||
there are many places in the code where it is desirable to have
|
|
||||||
``const char*`` with pipelines. None of the pipeline implementations
|
|
||||||
in qpdf
|
|
||||||
currently modify the data passed to write, and doing so would be counter
|
|
||||||
to the intent of ``Pipeline``, but there is nothing in the code to
|
|
||||||
prevent this from being done. There are places in the code where
|
|
||||||
``const_cast`` is used to remove the const-ness of pointers going into
|
|
||||||
``Pipeline``\ s. This could theoretically be unsafe, but there is
|
|
||||||
adequate testing to assert that it is safe and will remain safe in
|
|
||||||
qpdf's code.
|
|
||||||
|
|
||||||
.. _encryption:
|
.. _encryption:
|
||||||
|
|
||||||
Encryption
|
Encryption
|
||||||
@ -516,14 +510,14 @@ given an encryption key. This is used by ``QPDFWriter`` when it rewrites
|
|||||||
encrypted files.
|
encrypted files.
|
||||||
|
|
||||||
When copying encrypted files, unless otherwise directed, qpdf will
|
When copying encrypted files, unless otherwise directed, qpdf will
|
||||||
preserve any encryption in force in the original file. qpdf can do this
|
preserve any encryption in effect in the original file. qpdf can do
|
||||||
with either the user or the owner password. There is no difference in
|
this with either the user or the owner password. There is no
|
||||||
capability based on which password is used. When 40 or 128 bit
|
difference in capability based on which password is used. When 40 or
|
||||||
encryption keys are used, the user password can be recovered with the
|
128 bit encryption keys are used, the user password can be recovered
|
||||||
owner password. With 256 keys, the user and owner passwords are used
|
with the owner password. With 256 keys, the user and owner passwords
|
||||||
independently to encrypt the actual encryption key, so while either can
|
are used independently to encrypt the actual encryption key, so while
|
||||||
be used, the owner password can no longer be used to recover the user
|
either can be used, the owner password can no longer be used to
|
||||||
password.
|
recover the user password.
|
||||||
|
|
||||||
Starting with version 4.0.0, qpdf can read files that are not encrypted
|
Starting with version 4.0.0, qpdf can read files that are not encrypted
|
||||||
but that contain encrypted attachments, but it cannot write such files.
|
but that contain encrypted attachments, but it cannot write such files.
|
||||||
@ -538,33 +532,37 @@ format. The only exception to this is that clear-text metadata will be
|
|||||||
preserved as clear-text if it is that way in the original file.
|
preserved as clear-text if it is that way in the original file.
|
||||||
|
|
||||||
One point of confusion some people have about encrypted PDF files is
|
One point of confusion some people have about encrypted PDF files is
|
||||||
that encryption is not the same as password protection. Password
|
that encryption is not the same as password protection.
|
||||||
protected files are always encrypted, but it is also possible to create
|
Password-protected files are always encrypted, but it is also possible
|
||||||
encrypted files that do not have passwords. Internally, such files use
|
to create encrypted files that do not have passwords. Internally, such
|
||||||
the empty string as a password, and most readers try the empty string
|
files use the empty string as a password, and most readers try the
|
||||||
first to see if it works and prompt for a password only if the empty
|
empty string first to see if it works and prompt for a password only
|
||||||
string doesn't work. Normally such files have an empty user password and
|
if the empty string doesn't work. Normally such files have an empty
|
||||||
a non-empty owner password. In that way, if the file is opened by an
|
user password and a non-empty owner password. In that way, if the file
|
||||||
ordinary reader without specification of password, the restrictions
|
is opened by an ordinary reader without specification of password, the
|
||||||
specified in the encryption dictionary can be enforced. Most users
|
restrictions specified in the encryption dictionary can be enforced.
|
||||||
wouldn't even realize such a file was encrypted. Since qpdf always
|
Most users wouldn't even realize such a file was encrypted. Since qpdf
|
||||||
ignores the restrictions (except for the purpose of reporting what they
|
always ignores the restrictions (except for the purpose of reporting
|
||||||
are), qpdf doesn't care which password you use. QPDF will allow you to
|
what they are), qpdf doesn't care which password you use. QPDF will
|
||||||
create PDF files with non-empty user passwords and empty owner
|
allow you to create PDF files with non-empty user passwords and empty
|
||||||
passwords. Some readers will require a password when you open these
|
owner passwords. Some readers will require a password when you open
|
||||||
files, and others will open the files without a password and not enforce
|
these files, and others will open the files without a password and not
|
||||||
restrictions. Having a non-empty user password and an empty owner
|
enforce restrictions. Having a non-empty user password and an empty
|
||||||
password doesn't really make sense because it would mean that opening
|
owner password doesn't really make sense because it would mean that
|
||||||
the file with the user password would be more restrictive than not
|
opening the file with the user password would be more restrictive than
|
||||||
supplying a password at all. QPDF also allows you to create PDF files
|
not supplying a password at all. QPDF also allows you to create PDF
|
||||||
with the same password as both the user and owner password. Some readers
|
files with the same password as both the user and owner password. Some
|
||||||
will not ever allow such files to be accessed without restrictions
|
readers will not ever allow such files to be accessed without
|
||||||
because they never try the password as the owner password if it works as
|
restrictions because they never try the password as the owner password
|
||||||
the user password. Nonetheless, one of the powerful aspects of qpdf is
|
if it works as the user password. Nonetheless, one of the powerful
|
||||||
that it allows you to finely specify the way encrypted files are
|
aspects of qpdf is that it allows you to finely specify the way
|
||||||
created, even if the results are not useful to some readers. One use
|
encrypted files are created, even if the results are not useful to
|
||||||
case for this would be for testing a PDF reader to ensure that it
|
some readers. One use case for this would be for testing a PDF reader
|
||||||
handles odd configurations of input files.
|
to ensure that it handles odd configurations of input files. If you
|
||||||
|
attempt to create an encrypted file that is not secure, qpdf will warn
|
||||||
|
you and require you to explicitly state your intention to create an
|
||||||
|
insecure file. So while qpdf can create insecure files, it won't let
|
||||||
|
you do it by mistake.
|
||||||
|
|
||||||
.. _random-numbers:
|
.. _random-numbers:
|
||||||
|
|
||||||
@ -630,23 +628,21 @@ Copying Objects From Other PDF Files
|
|||||||
|
|
||||||
Version 3.0 of qpdf introduced the ability to copy objects into a
|
Version 3.0 of qpdf introduced the ability to copy objects into a
|
||||||
``QPDF`` object from a different ``QPDF`` object, which we refer to as
|
``QPDF`` object from a different ``QPDF`` object, which we refer to as
|
||||||
*foreign objects*. This allows arbitrary
|
*foreign objects*. This allows arbitrary merging of PDF files. The
|
||||||
merging of PDF files. The "from" ``QPDF`` object must remain valid after
|
:command:`qpdf` command-line tool provides limited support for basic
|
||||||
the copy as discussed in the note below. The
|
page selection, including merging in pages from other files, but the
|
||||||
:command:`qpdf` command-line tool provides limited
|
library's API makes it possible to implement arbitrarily complex
|
||||||
support for basic page selection, including merging in pages from other
|
merging operations. The main method for copying foreign objects is
|
||||||
files, but the library's API makes it possible to implement arbitrarily
|
``QPDF::copyForeignObject``. This takes an indirect object from
|
||||||
complex merging operations. The main method for copying foreign objects
|
|
||||||
is ``QPDF::copyForeignObject``. This takes an indirect object from
|
|
||||||
another ``QPDF`` and copies it recursively into this object while
|
another ``QPDF`` and copies it recursively into this object while
|
||||||
preserving all object structure, including circular references. This
|
preserving all object structure, including circular references. This
|
||||||
means you can add a direct object that you create from scratch to a
|
means you can add a direct object that you create from scratch to a
|
||||||
``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
|
``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
|
||||||
indirect object from another file with ``QPDF::copyForeignObject``. The
|
indirect object from another file with ``QPDF::copyForeignObject``.
|
||||||
fact that ``QPDF::makeIndirectObject`` does not automatically detect a
|
The fact that ``QPDF::makeIndirectObject`` does not automatically
|
||||||
foreign object and copy it is an explicit design decision. Copying a
|
detect a foreign object and copy it is an explicit design decision.
|
||||||
foreign object seems like a sufficiently significant thing to do that it
|
Copying a foreign object seems like a sufficiently significant thing
|
||||||
should be done explicitly.
|
to do that it should be done explicitly.
|
||||||
|
|
||||||
The other way to copy foreign objects is by passing a page from one
|
The other way to copy foreign objects is by passing a page from one
|
||||||
``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
|
``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
|
||||||
@ -654,26 +650,30 @@ The other way to copy foreign objects is by passing a page from one
|
|||||||
between indirect objects in the current file, foreign objects, and
|
between indirect objects in the current file, foreign objects, and
|
||||||
direct objects.
|
direct objects.
|
||||||
|
|
||||||
Please note: when you copy objects from one ``QPDF`` to another, the
|
When you copy objects from one ``QPDF`` to another, the input source
|
||||||
source ``QPDF`` object must remain valid until you have finished with
|
of the original file remain valid until you have finished with the
|
||||||
the destination object. This is because the original object is still
|
destination object. This is because the input source is still used
|
||||||
used to retrieve any referenced stream data from the copied object.
|
to retrieve any referenced stream data from the copied object. If
|
||||||
|
needed, there are methods to force the data to be copied. See comments
|
||||||
|
near the declaration of ``copyForeignObject`` in
|
||||||
|
:file:`include/qpdf/QPDF.hh` for details.
|
||||||
|
|
||||||
.. _rewriting:
|
.. _rewriting:
|
||||||
|
|
||||||
Writing PDF Files
|
Writing PDF Files
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
The qpdf library supports file writing of ``QPDF`` objects to PDF files
|
The qpdf library supports file writing of ``QPDF`` objects to PDF
|
||||||
through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
|
files through the ``QPDFWriter`` class. The ``QPDFWriter`` class has
|
||||||
writing modes: one for non-linearized files, and one for linearized
|
two writing modes: one for non-linearized files, and one for
|
||||||
files. See :ref:`linearization` for a description of
|
linearized files. See :ref:`linearization` for a description of
|
||||||
linearization is implemented. This section describes how we write
|
linearization is implemented. This section describes how we write
|
||||||
non-linearized files including the creation of QDF files (see :ref:`qdf`.
|
non-linearized files including the creation of QDF files (see
|
||||||
|
:ref:`qdf`).
|
||||||
|
|
||||||
This outline was written prior to implementation and is not exactly
|
This outline was written prior to implementation and is not exactly
|
||||||
accurate, but it provides a correct "notional" idea of how writing
|
accurate, but it portrays the essence of how writing works. Look at
|
||||||
works. Look at the code in ``QPDFWriter`` for exact details.
|
the code in ``QPDFWriter`` for exact details.
|
||||||
|
|
||||||
- Initialize state:
|
- Initialize state:
|
||||||
|
|
||||||
@ -685,7 +685,7 @@ works. Look at the code in ``QPDFWriter`` for exact details.
|
|||||||
|
|
||||||
- xref table: new id -> offset = empty
|
- xref table: new id -> offset = empty
|
||||||
|
|
||||||
- Create a QPDF object from a file.
|
- Create a ``QPDF`` object from a file.
|
||||||
|
|
||||||
- Write header for new PDF file.
|
- Write header for new PDF file.
|
||||||
|
|
||||||
@ -750,7 +750,7 @@ Filtered Streams
|
|||||||
----------------
|
----------------
|
||||||
|
|
||||||
Support for streams is implemented through the ``Pipeline`` interface
|
Support for streams is implemented through the ``Pipeline`` interface
|
||||||
which was designed for this package.
|
which was designed for this library.
|
||||||
|
|
||||||
When reading streams, create a series of ``Pipeline`` objects. The
|
When reading streams, create a series of ``Pipeline`` objects. The
|
||||||
``Pipeline`` abstract base requires implementation ``write()`` and
|
``Pipeline`` abstract base requires implementation ``write()`` and
|
||||||
@ -802,32 +802,20 @@ file might be, the presence of type warnings can save lots of developer
|
|||||||
time. They have also proven useful in exposing issues in qpdf itself
|
time. They have also proven useful in exposing issues in qpdf itself
|
||||||
that would have otherwise gone undetected.
|
that would have otherwise gone undetected.
|
||||||
|
|
||||||
*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
|
*Can there be a type-safe* ``QPDFObjectHandle``? At the time of the
|
||||||
``QPDFObjectHandle`` could be more strongly typed so that you'd have to
|
release of qpdf 11, there is active work being done toward the goal of
|
||||||
have check that something was of a particular type before calling
|
creating a way to work with PDF objects that is more type-safe and
|
||||||
type-specific accessor methods. However, implementing this at this stage
|
closer in feel to the current C++ standard library. It is hoped that
|
||||||
of the library's history would be quite difficult, and it would make a
|
this work will make it easier to write bindings to qpdf in modern
|
||||||
the common pattern of drilling into an object no longer work. While it
|
languages like `Rust <https://www.rust-lang.org/>`__. If this happens,
|
||||||
would be possible to have a parallel interface, it would create a lot of
|
it will likely be by providing an alternative to ``QPDFObjectHandle``
|
||||||
extra code. If qpdf were written in a language like rust, an interface
|
that provides a separate path to the underlying object. Details are
|
||||||
like this would make a lot of sense, but, for a variety of reasons, the
|
still being worked out. Fundamentally, PDF objects are not strongly
|
||||||
qpdf API is consistent with other APIs of its time, relying on exception
|
typed. They are similar to ``JSON`` objects or to objects in dynamic
|
||||||
handling to catch errors. The underlying PDF objects are inherently not
|
languages like `Python <https://python.org/>`__: there are certain
|
||||||
type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
|
things you can only do to objects of a given type, but you can replace
|
||||||
ultimately cause a lot more code to have to be written and would like
|
an object of one type with an object of another. Because of this,
|
||||||
make software that uses qpdf more brittle, and even so, checks would
|
there will always be some checks that will happen at runtime.
|
||||||
have to occur at runtime.
|
|
||||||
|
|
||||||
*Why do type errors sometimes raise exceptions?* The way warnings work
|
|
||||||
in qpdf requires a ``QPDF`` object to be associated with an object
|
|
||||||
handle for a warning to be issued. It would be nice if this could be
|
|
||||||
fixed, but it would require major changes to the API. Rather than
|
|
||||||
throwing away these conditions, we convert them to exceptions. It's not
|
|
||||||
that bad though. Since any object handle that was read from a file has
|
|
||||||
an associated ``QPDF`` object, it would only be type errors on objects
|
|
||||||
that were created explicitly that would cause exceptions, and in that
|
|
||||||
case, type errors are much more likely to be the result of a coding
|
|
||||||
error than invalid input.
|
|
||||||
|
|
||||||
*Why does the behavior of a type exception differ between the C and C++
|
*Why does the behavior of a type exception differ between the C and C++
|
||||||
API?* There is no way to throw and catch exceptions in C short of
|
API?* There is no way to throw and catch exceptions in C short of
|
||||||
|
Loading…
Reference in New Issue
Block a user