Add documentation for features since 8.3.0

This commit is contained in:
Jay Berkenbilt 2019-01-19 15:58:43 -05:00
parent 0a3057dc0a
commit e1271361c5
2 changed files with 460 additions and 18 deletions

View File

@ -343,6 +343,16 @@ class QPDFWriter
// setting R4 parameters pushes the version to at least 1.5, or if
// AES is used, 1.6, and setting R5 or R6 parameters pushes the
// version to at least 1.7 with extension level 3.
//
// Note about Unicode passwords: the PDF specification requires
// passwords to be encoded with PDF Doc encoding for R <= 4 and
// UTF-8 for R >= 5. In all cases, these methods take strings of
// bytes as passwords. It is up to the caller to ensure that
// passwords are properly encoded. The qpdf command-line tool
// tries to do this, as discussed in the manual. If you are doing
// this from your own application, QUtil contains many transcoding
// functions that could be useful to you, most notably
// utf8_to_pdf_doc.
QPDF_DLL
void setR3EncryptionParameters(
char const* user_password, char const* owner_password,

View File

@ -534,6 +534,83 @@ make
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--suppress-password-recovery</option></term>
<listitem>
<para>
Ordinarily, qpdf attempts to automatically compensate for
passwords specified in the wrong character encoding. This
option suppresses that behavior. Under normal conditions,
there are no reasons to use this option. See <xref
linkend="ref.unicode-passwords"/> for a discussion
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--password-mode=<replaceable>mode</replaceable></option></term>
<listitem>
<para>
This option can be used to fine-tune how qpdf interprets
Unicode (non-ASCII) password strings passed on the command
line. With the exception of the <option>hex-bytes</option>
mode, these only apply to passwords provided when encrypting
files. The <option>hex-bytes</option> mode also applies to
passwords specified for reading files. For additional
discussion of the supported password modes and when you might
want to use them, see <xref linkend="ref.unicode-passwords"/>.
The following modes are supported:
<itemizedlist>
<listitem>
<para>
<option>auto</option>: Automatically determine whether the
specified password is a properly encoded Unicode (UTF-8)
string, and transcode it as required by the PDF spec based
on the type encryption being applied. On Windows starting
with version 8.4.0, and on almost all other modern
platforms, incoming passwords will be properly encoded in
UTF-8, so this is almost always what you want.
</para>
</listitem>
<listitem>
<para>
<option>unicode</option>: Tells qpdf that the incoming
password is UTF-8, overriding whatever its automatic
detection determines. The only difference between this mode
and <option>auto</option> is that qpdf will fail with an
error message if the password is not valid UTF-8 instead of
falling back to <option>bytes</option> mode with a warning.
</para>
</listitem>
<listitem>
<para>
<option>bytes</option>: Interpret the password as a literal
byte string. For non-Windows platforms, this is what
versions of qpdf prior to 8.4.0 did. For Windows platforms,
there is no way to specify strings of binary data on the
command line directly, but you can use the
<option>@filename</option> option to do it, in which case
this option forces qpdf to respect the string of bytes as
provided. This option will allow you to encrypt PDF files
with passwords that will not be usable by other readers.
</para>
</listitem>
<listitem>
<para>
<option>hex-bytes</option>: Interpret the password as a
hex-encoded string. This provides a way to pass binary data
as a password on all platforms including Windows. As with
<option>bytes</option>, this option may allow creation of
files that can't be opened by other readers. This mode
affects qpdf's interpretation of passwords specified for
decrypting files as well as for encrypting them. It makes
it possible to specify strings that are encoded in some
manner other than the system's default encoding.
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--rotate=[+|-]angle[:page-range]</option></term>
<listitem>
@ -699,22 +776,17 @@ make
producers.
</para>
<para>
In all cases where qpdf allows specification of a password, care
must be taken if the password contains characters that fall
outside of the 7-bit US-ASCII character range to ensure that the
exact correct byte sequence is provided. It is possible that a
future version of qpdf may handle this more gracefully. For
example, if a password was encrypted using a password that was
encoded in ISO-8859-1 and your terminal is configured to use
UTF-8, the password you supply may not work properly. There are
various approaches to handling this. For example, if you are
using Linux and have the iconv executable installed, you could
pass <option>--password=`echo <replaceable>password</replaceable>
| iconv -t iso-8859-1`</option> to qpdf where
<replaceable>password</replaceable> is a password specified in
your terminal's locale. A detailed discussion of this is out of
scope for this manual, but just be aware of this issue if you have
trouble with a password that contains 8-bit characters.
Prior to 8.4.0, in the case of passwords that contain characters
that fall outside of 7-bit US-ASCII, qpdf left the burden of
supplying properly encoded encryption and decryption passwords to
the user. Starting in qpdf 8.4.0, qpdf does this automatically in
most cases. For an in-depth discussion, please see <xref
linkend="ref.unicode-passwords"/>. Previous versions of this
manual described workarounds using the <command>iconv</command>
command. Such workarounds are no longer required or recommended
with qpdf 8.4.0. However, for backward compatibility, qpdf
attempts to detect those workarounds and do the right thing in
most cases.
</para>
</sect1>
<sect1 id="ref.encryption-options">
@ -2024,6 +2096,121 @@ outfile.pdf</option>
content stream, in which case it will produce unusable results.
</para>
</sect1>
<sect1 id="ref.unicode-passwords">
<title>Unicode Passwords</title>
<para>
At the library API level, all methods that perform encryption and
decryption interpret passwords as strings of bytes. It is up to
the caller to ensure that they are appropriately encoded. Starting
with qpdf version 8.4.0, qpdf will attempt to make this easier for
you when interact with qpdf via its command line interface. The
PDF specification requires passwords used to encrypt files with
40-bit or 128-bit encryption to be encoded with PDF Doc encoding.
This encoding is a single-byte encoding that supports ISO-Latin-1
and a handful of other commonly used characters. It has a large
overlap with Windows ANSI but is not exactly the same. There is
generally not a way to provide PDF Doc encoded strings on the
command line. As such, qpdf versions prior to 8.4.0 would often
create PDF files that couldn't be opened with other software when
given a password with non-ASCII characters to encrypt a file with
40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
recognizes the encoding of the parameter and transcodes it as
needed. The rest of this section provides the details about
exactly how qpdf behaves. Most users will not need to know this
information, but it might be useful if you have been working
around qpdf's old behavior or if you are using qpdf to generate
encrypted files for testing other PDF software.
</para>
<para>
A note about Windows: when qpdf builds, it attempts to determine
what it has to do to use <function>wmain</function> instead of
<function>main</function> on Windows. The
<function>wmain</function> function is an alternative entry point
that receives all arguments as UTF-16-encoded strings. When qpdf
starts up this way, it converts all the strings to UTF-8 encoding
and then invokes the regular main. This means that, as far as qpdf
is concerned, it receives its command-line arguments with UTF-8
encoding, just as it would in any modern Linux or UNIX
environment.
</para>
<para>
If a file is being encrypted with 40-bit or 128-bit encryption and
the supplied password is not a valid UTF-8 string, qpdf will fall
back to the behavior of interpreting the password as a string of
bytes. If you have old scripts that encrypt files by passing the
output of <command>iconv</command> to qpdf, you no longer need to
do that, but if you do, qpdf should still work. The only exception
would be for the extremely unlikely case of a password that is
encoded with a single-byte encoding but also happens to be valid
UTF-8. Such a password would contain strings of even numbers of
characters that alternate between accented letters and symbols. In
the extremely unlikely event that you are intentionally using such
passwords and qpdf is thwarting you by interpreting them as UTF-8,
you can use <option>--password-mode=bytes</option> to suppress
qpdf's automatic behavior.
</para>
<para>
The <option>--password-mode</option> option, as described earlier
in this chapter, can be used to change qpdf's interpretation of
supplied passwords. There are very few reasons to use this option.
One would be the unlikely case described in the previous paragraph
in which the supplied password happens to be valid UTF-8 but isn't
supposed to be UTF-8. Your best bet would be just to provide the
password as a valid UTF-8 string, but you could also use
<option>--password-mode=bytes</option>. Another reason to use
<option>--password-mode=bytes</option> would be to intentionally
generate PDF files encrypted with passwords that are not properly
encoded. The qpdf test suite does this to generate invalid files
for the purpose of testing its password recovery capability. If
you were trying to create intentionally incorrect files for a
similar purposes, the <option>bytes</option> password mode can
enable you to do this.
</para>
<para>
When qpdf attempts to decrypt a file with a password that contains
non-ASCII characters, it will generate a list of alternative
passwords by attempting to interpret the password as each of a
handful of different coding systems and then transcode them to the
required format. This helps to compensate for the supplied
password being given in the wrong coding system, such as would
happen if you used the <command>iconv</command> workaround that
was previously needed. It also generates passwords by doing the
reverse operation: translating from correct in incorrect encoding
of the password. This would enable qpdf to decrypt files using
passwords that were improperly encoded by whatever software
encrypted the files, including older versions of qpdf invoked
without properly encoded passwords. The combination of these two
recovery methods should make qpdf transparently open most
encrypted files with the password supplied correctly but in the
wrong coding system. There are no real downsides to this behavior,
but if you don't want qpdf to do this, you can use the
<option>--suppress-password-recovery</option> option. One reason
to do that is to ensure that you know the exact password that was
used to encrypt the file.
</para>
<para>
With these changes, qpdf now generates compliant passwords in most
cases. There are still some exceptions. In particular, the PDF
specification directs compliant writers to normalize Unicode
passwords and to perform certain transformations on passwords with
bidirectional text. Implementing this functionality requires using
a real Unicode library like ICU. If a client application that uses
qpdf wants to do this, the qpdf library will accept the resulting
passwords, but qpdf will not perform these transformations itself.
It is possible that this will be addressed in a future version of
qpdf. The <classname>QPDFWriter</classname> methods that enable
encryption on the output file accept passwords as strings of
bytes.
</para>
<para>
Please note that the <option>--password-is-hex-key</option> option
is unrelated to all this. This flag bypasses the normal process of
going from password to encryption string entirely, allowing the
raw encryption key to be specified directly. This is useful for
forensic purposes or for brute-force recovery of files with
unknown passwords.
</para>
</sect1>
</chapter>
<chapter id="ref.qdf">
<title>QDF Mode</title>
@ -3974,6 +4161,253 @@ print "\n";
<filename>ChangeLog</filename> in the source distribution.
</para>
<variablelist>
<varlistentry>
<term>8.4.0: XXX, 2019</term>
<listitem>
<itemizedlist>
<listitem>
<para>
Command-line Enhancements
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>Non-compatible CLI change:</emphasis> The qpdf
command-line tool interprets passwords given at the
command-line differently from previous releases when the
passwords contain non-ASCII characters. In some cases, the
behavior differs from previous releases. For a discussion of
the current behavior, please see <xref
linkend="ref.unicode-passwords"/>. The incompatibilities are
as follows:
<itemizedlist>
<listitem>
<para>
On Windows, qpdf now receives all command-line options as
Unicode strings if it can figure out the appropriate
compile/link options. This is enabled at least for MSVC
and mingw builds. That means that if non-ASCII strings
are passed to the qpdf CLI in Windows, qpdf will now
correctly receive them. In the past, they would have
either been encoded as Windows code page 1252 (also known
as &ldquo;Windows ANSI&rdquo; or as something
unintelligble. In almost all cases, qpdf is able to
properly interpret Unicode arguments now, whereas in the
past, it would almost never interpret them properly. The
result is that non-ASCII passwords given to the qpdf CLI
on Windows now have a much greater chance of creating PDF
files that can be opened by a variety of readers. In the
past, usually files encrypted from the Windows CLI using
non-ASCII passwords would not be readable by most
viewers. Note that the current version of qpdf is able to
decrypt files that it previously created using the
previously supplied password.
</para>
</listitem>
<listitem>
<para>
The PDF specification requires passwords to be encoded as
UTF-8 for 256-bit encryption and with PDF Doc encoding
for 40-bit or 128-bit encryption. Older versions of qpdf
left it up to the user to provide passwords with the
correct encoding. The qpdf CLI now detects when a
password is given with UTF-8 encoding and automatically
transcodes it to what the PDF spec requires. While this
is almost always the correct behavior, it is possible to
override the behavior if there is some reason to do so.
This is discussed in more depth in <xref
linkend="ref.unicode-passwords"/>.
</para>
</listitem>
</itemizedlist>
</para>
</listitem>
<listitem>
<para>
When opening an encrypted file with a password, if the
specified password doesn't work and the password contains
any non-ASCII characters, qpdf will try a number of
alternative passwords to try to compensate for possible
character encoding errors. This behavior can be suppressed
with the <option>--suppress-password-recovery</option>
option. See <xref linkend="ref.unicode-passwords"/> for a
full discussion.
</para>
</listitem>
<listitem>
<para>
Add the <option>--password-mode</option> option to fine-tune
how qpdf interprets password arguments, especially when they
contain non-ASCII characters. See <xref
linkend="ref.unicode-passwords"/> for more information.
</para>
</listitem>
<listitem>
<para>
In the <option>--pages</option> option, it is now possible
to copy the same page more than once from the same file
without using the previous workaround of specifying two
different paths to the same file.
</para>
</listitem>
<listitem>
<para>
In the <option>--pages</option> option, allow use of
&ldquo;.&rdquo; as a shortcut for the primary input file.
That way, you can do <command>qpdf in.pdf --pages . 1-2 --
out.pdf</command> instead of having to repeat
<filename>in.pdf</filename> in the command.
</para>
</listitem>
<listitem>
<para>
When encrypting with 128-bit and 256-bit encryption, new
encryption options <option>--assemble</option>,
<option>--annotate</option>, <option>--form</option>, and
<option>--modify-other</option> allow more fine-grained
granluarity in configuring options. Before, the
<option>--modify</option> option only configured certain
predefined groups of permissions.
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Bug Fixes and Enhancements
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>Potential data-loss bug:</emphasis> Versions of
qpdf between 8.1.0 and 8.3.0 had a bug that could cause page
splitting and merging operations to drop some font or image
resources if the PDF file's internal structure shared these
resource lists across pages and if some but not all of the
pages in the output did not reference all the fonts and
images. Using the
<option>--preserve-unreferenced-resources</option> option
would work around the incorrect behavior. This bug was the
result of a typo in the code and a deficiency in the test
suite. The case that triggered the error was known, just not
handled properly. This case is now exercised in qpdf's test
suite and properly handled.
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Library Enhancements
</para>
<itemizedlist>
<listitem>
<para>
Add method
<function>QUtil::possible_repaired_encodings()</function> to
generate a list of strings that represent other ways the
given string could have been encoded. This is the method the
QPDF CLI uses to generate the strings it tries when
recovering incorrectly encoded Unicode passwords.
</para>
</listitem>
<listitem>
<para>
Add new versions of
<function>QPDFWriter::setR{3,4,5,6}EncryptionParameters</function>
that allow more granular setting of permissions bits. See
<filename>QPDFWriter.hh</filename> for details.
</para>
</listitem>
<listitem>
<para>
Add new versions of the transcoders from UTF-8 to
single-byte coding systems in <classname>QUtil</classname>
that report success or failure rather than just substituting
a specified unknown character.
</para>
</listitem>
<listitem>
<para>
Add method <function>QUtil::analyze_encoding()</function> to
determine whether a string has high-bit characters and is
appears to be UTF-16 or valid UTF-8 encoding.
</para>
</listitem>
<listitem>
<para>
Add new method
<function>QPDFPageObjectHelper::shallowCopyPage()</function>
to copy a new page that is a &ldquo;shallow copy&rdquo; of a
page. The resulting object is an indirect object ready to be
passed to
<function>QPDFPageDocumentHelper::addPage()</function> for
either the original <classname>QPDF</classname> object or a
different one. This is what the <command>qpdf</command>
command-line tool uses to copy the same page multiple times
from the same file during splitting and merging operations.
</para>
</listitem>
<listitem>
<para>
Add method <function>QPDF::getUniqueId()</function>, which
returns a unique identifier for the given QPDF object. The
identifier will be unique across the life of the
application. The returned value can be safely used as a map
key.
</para>
</listitem>
<listitem>
<para>
Add method <function>QPDF::setImmediateCopyFrom</function>.
This further enhances qpdf's ability to allow a
<classname>QPDF</classname> object from which objects are
being copied to go out of scope before the destination
object is written. If you call this method on a
<classname>QPDF</classname> instances, objects copied
<emphasis>from</emphasis> this instance will be copied
immediately instead of lazily. This option uses more memory
but allows the source object to go out of scope before the
destination object is written in all cases. See comments in
<filename>QPDF.hh</filename> for details.
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Build Improvements
</para>
<itemizedlist>
<listitem>
<para>
Add new configure option
<option>--enable-avoid-windows-handle</option>, which causes
the preprocessor symbol
<literal>AVOID_WINDOWS_HANDLE</literal> to be defined. When
defined, qpdf will avoid referencing the Windows
<classname>HANDLE</classname> type, which is disallowed with
certain versions of the Windows SDK.
</para>
</listitem>
<listitem>
<para>
For Windows builds, attempt to determine what options, if
any, have to be passed to the compiler and linker to enable
use of <function>wmain</function>. This causes the
preprocessor symbol <literal>WINDOWS_WMAIN</literal> to be
defined. If you do your own builds with other compilers, you
can define this symbol to cause <function>wmain</function>
to be used. This is needed to allow the Windows
<command>qpdf</command> command to receive Unicode
command-line options.
</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
<varlistentry>
<term>8.3.0: January 7, 2019</term>
<listitem>
@ -5079,8 +5513,6 @@ print "\n";
</itemizedlist>
</listitem>
</varlistentry>
</variablelist>
<variablelist>
<varlistentry>
<term>6.0.0: November 10, 2015</term>
<listitem>