diff --git a/include/qpdf/QPDFWriter.hh b/include/qpdf/QPDFWriter.hh
index 885a3630..1aa4e8a8 100644
--- a/include/qpdf/QPDFWriter.hh
+++ b/include/qpdf/QPDFWriter.hh
@@ -343,6 +343,16 @@ class QPDFWriter
// setting R4 parameters pushes the version to at least 1.5, or if
// AES is used, 1.6, and setting R5 or R6 parameters pushes the
// version to at least 1.7 with extension level 3.
+ //
+ // Note about Unicode passwords: the PDF specification requires
+ // passwords to be encoded with PDF Doc encoding for R <= 4 and
+ // UTF-8 for R >= 5. In all cases, these methods take strings of
+ // bytes as passwords. It is up to the caller to ensure that
+ // passwords are properly encoded. The qpdf command-line tool
+ // tries to do this, as discussed in the manual. If you are doing
+ // this from your own application, QUtil contains many transcoding
+ // functions that could be useful to you, most notably
+ // utf8_to_pdf_doc.
QPDF_DLL
void setR3EncryptionParameters(
char const* user_password, char const* owner_password,
diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml
index c3d62814..0b2ec813 100644
--- a/manual/qpdf-manual.xml
+++ b/manual/qpdf-manual.xml
@@ -534,6 +534,83 @@ make
+
+
+
+
+ Ordinarily, qpdf attempts to automatically compensate for
+ passwords specified in the wrong character encoding. This
+ option suppresses that behavior. Under normal conditions,
+ there are no reasons to use this option. See for a discussion
+
+
+
+
+
+
+
+ This option can be used to fine-tune how qpdf interprets
+ Unicode (non-ASCII) password strings passed on the command
+ line. With the exception of the
+ mode, these only apply to passwords provided when encrypting
+ files. The mode also applies to
+ passwords specified for reading files. For additional
+ discussion of the supported password modes and when you might
+ want to use them, see .
+ The following modes are supported:
+
+
+
+ : Automatically determine whether the
+ specified password is a properly encoded Unicode (UTF-8)
+ string, and transcode it as required by the PDF spec based
+ on the type encryption being applied. On Windows starting
+ with version 8.4.0, and on almost all other modern
+ platforms, incoming passwords will be properly encoded in
+ UTF-8, so this is almost always what you want.
+
+
+
+
+ : Tells qpdf that the incoming
+ password is UTF-8, overriding whatever its automatic
+ detection determines. The only difference between this mode
+ and is that qpdf will fail with an
+ error message if the password is not valid UTF-8 instead of
+ falling back to mode with a warning.
+
+
+
+
+ : Interpret the password as a literal
+ byte string. For non-Windows platforms, this is what
+ versions of qpdf prior to 8.4.0 did. For Windows platforms,
+ there is no way to specify strings of binary data on the
+ command line directly, but you can use the
+ option to do it, in which case
+ this option forces qpdf to respect the string of bytes as
+ provided. This option will allow you to encrypt PDF files
+ with passwords that will not be usable by other readers.
+
+
+
+
+ : Interpret the password as a
+ hex-encoded string. This provides a way to pass binary data
+ as a password on all platforms including Windows. As with
+ , this option may allow creation of
+ files that can't be opened by other readers. This mode
+ affects qpdf's interpretation of passwords specified for
+ decrypting files as well as for encrypting them. It makes
+ it possible to specify strings that are encoded in some
+ manner other than the system's default encoding.
+
+
+
+
+
+
@@ -699,22 +776,17 @@ make
producers.
- In all cases where qpdf allows specification of a password, care
- must be taken if the password contains characters that fall
- outside of the 7-bit US-ASCII character range to ensure that the
- exact correct byte sequence is provided. It is possible that a
- future version of qpdf may handle this more gracefully. For
- example, if a password was encrypted using a password that was
- encoded in ISO-8859-1 and your terminal is configured to use
- UTF-8, the password you supply may not work properly. There are
- various approaches to handling this. For example, if you are
- using Linux and have the iconv executable installed, you could
- pass to qpdf where
- password is a password specified in
- your terminal's locale. A detailed discussion of this is out of
- scope for this manual, but just be aware of this issue if you have
- trouble with a password that contains 8-bit characters.
+ Prior to 8.4.0, in the case of passwords that contain characters
+ that fall outside of 7-bit US-ASCII, qpdf left the burden of
+ supplying properly encoded encryption and decryption passwords to
+ the user. Starting in qpdf 8.4.0, qpdf does this automatically in
+ most cases. For an in-depth discussion, please see . Previous versions of this
+ manual described workarounds using the iconv
+ command. Such workarounds are no longer required or recommended
+ with qpdf 8.4.0. However, for backward compatibility, qpdf
+ attempts to detect those workarounds and do the right thing in
+ most cases.
@@ -2024,6 +2096,121 @@ outfile.pdf
content stream, in which case it will produce unusable results.
+
+ Unicode Passwords
+
+ At the library API level, all methods that perform encryption and
+ decryption interpret passwords as strings of bytes. It is up to
+ the caller to ensure that they are appropriately encoded. Starting
+ with qpdf version 8.4.0, qpdf will attempt to make this easier for
+ you when interact with qpdf via its command line interface. The
+ PDF specification requires passwords used to encrypt files with
+ 40-bit or 128-bit encryption to be encoded with PDF Doc encoding.
+ This encoding is a single-byte encoding that supports ISO-Latin-1
+ and a handful of other commonly used characters. It has a large
+ overlap with Windows ANSI but is not exactly the same. There is
+ generally not a way to provide PDF Doc encoded strings on the
+ command line. As such, qpdf versions prior to 8.4.0 would often
+ create PDF files that couldn't be opened with other software when
+ given a password with non-ASCII characters to encrypt a file with
+ 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
+ recognizes the encoding of the parameter and transcodes it as
+ needed. The rest of this section provides the details about
+ exactly how qpdf behaves. Most users will not need to know this
+ information, but it might be useful if you have been working
+ around qpdf's old behavior or if you are using qpdf to generate
+ encrypted files for testing other PDF software.
+
+
+ A note about Windows: when qpdf builds, it attempts to determine
+ what it has to do to use wmain instead of
+ main on Windows. The
+ wmain function is an alternative entry point
+ that receives all arguments as UTF-16-encoded strings. When qpdf
+ starts up this way, it converts all the strings to UTF-8 encoding
+ and then invokes the regular main. This means that, as far as qpdf
+ is concerned, it receives its command-line arguments with UTF-8
+ encoding, just as it would in any modern Linux or UNIX
+ environment.
+
+
+ If a file is being encrypted with 40-bit or 128-bit encryption and
+ the supplied password is not a valid UTF-8 string, qpdf will fall
+ back to the behavior of interpreting the password as a string of
+ bytes. If you have old scripts that encrypt files by passing the
+ output of iconv to qpdf, you no longer need to
+ do that, but if you do, qpdf should still work. The only exception
+ would be for the extremely unlikely case of a password that is
+ encoded with a single-byte encoding but also happens to be valid
+ UTF-8. Such a password would contain strings of even numbers of
+ characters that alternate between accented letters and symbols. In
+ the extremely unlikely event that you are intentionally using such
+ passwords and qpdf is thwarting you by interpreting them as UTF-8,
+ you can use to suppress
+ qpdf's automatic behavior.
+
+
+ The option, as described earlier
+ in this chapter, can be used to change qpdf's interpretation of
+ supplied passwords. There are very few reasons to use this option.
+ One would be the unlikely case described in the previous paragraph
+ in which the supplied password happens to be valid UTF-8 but isn't
+ supposed to be UTF-8. Your best bet would be just to provide the
+ password as a valid UTF-8 string, but you could also use
+ . Another reason to use
+ would be to intentionally
+ generate PDF files encrypted with passwords that are not properly
+ encoded. The qpdf test suite does this to generate invalid files
+ for the purpose of testing its password recovery capability. If
+ you were trying to create intentionally incorrect files for a
+ similar purposes, the password mode can
+ enable you to do this.
+
+
+ When qpdf attempts to decrypt a file with a password that contains
+ non-ASCII characters, it will generate a list of alternative
+ passwords by attempting to interpret the password as each of a
+ handful of different coding systems and then transcode them to the
+ required format. This helps to compensate for the supplied
+ password being given in the wrong coding system, such as would
+ happen if you used the iconv workaround that
+ was previously needed. It also generates passwords by doing the
+ reverse operation: translating from correct in incorrect encoding
+ of the password. This would enable qpdf to decrypt files using
+ passwords that were improperly encoded by whatever software
+ encrypted the files, including older versions of qpdf invoked
+ without properly encoded passwords. The combination of these two
+ recovery methods should make qpdf transparently open most
+ encrypted files with the password supplied correctly but in the
+ wrong coding system. There are no real downsides to this behavior,
+ but if you don't want qpdf to do this, you can use the
+ option. One reason
+ to do that is to ensure that you know the exact password that was
+ used to encrypt the file.
+
+
+ With these changes, qpdf now generates compliant passwords in most
+ cases. There are still some exceptions. In particular, the PDF
+ specification directs compliant writers to normalize Unicode
+ passwords and to perform certain transformations on passwords with
+ bidirectional text. Implementing this functionality requires using
+ a real Unicode library like ICU. If a client application that uses
+ qpdf wants to do this, the qpdf library will accept the resulting
+ passwords, but qpdf will not perform these transformations itself.
+ It is possible that this will be addressed in a future version of
+ qpdf. The QPDFWriter methods that enable
+ encryption on the output file accept passwords as strings of
+ bytes.
+
+
+ Please note that the option
+ is unrelated to all this. This flag bypasses the normal process of
+ going from password to encryption string entirely, allowing the
+ raw encryption key to be specified directly. This is useful for
+ forensic purposes or for brute-force recovery of files with
+ unknown passwords.
+
+ QDF Mode
@@ -3974,6 +4161,253 @@ print "\n";
ChangeLog in the source distribution.
+
+ 8.4.0: XXX, 2019
+
+
+
+
+ Command-line Enhancements
+
+
+
+
+ Non-compatible CLI change: The qpdf
+ command-line tool interprets passwords given at the
+ command-line differently from previous releases when the
+ passwords contain non-ASCII characters. In some cases, the
+ behavior differs from previous releases. For a discussion of
+ the current behavior, please see . The incompatibilities are
+ as follows:
+
+
+
+ On Windows, qpdf now receives all command-line options as
+ Unicode strings if it can figure out the appropriate
+ compile/link options. This is enabled at least for MSVC
+ and mingw builds. That means that if non-ASCII strings
+ are passed to the qpdf CLI in Windows, qpdf will now
+ correctly receive them. In the past, they would have
+ either been encoded as Windows code page 1252 (also known
+ as “Windows ANSI” or as something
+ unintelligble. In almost all cases, qpdf is able to
+ properly interpret Unicode arguments now, whereas in the
+ past, it would almost never interpret them properly. The
+ result is that non-ASCII passwords given to the qpdf CLI
+ on Windows now have a much greater chance of creating PDF
+ files that can be opened by a variety of readers. In the
+ past, usually files encrypted from the Windows CLI using
+ non-ASCII passwords would not be readable by most
+ viewers. Note that the current version of qpdf is able to
+ decrypt files that it previously created using the
+ previously supplied password.
+
+
+
+
+ The PDF specification requires passwords to be encoded as
+ UTF-8 for 256-bit encryption and with PDF Doc encoding
+ for 40-bit or 128-bit encryption. Older versions of qpdf
+ left it up to the user to provide passwords with the
+ correct encoding. The qpdf CLI now detects when a
+ password is given with UTF-8 encoding and automatically
+ transcodes it to what the PDF spec requires. While this
+ is almost always the correct behavior, it is possible to
+ override the behavior if there is some reason to do so.
+ This is discussed in more depth in .
+
+
+
+
+
+
+
+ When opening an encrypted file with a password, if the
+ specified password doesn't work and the password contains
+ any non-ASCII characters, qpdf will try a number of
+ alternative passwords to try to compensate for possible
+ character encoding errors. This behavior can be suppressed
+ with the
+ option. See for a
+ full discussion.
+
+
+
+
+ Add the option to fine-tune
+ how qpdf interprets password arguments, especially when they
+ contain non-ASCII characters. See for more information.
+
+
+
+
+ In the option, it is now possible
+ to copy the same page more than once from the same file
+ without using the previous workaround of specifying two
+ different paths to the same file.
+
+
+
+
+ In the option, allow use of
+ “.” as a shortcut for the primary input file.
+ That way, you can do qpdf in.pdf --pages . 1-2 --
+ out.pdf instead of having to repeat
+ in.pdf in the command.
+
+
+
+
+ When encrypting with 128-bit and 256-bit encryption, new
+ encryption options ,
+ , , and
+ allow more fine-grained
+ granluarity in configuring options. Before, the
+ option only configured certain
+ predefined groups of permissions.
+
+
+
+
+
+
+ Bug Fixes and Enhancements
+
+
+
+
+ Potential data-loss bug: Versions of
+ qpdf between 8.1.0 and 8.3.0 had a bug that could cause page
+ splitting and merging operations to drop some font or image
+ resources if the PDF file's internal structure shared these
+ resource lists across pages and if some but not all of the
+ pages in the output did not reference all the fonts and
+ images. Using the
+ option
+ would work around the incorrect behavior. This bug was the
+ result of a typo in the code and a deficiency in the test
+ suite. The case that triggered the error was known, just not
+ handled properly. This case is now exercised in qpdf's test
+ suite and properly handled.
+
+
+
+
+
+
+ Library Enhancements
+
+
+
+
+ Add method
+ QUtil::possible_repaired_encodings() to
+ generate a list of strings that represent other ways the
+ given string could have been encoded. This is the method the
+ QPDF CLI uses to generate the strings it tries when
+ recovering incorrectly encoded Unicode passwords.
+
+
+
+
+ Add new versions of
+ QPDFWriter::setR{3,4,5,6}EncryptionParameters
+ that allow more granular setting of permissions bits. See
+ QPDFWriter.hh for details.
+
+
+
+
+ Add new versions of the transcoders from UTF-8 to
+ single-byte coding systems in QUtil
+ that report success or failure rather than just substituting
+ a specified unknown character.
+
+
+
+
+ Add method QUtil::analyze_encoding() to
+ determine whether a string has high-bit characters and is
+ appears to be UTF-16 or valid UTF-8 encoding.
+
+
+
+
+ Add new method
+ QPDFPageObjectHelper::shallowCopyPage()
+ to copy a new page that is a “shallow copy” of a
+ page. The resulting object is an indirect object ready to be
+ passed to
+ QPDFPageDocumentHelper::addPage() for
+ either the original QPDF object or a
+ different one. This is what the qpdf
+ command-line tool uses to copy the same page multiple times
+ from the same file during splitting and merging operations.
+
+
+
+
+ Add method QPDF::getUniqueId(), which
+ returns a unique identifier for the given QPDF object. The
+ identifier will be unique across the life of the
+ application. The returned value can be safely used as a map
+ key.
+
+
+
+
+ Add method QPDF::setImmediateCopyFrom.
+ This further enhances qpdf's ability to allow a
+ QPDF object from which objects are
+ being copied to go out of scope before the destination
+ object is written. If you call this method on a
+ QPDF instances, objects copied
+ from this instance will be copied
+ immediately instead of lazily. This option uses more memory
+ but allows the source object to go out of scope before the
+ destination object is written in all cases. See comments in
+ QPDF.hh for details.
+
+
+
+
+
+
+ Build Improvements
+
+
+
+
+ Add new configure option
+ , which causes
+ the preprocessor symbol
+ AVOID_WINDOWS_HANDLE to be defined. When
+ defined, qpdf will avoid referencing the Windows
+ HANDLE type, which is disallowed with
+ certain versions of the Windows SDK.
+
+
+
+
+ For Windows builds, attempt to determine what options, if
+ any, have to be passed to the compiler and linker to enable
+ use of wmain. This causes the
+ preprocessor symbol WINDOWS_WMAIN to be
+ defined. If you do your own builds with other compilers, you
+ can define this symbol to cause wmain
+ to be used. This is needed to allow the Windows
+ qpdf command to receive Unicode
+ command-line options.
+
+
+
+
+
+
+ 8.3.0: January 7, 2019
@@ -5079,8 +5513,6 @@ print "\n";
-
- 6.0.0: November 10, 2015