2024-01-15 Jay Berkenbilt <ejb@ql.org> * Add JSON::getDictItem (from m-holger) 2024-01-10 Jay Berkenbilt <ejb@ql.org> * Allow --overlay and --underlay to be repeated. They may appear multiple times on the command-line and will be stacked in the order in which they appear. In QPDFJob JSON, the overlay and underlay keys may contain arrays. For compatibility, they may also contain a single dictionary. 2024-01-09 Jay Berkenbilt <ejb@ql.org> * Add new command-line arguments --file and --range which can be used within --pages in place of positional arguments. Allow --file to be used inside of --overlay and --underlay as well. These new options can be freely intermixed with positional arguments. Also add file(), range(), and password() to QPDFJob::PagesConfig as an alternative to pageSpec. 2024-01-08 Jay Berkenbilt <ejb@ql.org> * 11.8.0: release 2024-01-07 Jay Berkenbilt <ejb@ql.org> * Bug fix: treat references to older generations of objects as null. 2024-01-06 Jay Berkenbilt <ejb@ql.org> * When recovering a file's xref table, attempt to find xref streams if a traditional trailer dictionary is not found. Fixes #1103. 2024-01-05 Jay Berkenbilt <ejb@ql.org> * Add --set-page-labels command-line argument and supporting API. Fixes #939. - QPDFJob::Config::setPageLabels - pdf_page_label_e enumerated type - QPDFPageLabelDocumentHelper::pageLabelDict 2024-01-01 Jay Berkenbilt <ejb@ql.org> * Support comma-separated numeric values with --collate to select different group sizes from different files. Fixes #505. * Support "x" before a group in a numeric range to exclude a group from the previous group. Details are in the manual. Fixes #564, #790. 2023-12-29 Jay Berkenbilt <ejb@ql.org> * When flattening annotations, preserve annotations without any appearance information at all, such as types /Link, /Popup, and /Projection. Fixes #1039. 2023-12-25 Jay Berkenbilt <ejb@ql.org> * Detect overlong UTF-8 in the UTF-8 decoder, and fix detection of 8-bit characters in erroneous UTF-8 strings. 2023-12-24 Jay Berkenbilt <ejb@ql.org> * 11.7.0: release 2023-12-23 Jay Berkenbilt <ejb@ql.org> * Define CPACK_NSIS_MODIFY_PATH for the Windows builds so the official installers will offer to modify PATH when installing qpdf. Fixes #1054. * Add QPDFAcroFormDocumentHelper::disableDigitalSignatures, which disables any digital signature fields, leaving their visual representations intact. The --remove-restrictions command-line argument now calls this. Fixes #1015. 2023-12-22 Jay Berkenbilt <ejb@ql.org> * Generate a more complete qpdf "man page" from the same source as qpdf --help. Fixes #1064. * Allow the syntax "--encrypt --user-password=user-password --owner-password=owner-password --bits={40,128,256}" when encrypting PDF files. This is an alternative to the syntax "--encrypt user-password owner-password {40,128,256}", which will continue to be supported. The new syntax works better with shell completion and allows creation of passwords that start with "-". Fixes #874. * When setting a check box value, allow any value other than /Off to mean checked. This is permitted by the spec. Previously, any value other than /Yes or /Off was rejected. Fixes #1056. 2023-12-21 Jay Berkenbilt <ejb@ql.org> * Fix to QPDF JSON: a floating point number that appears in scientific notation will be converted to fixed-point notation, rounded to six digits after the decimal point. Fixes #1079. * Fix to QPDF JSON: the syntax "n:/pdf-syntax" is now accepted as an alternative way to represent names. This can be used for any name (e.g. "n:/text#2fplain"), but it is necessary when the name contains binary characters. For example, /one#a0two must be represented as "n:/one#a0two" since the single byte a0 is not valid in JSON. Fixes #1072. * From M. Holger: Refactor QPDFParser for performance. See #1059 for a discussion. 2023-12-20 Jay Berkenbilt <ejb@ql.org> * Update code and tests so that qpdf's test suite no longer depends on the output of any specific zlib implementation. This makes it possible to get a fully passing test suite with any API-compatible zlib library. CI tests with the default zlib as well as zlib-ng (including verifying that zlib-ng is not the default), but any zlib implementation should work. Fixes #774. * Bug fix: with --compress-streams=n, don't compress object, XRef, or linearization hint streams. 2023-12-16 Jay Berkenbilt <ejb@ql.org> * Add new C++ functions "qpdf_c_get_qpdf" and "qpdf_c_wrap" to qpdf-c.h that make it possible to write your own extern "C" functions in C++ that interoperate with the C API. See examples/extend-c-api for more information. * Bug fix from M. Holger: the default for /Columns in PNG filter is 1, but libqpdf was acting like it was 0. * Enhancement from M. Holger: add methods to Buffer to work more easily with std::string. 2023-12-10 Jay Berkenbilt <ejb@ql.org> * 11.6.4: release 2023-12-09 Jay Berkenbilt <ejb@ql.org> * Install fix: include cmake files with the dev component. 2023-11-20 Jay Berkenbilt <ejb@ql.org> * Build AppImage with an older Linux distribution to support AWS Lambda. Fixes #1086. 2023-10-15 Jay Berkenbilt <ejb@ql.org> * 11.6.3: release 2023-10-14 Jay Berkenbilt <ejb@ql.org> * Tweak linearization code to better handle files between 2 GB and 4 GB in size. Fixes #1023. * Fix data loss bug: qpdf could discard a the character after an escaped octal string consisting of less than three digits. For content, this would only happen with QDF or when normalizing content. Outside of content, it could have happened in any binary string, such as /ID, if the encoding software used octal escape strings with less than three digits. This bug was introduced between 10.6.3 and 11.0.0. Fixes #1050. 2023-10-07 Jay Berkenbilt <ejb@ql.org> * 11.6.2: release * Bug fix: when piping stream data, don't call finish on failure if the failure was caused by a previous call to finish. Fixes #1042. * Push .idea directory with the beginning of a sharable JetBrains CLion configuration. 2023-09-05 Jay Berkenbilt <ejb@ql.org> * 11.6.1: release * Fix a logic error introduced in 11.6.0 in the fix to copyForeignObject. The bug could result in some pages not being copied. 2023-09-03 Jay Berkenbilt <ejb@ql.org> * 11.6.0: release * ascii85 parser: ignore spaces everywhere including between ~ and >. Fixes #973. * Bug fix: with --pages, if one of the external files had warnings but the main file did not, the warning was previously not taken into consideration when determining the exit status. * Put quotation marks around the command in completion output to better handle spaces in paths. It is not a perfect fix (ideally, full shell-compatible quoting should be used), but it handles more cases than the old code and should handle all reasonable cases of qpdf being in a directory with a space in its name, which is common in Windows. Fixes #1021. * Move check for random number device to runtime instead of compile time. Since, by default, the crypto provider provides random numbers, runtime determination of a random number device is usually not needed. Fixes #1022. 2023-09-02 Jay Berkenbilt <ejb@ql.org> * Maintain links to foreign pages when copying foreign objects. This allows hyperlinks in imported files to work. Fixes #1003. * Bug fix: Return a null object if an attempt is made to to copy a foreign /Pages object with copyForeignObject. This corrects a possible crash. Fixes #1010. * Bug fix: Return a null object if an attempt is made to to copy a foreign /Pages object with copyForeignObject. Fixes #1003. * Add /MediaBox to a page if absent. Thanks M. Holger. * Use std::vector internally for Pl_Buffer to avoid incompatibility with C++20. Thanks to Zoe Clifford. Fixes #1024. 2023-07-09 Jay Berkenbilt <ejb@ql.org> * 11.5.0: release * This release consists entirely of changes made by M. Holger. Mostly this is changes to the private API, performance enhancements, code cleanup, and reformatting to 100 columns instead of 80. For qpdf development, we are starting to use JetBrains CLion, so a lot of the changes are moving us toward a cleaner development experience in that environment. 2023-06-15 Jay Berkenbilt <ejb@ql.org> * Bug fix: when a the same page is copied multiple times, copy the annotations rather than having multiple pages share an annotation object. Thanks to M. Holger for the fix. Fixes #600. 2023-06-14 Jay Berkenbilt <ejb@ql.org> * Add "FUTURE" build option for enabling experimental APIs. Do not package qpdf built with the FUTURE option as there are no binary compatibility or even source compatibility guarantees. The option is intended for developers who want to ensure that future potentially breaking changes are compatible with their code or provide feedback on upcoming changes. At present, the only feature enabled by FUTURE is a move constructor for QPDFObjectHandle. While this shouldn't break any code, it would change details about how many copies of a specific QPDFObjectHandle were in existence, so it could potentially break code that was relying on internal shared pointer reference counts. Thanks to M. Holger for the idea and contribution. 2023-05-25 Jay Berkenbilt <ejb@ql.org> * Add new method Buffer::copy and deprecate Buffer copy constructor and assignment operator. Buffer copy operations are expensive as they always involve copying the buffer content. Use "buffer2 = buffer1.copy();" or "Buffer buffer2{buffer1.copy()};" to make it explicit that copying is intended. This change was contributed by M. Holger. 2023-05-21 Jay Berkenbilt <ejb@ql.org> * 11.4.0: release 2023-05-20 Jay Berkenbilt <ejb@ql.org> * From M. Holger: add QPDF::newReserved as a better alternative to QPDFObjectHandle::newReserved. The operation of creating a new reserved object fits better in the QPDF API. The old call just delegates to the new one. 2023-05-13 Jay Berkenbilt <ejb@ql.org> * When an annotation dictionary's appearance dictionary (`/AP`) has a key that is a stream, disregard `/AS` (which is supposed to point to a subkey). This enables qpdf to not ignore annotations that have incorrect values for `/AS` when the appearance stream is directly in the `/AP` dictionary instead of in a subkey. Fixes #949. 2023-04-02 Jay Berkenbilt <ejb@ql.org> * Allow QPDFJob's workflow to be split into a reading phase and a writing phase to allow the caller to operate on the QPDF object before it is written. This adds methods QPDFJob::createQPDF and QPDFJob::writeQPDF and corresponding C API functions qpdfjob_create_qpdf and qpdfjob_write_qpdf. Thanks to M. Holger for the contribution. 2023-04-01 Jay Berkenbilt <ejb@ql.org> * From M. Holger: throw a logic error if an uninitialized or foreign QPDFObjectHandle is added to an array. 2023-03-18 Jay Berkenbilt <ejb@ql.org> * Enhance --optimize-images to support images nested inside of form XObjects. Thanks to Connor Osborne (github user cdosborn) for the contribution. Fixes #923. 2023-02-25 Jay Berkenbilt <ejb@ql.org> * 11.3.0: release * When performing overlay or underlay operations, convert the original page to a form XObject instead of simply isolating its contents with q/Q operators. This prevents unbalanced q/Q operators in any of the original pages from messing up the graphics state of anything that is overlaid on top of it. Fixes #904. 2023-02-18 Jay Berkenbilt <ejb@ql.org> * Treat all linearization errors and warnings as warnings, and issue them through the normal warning system using the new error code qpdf_e_linearization. That means that --no-warn will suppress them, and the file name is included in the warning. Fixes #851. 2023-01-28 Jay Berkenbilt <ejb@ql.org> * New option --remove-restrictions removes security restrictions from digitally signed files. Fixes #833. 2023-01-09 Jay Berkenbilt <ejb@ql.org> * Bug fix: flatten annotations should handle a page with no /Resources key. Fixes #827. 2022-11-20 Jay Berkenbilt <ejb@ql.org> * 11.2.0: release * Add a few convenience methods to QPDFTokenizer::Token for checking token types. thanks to M. Holger for the contribution. * Add stream creation methods to the QPDF class as a better alternative to the ones in the QPDFObjectHandle class. Thanks to M. Holger for the contribution. 2022-11-19 Jay Berkenbilt <ejb@ql.org> * Bug fix: handle special case of an earlier xref stream object's object number being reused by an update made by appending the file. Fixes #809. 2022-10-08 Jay Berkenbilt <ejb@ql.org> * Fix major performance bug with the openssl crypto provider when using OpenSSL 3. The legacy loader and rc4 algorithm was being loaded with every call to the crypto provider instead of once in the life of the program. Fixes #798. * performance_check: add --test option to limit which tests are run. 2022-10-06 Jay Berkenbilt <ejb@ql.org> * Change minimum required C++ version from C++-14 to C++-17. * Fix another symbol export issue with the MinGW build. 2022-10-01 Jay Berkenbilt <ejb@ql.org> * 11.1.1: release 2022-09-27 Jay Berkenbilt <ejb@ql.org> * Bug fix: avoid having the AppImage discard the first argument when renamed to one of the embedded executables. Fixes #789. * Add AppImage-specific tests to CI. These test different invocation styles and loading of the proper shared library. 2022-09-26 Jay Berkenbilt <ejb@ql.org> * Bug fix: avoid using PDF Doc encoding for strings whose PDF Doc encoding representation starts with UTF-16 or UTF-8 markers. Fixes #778. 2022-09-27 Jay Berkenbilt <ejb@ql.org> * Add tests to CI for char being unsigned by default. 2022-09-14 Jay Berkenbilt <ejb@ql.org> * 11.1.0: release * Add notes to documentation clarifying that installing the dev component usually requires the lib component to also be installed. * Set CMAKE_INCLUDE_DIRECTORIES_PROJECT_BEFORE ON in cmake to (hopefully) solve the problem of older installed qpdf headers interfering with building qpdf from source. Fixes #763. 2022-09-12 Jay Berkenbilt <ejb@ql.org> * Add some missing DLL exports that only affect the Windows build. * Remove compile-time test for LL_FMT. It's unlikely that any compiler new enough to build qpdf still doesn't support %lld. 2022-09-10 Jay Berkenbilt <ejb@ql.org> * 11.0.0: release 2022-09-09 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::isSameObjectAs to test whether two QPDFObjectHandle objects point to the same underlying object. * Expose ability to create custom loggers and to get and set the logger for QPDF and QPDFJob through the C API. 2022-09-08 Jay Berkenbilt <ejb@ql.org> * Added new functions to the C API to support qpdf JSON: qpdf_create_from_json_file, qpdf_create_from_json_data, qpdf_update_from_json_file, qpdf_update_from_json_data, and qpdf_write_json. Examples can be found in qpdf-ctest.c (in the source tree), tests 42 through 47. * Add QPDFObjectHandle::isDestroyed() to test whether an indirect object was from a QPDF that has been destroyed. 2022-09-07 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::getQPDF(), which returns a reference, as an alternative to QPDFObjectHandle::getOwningQPDF(). 2022-09-06 Jay Berkenbilt <ejb@ql.org> * For all bounding box methods in QPDFPageObjectHelper other than MediaBox, add a parameter `copy_if_fallback`, and add comments explaining in depth exactly what copy_if_shared and copy_if_fallback mean. Fixes #664. * Add new methods getArtBox and getBleedBox to QPDFPageObjectHelper, completing the set of bounding box methods. * The --show-encryption option now works even if a correct password is not supplied. If you were using --show-encryption to test whether you have the right password, use --requires-password instead. Fixes #598. 2022-09-05 Jay Berkenbilt <ejb@ql.org> * Add a move constructor to Buffer, making it possible to move rather than copy the internal buffer. Thanks to jbarlow83 for the contribution. 2022-09-02 Jay Berkenbilt <ejb@ql.org> * Add new QPDF::create() factory method that returns std::shared_ptr<QPDF>. * Prevent copying/assigning to QPDF objects in the API. It has never been safe to do this, but the API wasn't preventing it. 2022-09-01 Jay Berkenbilt <ejb@ql.org> * Remove QPDFObject.hh from include/qpdf. The only reason to include was to get QPDFObject::object_type_e. Instead, include qpdf/Constants.h, and change `QPDFObject::ot_` to `::ot_`. * More optimizations and cleanup from m-holger (#726, #730) including major refactor of QPDF's internal representations of objects. In addition to a large performance improvement, this also made it possible for QPDFObjectHandle::getOwningQPDF() to return a null pointer if the owning QPDF had been destroyed. (A more complete solution to this problem will be introduced for qpdf 12.) This work also paves the way for a future alternative to QPDFObjectHandle that is more idiomatic C++ and has greater type safety. 2022-08-31 Jay Berkenbilt <ejb@ql.org> * From m-holger (#729): refactor QPDF's parser into a new QPDFParser class, cleaning the code, significantly improving performance. 2022-08-27 Jay Berkenbilt <ejb@ql.org> * From m-holger: major refactoring of QPDFTokenizer to improve readability and to optimize performance. This also included some optimizations to some InputSource classes. Thanks for this excellent contribution. Fixes #749, #442. 2022-08-07 Jay Berkenbilt <ejb@ql.org> * Add new build configuration option ENABLE_QTC, which is off by default when not running in MAINTAINER_MODE. When this is off, QTC coverage calls sprinkled throughout the qpdf source code are compiled out for increased performance. See "Build Options" in the manual for a discussion. Fixes #714. 2022-08-06 Jay Berkenbilt <ejb@ql.org> * Added by m-holger: QPDF::getObject() method as a simpler form of getObjectByID or getObjectByObjGen. The older methods are being retained for compatibility and are not deprecated. 2022-07-24 Jay Berkenbilt <ejb@ql.org> * include/qpdf/JSON.hh: Schema validation: allow a single item to appear anywhere that the schema has an array of a single item. This makes it possible to change an element of the schema from an item to an array to allow the data to accept an array where a single value was previously required. This change is needed to allow QPDFJob JSON to start accepting multiple items where a single item used to be expected without breaking backward compatibility. Without this change, the earlier fix to removeAttachment would be a breaking change. Also allow the schema to contain a multi-element array, which means that the output has to have an array of the same length in the corresponding location, and each element is validated against the corresponding schema element. * QPDFObjectHandle: for the methods insertItem, appendItem, eraseItem, replaceKey, and removeKey, add a corresponding "AndGetNew" and/or "AndGetOld" methods. The ones that end with "AndGetNew" return the newly added item. The ones that end with "AndGetOld" return the old value. The AndGetNew methods make it possible to create a new object, add it to an array or dictionary, and get a handle to it all in one line. The AndGetOld methods make it easier to retrieve an old value when removing or replacing it. * Thanks to m-holger for doing significant cleanup of private APIs and internals around QPDFObjGen and for significantly improving the performance of QPDFObjGen -- See #731. This includes a few user-visible changes: - Addition of QPDFObjectHandle::StreamDataProvider::provideStreamData overloads that take QPDFObjGen - Addition of an optional argument to QPDFObjGen::unparse allowing specification of a separator character, with the default resulting in the old behavior Examples have been updated to use improved APIs. The old provideStreamData overloads will continue to be supported, so updating older code to use the new interfaces is entirely at the programmer's discretion. 2022-06-25 Jay Berkenbilt <ejb@ql.org> * Add tracking methods QPDF::everCalledGetAllPages() and QPDF::everPushedInheritedAttributesToPages(). Since those methods may have the side effect of creating new objects and replace objects in various places in the pages tree, it's useful to be able to find out whether they've ever been called. 2022-06-18 Jay Berkenbilt <ejb@ql.org> * Add QPDFJob::registerProgressReporter, making it possible to override the progress reporter that is used when --progress (or the equivalent) is configured with QPDFJob. This is qpdfjob_register_progress_reporter in the C API. * Add examples that show how to capture QPDFJob's output by configuring the default logger (qpdfjob-save-attachment.cc, qpdfjob-c-save-attachment.c). Fixes #691. * Add C API for QPDFLogger -- see qpdflogger-c.h * Add additional qpdfjob C API functions take a handle. * Add qpdf_exit_code_e to Constants.h so that exit codes from QPDFJob are accessible to the C API. * When --progress or --verbose is combined with writing to standard output, progress reporting and verbose messages go to standard error. Previously it was disabled in this case. 2022-06-05 Jay Berkenbilt <ejb@ql.org> * QPDFJob: API breaking change: QPDFJob::doIfVerbose passes a Pipeline& rather than a std::ostream& to the the callback function. * Add integer types to pipeline's operator<<: short, int, long, long long, unsigned short, unsigned int, unsigned long, unsigned long long. 2022-05-30 Jay Berkenbilt <ejb@ql.org> * qpdf JSON is now at version 2. New command-line arguments: --json-output, --json-input, --update-from-json. New methods QPDF::writeJSON, QPDF::createFromJSON, QPDF::updateFromJSON. For details, see the "qpdf JSON" chapter of the manual. * When showing encryption data in json output, when the user password was recovered with by the owner password and the specified password does not match the user password, reveal the user password. This is not possible with 256-bit keys. * Include additional information in --list-attachments --verbose and in --json --json-key=attachments. * Add QUtil::qpdf_time_to_iso8601 and QUtil::pdf_time_to_iso8601 for converting PDF/qpdf timestamps to ISO-8601 date format. 2022-05-18 Jay Berkenbilt <ejb@ql.org> * Add QUtil::FileCloser to the public API. This is a simple inline class to help with automatic file closing. 2022-05-17 Jay Berkenbilt <ejb@ql.org> * Allow passing *uninitialized* (not null) objects to replaceStreamData as filter and/or decode_parms to leave any existing values for /Filter and /DecodeParms untouched. 2022-05-15 Jay Berkenbilt <ejb@ql.org> * Add QUtil::is_long_long to test whether a string can be converted to a long long and back without loss of information. 2022-05-04 Jay Berkenbilt <ejb@ql.org> * JSON: add a new "blob" type that takes a function to write data into. The blob is serialized as a base64-encoded representation of whatever is written to the function. * FileInputSource has new constructors that eliminate the need to call setFilename or setFile in most cases. * Enhance JSON by adding a write method that takes a Pipeline* and depth, and add several helper methods to make it easier to write large amounts of JSON incrementally without having to have the whole thing in memory. * json v1 output: make "pages" and "objects" consistent. Previously, "objects" always reflected the objects exactly as they appeared in the original file, while "pages" reflected objects after repair of the pages tree. This could be misleading. Now, if "pages" is specified, "objects" shows the effects of repairing the page tree, and if not, it doesn't. This makes no difference for correct PDF files that don't have problems in the pages tree. JSON v2 will behave in a similar way. 2022-05-03 Jay Berkenbilt <ejb@ql.org> * Add new Pipeline class Pl_String which appends to a std::string& passed to it at construction. * Add new Pipeline class Pl_OStream, similar to Pl_StdioFile but takes a std::ostream instead of a FILE*. * Add new convenience methods to Pipeline: writeCStr and writeString. Also add a limit << operator that takes C strings and std::strings. Also add an overloaded version of write that takes "char const*". * API change: Pipeline::write now takes "unsigned char const *" instead of "unsigned char*". Callers shouldn't have to change anything, though can stop using writable strings or QUtil::unsigned_char_pointer. If you have implemented your own pipelines, you should change your write method to take a const pointer. 2022-05-01 Jay Berkenbilt <ejb@ql.org> * JSON: add reactors to the JSON parser, making it possible to react to JSON parsing events as they occur and to block the results from being stored. This makes it possible to incrementally parse arbitrarily large JSON inputs. 2022-04-30 Jay Berkenbilt <ejb@ql.org> * QPDFWriter: change encryption API calls - Remove deprecated versions of setR*EncryptionParameters methods from before qpdf 8.4.0 - Replace setR2EncryptionParameters with setR2EncryptionParametersInsecure - Replace setR3EncryptionParameters with setR3EncryptionParametersInsecure - Replace setR4EncryptionParameters with setR4EncryptionParametersInsecure * C API: change encryption API calls to match C++ interface - Remove pre-8.4.0 functions: - qpdf_set_r3_encryption_parameters - qpdf_set_r4_encryption_parameters - qpdf_set_r5_encryption_parameters - qpdf_set_r6_encryption_parameters - Add "_insecure" to insecure encryption triggers: - Replace void qpdf_set_r2_encryption_parameters with qpdf_set_r2_encryption_parameters_insecure - Replace void qpdf_set_r3_encryption_parameters2 with qpdf_set_r3_encryption_parameters_insecure - Replace void qpdf_set_r4_encryption_parameters2 with qpdf_set_r4_encryption_parameters_insecure * Make attempting to write encrypted files that use RC4 (40-bit or 128-bit without AES) an error rather than a warning when --allow-weak-crypto is not specified. Fixes #576. 2022-04-24 Jay Berkenbilt <ejb@ql.org> * Bug fix: "removeAttachment" in the job JSON now takes an array of strings instead of a string. It should have taken an array of strings since the corresponding command-line argument, --remove-attachment, is repeatable. Fixes #693. * Deprecate QPDFObjectHandle::replaceOrRemoveKey -- it does and always has done the same thing as replaceKey. 2022-04-23 Jay Berkenbilt <ejb@ql.org> * Add a new QPDF::warn method that takes the parameters of QPDFExc's constructor except for the filename, which is taken from the QPDF object. This is a shorter way to issue warnings on behalf of a QPDF object. * Add new method QUtil::is_explicit_utf8 that tests whether a string is explicitly marked as being UTF-8 encoded, as allowed by the PDF 2.0 spec. Such a string starts with the bytes 0xEF 0xBB 0xBF, which is the UTF-8 encoding of U+FEFF. * Add new method QUtil::get_next_utf8_codepoint as a low-level helper for iterating through the UTF-8 characters in a byte string. 2022-04-16 Jay Berkenbilt <ejb@ql.org> * Breaking CLI change: the default value for --json is now "latest" rather than "1". At this moment, "1" is the latest version, but version "2" will be added before the release of qpdf 11. * Perform code cleanup including some source-compatible but not binary compatible changes to function signatures, use of anonymous namespaces, and use of "= default" and "= delete" in declarations. 2022-04-09 Jay Berkenbilt <ejb@ql.org> * Replace PointerHolder with std::shared_ptr through the QPDF API. A backward-compatible interface is provided and enabled by default with a warning that can be turned off. See "Smart Pointers" in the "Design and Library Notes" section of the manual for information including a detailed migration process to assist with migrating code that uses the qpdf library. 2022-04-03 Jay Berkenbilt <ejb@ql.org> * Add automatic code formatting with clang-format. See "Code Formatting" in the "Contributing to qpdf" chapter of the manual. 2022-03-19 Jay Berkenbilt <ejb@ql.org> * 10.6.3.0cmake1: unofficial release * Conversion of build system to cmake. This change doesn't include any user-visible functional changes to the library API or CLI but completely replaces the build system. Details can be found in the manual in the "Building and Installing QPDF" and "Notes for Packagers" sections, especially "Converting From autoconf to cmake" in "Building and Installing QPDF". Highlights of the changes can be found in the release notes. 2022-03-08 Jay Berkenbilt <ejb@ql.org> * 10.6.3: release * Use Windows 2022 github runners and therefore Visual Studio 2022 to create Windows distributions * Fix DLL export issue with mingw (Windows) 2022-03-07 Jay Berkenbilt <ejb@ql.org> * Minor internal changes to assist with building in other environments: rename internal bits.icc to qpdf/bits_functions.hh (not part of public API), enforce reordering of header files to prevent jpeglib.h from interfering with other headers, remove an unused header that was accidentally added in 10.6.0 but never referenced by any code. * Make build work and tests work when NDEBUG is defined. This involved a few changes to some test files but no changes to any library code. 2022-02-25 Jay Berkenbilt <ejb@ql.org> * Bug fix in JSON parser: accept \/ in a string as valid input per JSON spec even though we don't translate / to \/ on output. 2022-02-22 Jay Berkenbilt <ejb@ql.org> * Recognize PDF strings explicitly marked as UTF-8 as allowed by the PDF 2.0 spec. Fixes #654. 2022-02-18 Jay Berkenbilt <ejb@ql.org> * Bug fix: when generating appearance streams, the font size was substituted incorrectly from /DA if Tf was absent or the number preceding Tf was out of range. Fixes #655. 2022-02-16 Jay Berkenbilt <ejb@ql.org> * 10.6.2: release 2022-02-15 Jay Berkenbilt <ejb@ql.org> * Fix asymmetrical logic between QPDFObjectHandle::newUnicodeString() and QPDFObjectHandle::getUTF8Val(). The asymmetrical logic didn't matter before fixing the PDF Doc transcoding bugs. * When analyzing PDF strings, recognize UTF-16LE as UTF-16. The PDF spec only allows UTF-16BE, but most readers seem to allow both. Fixes #649. * Bug fix: 10.6.0 inadvertently removed an unknown/undocumented CLI parsing feature, which has been restored in 10.6.2. Fixes #652. * Don't map 0x18 through 0x1f, 0x7f, 0x9f, or 0xad as fixed points when transcoding UTF-8 to PDFDoc. These code points have different meanings in those two encoding systems. Fixes #650. 2022-02-11 Jay Berkenbilt <ejb@ql.org> * 10.6.1: release * Fix some compilation issues from use of abs without including proper headers. 2022-02-09 Jay Berkenbilt <ejb@ql.org> * 10.6.0: release * Fix one more PDF doc encoding omission: 0xAD is also undefined. Fixes #637. 2022-02-08 Jay Berkenbilt <ejb@ql.org> * Bug fix: when splitting pages with --split-pages or selecting pages with --pages, set the output PDF version to the maximum of all the input PDF versions. This is a fix to QPDFJob. If you are creating output PDF files yourself from multiple inputs, you will need to code the same thing. The new PDFVersion object, its updateIfGreater() method, and the new QPDF and QPDFWriter methods described below make this very easy to do. Fixes #610. * Add new class PDFVersion for more convenient comparison of PDF version numbers from the %!PDF header. * Add QPDF::getVersionAsPDFVersion() to return the PDF version and extension together as a PDFVersion object instead of a string. * Add a QPDFWriter::setMinimumPDFVersion() that takes a PDFVersion object. 2022-02-06 Jay Berkenbilt <ejb@ql.org> * Pl_Buffer and QPDFWriter: add getBufferSharedPointer(), which turns a PointerHolder<Buffer> but will return a std::shared_ptr<Buffer> in qpdf 11. * From m-holger: add getKeyIfDict(), which calls getKey for dictionaries and returns null if called on null. This is for easier access to optional, lower-level dictionaries. 2022-02-05 Jay Berkenbilt <ejb@ql.org> * Add several new accessors to QPDFObjectHandle: the bool getValueAsX(X&) accessors allow an alternative way to retrieve values from QPDFObjectHandle objects and can result in more concise code in many situations. Thanks to m-holger for the contribution. * Add qpdf_oh_new_binary_unicode_string and qpdf_oh_get_binary_utf8_value to the C API. This makes it possible to handle UTF-8-encoded strings with embedded NUL characters. Thanks to m-holger for the contribution. * Add a global user-defined string literal "_qpdf" as a shorthand for QPDFObjectHandle::parse, allowing you to create QPDFObjectHandle objects with QPDFObjectHandle oh = "<</Some (PDF)>>"_qpdf; * Expose QPDF::emptyPDF to the C API as qpdf_empty_pdf() * Add comments letting people know that the version string returned by QPDF::QPDFVersion and qpdf_get_qpdf_version is static. * Add QUtil::make_unique_cstr to return a std::unique_ptr<char[]> as an alternative to QUtil::copy_string and QUtil::make_shared_cstr. 2022-02-04 Jay Berkenbilt <ejb@ql.org> * New preprocessor symbols QPDF_MAJOR_VERSION, QPDF_MINOR_VERSION, QPDF_PATCH_VERSION as numbers and QPDF_VERSION as a string. These can be used for feature testing in code. These are in qpdf/DLL.h, which is included by every header that adds to the public API. Since these constants are introduced in version 10.6, it's important for them to be in a header that everyone already includes so you don't have to try to include a header that won't be there. * PointerHolder: add a get() method and a use_count() method for compatibility with std::shared_ptr. In qpdf 11, qpdf's APIs will switch to using std::shared_ptr instead of PointerHolder, though there will be a PointerHolder class with a backward-compatible API. To ease the transition, we are adding get() now with the same semantics as std::shared_ptr's get. Note that there is a difference in behavior: const PointerHolder has always behaved incorrectly. const PointerHolder objects only returned const pointers. This is wrong. If you want a const pointer, use PointerHolder<T const>. A const PointerHolder just shouldn't allow its pointer to be reassigned. The new get() method behaves correctly in that calling get() on a const PointerHolder to a non-const pointer returns a non-const pointer. This is the way regular pointers behave. 2022-02-01 Jay Berkenbilt <ejb@ql.org> * Major refactor: all functionality from the qpdf CLI is now available for library users using the QPDFJob class. See comments in include/qpdf/QPDFJob.hh and a new chapter about QPDFJob in the manual. QPDFJob provides fluent interfaces for setting options that exactly map to command-line arguments. There are also methods for initializing QPDFJob from an argv array and from a JSON object. * A light C API around basic QPDFJob functionality is in include/qpdf/qpdfjob-c.h.p * Add new functions version of QUtil::call_main_from_wmain that takes a constant argv array. 2022-01-31 Jay Berkenbilt <ejb@ql.org> * Have --json-help just output the JSON object, leaving a description to --help and the manual. * The --json flag now takes a version number as an optional parameter. The default will remain version 1 for compatibility until the release of qpdf 11, after which it will become "latest". At this time, there's only version 1, but a version 2 may appear in a future qpdf. 2022-01-28 Jay Berkenbilt <ejb@ql.org> * Add QPDFUsage exception, which is thrown by QPDFJob to indicate command-line usage or job configuration errors. 2022-01-22 Jay Berkenbilt <ejb@ql.org> * Add QUtil::make_shared_cstr to return a std::shared_ptr<char> instead of a char* like QUtil::copy_string * JSON: for (qpdf-specific, not official) "schema" checking, add the ability to treat missing fields as optional. Also ensure that values in the schema are dictionary, array, or string. * Add convenience methods isNameAndEquals and isDictionaryOfType to QPDFObjectHandle with corresponding functions added to the C API. Thanks to m-holger for the contribution. 2022-01-17 Jay Berkenbilt <ejb@ql.org> * Add JSON::parse. Now qpdf's JSON class implements a general-purpose JSON parser and serializer, but there are better options for general use. This is really designed for qpdf's internal use and is set up to be compatible with qpdf's existing API and to hook into a planned JSON-based API to the QPDFJob class. * Add isDictionary and isArray to JSON 2022-01-11 Jay Berkenbilt <ejb@ql.org> * Major overhaul of documentation and help for the qpdf command-line tool. qpdf --help is now broken into topics rather than being one great wall of text, and the command-line arguments are indexed in the manual. The entire text of the "Running qpdf" chapter has been reviewed thoroughly. Many thanks once again to M. Holger for a detailed review and editorial assistance with the manual. * Bug fix: add missing characters from PDF doc encoding. Fixes #606. 2021-12-29 Jay Berkenbilt <ejb@ql.org> * Add method QUtil::file_can_be_opened 2021-12-21 Jay Berkenbilt <ejb@ql.org> * 10.5.0: release * Add documentation link to top-level README * Discontinue inclusion of the pre-built documentation in the source distribution. Consult the packaging documentation in the manual for details. The file README-doc.txt is installed in the doc directory by default and contains information that users will need to know to find the documentation. 2021-12-19 Jay Berkenbilt <ejb@ql.org> * C API: clarify documentation around string lengths. Add two new methods: qpdf_oh_get_binary_string_value and qpdf_oh_new_binary_string to make the need to handle the length and data separate in more explicit in cases in which the string data may contain embedded null characters. 2021-12-17 Jay Berkenbilt <ejb@ql.org> * C API: simplify error handling for uncaught errors (never in a released version) and clarify documentation in qpdf-c.h around error handling. See qpdf-c.h for details, including how to check for errors and the new function qpdf_silence_errors. * C API: expose getTypeCode and getTypeName from QPDFObjectHandle. Fixes #597. * C API: add functions for working with stream data. Search for "STREAM FUNCTIONS" in qpdf-c.h. Fixes #596. * QPDFObjectHandle object types have been moved from QPDFObject::object_type_e to qpdf_object_type_e (defined in Constants.h). Old values are available for backward compatibility. * Add Pl_Buffer::getMallocBuffer() to initialize a buffer with malloc in support of the C API 2021-12-16 Jay Berkenbilt <ejb@ql.org> * Add several functions to the C API for working with pages. C wrappers around several of the "Legacy" page operations from QPDFObjectHandle.hh have been added. See "PAGE FUNCTIONS" in qpdf-c.h for details. Fixes #594. 2021-12-12 Jay Berkenbilt <ejb@ql.org> * Convert documentation from docbook to reStructuredText/Sphinx. 2021-12-10 Jay Berkenbilt <ejb@ql.org> * Handle bitstream overflow errors more gracefully. Fixes #581. * C API: add qpdf_get_object_by_id, qpdf_make_indirect_object, and qpdf_replace_object, exposing the corresponding methods in QPDF and QPDFObjectHandle. Fixes #588. * Add missing QPDF_DLL to QPDFObjectHandle::addTokenFilter so that it is actually accessible as part of the public interface as intended. Fixes #580. * C API: Overhaul how errors are handle the C API's object handle interfaces. Clarify documentation regarding object accessors and how type errors and warnings are handled. Many cases that used to crash code that used the C API can now be trapped and will be written stderr if not trapped. See qpdf-c.h for details. * C API: Add qpdf_oh_new_uninitialized to explicitly create uninitialized object handles. * Add new error code qpdf_e_object that is used for exceptions (including warnings) that are caused by using QPDFObjectHandle methods on object handles of the wrong type. 2021-12-02 Jay Berkenbilt <ejb@ql.org> * C API: Add qpdf_oh_is_initialized. * C API: Add qpdf_get_last_string_length to return the length of the last string returned. This is necessary in order to fully retrieve values of strings that may contain embedded null characters. * C API: Add qpdf_oh_new_object to clone an object handle. Change implemented by m-holger in #587. 2021-11-16 Jay Berkenbilt <ejb@ql.org> * 10.4.0: release 2021-11-10 Jay Berkenbilt <ejb@ql.org> * Add --allow-weak-crypto option to suppress warnings about use of weak cryptographic algorithms. Update documentation around this issue. Fixes #358. 2021-11-07 Jay Berkenbilt <ejb@ql.org> * Relax xref recovery logic a bit so that files whose objects are either missing endobj or have endobj at other than the beginning of a line can still be recovered. Fixes #573. 2021-11-04 Jay Berkenbilt <ejb@ql.org> * Add support for OpenSSL 3. Fixes #568. The OpenSSL version is detected at compile-time. If you want to build with OpenSSL 3 on a system that has OpenSSL 1 installed, you can run configure like this (or similar to this depending on how you installed openssl3): pc_openssl_CFLAGS=-I/path/to/openssl3/include \ pc_openssl_LIBS='-L/path/to/openssl3/lib64 -lssl -lcrypto' \ ./configure where /path/to/openssl3 is wherever your OpenSSL 3 distribution is installed. You may also need to set the LD_LIBRARY_PATH environment variable if it's not installed in a standard location. * Add range check in QPDFNumberTreeObjectHelper (fuzz issue 37740). * Add QIntC::range_check_subtract to do range checking on subtraction, which has different boundary conditions from addition. * Bug fix: fix crash that could occur under certain conditions when using --pages with files that had form fields. Fixes #548. * Add an extra check to the library to detect when foreign objects are inserted directly (instead of using <function>QPDF::copyForeignObject</function>) at the time of insertion rather than when the file is written. Catching the error sooner makes it much easier to locate the incorrect code. 2021-11-03 Jay Berkenbilt <ejb@ql.org> * Bug fix: make overlay/underlay work on a page with no resource dictionary. Fixes #527. 2021-11-02 Jay Berkenbilt <ejb@ql.org> * Add QPDF::findPage to the public API. This is primarily to help improve the efficiency of code that wraps the qpdf library, such as pikepdf. Fixes #516. * zlib-flate: warn and exit with code 3 when there is corrupted input data even when decompression is possible. We do this in the zlib-flate CLI so that it can be more reliably used to test the validity of zlib streams, but we don't warn by default in qpdf itself because PDF files in the wild exist with this problem and other readers appear to tolerate it. There is a PDF in the qpdf test suite (form-filled-by-acrobat.pdf) that was written by a version of Adobe Acrobat that exhibits this problem. Fixes #562. * Add Pl_Flate::setWarnCallback to make it possible to be notified of data errors that are recoverable but still indicate invalid data. * Improve error reporting when someone forgets the -- after --pages. Fixes #555. 2021-05-12 Jay Berkenbilt <ejb@ql.org> * Bug fix: ensure we don't overflow any string bounds while handling completion, even when we are given bogus input values. Fixes #441. 2021-05-09 Jay Berkenbilt <ejb@ql.org> * Improve performance of preservation of object streams by avoiding unnecessary traversal of objects when there are no object streams. 2021-05-08 Jay Berkenbilt <ejb@ql.org> * 10.3.2: release * Fix problem that caused the generated manual from being included in the Windows distributions. Fixes #521. * Fix 11-year-old bug of leaving unreferenced objects in preserved object streams. Fixes #520. 2021-04-17 Jay Berkenbilt <ejb@ql.org> * Portability fix: use tm_gmtoff rather than global timezone variable if available to get timezone offset. This fixes compilation on BSD and also results in a daylight saving time-aware offset for Linux or other GNU systems. Fixes #515. 2021-04-05 Jay Berkenbilt <ejb@ql.org> * When adding a page, if the page already exists, make a shallow copy of the page instead of throwing an exception. This makes the behavior of adding a page from the library consistent with what the CLI does and also with what the library does if it starts with a file that already has a duplicated page. Note that this means that, in some cases, the page you pass to addPage or addPageAt (either in QPDF or QPDFPageDocumentHelper) will not be the same object that actually gets added. (This has actually always been the case.) That means that, if you are going to do subsequent modification on the page, you should retrieve it again. 2021-03-11 Jay Berkenbilt <ejb@ql.org> * 10.3.1: release * Bug fix: allow /DR to be direct in /AcroForm 2021-03-04 Jay Berkenbilt <ejb@ql.org> * 10.3.0: release * The last several changes are in support of fixing more complex cases of keeping form fields working properly through page copying operations. Fixes #509. * Deprecated QPDFAcroFormDocumentHelper::copyFieldsFromForeignPage -- use QPDFAcroFormDocumentHelper::fixCopiedAnnotations instead. The API for dealing with annotations and form fields around copying pages is extremely complex and very hard to get right. It is planned for a future version of qpdf to have a higher level interface for dealing with copying pages around and preserving document-level constructs. * Add QPDFAcroFormDocumentHelper::getFieldsWithQualifiedName for returning a list of fields by name. * Add QPDFAcroFormDocumentHelper::addAndRenameFormFields to add a collection of fields while ensuring that, within the collection, fields with the same name continue to have the same name, but that they don't conflict with exiting fields in the document. * Add QPDFAcroFormDocumentHelper::setFormFieldName for changing the name of a form field in a manner that preserves QPDFAcroFormDocumentHelper's cache. 2021-03-03 Jay Berkenbilt <ejb@ql.org> * Handle /DR properly when copying form fields. This is a significant rework of the form field copying from 10.2.0. It ensures that when copy fields from different files, we resolve any conflicting names in resources. * Add QPDFMatrix::operator== * Add QPDFObjectHandle::makeResourcesIndirect 2021-03-02 Jay Berkenbilt <ejb@ql.org> * Add an optional resource_names argument to getUniqueResourceName for added efficiency. * Add conflict detection QPDFObjectHandle::mergeResources. 2021-03-01 Jay Berkenbilt <ejb@ql.org> * Improve code that finds unreferenced resources to ignore names in the content stream that are not fonts or XObjects. This should reduce the number of cases when qpdf needlessly decides not to remove unreferenced resources. Hopefully it doesn't create any new bugs where it removes unreferenced resources that it isn't supposed to. * Add QPDF::numWarnings() -- useful to tell whether any warnings were issued by a specific bit of code. 2021-02-26 Jay Berkenbilt <ejb@ql.org> * Bug fix: QPDFFormFieldObjectHelper was mis-handling /DA, /Q, and /DR in ways that usually didn't matter but were still wrong. /DA and /Q were being found in the field hierarchy, but if not found, the default values in the /AcroForm dictionary were not being used. /DR was being treated as an inherited field in the field dictionary, which is wrong. It is actually supposed to come from the /AcroForm dictionary. We were getting away with this since many popular form writers seem to copy it to the field as well, even though the spec makes no mention of doing this. To support this, QPDFFormFieldObjectHelper::getDefaultResources was added. 2021-02-25 Jay Berkenbilt <ejb@ql.org> * Update StreamDataProvider examples to use copyStream() when they want to get to the original stream data from the provider. Prior to 10.2.0, we had to copy the stream to another QPDF, but now we can just use copyStream(). * Bug fix/behavior change: when QPDF::replaceObject or QPDF::swapObjects is called, existing QPDFObjectHandle instances will now notice the change. This removes a long-standing source of bugs and confusing behavior. 2021-02-23 Jay Berkenbilt <ejb@ql.org> * 10.2.0: release * The test for the input and output files being the same wasn't implemented correctly for --split-pages since the specified output file is really a pattern, not the actual output file. 2021-02-22 Jay Berkenbilt <ejb@ql.org> * From qpdf CLI, --pages and --split-pages will properly preserve interactive form functionality. Fixes #340. * Add QPDFAcroFormDocumentHelper::copyFieldsFromForeignPage to copy form fields from a foreign page into the current file. (This method didn't work and was deprecated in 10.3.0.) * Add QPDFFormFieldObjectHelper::getTopLevelField to get the top-level field for a given form field. * Update pdf-overlay-page example to include copying of annotations. * Add a new version of QPDFPageObjectHelper::placeFormXObject that initializes the transformation matrix that was used so you don't have to call both placeFormXObject and getMatrixForFormXObjectPlacement. 2021-02-21 Jay Berkenbilt <ejb@ql.org> * From qpdf CLI, --overlay and --underlay will copy annotations and form fields from overlay/underlay file. Fixes #395. * Add QPDFPageObjectHelper::copyAnnotations, which copies annotations and, if applicable, associated form fields, from one page to another, possibly transforming the rectangles. * Bug fix: --flatten-rotation now applies the required transformation to annotations on the page. * Add QPDFAcroFormDocumentHelper::transformAnnotations to apply a transformation to a group of annotations. * Add QPDFObjGen::unparse() * Add QPDFObjectHandle::copyStream() for making a copy of a stream within the same QPDF instance. * Allow QPDFObjectHandle::newArray and QPDFObjectHandle::newFromMatrix take QPDFMatrix as well as QPDFObjectHandle::Matrix * Make member variables a--f of QPDFMatrix public 2021-02-20 Jay Berkenbilt <ejb@ql.org> * Allow --rotate=0 to clear rotation from a page. 2021-02-18 Jay Berkenbilt <ejb@ql.org> * Add QPDFAcroFormDocumentHelper::addFormField, which adds a new form field, initializing the AcroForm dictionary if needed. * Add QPDFPageObjectHelper::getMatrixForFormXObjectPlacement, which returns the transformation matrix required to map from a form field's coordinate system into a specific rectangle within the page. * Add QUtil::path_basename to get last element of a path. * Add examples/pdf-attach-file.cc to illustrate new file attachment method and also new parse that takes indirect objects. 2021-02-17 Jay Berkenbilt <ejb@ql.org> * Allow optional numeric argument to --collate. If --collate=n is given, pull n pages from the first file, n pages from the second file, etc., until we run out of pages. 2021-02-15 Jay Berkenbilt <ejb@ql.org> * Add a version of QPDFObjectHandle::parse that takes a QPDF* as context so that it can parse strings containing indirect object references. 2021-02-14 Jay Berkenbilt <ejb@ql.org> * Add new versions of QPDFObjectHandle::replaceStreamData that take std::function objects for cases when you need something between a static string and a full-fledged StreamDataProvider. Using this with QUtil::file_provider is a very easy way to create a stream from the contents of a file. 2021-02-12 Jay Berkenbilt <ejb@ql.org> * Move formerly internal QPDFMatrix class to the public API. This class provides convenience methods for working with transformation matrices. * QUtil::double_to_string: trim trailing zeroes by default, and add option to not trim trailing zeroes. This causes a syntactic but semantically preserving change in output when doubles are converted to strings. The library uses double_to_string in only a few places. In practice, output will be different (trailing zeroes removed) in code that creates form XObjects (mostly generation of appearance streams for form fields as well as overlay and underlay) and in the flatten rotation code that was added in qpdf 10.1. 2021-02-10 Jay Berkenbilt <ejb@ql.org> * Require a C++-14 compiler. * Detect loops when adding when reading outlines dictionary upon initialization of QPDFOutlineDocumentHelper (fuzz issue 30507). * Add "attachments" as an additional json key, and add some information about attachments to the json output. * Add new command-line arguments for operating on attachments: --list-attachments, --add-attachment, --remove-attachment, --copy-attachments-from. See --help and manual for details. 2021-02-09 Jay Berkenbilt <ejb@ql.org> * Add methods to QUtil for working with PDF timestamp strings: pdf_time_to_qpdf_time, qpdf_time_to_pdf_time, get_current_qpdf_time. 2021-02-08 Jay Berkenbilt <ejb@ql.org> * Add helper classes for file attachments: QPDFEmbeddedFileDocumentHelper, QPDFFileSpecObjectHelper, QPDFEFStreamObjectHelper. See their header files for details. 2021-02-07 Jay Berkenbilt <ejb@ql.org> * Add new functions QUtil::pipe_file and QUtil::file_provider for sending the contents of a file through a pipeline as binary data. 2021-02-04 Jay Berkenbilt <ejb@ql.org> * Add new option --password-file=file for reading the decryption password from a file. file may be "-" to read from standard input. Fixes #499. * By default, give an error if a user attempts to encrypt a file with a 256-bit key, a non-empty user password, and an empty owner password. Such files are insecure since they can be opened with no password. To allow explicit creation of files like this, pass the new --allow-insecure option. Thanks to github user RobK88 for a detailed analysis and for reporting this issue. Fixes #501. 2021-02-02 Jay Berkenbilt <ejb@ql.org> * Bug fix: if a form XObject lacks a resources dictionary, consider any names in that form XObject to be referenced from the containing page. This is compliant with older PDF versions. Also detect if any form XObjects have any unresolved names and, if so, don't remove unreferenced resources from them or from the page that contains them. Fixes #494. 2021-01-31 Jay Berkenbilt <ejb@ql.org> * Bug fix: properly handle strings if they appear in inline image dictionaries while externalizing inline images. 2021-01-30 Jay Berkenbilt <ejb@ql.org> * Add examples/pdf-name-number-tree.cc to illustrate new name/number tree API and new array/dictionary iterator API. 2021-01-29 Jay Berkenbilt <ejb@ql.org> * Add methods to QPDFObjectHandle that provide a C++ iterator API, including C++11 range-for iteration, over arrays and dictionaries. With this, you can do for (auto i: dict_oh.ditems()) { // i.first is a string, i.second is a QPDFObjectHandle } for (auto i: array_oh.aitems()) { // i is a QPDFObjectHandle } * QPDFObjectHandle::is* methods to check type now return false on uninitialized objects rather than crashing or throwing a logic error. 2021-01-24 Jay Berkenbilt <ejb@ql.org> * Implement remove for name and number trees as well as exposing remove and insertAfter methods for iterators. With this addition, qpdf now has robust read/write support for name and number trees. 2021-01-23 Jay Berkenbilt <ejb@ql.org> * Add an insert method to QPDFNameTreeObjectHelper and QPDFNumberTreeObjectHelper. * QPDFNameTreeObjectHelper and QPDFNumberTreeObjectHelper will automatically repair broken name and number trees by default. This behavior can be turned off. * Change behavior of QPDFObjectHandle::newUnicodeString so that it encodes ASCII or PDFDocEncoding if those encodings will support all the characters in the string, resorting to UTF-16 only if the other encodings are insufficient. This is a cleaner implementation of the intention of encoding strings for use outside of contents and results in fewer instances of ASCII strings being needlessly encoded as UTF-16. This change may cause qpdf to generate different output from the same input when form field values are set using methods from QPDFFormFieldObjectHelper. 2021-01-16 Jay Berkenbilt <ejb@ql.org> * Add new constructors for QPDFNameTreeObjectHelper and QPDFNumberTreeObjectHelper that take a QPDF object so they can create indirect objects and issue warnings. The old constructors are deprecated and will be removed in qpdf 11. Just pass in the owning QPDF of the object handle used to initialize the helpers. * Re-implement QPDFNameTreeObjectHelper and QPDFNumberTreeObjectHelper to be much more efficient and to have an iterator-based API in addition to the existing one. This makes it possible to use "range-for" loops over these helpers and to iterate through name and number trees without creating a map containing all the keys and values, which is slow and potentially consumes a lot of memory. * Add warn() to QPDF's public API. 2021-01-11 Jay Berkenbilt <ejb@ql.org> * Fix very old error in code that was finding attachment streams. Probably this error never mattered, but the code was still not exactly right. 2021-01-06 Jay Berkenbilt <ejb@ql.org> * Give warnings instead of segfaulting if a QPDF operation is attempted after calling closeInputSource(). Fixes #495. 2021-01-05 Jay Berkenbilt <ejb@ql.org> * 10.1.0: release 2021-01-04 Jay Berkenbilt <ejb@ql.org> * When qpdf CLI extracts pages, it now only attempts to remove unreferenced resources from the pages that it is keeping. This change dramatically reduces the time it takes to extract a small number of pages from a large, complex file. * Move getNext()->write() calls in some pipelines to ensure that state gates properly reset even if the next pipeline's write throws an exception (fuzz issue 28262). 2021-01-03 Jay Berkenbilt <ejb@ql.org> * Don't include -o nospace with zsh completion setup so file completion works normally. Fixes #473. 2021-01-02 Jay Berkenbilt <ejb@ql.org> * Make QPDFPageObjectHelper methods pipeContents, parseContents, and addContentTokenFilter work with form XObjects. * Rename some QPDFPageObjectHelper methods and make them support form XObjects as well as pages. The old names will be preserved from compatibility. - pipePageContents -> pipeContents - parsePageContents -> parseContents * Add QPDFObjectHandle::parseAsContents to apply ParserCallbacks to a form XObject. * QPDFPageObjectHelper::externalizeInlineImages can be called with form XObjects as well as pages. * Bug fix: QPDFPageObjectHelper::externalizeInlineImages was not descending into form XObjects on a page. It now does this by default. In the extremely unlikely event that anyone was actually depending on the old behavior, it is available by passing shallow=true to the externalizeInlineImages call. * Bug fix: QPDFObjectHandle::filterPageContents was broken for pages with an array of content streams. This caused externalize-inline-images to also be broken for this case. 2021-01-01 Jay Berkenbilt <ejb@ql.org> * Add methods to QPDFPageObjectHelper: forEachXObject, forEachImage, forEachFormXObject to call a function on each XObject (or image or form XObject) in a page or form XObject, possibly recursing into nested form XObjects. * Add method QPDFPageObjectHelper::getFormXObjects to return a map of keys to form XObjects (non-recursively) from a page or form XObject. * Add method QPDFObjectHandle::isImage to test whether an object is an image. 2020-12-31 Jay Berkenbilt <ejb@ql.org> * QPDFPageObjectHelper::removeUnreferencedResources can now be called with a QPDFPageObjectHelper created from a form XObject. The method already recursed into form XObjects. * Rename some QPDFPageObjectHelper methods and make them support form XObjects as well as pages. The old names will be preserved from compatibility. - getPageImages -> getImages - filterPageContents -> filterContents * Add QPDFObjectHandle::isFormXObject to test whether an object is a form XObject. 2020-12-30 Jay Berkenbilt <ejb@ql.org> * Add QPDFPageObjectHelper::flattenRotation and --flatten-rotation option to the qpdf CLI. The flattenRotation method removes any /Rotate key from a page dictionary and implements the same rotation by modifying the page's contents such that the various page boxes are altered and the page renders identically. This can be used to work around buggy PDF applications that don't properly handle page rotation. The --flatten-rotation option to the qpdf CLI calls flattenRotation for every page. 2020-12-26 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::setFilterOnWrite, which can be used to tell QPDFWriter not to filter a stream on output even if it can. You can use this to prevent QPDFWriter from touching a stream (either uncompressing or compressing) that you have optimized or otherwise ensured looks exactly the way you want it, even if decode level or stream compression would otherwise cause QPDFWriter to modify the stream. * Add ostream << for QPDFObjGen. (Don't ask why it took 7.5 years for me to decide to do this.) 2020-12-25 Jay Berkenbilt <ejb@ql.org> * Refactor write code to eliminate an extra full traversal of objects in the file and to remove assumptions that preclude stream references from appearing in /DecodeParms of filterable streams. This results in an approximately 8% performance reduction in write times. 2020-12-23 Jay Berkenbilt <ejb@ql.org> * Allow library users to provide their own decoders for stream filters by deriving classes from QPDFStreamFilter and registering them using QPDF::registerStreamFilter. Registered stream filters provide code to validate and interpret /DecodeParms for a specific /Filter and also to provide a pipeline that will decode. Note that it is possible to encode to a filter type that is not supported even without this feature. See examples/pdf-custom-filter.cc for an example of using custom stream filters. 2020-12-22 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::makeDirect(bool allow_streams) -- if allow_streams is true, preserve indirect references to streams rather than throwing an exception. This allows the object to be made as direct as possible while preserving stream references. 2020-12-20 Jay Berkenbilt <ejb@ql.org> * Add qpdf_register_progress_reporter method to C API, corresponding to QPDFWriter::registerProgressReporter. Fixes #487. 2020-11-28 Jay Berkenbilt <ejb@ql.org> * Add new functions to the C API for manipulating QPDFObjectHandles. The new functions allow creation and modification of objects, which brings a lot of additional power to the C API. See include/qpdf/qpdf-c.h for details and examples/pdf-c-objects.c for a simple example. 2020-11-21 Jay Berkenbilt <ejb@ql.org> * 10.0.4: release * Fix QIntC::range_check to handle negative numbers properly (fuzz issue 26994). 2020-11-11 Jay Berkenbilt <ejb@ql.org> * Treat a direct page object as a runtime error rather than a logic error since it is actually possible to create a file that has this (fuzz issue 27393). 2020-11-09 Jay Berkenbilt <ejb@ql.org> * Handle "." appearing in --pages not preceded by a numeric range as a special case in command-line parsing code. 2020-11-04 Jay Berkenbilt <ejb@ql.org> * Ignore the value of the offset/generation field in an xref entry for a deleted object. Also attempt file recovery on lower-level exceptions thrown while reading the xref table. Fixes #482. 2020-10-31 Jay Berkenbilt <ejb@ql.org> * 10.0.3: release * Don't enter extension initialization in QPDFWriter on a direct object. Fixes stack overflow in pathological case of /Root being a direct object (fuzz issue 26761). * My previous fix to #449 (handling foreign streams with indirect objects in /Filter and/or /DecodeParms) was incorrect and caused other problems. There is a now a correct fix to the original problem. Fixes #478. 2020-10-27 Jay Berkenbilt <ejb@ql.org> * 10.0.2: release 2020-10-25 Jay Berkenbilt <ejb@ql.org> * When signing distribution files, generate sha256 checksums instead of md5, sha1, and sha512. sha256 seems to be more widely used, and there's no reason to use md5 or sha1 anymore. * Official Windows releases are now built using the openssl crypto provider. The native provider is still available for selection at runtime using the QPDF_CRYPTO_PROVIDER environment variable. * Bug fix: --no-warn was not suppressing some warnings that might be generated by --split-pages. 2020-10-23 Jay Berkenbilt <ejb@ql.org> * Bug fix: when concatenating content streams, insert a newline if needed to prevent the last token from the old stream from being merged with the first token of the new stream. Qpdf was mistakenly concatenating the streams without regard to the specification that content streams are to be broken on token boundaries. Fixes #444. * fix-qdf: handle empty streams better with ignore newline by treating them as empty even though, technically, a blank line would be required inside the Stream. This just makes it easier to add place-holder empty streams while editing qdf files by hand. 2020-10-22 Jay Berkenbilt <ejb@ql.org> * Fix memory leak that could occur if objects in object streams were resolved more than once and the objects within the object streams contained circular references. This leak could be triggered when qpdf was run with --object-streams=generate on files that already had object streams containing circular references (fuzz issue 23642). * Add QIntC::range_check for checking to see whether adding two numbers together will cause an overflow. * Fix loop detection problem when traversing page thumbnails during optimization (fuzz issue 23172). 2020-10-21 Jay Berkenbilt <ejb@ql.org> * Bug fix: properly handle copying foreign streams that have indirect /Filter or /DecodeParms keys when stream data has been replaced. The circumstances leading to this bug are very unusual but would cause qpdf to either generate an internal error or some other kind of warning situation if it would occur. Fixes #449. * Qpdf's build and CI has been migrated from Azure Pipelines (Azure DevOps) to GitHub Actions. * Remove some fuzz files that triggered Mal/PDFEx-H with some virus scanners. There's plenty of coverage in the fuzz corpus without these files, and it's a nuisance to have virus checkers remove them. Fixes #460. * Ensure that numeric conversion is not affected by the user's global locale setting. Fixes #459. * Add qpdf-<version>-linux-x86_64.zip to the list of built distributions. This is a simple zip file that contains just the qpdf executables and the dependent shared libraries that would not ordinarily be present on a base system. This minimal binary distribution works as is when used as a Lambda layer in AWS and could be suitable for inclusion in a docker image or other standalone Linux/x86_64 environment where you want minimal support for running the qpdf executable. Fixes #352. 2020-10-20 Jay Berkenbilt <ejb@ql.org> * Add --warning-exit-0 option to the qpdf command line. When specified, qpdf will exit with a status of 0 rather than 3 when there are warnings without errors. Combine with --no-warn to completely ignore warnings. * Bug fix: fix further cases in which errors were written to stdout. Fixes #438. * Build option: add --disable-rpath option to ./configure, which disables passing -rpath to the linker when building shared libraries with libtool. Fixes #422. 2020-10-16 Jay Berkenbilt <ejb@ql.org> * Accept pull request that improves how the Windows native crypto provider is obtained. * Accept pull request that improves performance in processing files in memory. * Accept pull requests that improve openssl configuration and error reporting. * Build using GitHub Actions. The intention is that this will replace Azure Pipelines as the official CI for qpdf for the next release. 2020-10-15 Jay Berkenbilt <ejb@ql.org> * Make many minor improvements to the build process and code health, including fixing a lgtm warning and compiler warnings from newer version of gcc and MSVC toolchains. Add several cosmetic improvements to build output in CI. * Added LL_FMT to config.h.in. This is populated automatically by autoconf, but if build with your own build system, you may need to define it as whatever the format string needed by printf for long long is. Usually this is "%lld", but it can be "%I64d" for some older Windows-based compilers. 2020-04-29 Jay Berkenbilt <ejb@ql.org> * Bug fix: qpdf --check was writing errors and warnings reported by checkLinearization to stdout instead of stderr. Fixes #438. 2020-04-09 Jay Berkenbilt <ejb@ql.org> * 10.0.1: release 2020-04-08 Jay Berkenbilt <ejb@ql.org> * Bug fix: qpdf 10.0.0 introduced a bug in which QPDFObjectHandle::getStreamData would return the raw data when called on an unfilterable stream instead of throwing an exception like it's supposed to. Fixes #425. 2020-04-07 Jay Berkenbilt <ejb@ql.org> * Improve pdf-invert-images example to show a pattern of copying streams into another QPDF object to enable a stream data provider to access the original stream data. * Fix error that caused a compilation error with clang. Fixes #424. 2020-04-06 Jay Berkenbilt <ejb@ql.org> * 10.0.0: release * Move random number generation into the crypto providers. The old os-based secure random number generation with fallback to insecure random number generation (only if allowed at build time) has moved into the native crypto provider. If using other providers (currently gnutls or openssl), random number generation will use those libraries. The old interfaces for supplying your own random number generator are still in place. Fixes #418. * Source-level incompatibility: remove QUtil::srandom. There was no reason to ever call this, and it didn't do anything unless insecure random number generation was compiled in, which it is not by default. If you were calling this, just remove the call because it wasn't doing anything anyway. * Add openssl crypto provider, contributed by Dean Scarff. This provider is implemented using OpenSSL and also works with BoringSSL. 2020-04-04 Jay Berkenbilt <ejb@ql.org> * Add a new provideStreamData method for StreamDataProvider that allows a success code to be returned and that accepts the suppress_warnings and will_retry methods. This makes it possible to have a StreamDataProvider call pipeStreamData and propagate its results back. This change allows better error handling and recovery when objects are copied from other files and when "immediate copy from" is enabled. * When copying foreign streams, the same type of recovery from streams with filtering errors is performed as when dealing with streams in the original input. This could happen, for example, if you are using the --pages option to take pages from another file and that file has errors in it. * Add a new version of QPDFObjectHandle::pipeStreamData whose return value indicates overall success or failure rather than whether nor not filtering was attempted. It should have always been this way. This change was done in a backward-compatible fashion. Previously existing pipeStreamData methods' return values mean the same as always. * Add "objectinfo" section to json output. In this release, information about whether each object is a stream or not is provided. There's otherwise no way to tell conclusively from the json output. Over time, other computed information about objects may be added here. * Add new option --remove-unreferenced-resources that takes auto, yes, or no as options. This tells qpdf whether to attempt to remove unreferenced resources from pages when doing page splitting operations. Prior to this change, the default was to attempt to remove unreferenced resources, but this operation was very slow, especially for large and complex files. The new default is "auto", which tells qpdf to analyze the file for shared resources. This is a relatively quick test. If no shared resources are found, then we don't attempt to remove unreferenced resources, because unreferenced resources never occur in files without shared resources. To force qpdf to look for and remove unreferenced resources, use --remove-unreferenced-resources=yes. The option --preserve-unreferenced-resources is now a synonym for --remove-unreferenced-resources=no. * Use std::atomic for unique ID generation internally within the library. This eliminates the already extremely low chance of a collision, improves thread safety, and removes a dependency on a random number generator. Thanks to Dean Scarff for the contribution. 2020-04-03 Jay Berkenbilt <ejb@ql.org> * Allow qpdf to be built on systems without wchar_t. All "normal" systems have wchar_t because it is part of the C++ standard, but there are some stripped down environments that don't have it. See README.md (search for wchar_t) for instructions and a discussion. Fixes #406. * Add two extra optional arguments to QPDFPageObjectHelper::placeFormXObject to control whether the placed item is allowed to be shrunk or expanded to fit within or maximally fill the destination rectangle. Prior to this change, placeFormXObject might shrink it but would never expand it. * When calling the C API, accept any non-zero value as TRUE rather than just 1. This appears to resolve issues on Windows when calling some versions of the DLL directly from other languages. 2020-04-02 Jay Berkenbilt <ejb@ql.org> * Add method QPDFObjectHandle::unsafeShallowCopy for copying only top-level dictionary keys or array items. See comments in QPDFObjectHandle.hh for when this should be used. * Remove Members class indirection for QPDFObjectHandle. Those are copied and assigned too often, and that change caused a very substantial performance hit. 2020-03-31 Jay Berkenbilt <ejb@ql.org> * When detecting unreferenced images during page splitting, if any XObjects are form XObjects, recursively descend into them and remove any unreferenced objects from them too. Fixes #373. * Add QPDFObjectHandle::filterAsContents, which filters a stream's data as if it were page contents. This can be useful to filter form XObjects the same way we would filter page contents. * If QPDF_EXECUTABLE is set, use it as the path to qpdf for purposes of completion. This variable is only read during the execution of `qpdf --completion-zsh` and `qpdf --completion-bash`. It is not used during the actual evaluation of completions. 2020-02-22 Jay Berkenbilt <ejb@ql.org> * Update pdf-set-form-values.cc to use and mention generateAppearance, which hadn't been added when the example was originally created. * Detect, warn, and correct the case of /Pages in the document catalog incorrectly pointing to a page or intermediate node instead of the root of the pages tree. Fixes #398. 2020-01-26 Jay Berkenbilt <ejb@ql.org> * 9.1.1: release * Bug fix: in qdf mode, do not write out any XRef streams that may have appeared in the original file. These are usually unreferenced, but with --preserve-unreferenced, they could be written out, which breaks fix-qdf's assumption that there is at most one XRef stream and that it appears at the end of the file. Fixes #386. * Bug fix: when externalizing inline images, a colorspace value that was a lookup key in the page's /Resource -> /ColorSpace dictionary was not properly handled. Fixes #392. * Add "encrypt" key to the json output. This contains largely the same information as given by --show-encryption but in a consistent, parseable format. * Add options --is-encrypted and --requires-password. These can be used with files, including encrypted files with unknown passwords, to determine whether or not a file is encrypted and whether a password is required to open the file. The --requires-password option can also be used to determine whether a supplied password is correct. Information is supplied through exit codes, making these options particularly useful for shell scripts. Fixes #390. 2020-01-14 Jay Berkenbilt <ejb@ql.org> * Fix for Windows being unable to acquire crypt context with a new keyset. Thanks to Cloudmersive for the fix. Fixes #387. * Rewrite fix-qdf in C++. This means fix-qdf is a proper executable now, and there is no longer a runtime requirement on perl. * Add QUtil::call_main_from_wmain, a helper function that can be called in the body of wmain to convert UTF-16 arguments to UTF-8 arguments and then call another main function. 2020-01-13 Jay Berkenbilt <ejb@ql.org> * QUtil::read_lines_from_file: add new versions that use FILE*, use FILE* instead if std::ifstream internally to support correct handling of Unicode filenames in Windows, and add the option to preserve line endings. 2019-11-17 Jay Berkenbilt <ejb@ql.org> * 9.1.0: release * This is the first version of qpdf that requires C++-11. 2019-11-09 Jay Berkenbilt <ejb@ql.org> * 9.1.rc1: release * Improve behavior of wildcard expansion for msvc executable when run from the Windows cmd.exe shell. Unlike in UNIX environments, Windows leaves it up to the executable to expand its own wildcards. Fixes #224. * Allow :even or :odd to be appended to numeric ranges for --pages, --rotate, and other options that take page ranges. * When reading /P from the encryption dictionary, use static_cast instead of QIntC to convert the value to a signed integer. The value of /P is a bit field, and PDF files have been found in the wild where /P is represented as an unsigned integer even though the spec states that it is a signed 32-bit value. By using static_cast, we allow qpdf to compensate for writers that incorrectly represent the correct bit field as an unsigned value. Fixes #382. 2019-11-05 Jay Berkenbilt <ejb@ql.org> * Add support for pluggable crypto providers, enabling multiple implementations of the cryptographic functions needed by qpdf. This feature was added by request of Red Hat, which recognized the use of qpdf's native crypto implementations as a potential security liability, preferring instead to get all crypto functionality from a third-party library that receives a lot of scrutiny. However it was also important to me to not impose any unnecessary third party dependencies on my users or packagers, some of which build qpdf for lots of environments, some of which may not easily support gnutls. Starting in qpdf 9.1.0, it is be possible to build qpdf with both the native and gnutls crypto providers or with either in isolation. In support of this feature, new classes QPDFCryptoProvider and QPDFCryptoImpl have been added to the public interface. See QPDFCryptoImpl.hh for details about adding your own crypto provider and QPDFCryptoProvider.hh for details about choosing which one is used. Note that selection of crypto providers is invisible to anyone who doesn't explicitly care. Neither end users nor developers have to be concerned about it. * The environment variable QPDF_CRYPTO_PROVIDER can be used to override qpdf's default choice of crypto provider. The --show-crypto flag to the qpdf CLI can be used to present a list of supported crypto providers with the default provider always listed first. * Add gnutls crypto provider. Thanks to Zdenek Dohnal for contributing the code that I ultimately used in the gnutls crypto provider and for engaging in an extended discussion about this feature. Fixes #218. 2019-10-22 Jay Berkenbilt <ejb@ql.org> * Incorporate changes from Masamichi Hosoda <trueroad@trueroad.jp> to properly handle signature in the following ways: - Always represent /Contents in a signature dictionary as a hex string - Do not compress signature dictionaries when generating object streams - Do not encrypt/decrypt the /Contents field of the signature dictionary when creating or reading encrypted files * Incorporate changes from Masamichi Hosoda <trueroad@trueroad.jp> to add additional methods for making it possible to gain deeper insight into cross reference tables and object renumbering. These new API calls make it possible for applications to go into PDF files created by qpdf and make changes to them that go beyond working with the PDF at the object level. The specific use case for these changes was to write an external tool to perform digital signature, but there could be other uses as well. New methods include the following, all of which are described in their respective headers: - QPDF::getXRefTable() - QPDFObjectHandle::getParsedOffset() - QPDFWriter::getRenumberedObjGen(QPDFObjGen) - QPDFWriter::getWrittenXRefTable() 2019-10-12 Jay Berkenbilt <ejb@ql.org> * 9.0.2: release * Change the name of the temporary file used by --replace-input to work with arbitrary absolute or relative paths without requiring path splitting logic. Fixes #365. 2019-09-20 Jay Berkenbilt <ejb@ql.org> * 9.0.1: release 2019-09-19 Jay Berkenbilt <ejb@ql.org> * When converting an array to a Rectangle, ensure that llx <= urx and lly <= ury. This prevents flatten-annotations from flipping fields whose coordinates are messed up in the input. Fixes #363. * Warn when duplicated dictionary keys are found during parsing. The behavior remains as before: later keys override earlier ones. However, this generates a warning now rather than being silently ignored. Fixes #345. 2019-09-17 Jay Berkenbilt <ejb@ql.org> * Fix a few integer warnings for big-endian systems. * QIntC tests: don't assume char is signed. Fixes #361. 2019-08-31 Jay Berkenbilt <ejb@ql.org> * 9.0.0: release * Add QPDF::anyWarnings() method to find out whether there have been any warnings without resetting the list. * Add QPDF::closeInputSource() method to release the input source so the input file can be deleted or renamed. * Add methods rename_file and remove_file to QUtil. 2019-08-24 Jay Berkenbilt <ejb@ql.org> * Add QPDF::userPasswordMatched() and QPDF::ownerPasswordMatched() methods so it can be determined separately whether the supplied password matched the user password, the owner password, or both. Fixes #159. 2019-08-23 Jay Berkenbilt <ejb@ql.org> * Add --recompress-streams option to qpdf and QPDFWriter::setRecompressFlate to cause QPDFWriter to recompress streams that are already compressed with /FlateDecode. * Add option Pl_Flate::setCompressionLevel to globally set the zlib compression level used by all Pl_Flate pipelines. * Add --compression-level flag to qpdf to set the zlib compression level. When combined with --recompress-flate, this will cause most of qpdf's streams to use the maximum compression level. This results in only a very small amount of savings in size that comes at a fairly significant performance cost, but it could be useful for archival files or other cases where every byte counts and creation time doesn't matter so much. Note that using --object-streams=generate in combination with these options gives you the biggest advantage. Fixes #113. 2019-08-22 Jay Berkenbilt <ejb@ql.org> * In QPDFObjectHandle::ParserCallbacks, in addition to handleObject(QPDFObjectHandle), allow developers to override handleObject(QPDFObjectHandle, size_t offset, size_t length). If this method appears instead, it is called with the offset of the object in the content stream (which may be concatenated from an array of streams) and the length of the object. Intervening whitespace and comments are not included in offset and length. * Add method QPDFObjectHandle::ParserCallbacks::contentSize(size_t). If defined, it is called by the content stream parser before the first call to handleObject, and the argument is the total size in bytes of the content streams. * Add QPDFObjectHandle::isDirectNull() -- a const method that allows determining whether an object is a literal null without attempting to resolve it. * Stop replacing indirect references to null with literal null in arrays when writing output with QPDFWriter. 2019-08-19 Jay Berkenbilt <ejb@ql.org> * Accept (and warn for) extraneous whitespace preceding the xref table. Fixes #341. * Accept (and warn for) extraneous whitespace between the stream keyword and newline. Fixes #329. * Properly handle name tokens containing # not preceding two hexadecimal digits. Such names are invalid in PDF >= 1.2 but valid in PDF 1.0 and 1.1. Prior to this fix, qpdf's behavior was to treat such tokens as an error for PDF >= 1.2, but for older PDF tokens, the name was silently accepted, and when the name token was written out, the # was changed to #23, which is the correct way to represent a # character. This behavior was problematic for several reasons: one is that, ordinarily, content streams are not parsed, so this would cause things like image references whose names contained # to break. Also, even if the input file was 1.0 or 1.1, there's no guarantee that the output file wouldn't be written at a new version, resulting in invalid name tokens. The new behavior is to issue a warning upon encountering such a token but to accept it, regardless of the PDF version. Such tokens are written out properly as well. Additionally, the warning message indicates that the tokens are invalid for PDF >= 1.2. Fixes #332. * Non-compatible API change: remove QPDFTokenizer::allowPoundAnywhereInName(). There were a lot of problems with this. When it was used, any name tokens read would always be modified on output, which is never the correct behavior. This method used to signal QPDFTokenizer to not treat # specially in name tokens, which resulted in the incorrect behavior whose fix is described in the preceding item. 2019-08-18 Jay Berkenbilt <ejb@ql.org> * When traversing the pages tree, if an invalid /Type key is encountered, fix it. This is not done for all operations, but it will be done for any case in which getAllPages is called. This includes all page-based CLI operations. (Hopefully) Fixes #349. 2019-08-17 Jay Berkenbilt <ejb@ql.org> * Change internal implementation of QPDF arrays to use sparse arrays, which results in using much less memory for arrays with large numbers of nulls. Various files have been encountered in the wild that contains thousands of arrays with millions of nulls. Fixes #305, #311. 2019-07-03 Jay Berkenbilt <ejb@ql.org> * Non-compatible API change: change QPDFOutlineDocumentHelper::getTopLevelOutlines and QPDFOutlineObjectHelper::getKids to return a std::vector instead of a std::list of QPDFOutlineObjectHelper objects. This is to work around bugs with some compilers' STL implementations that are choking with list here. There's no deep reason for these to be lists instead of vectors. Fixes #297. 2019-06-22 Jay Berkenbilt <ejb@ql.org> * Handle encrypted files with missing or invalid /Length entries in the encryption dictionary. * QPDFWriter: allow calling set*EncryptionParameters before calling setFilename. Fixes #336. * It now works to run --completion-bash and --completion-zsh when qpdf is started from an AppImage. * Provided a more useful error message when Windows can't get security context. Thanks to user zdenop for supplying some code. Fixes #286. * Favor PointerHolder over manual memory allocation in shippable code where possible. Fixes #235. * If pkg-config is available, use it to local libjpeg and zlib. If not, fall back to old behavior. Fixes #324. * The "make install" target explicitly sets a mode rather than relying the user's umask. Fixes #326. * When a file has linearization warnings but no errors, qpdf --check and --check-linearization now exit with code 3 instead of 2. Fixes #50. * Add new function QUtil::read_file_into_memory. 2019-06-21 Jay Berkenbilt <ejb@ql.org> * When supported, qpdf builds with -fvisibility=hidden, which removes non-exported symbols from the shared library in a manner similar to how Windows DLLs work. This is better for performance and also better for safety and protection of private interfaces. See https://gcc.gnu.org/wiki/Visibility. *NOTE*: If you are getting linker errors trying to catch exceptions or derive things from a base class in the qpdf library, it's possible that a QPDF_DLL_CLASS declaration is missing somewhere. Please report this as a bug at https://github.com/qpdf/qpdf/issues. * Source-level incompatibility: remove the version QPDF::copyForeignObject with an unused boolean parameter. If you were, for some reason, calling this, just take the parameter away. * Source-level incompatibility: remove the version QPDFTokenizer::expectInlineImage with no arguments. It didn't produce correct inline images. This is a very low-level routine. There is little reason to call it outside of qpdf's lexical engine. * Source-level incompatibility: rename QUtil::strcasecmp to QUtil::str_compare_nocase. This is a non-compatible change, but QUtil::strcasecmp is hardly the most important part of qpdf's API. The reason for this change is that strcasecmp is a macro on some systems, and that was causing problems when QUtil.hh was included in certain circumstances. Fixes #242. 2019-06-20 Jay Berkenbilt <ejb@ql.org> * Enable compilation with additional warnings for integer conversion and sign (-Wsign-conversion, -Wconversion for gcc and similar; -W3 for msvc) if supported. These warnings are on by default can be turned off by passing --disable-int-warnings * Fix all integer sign and conversion warnings. This makes all integer type conversions that have potential data loss explicit with calls that do range checks and raise an exception. * Change out_bufsize argument to Pl_Flate's constructor for int to unsigned int for compatibility with underlying zlib implementation. * Change QPDFObjectHandle::pipeStreamData's encode_flags argument from unsigned long to int since int is the underlying type of the enumerated type values that are passed to it. This change should be invisible to virtually all code unless you are compiling with strict warning flags and explicitly casting to unsigned long. * Add methods to QPDFObjectHandle to return the value of Integer objects as int and unsigned int with range checking and fallback behavior to avoid silent underflow/overflow conditions. * Add functions to QUtil to convert unsigned integers to strings, avoiding implicit conversion between unsigned and signed integer types. * Add QIntC.hh, containing integer type converters that do range checking. 2019-06-18 Jay Berkenbilt <ejb@ql.org> * Remove previously submitted qpdf_read_memory_fuzzer as it is a small subset of qpdf_fuzzer. 2019-06-15 Jay Berkenbilt <ejb@ql.org> * Update CI (Azure Pipelines) to run tests with some sanitizers. * Do "ideal integration" with oss-fuzz. This includes adding a better fuzzer with a seed corpus and adding automated tests of the fuzzer with the test data. * When parsing files, while reading an object, if there are too many consecutive errors without enough intervening successes, give up on the specific object. This reduces cases in which very badly damaged files send qpdf into a tail spin reading one character at a time and reporting warnings. 2019-06-13 Jay Berkenbilt <ejb@ql.org> * Perform initial integration of Google's oss-fuzz project by copying the fuzzer someone from Google already did into the qpdf repository and adding build support. This shift in control is in preparation for an ideal integration with oss-fuzz. 2019-06-09 Jay Berkenbilt <ejb@ql.org> * When /DecodeParms is an empty list, ignore it on read and delete it on write. Fixes #331. 2019-05-18 Jay Berkenbilt <ejb@ql.org> * 8.4.2: release 2019-05-16 Jay Berkenbilt <ejb@ql.org> * Fix memory error in Windows-only code from typo. Fixes #330. 2019-04-27 Jay Berkenbilt <ejb@ql.org> * 8.4.1: release 2019-04-20 Jay Berkenbilt <ejb@ql.org> * When qpdf --version is run, it will detect if the qpdf CLI was built with a different version of qpdf than the library. This usually indicates that multiple versions of qpdf are installed and that the library path is not set up properly. This situation sometimes causes confusing behavior for users who are not actually running the version of qpdf they think they are running. * Add parameter --remove-page-labels to remove page labels from output. In qpdf 8.3.0, the behavior changed so that page labels were preserved when merging and splitting files. Some users were relying on the fact that if you ran qpdf --empty --pages ... all page labels were dropped. This option makes it possible to get that behavior if it is explicitly desired. Fixes #317. * Add parameter --keep-files-open-threshold to override the maximum number of files that qpdf will allow to be kept open at once. Fixes #288. * Handle Unicode characters in filenames properly on Windows. The changes to support Unicode on the CLI in Windows broke Unicode filenames on that platform. Fixes #298. * Slightly tighten logic that determines whether an object is a page. The previous logic was sometimes failing to preserve annotations because they were passing the overly loose test for whether something was a page. This fix has a slight risk of causing some extraneous objects to be copied during page splitting and merging for erroneous PDF files whose page objects contain invalid types or are missing the /Type key entirely, both of which would be invalid according to the PDF specification. * Revert change that included preservation of outlines (bookmarks) in --split-pages. The way it was implemented caused a very significant performance penalty when splitting pages with outlines. We need a better solution that only copies the relevant items, not the whole tree. 2019-03-11 Jay Berkenbilt <ejb@ql.org> * JSON serialization: add missing leading 0 to decimal values between -1 and 1. Fixes #308. 2019-02-01 Jay Berkenbilt <ejb@ql.org> * 8.4.0: release 2019-01-31 Jay Berkenbilt <ejb@ql.org> * Bug fix: do better pre-checks on images before optimizing; refuse to optimize images that can't be converted to JPEG because of colorspace or depth. * Add new options --externalize-inline-images, which converts inline images larger than a specified size to regular images, and --ii-min-bytes, which tweaks that size. * When optimizing images, inline images are now included in the optimization, first being converted to regular images. Use --keep-inline-images to exclude them from optimization. Fixes #278. * Add method QPDFPageObjectHelper::externalizeInlineImages, which converts inline images whose size is at least a specified amount to regular images. * Remove traces of acroread, which hasn't been available in Linux for a long time. 2019-01-30 Jay Berkenbilt <ejb@ql.org> * Do not include space after ID operator in inline image data. The token now correctly contains the image data, the EI operator, and the delimiter that precedes the EI operator. * Improve locating of an inline image's EI operator to correctly handle the case of EI appearing inside the image data. * Very low-level QPDFTokenizer API now includes an expectInlineImage method that takes an input stream, enabling it to locate an inline image's EI operator better. When this method is called, the inline image token returned will not contain the EI operator and will contain correct image data. This is called automatically everywhere within the qpdf library. Most user code will never have to use the low-level tokenizer API. If you use Pl_QPDFTokenizer, this will be done automatically for you. If you use the low-level API and call expectInlineImage, you should call the new version. 2019-01-29 Jay Berkenbilt <ejb@ql.org> * Bug fix: when returning an inline image token, the tokenizer no longer includes the delimiter that follows EI. The QPDFObjectHandle created from the token was correct. * Handle files with direct page objects, which is not allowed by the PDF spec but has been seen in the wild. Fixes #164. 2019-01-28 Jay Berkenbilt <ejb@ql.org> * Bug fix: when using --stream-data=compress, object streams and xref streams were not compressed. They were compressed if no --stream-data option was specified. Fixes #271. * When linearizing or getting the list of all pages in a file, replace duplicated page objects with a shallow copy of the page object. Linearization and all page manipulation APIs require page objects to be unique. Pages that were originally duplicated will still share contents and any other indirect resources. Fixes #268. 2019-01-26 Jay Berkenbilt <ejb@ql.org> * Add --overlay and --underlay options. Fixes #207. * Create examples/pdf-overlay-page.cc to demonstrate use of page/form XObject interaction * Add new methods QPDFPageObjectHelper::getFormXObjectForPage, which creates a form XObject equivalent to a page, and QPDFObjectHandle::placeFormXObject, which generates content stream code to placing a form XObject on a page. 2019-01-25 Jay Berkenbilt <ejb@ql.org> * Add new method QPDFObjectHandle::getUniqueResourceName() to return an unused key available to be used in a resource dictionary. * Add new method QPDFPageObjectHelper::getAttribute() that properly handles inherited attributes and allows for creation of a copy of shared attributes. This is very useful if you are getting an attribute of a page dictionary with the intent to modify it privately for that page. * Fix QPDFPageObjectHelper::getPageImages (and the legacy QPDFObjectHandle::getPageImages()) to properly handle images in inherited resources dictionaries. 2019-01-20 Jay Berkenbilt <ejb@ql.org> * Tweak the content code generated for variable text fields to better handle font sizes and multi-line text. * When generating appearance streams for variable text annotations, properly handle the cases of there being no appearance dictionary, no appearance stream, or an appearance stream with no BMC..EMC marker. * When flattening annotations, remove annotations from the file that don't have appearance streams. These were previously being preserved, but since they are invisible, there is no reason to preserve them when flattening annotations. 2019-01-19 Jay Berkenbilt <ejb@ql.org> * NOTE: qpdf CLI: some non-compatible changes were made to how qpdf interprets password arguments that contain Unicode characters that fall outside of ASCII. On Windows, the non-compatibility was unavoidable, as explained in the release notes. On all platforms, it is possible to get the old behavior if desired, though the old behavior would almost always result in files that other applications were unable to open. As it stands, qpdf should now be able to open passwords encrypted with a wide range of passwords that some other viewers might not handle, though even now, qpdf's Unicode password handling is not 100% complete. * Add --password-mode option, which allows fine-grained control of how password arguments are treated. This is discussed fully in the manual. Fixes #215. * Add option --suppress-password-recovery to disable the behavior of searching for a correct password by re-encoding the provided password. This option can be useful if you want to ensure you know exactly what password is being used. 2019-01-17 Jay Berkenbilt <ejb@ql.org> * When attempting to open an encrypted file with a password, if the password doesn't work, try alternative passwords created by re-interpreting the supplied password with different string encodings. This makes qpdf able to recover passwords with non-ASCII characters when either the decryption or encryption operation was performed with an incorrectly encoded password. * Fix data loss bug: qpdf was discarding referenced resources in the case in which a page's resource dictionary contained an indirect reference for either /Font or /XObject that contained fonts or XObjects not referenced on all pages that shared the resource. This was a "typo" in the code. The comment explained the correct behavior, and the code was clearly intended to handle this issue, but the implementation had an error in it. This is fixed by a single-line change, which can be found in git commit 4bc434000c42a7191e705c8a38216ca6743ad9ff. That commit can be used as a patch that applies cleanly against qpdf 8.1.0 and forward. The bug was introduced in version 8.1.0. For the record, this is the first bug in qpdf's history that could result in silent loss of data when processing a correct input file. Fixes #276. 2019-01-15 Jay Berkenbilt <ejb@ql.org> * Add QUtil::possible_repaired_encodings which, given a string, generates other strings that represent re-interpretation of the bytes in a different coding system. This is used to help recover passwords if the password string was improperly encoded on a different system due to user error or a software bug. 2019-01-14 Jay Berkenbilt <ejb@ql.org> * Add new CLI flags to 128-bit and 256-bit encryption: --assemble, --annotate, --form, and --modify-other to control encryption permissions with more granularity than was allowed with the --modify flag. Fixes #214. * Add new versions of QPDFWriter::setR{3,4,5,6}EncryptionParameters that allow individual setting of the various permission bits. The old interfaces are retained for backward compatibility. In the "C" API, add qpdf_set_r{3,4,5,6}_encryption_parameters2. The new interfaces use separate booleans for various permissions instead of the qpdf_r3_modify_e enumerated type, which set permission bits in predefined groups. * Add versions of utf8 to single-byte character transcoders that return a success code. 2019-01-13 Jay Berkenbilt <ejb@ql.org> * Add several more string transcoding and analysis methods to QUtil for bidirectional conversion between PDF Doc, Win Ansi, Mac Roman, UTF-6, and UTF-16 along with detection of valid UTF-8 and UTF-16. 2019-01-12 Jay Berkenbilt <ejb@ql.org> * In the --pages option, allow the same page to be specified more than once. You can now do "--pages A.pdf 1,1 --" or "--pages A.pdf 1 A.pdf 1" instead of having to use two different paths to specify A.pdf. Fixes #272. * Add QPDFPageObjectHelper::shallowCopyPage(). This method creates a new page object that is a "shallow copy" of the given page as described in the comments in QPDFPageObjectHelper. The resulting object has not been added anywhere but is ready to be passed to QPDFPageDocumentHelper::addPage of its own QPDF or another QPDF object. * Add QPDF::getUniqueId() method to return an identifier that is intended to be unique within the scope of all QPDF objects created by the calling application in a single run. * In --pages, allow "." as a replacement for the current input file, making it possible to say "qpdf A.pdf --pages . 1-3 --" instead of having to repeat the input filename. 2019-01-10 Jay Berkenbilt <ejb@ql.org> * Add new configure option --enable-avoid-windows-handle, which causes the symbol AVOID_WINDOWS_HANDLE to be defined. If set, we avoid using Windows I/O HANDLE, which is disallowed in some versions of the Windows SDK, such as for Windows phones. QUtil::same_file will always return false in this case. Only applies to Windows builds. * Add new method QPDF::setImmediateCopyFrom. When called on a source QPDF object, streams can be copied FROM that object to other ones without having to keep the source QPDF or its input source around. The cost is copying the streams into RAM. See comments in QPDF.hh for setImmediateCopyFrom for a detailed explanation. 2019-01-07 Jay Berkenbilt <ejb@ql.org> * 8.3.0: release * Add sample completion files in completions. These can be used by packagers to install on the system wherever bash and zsh keep their vendor-supplied completions. * Add configure flag --enable-check-autofiles, which is on by default. Packagers whose packaging systems automatically refresh autoconf or libtool files should pass --disable-check-autofiles to ./configure to suppress warnings about automatically generated files being outdated. 2019-01-06 Jay Berkenbilt <ejb@ql.org> * Remove the restriction in most cases that the source QPDF used in a copyForeignObject call has to stick around until the destination QPDF is written. The exceptional case is when the source stream gets is data using a QPDFObjectHandle::StreamDataProvider. For a more in-depth discussion, see comments around copyForeignObject in QPDF.hh. Fixes #219. 2019-01-05 Jay Berkenbilt <ejb@ql.org> * When generating appearances, if the font uses one of the standard, built-in encodings, restrict the character set to that rather than just to ASCII. This will allow most appearances to contain characters from the ISO-Latin-1 range plus a few additional characters. * Add methods QUtil::utf8_to_win_ansi and QUtil::utf8_to_mac_roman. * Add method QUtil::utf8_to_utf16. 2019-01-04 Jay Berkenbilt <ejb@ql.org> * Add new option --optimize-images, which recompresses every image using DCT (JPEG) compression as long as the image is not already compressed with lossy compression and recompressing the image reduces its size. The additional options --oi-min-width, --oi-min-height, and --oi-min-area prevent recompression of images whose width, height, or pixel area (width * height) are below a specified threshold. * Add new option --collate. When specified, the semantics of --pages change from concatenation to collation. See the manual for a more detailed discussion. Fixes #259. * Add new method QPDFWriter::getFinalVersion, which returns the PDF version that will ultimately be written to the final file. See comments in QPDFWriter.hh for some restrictions on its use. Fixes #266. * When unexpected errors are found while checking linearization data, print an error message instead of calling assert, which cause the program to crash. Fixes #209, #231. * Detect and recover from dangling references. If a PDF file contained an indirect reference to a non-existent object (which is valid), when adding a new object to the file, it was possible for the new object to take the object ID of the dangling reference, thereby causing the dangling reference to point to the new object. This case is now prevented. Fixes #240. 2019-01-03 Jay Berkenbilt <ejb@ql.org> * Add --generate-appearances flag to the qpdf command-line tool to trigger generation of appearance streams. * Fix behavior of form field value setting to handle the following cases: - Strings are always written as UTF-16 - Check boxes and radio buttons are handled properly with synchronization of values and appearance states * Define constants in qpdf/Constants.h for interpretation of annotation and form field flags * Add QPDFAnnotationObjectHelper::getFlags * Add many new methods to QPDFFormFieldObjectHelper for querying flags and field types * Add new methods for appearance stream generation. See comments in QPDFFormFieldObjectHelper.hh for generateAppearance() for a description of limitations. - QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded - QPDFFormFieldObjectHelper::generateAppearance * Bug fix: when writing form field values, always write string values encoded as UTF-16. * Add method QUtil::utf8_to_ascii, which returns an ASCII string for a UTF-8 string, replacing out-of-range characters with a specified substitute. 2019-01-02 Jay Berkenbilt <ejb@ql.org> * Add method QPDFObjectHandle::getResourceNames that returns a set of strings representing all second-level keys in a dictionary (i.e. all keys of all direct dictionary members). 2018-12-31 Jay Berkenbilt <ejb@ql.org> * Add --flatten-annotations flag to the qpdf command-line tool for annotation flattening. * Add methods for flattening form fields and annotations: - QPDFPageDocumentHelper::flattenAnnotations - integrate annotation appearance streams into page contents with special handling for form fields: if appearance streams are up to date (/NeedAppearances is false in /AcroForm), the /AcroForm key of the document catalog is removed. Otherwise, a warning is issued, and form fields are ignored. Non-form-field annotations are always flattened if an appearance stream can be found. - QPDFAnnotationObjectHelper::getPageContentForAppearance - generate the content stream fragment to render an appearance stream in a page's content stream as a form xobject. Called by flattenAnnotations. * Add method QPDFObjectHandle::mergeResources(), which merges resource dictionaries. See detailed description in QPDFObjectHandle.hh. * Add QPDFObjectHandle::Matrix, similar to QPDFObjectHandle::Rectangle, as a convenience class for six-element arrays that are used as matrices. 2018-12-23 Jay Berkenbilt <ejb@ql.org> * When specifying @arg on the command line, if the file "arg" does not exist, just treat this is a normal argument. This makes it easier to deal with files whose names start with the @ character. Fixes #265. * Tweak completion so it works with zsh as well using bashcompinit. 2018-12-22 Jay Berkenbilt <ejb@ql.org> * Add new options --json, --json-key, and --json-object to generate a json representation of the PDF file. This is described in more depth in the manual. You can also run qpdf --json-help to get a description of the json format. 2018-12-21 Jay Berkenbilt <ejb@ql.org> * Allow --show-object=trailer for showing the document trailer. * You can now use eval $(qpdf --completion-bash) to enable bash completion for qpdf. It's not perfect, but it works pretty well. 2018-12-19 Jay Berkenbilt <ejb@ql.org> * When splitting pages using --split-pages, the outlines dictionary and some supporting metadata are copied into the split files. The result is that all bookmarks from the original file appear, and those that point to pages that are preserved work while those that point to pages that are not preserved don't do anything. This is an interim step toward proper support for bookmark preservation in split files. * Add QPDFOutlineDocumentHelper and QPDFOutlineObjectHelper for handling outlines (bookmarks) including bidirectionally mapping between bookmarks and pages. Initially there is no support for modifying the outlines hierarchy. 2018-12-18 Jay Berkenbilt <ejb@ql.org> * New method QPDFObjectHandle::getJSON() returns a JSON object with a partial representation of the object. See QPDFObjectHandle.hh for a detailed description. * Add a simple JSON serializer. This is not a complete or general-purpose JSON library. It allows assembly and serialization of JSON structures with some restrictions, which are described in the header file. * Add QPDFNameTreeObjectHelper class. This class provides useful methods for dealing with name trees, which are discussed in section 7.9.6 of the PDF spec (ISO-32000). * Preserve page labels when merging and splitting files. Prior versions of qpdf simply preserved the page label information from the first file, which usually wouldn't make any sense in the merged file. Now any page that had a page number in any original file will have the same page number after merging or splitting. * Add QPDFPageLabelDocumentHelper class. This is a document helper class that provides useful methods for dealing with page labels. It abstracts the fact that they are stored as number trees and deals with interpolating intermediate values that are not in the tree. It also has helper functions used by the qpdf command line tool to preserve page labels when merging and splitting files. * Add QPDFNumberTreeObjectHelper class. This class provides useful methods for dealing with number trees, which are discussed in section 7.9.7 of the PDF spec (ISO-32000). Page label dictionaries are represented as number trees. * New method QPDFObjectHandle::wrapInArray returns the object itself if it is an array. Otherwise, it returns an array containing the object. This is useful for dealing with PDF data that is sometimes expressed as a single element and sometimes expressed as an array, which is a somewhat common PDF idiom. 2018-10-11 Jay Berkenbilt <ejb@ql.org> * Files generated by autogen.sh are now committed so that it is possible to build on platforms without autoconf directly from a clean checkout of the repository. The configure script detects if the files are out of date when it also determines that the tools are present to regenerate them. * Add build in Azure Pipelines, now that it is free for open source projects. 2018-08-18 Jay Berkenbilt <ejb@ql.org> * 8.2.1: release * Add new option --keep-files-open=[yn] to control whether qpdf keeps files open when merging. Prior to version 8.1.0, qpdf always kept all files open, but this meant that the number of files that could be merged was limited by the operating system's open file limit. Version 8.1.0 opened files as they were referenced, but this caused a major performance impact. Version 8.2.0 optimized the performance but did so in a way that, for local file systems, there was a small but unavoidable performance hit, but for networked file systems, the performance impact could be very high. Starting with version 8.2.1, the default behavior is that files are kept open if no more than 200 files are specified, but that the behavior can be explicitly overridden with the --keep-files-open flag. If you are merging more than 200 files but less than the operating system's max open files limit, you may want to use --keep-files-open=y. If you are using a local file system where the overhead is low and you might sometimes merge more than the OS limit's number of files, you may want to specify --keep-files-open=n. Fixes #237. 2018-08-16 Jay Berkenbilt <ejb@ql.org> * 8.2.0: release 2018-08-14 Jay Berkenbilt <ejb@ql.org> * For the mingw builds, change the name of the DLL import library from libqpdf.a to libqpdf.dll.a to avoid confusing it with a static library. This potentially clears the way for supporting a static library in the future, though presently, the qpdf Windows build only builds the DLL and executables. Fixes #225. 2018-08-13 Jay Berkenbilt <ejb@ql.org> * Add new class QPDFSystemError, derived from std::runtime_error, which is now thrown by QUtil::throw_system_error. This enables the triggering errno value to be retrieved. Fixes #221. 2018-08-12 Jay Berkenbilt <ejb@ql.org> * qpdf command line: add --no-warn option to suppress issuing warning messages. If there are any conditions that would have caused warnings to be issued, the exit status is still 3. * Rewrite the internals of Pl_Buffer to be much more efficient in use of memory at a very slight performance cost. The old implementation could cause memory usage to go out of control for files with large images compressed using the TIFF predictor. Fixes #228. 2018-08-05 Jay Berkenbilt <ejb@ql.org> * Bug fix: end of line characters were not properly handled inside strings in some cases. Fixes #226. * Bug fix: infinite loop on progress reporting for very small files. Fixes #230. 2018-08-04 Jay Berkenbilt <ejb@ql.org> * Performance fix: optimize page merging operation to avoid unnecessary open/close calls on files being merged. Fixes #217. * Add ClosedFileInputSource::stayOpen method, enabling a ClosedFileInputSource to stay open during manually indicated periods of high activity, thus reducing the overhead of frequent open/close operations. 2018-06-23 Jay Berkenbilt <ejb@ql.org> * 8.1.0: release 2018-06-22 Jay Berkenbilt <ejb@ql.org> * Bug fix: properly decrypt files with 40-bit keys that use revision 3 of the security handler. Prior to this, qpdf was reporting "invalid password" in this case. Fixes #212. * With --verbose, print information about each input file when merging files. * Add progress reporting to QPDFWriter. Programmatically, you can register a progress reporter with registerProgressReporter(). From the command line, passing --progress will give progress indicators in increments of no less than 1% as output files are written. Fixes #200. * Add new method QPDF::getObjectCount(). This gives an approximate (upper bound) account of objects in the QPDF object. * Don't leave files open when merging. This makes it possible merge more files at once than the operating system's open file limit. Fixes #154. * Add ClosedFileInputSource class, and input source that keeps its input file closed when not reading it. At the expense of some performance, this allows you to operate on many files without opening too many files at the operating system level. * Add new option --preserve-unreferenced-resources, which suppresses removal of unreferenced objects from page resource dictionaries during page splitting operations. 2018-06-21 Jay Berkenbilt <ejb@ql.org> * Add method QPDFPageObjectHelper::removeUnreferencedResources and also QPDFPageDocumentHelper::removeUnreferencedResources that calls the former on every page. This method removes any XObject or Font references from the page's resource dictionary if they are not referenced anywhere in any of the content streams. This significantly reduces the size of split files whose pages internally share resource dictionaries. Fixes #203. * The --rotate option to qpdf no longer requires an explicit page range. You can now rotate all pages of a document with qpdf --rotate=angle in.pdf out.pdf. Fixes #211. * Create examples/pdf-set-form-values.cc to illustrate use of interactive form helpers. * Added methods QPDFAcroFormDocumentHelper::setNeedAppearances and added methods to QPDFFormFieldObjectHelper to set a field's value, optionally updating the document to indicate that appearance streams need to be regenerated. * Added QPDFObject::newUnicodeString and QPDFObject::unparseBinary to allow for more convenient creation of strings that are explicitly encoded in UTF-16 BE. This is useful for creating Unicode strings that appear outside of content streams, such as in page labels, outlines, form field values, etc. 2018-06-20 Jay Berkenbilt <ejb@ql.org> * Added new classes QPDFAcroFormDocumentHelper, QPDFFormFieldObjectHelper, and QPDFAnnotationObjectHelper to assist with working with interactive forms in PDF files. At present, API methods for reading forms, form fields, and widget annotations have been added. It is likely that some additional methods for modifying forms will be added in the future. Note that qpdf remains a library whose function is primarily focused around document structure and metadata rather than content. As such, it is not expected that qpdf will have higher level APIs for generating form contents, but qpdf will hopefully gain the capability to deal with the bookkeeping aspects of wiring up all the objects, which could make it a useful library for other software that works with PDF interactive forms. PDF forms are complex, and the terminology around them is confusing. Please see comments at the top of QPDFAcroFormDocumentHelper.hh for additional discussion. * Added new classes QPDFPageDocumentHelper and QPDFPageObjectHelper for page-level API functions. These classes introduce a new API pattern of document helpers and object helpers in qpdf. The helper classes provide a higher level API for working with certain types of structural features of PDF while still staying true to qpdf's philosophy of not isolating the user from the underlying structure. Please see the chapter in the documentation entitled "Design and Library Notes" for additional discussion. The examples have also been updated to use QPDFPageDocumentHelper and QPDFPageObjectHelper when performing page-level operations. 2018-06-19 Jay Berkenbilt <ejb@ql.org> * New QPDFObject::Rectangle class will convert to and from arrays of four numerical values. Rectangles are used in various places within the PDF file format and are called out as a specific data type in the PDF specification. 2018-05-12 Jay Berkenbilt <ejb@ql.org> * In newline before endstream mode, an extra newline was not inserted prior to the endstream that ends object streams. Fixes #205. 2018-04-15 Jay Berkenbilt <ejb@ql.org> * Arbitrarily limit the depth of data structures represented by direct object. This is CVE-2018-9918. Fixes #202. 2018-03-06 Jay Berkenbilt <ejb@ql.org> * 8.0.2: release * Properly handle pages with no contents. Fixes #194. 2018-03-05 Jay Berkenbilt <ejb@ql.org> * Improve handling of loops while following cross reference tables. Fixes #192. 2018-03-04 Jay Berkenbilt <ejb@ql.org> * 8.0.1: release * On the command line when specifying page ranges, support preceding a page number by "r" to indicate that it should be counted from the end. For example, the range r3-r1 would indicate the last three pages of a document. 2018-03-03 Jay Berkenbilt <ejb@ql.org> * Ignore zlib data check errors while uncompressing streams. This is consistent with behaviors of other readers and enables handling of some incorrectly written zlib streams. Fixes #191. 2018-02-25 Jay Berkenbilt <ejb@ql.org> * 8.0.0: release 2018-02-17 Jay Berkenbilt <ejb@ql.org> * Fix QPDFObjectHandle::getUTF8Val() to properly handle strings that are encoded with PDF Doc Encoding. Fixes #179. * Add qpdf_check_pdf to the "C" API. This method just attempts to read the entire file and produce no output, making possible to assess whether the file has any errors that qpdf can detect. * Major enhancements to handling of type errors within the qpdf library. This fix is intended to eliminate those annoying cases where qpdf would exit with a message like "operation for dictionary object attempted on object of wrong type" without providing any context. Now qpdf keeps enough context to be able to issue a proper warning and to handle such conditions in a sensible way. This should greatly increase the number of bad files that qpdf can recover, and it should make it much easier to figure out what's broken when a file contains errors. * Error message fix: replace "file position" with "offset" in error messages that report lexical or parsing errors. Sometimes it's an offset in an object stream or a content stream rather than a file position, so this makes the error message less confusing in those cases. It still requires some knowledge to find the exact position of the error, since when it's not a file offset, it's probably an offset into a stream after uncompressing it. * Error message fix: correct some cases in which the object that contained a lexical error was omitted from the error message. * Error message fix: improve file name in the error message when there is a parser error inside an object stream. 2018-02-11 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::filterPageContents method to provide a different interface for applying token filters to page contents without modifying the ultimate output. 2018-02-04 Jay Berkenbilt <ejb@ql.org> * Changes listed on today's date are numerous and reflect significant enhancements to qpdf's lexical layer. While many nuances are discussed and a handful of small bugs were fixed, it should be emphasized that none of these issues have any impact on any output or behavior of qpdf under "normal" operation. There are some changes that have an effect on content stream normalization as with qdf mode or on code that interacts with PDF files lexically using QPDFTokenizer. There are no incompatible changes for normal operation. There are a few changes that will affect the exact error messages issued on certain bad files, and there is a small non-compatible enhancement regarding the behavior of manually constructed QPDFTokenizer::Token objects. Users of the qpdf command line tool will see no changes other than the addition of a new command-line flag and possibly some improved error messages. * Significant lexer (tokenizer) enhancements. These are changes to the QPDFTokenizer class. These changes are of concern only to people who are operating with PDF files at the lexical layer using qpdf. They have little or no impact on most high-level interfaces or the command-line tool. New token types tt_space and tt_comment to recognize whitespace and comments. this makes it possible to tokenize a PDF file or stream and preserve everything about it. For backward compatibility, space and comment tokens are not returned by the tokenizer unless QPDFTokenizer.includeIgnorable() is called. Better handling of null bytes. These are now included in space tokens rather than being their own "tt_word" tokens. This should have no impact on any correct PDF file and has no impact on output, but it may change offsets in some error messages when trying to parse contents of bad files. Under default operation, qpdf does not attempt to parse content streams, so this change is mostly invisible. Bug fix to handling of bad tokens at ends of streams. Now, when allowEOF() has been called, these are treated as bad tokens (tt_bad or an exception, depending on invocation), and a separate tt_eof token is returned. Before the bad token contents were returned as the value of a tt_eof token. tt_eof tokens are always empty now. Fix a bug that would, on rare occasions, report the offset in an error message in the wrong space because of spaces or comments adjacent to a bad token. Clarify in comments exactly where the input source is positioned surrounding calls to readToken and getToken. * Add a new token type for inline images. This token type is only returned by QPDFTokenizer immediately following a call to expectInlineImage(). This change includes internal refactoring of a handful of places that all separately handled inline images, The logic of detecting inline images in content streams is now handled in one place in the code. Also we are more flexible about what characters may surround the EI operator that marks the end of an inline image. * New method QPDFObjectHandle::parsePageContents() to improve upon QPDFObjectHandle::parseContentStream(). The parseContentStream method used to operate on a single content stream, but was fixed to properly handle pages with contents split across multiple streams in an earlier release. The new method parsePageContents() can be called on the page object rather than the value of the page dictionary's /Contents key. This removes a few lines of boiler-plate code from any code that uses parseContentStream, and it also enables creation of more helpful error messages if problems are encountered as the error messages can include information about which page the streams come from. * Update content stream parsing example (examples/pdf-parse-content.cc) to use new QPDFObjectHandle::parsePageContents() method in favor of the older QPDFObjectHandle::parseContentStream() method. * Bug fix: change where the trailing newline is added to a stream in QDF mode when content normalization is enabled (the default for QDF mode). Before, the content normalizer ensured that the output ended with a trailing newline, but this had the undesired side effect of including the newline in the stream data for purposes of length computation. QPDFWriter already appends a newline without counting in length for better readability. Ordinarily this makes no difference, but in the rare case of a page's contents being split in the middle of a token, the old behavior could cause the extra newline to be interpreted as part of the token. This bug could only be triggered in qdf mode, which is a mode intended for manual inspection of PDF files' contents, so it is very unlikely to have caused any actual problems for people using qpdf for production use. Even if it did, it would be very unusual for a PDF file to actually be adversely affected by this issue. * Add support for coalescing a page's contents into a single stream if they are represented as an array of streams. This can be performed from the command line using the --coalesce-contents option. Coalescing content streams can simplify things for software that wants to operate on a page's content streams without having to handle weird edge cases like content streams split in the middle of tokens. Note that QPDFObjectHandle::parsePageContents and QPDFObjectHandle::parseContentStream already handled split content streams. This is mainly to set the stage for new methods of operating on page contents. The new method QPDFObjectHandle::pipeContentStreams will pipe all of a page's content streams though a single pipeline. The new method QPDFObjectHandle.coalesceContentStreams, when called on a page object, will do nothing if the page's contents are a single stream, but if they are an array of streams, it will replace the page's contents with a single stream whose contents are the concatenation of the original streams. * A few library routines throw exceptions if called on non-page objects. These constraints have been relaxed somewhat to make qpdf more tolerant of files whose page dictionaries are not properly marked as such. Mostly exceptions about page operations being called on non page objects will only be thrown in cases where the operation had no chance of succeeding anyway. This change has no impact on any default mode operations, but it could allow applications that use page-level APIs in QPDFObjectHandle to be more tolerant of certain types of damaged files. * Add QPDFObjectHandle::TokenFilter class and methods to use it to perform lexical filtering on content streams. You can call QPDFObjectHandle::addTokenFilter on stream object, or you can call the higher level QPDFObjectHandle::addContentTokenFilter on a page object to cause the stream's contents to passed through a token filter while being retrieved by QPDFWriter or any other consumer. For details on using TokenFilter, please see comments in QPDFObjectHandle.hh. * Enhance the string, type QPDFTokenizer::Token constructor to initialize a raw value in addition to a value. Tokens have a value, which is a canonical representation, and a raw value. For all tokens except strings and names, the raw value and the value are the same. For strings, the value excludes the outer delimiters and has non-printing characters normalized. For names, the value resolves non-printing characters. In order to better facilitate token filters that mostly preserve contents and to enable developers to be mostly unconcerned about the nuances of token values and raw values, creating string and name tokens now properly handles this subtlety of values and raw values. When constructing string tokens, take care to avoid passing in the outer delimiters. This has always been the case, but it is now clarified in comments in QPDFObjectHandle.hh::TokenFilter. This has no impact on any existing code unless there's some code somewhere that was relying on Token::getRawValue() returning an empty string for a manually constructed token. The token class's operator== method still only looks at type and value, not raw value. For example, string tokens for <41> and (A) would still be equal because both are representations of the string "A". * Add QPDFObjectHandle::isDataModified method. This method just returns true if addTokenFilter has been called on the stream. It enables a caller to determine whether it is safe to optimize away piping of stream data in cases where the input and output are expected to be the same. QPDFWriter uses this internally to skip the optimization of not re-compressing already compressed streams if addTokenFilter has been called. Most developers will not have to worry about this as it is used internally in the library in the places that need it. If you are manually retrieving stream data with QPDFObjectHandle::getStreamData or QPDFObjectHandle::pipeStreamData, you don't need to worry about this at all. * Provide heavily annotated examples/pdf-filter-tokens.cc example that illustrates use of some simple token filters. * When normalizing content streams, as in qdf mode, issue warning about bad tokens. Content streams are only normalized when this is explicitly requested, so this has no impact on normal operation. However, in qdf mode, if qpdf detects a bad token, it means that either there's a bug in qpdf's lexer, that the file is damaged, or that the page's contents are split in a weird way. In any of those cases, qpdf could potentially damage the stream's contents by replacing carriage returns with newlines or otherwise messing with spaces. The mostly likely case of this would be an inline image's compressed data being divided across two streams and having the compressed data in the second stream contain a carriage return as part of its binary data. If you are using qdf mode just to look at PDF files in text editors, this usually doesn't matter. In cases of contents split across multiple streams, coalescing streams would eliminate the problem, so the warning mentions this. Prior to this enhancement, the chances of qdf mode writing incorrect data were already very low. This change should make it nearly impossible for qdf mode to unknowingly write invalid data. 2018-02-04 Jay Berkenbilt <ejb@ql.org> * Add QPDFWriter::setLinearizationPass1Filename method and --linearize-pass1 command line option to allow specification of a file into which QPDFWriter will write its intermediate linearization pass 1 file. This is useful only for debugging qpdf. qpdf creates linearized files by computing the output in two passes. Ordinarily the first pass is discarded and not written anywhere. This option allows it to be inspected. 2018-02-04 Jay Berkenbilt <ejb@ql.org> * 7.1.1: release * Bug fix: properly linearize files whose /ID has a length of other than 16 bytes. * Rename some test files to avoid files with three dots in their names. Fixes #173. * Fix various build and compilation issues on some platforms and compilers. Fixes #176, #172, #177 * Fix a few typos and clarify a few comments in header files. 2018-01-14 Jay Berkenbilt <ejb@ql.org> * 7.1.0: release * Allow raw encryption key to be specified in library and command line with the QPDF::setPasswordIsHexKey method and --password-is-hex-key option. Allow encryption key to be displayed with --show-encryption-key option. Thanks to Didier Stevens <didier.stevens@gmail.com> for the idea and contribution of one implementation of this idea. See his blog post at https://blog.didierstevens.com/2017/12/28/cracking-encrypted-pdfs-part-3/ for a discussion of using this for cracking encrypted PDFs. I hope that a future release of qpdf will include some additional recovery options that may also make use of this capability. 2018-01-13 Jay Berkenbilt <ejb@ql.org> * Fix lexical error: the PDF specification allows floating point numbers to end with ".". Fixes #165. * Fix link order in the build to avoid conflicts when building from source while an older version of qpdf is installed. Fixes #158. * Add support for TIFF predictor for LZW and Flate streams. Now all predictor functions are supported. Fixes #171. 2017-12-25 Jay Berkenbilt <ejb@ql.org> * Clarify documentation around options that control parsing but not output creation. Two options: --suppress-recovery and --ignore-xref-streams, were documented in the "Advanced Transformation Options" section of the manual and --help output even though they are not related to output. These are now described in a separate section called "Advanced Parsing Options." * Implement remaining PNG filters for decode. Prior versions could decode only the "up" filter. Now all PNG filters (sub, up, average, Paeth, optimal) are supported for decoding. Thanks to Tobias Hoffmann for providing a test PDF file that has images with all PNG filters along with different numbers of bits per sample and samples per pixel, and thanks to Casey Rojas for providing implementations of the remaining PNG filters. The implementation of the remaining PNG filters changed the interface to the private Pl_PNGFilter class, but this class's header file is not in the installation, and there is no public interface to the class. Within the library, the class is never allocated on the stack; it is only ever dynamically allocated. As such, this does not actually break binary compatibility of the library. 2017-09-15 Jay Berkenbilt <ejb@ql.org> * 7.0.0: release 2017-09-12 Jay Berkenbilt <ejb@ql.org> * Relicense qpdf under version 2.0 of the Apache License rather than version 2.0 of the Artistic License. Both are fine, but the Apache License is in more widespread use, and I like it a little better than Artistic-2.0. It is my intention that there be no change in what you can or can't do with qpdf. Versions of qpdf prior to version 7 were released under the terms of version 2.0 of the Artistic License. At your option, you may continue to consider qpdf to be licensed under those terms. Please see the manual for additional information. * Improve the error message that is issued when QPDFWriter encounters a stream that can't be decoded. In particular, mention that the stream will be copied without filtering to avoid data loss. * Add new methods to the C API to correspond to new additions to QPDFWriter: - qpdf_set_compress_streams - qpdf_set_decode_level - qpdf_set_preserve_unreferenced_objects - qpdf_set_newline_before_endstream 2017-08-25 Jay Berkenbilt <ejb@ql.org> * Re-implement parser iteratively to avoid stack overflow on very deeply nested arrays and dictionaries. Fixes #146. * Detect infinite loop while finding additional xref tables. Fixes #149. 2017-08-22 Jay Berkenbilt <ejb@ql.org> * 7.0.b1: release * Convert all README files to markdown. Names changed as follows: - README --> README.md - README.hardening --> README-hardening.md - README.maintainer --> README-maintainer.md - README-what-to-download.txt --> README-what-to-download.md - README-windows.txt --> README-windows.md The file README-windows-install.txt remains a text file. 2017-08-21 Jay Berkenbilt <ejb@ql.org> * Add support for writing PCLm files. Most of the work was done by Sahil Arora <sahilarora.535@gmail.com> as part of a Google Summer of Code project in 2017. PCLm support is useful only for clients that specifically know how to create PCLm files. Support in qpdf is just for ensuring that objects are written in the correct order and for including some additional material in the output that is required by the PCLm standard. 2017-08-19 Jay Berkenbilt <ejb@ql.org> * Remove --precheck-streams. This is enabled by default now without any efficiency cost. This feature was never released. * Update pdf-create example to illustrate use of additional image compression filters. * Add support for /RunLengthDecode and /DCTDecode: - New pipeline types Pl_RunLength and Pl_DCT - New command-line flags --compress-streams and --decode-level to replace/enhance --stream-data - New QPDFWriter::setCompressStreams and QPDFWriter::setDecodeLevel methods Please see documentation, header files, and help messages for details on these new features. 2017-08-12 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::rotatePage to apply rotation to a page object. Add --rotate option to qpdf to specify page rotation from the command line. * Provide --verbose option that causes qpdf to print an indication of what files it is writing. * Change --single-pages to --split-pages and make it take an optional argument specifying the number of pages per file. 2017-08-11 Jay Berkenbilt <ejb@ql.org> * Fix --newline-before-endstream to always add a newline before endstream even if the last character was already a newline. This is actually what's required by PDF/A. Fixes #133. * Handle encrypted files whose encryption parameters are too short. Fixes #96. 2017-08-10 Jay Berkenbilt <ejb@ql.org> * Remove dependency on libpcre. * Be more forgiving of certain types of errors in the xref table that don't interfere with interpreting the table. * Remove unused "tracing" parameter from PointerHolder's (T*, bool) constructor. This change breaks source code compatibility, but since this argument to PointerHolder has not used for a long time and the presence of a boolean parameter in the primary constructor makes it too easy to use that by mistake when trying to use PointerHolder for arrays, it seems like it's finally time to take it out. If you have a compile error because of this change, please check to see whether you intended to use the (bool, T*) version of the constructor instead. If not, just remove the second parameter. 2017-08-09 Jay Berkenbilt <ejb@ql.org> * When recovering stream length, find endobj without endstream as well as just looking for endstream. Be a little more lax about where we allow it to be found. 2017-08-05 Jay Berkenbilt <ejb@ql.org> * Add --single-pages option to cause output to be written to a separate file for each page rather than one big file. * Process --pages options earlier so that certain inspection options, like --show-pages, can show the state after the merging operations. 2017-08-02 Jay Berkenbilt <ejb@ql.org> * Fix off-by-one error in parsing pages options. Fixes #129. 2017-07-29 Jay Berkenbilt <ejb@ql.org> * Support @filename and @- in the qpdf command-line tool to read command-line arguments, one per line, from the named file. @- reads from standard input. Fixes #16. * Detect when input file and output file are the same and exit to avoid overwriting and losing input file. Fixes #29. * When passing multiple inspection arguments, run --check first, and defer exit until after all the checks have been run. This makes it possible to force operations such as --show-xref to be delayed until after recovery attempts have been made. For example, if you have a file with a syntactically valid xref table that has some offsets that are incorrect, running qpdf --check --show-xref on that file will first recover the xref and the dump the recovered xref, while just running qpdf --show-xref will show the xref table as present in the file. Fixes #42. * When recovering stream length, indicate the recovered length. Fixes #44. * Add --newline-before-endstream command-line option and setNewlineBeforeEndstream method to QPDFWriter. This forces qpdf to always add a newline before the endstream keyword. It is a necessary but not sufficient condition for PDF/A compliance. Fixes #103. * Handle zlib data errors when decoding streams. Fixes #106. * Improve handling of files where the "stream" keyword is not followed by proper line terminators. Fixes #104. * Fix content stream parsing to handle cases of structures within the stream split across stream boundaries. Fixes #73. 2017-07-28 Jay Berkenbilt <ejb@ql.org> * Add --preserve-unreferenced command-line option and setPreserveUnreferencedObjects method to QPDFWriter. This option causes QPDFWriter to write all objects from the input file to the output file regardless of whether the objects are referenced. Objects are written to the output file in numerical order from the input file. This option has no effect for linearized files. 2017-07-27 Jay Berkenbilt <ejb@ql.org> * Add --precheck-streams command-line option and setStreamPrecheck method to QPDFWriter to tell QPDFWriter to attempt decoding a stream fully before deciding whether to filter it or not. * Recover gracefully from streams that aren't filterable because the filter parameters are invalid in the stream dictionary or the dictionary itself is invalid. * Significantly improve recoverability from invalid qpdf objects. Most conditions in basic object parsing that used to cause qpdf to exit are now warnings. There are still many more opportunities for improvements of this sort beyond just object parsing. 2017-07-26 Jay Berkenbilt <ejb@ql.org> * Fixes to infinite loops below also fix problems reported in other issues and cover CVE-2017-11624, CVE-2017-11625, CVE-2017-11626, and CVE-2017-11627. * Don't attempt to interpret syntactic keywords (like R and endobj) found while parsing content streams. * Detect infinite loops while resolving objects. This could happen if something inside an object that had to be resolved during parsing, such as a stream length, recursively referenced the object being resolved. * CVE-2017-9208: Handle references to and appearance of object 0 as a special case. Object 0 is not allowed, and qpdf was using it internally to represent direct objects. * CVE-2017-9209: Fix infinite loop caused by attempting to reconstruct the xref table while already in the process of reconstructing the xref table. * CVE-2017-9210: Fix infinite loop caused by attempting to unparse an object for inclusion in the text of an exception. 2015-11-10 Jay Berkenbilt <ejb@ql.org> * 6.0.0: release * No changes from 5.2.0. The 5.2.0 release broke binary compatibility and was withdrawn. 2015-10-31 Jay Berkenbilt <ejb@ql.org> * 5.2.0: release * libqpdf/QPDF.cc (read_xrefTable): Be tolerant of some malformed xref tables that don't have the required trailing space after each line. 2015-10-29 Jay Berkenbilt <ejb@ql.org> * Implement QPDFWriter::setDeterministicID and --deterministic-id command-line flag to qpdf to request generation of a deterministic /ID for non-encrypted files. 2015-05-24 Jay Berkenbilt <ejb@ql.org> * 5.1.3: release * Bug fix: fix-qdf was not handling object streams with more than 255 objects in them. * Handle Microsoft crypt provider initialization properly for case where no keys have been previously created, such as in a fresh Windows installation. * Include time.h in QUtil.hh for time_t 2015-02-21 Jay Berkenbilt <ejb@ql.org> * Detect loops in Pages structure. Thanks to Gynvael Coldwind and Mateusz Jurczyk of the Google Security Team for providing a sample file with this problem. * Prevent buffer overrun when converting a password to an encryption key. Thanks to Gynvael Coldwind and Mateusz Jurczyk of the Google Security Team for providing a sample file with this problem. * Ensure that arguments to "R" when parsing the file are direct objects before trying to resolve them. This prevents specially crafted files from causing qpdf to crash with a stack overflow. Thanks to Gynvael Coldwind and Mateusz Jurczyk of the Google Security Team for providing a sample file with this problem. 2014-12-01 Jay Berkenbilt <ejb@ql.org> * Some broken PDF files lack the required /Type key for /Page and /Pages nodes in the page dictionary. QPDF now uses other methods to figure out what kind of node it is looking at so that it can handle those files. Original reported at https://bugs.launchpad.net/ubuntu/+source/qpdf/+bug/1397413 2014-11-14 Jay Berkenbilt <ejb@ql.org> * Bug fix: QPDFObjectHandle::getPageContents() no longer throws an exception when called on a page that has no /Contents key in its dictionary. This is allowed by the spec, and some software packages generate files like this for pages that are blank in the original. 2014-06-07 Jay Berkenbilt <ejb@ql.org> * 5.1.2: release * MS Visual C++ build: explicitly target Windows 5.0.1 (XP) * New example program: pdf-split-pages: efficiently split PDF files into individual pages. * Bug fix: don't fail on files that contain streams where /Filter or /DecodeParms references a stream. Before, qpdf would try to convert these to direct objects, which would fail because of the stream. 2014-02-22 Jay Berkenbilt <ejb@ql.org> * Bug fix: if the last object in the first part of a linearized file had an offset that was below 65536 by less than the size of the hint stream, the xref stream was invalid and the resulting file is not usable. This is now fixed. 2014-01-14 Jay Berkenbilt <ejb@ql.org> * 5.1.1: release 2013-12-26 Jay Berkenbilt <ejb@ql.org> * Bug fix: when copying foreign objects (which occurs during page splitting among other cases), avoid traversing the same object more than once if it appears more than once in the same direct object. This bug is performance-only and does not affect the actual output. 2013-12-17 Jay Berkenbilt <ejb@ql.org> * 5.1.0: release 2013-12-16 Jay Berkenbilt <ejb@ql.org> * Document and make explicit that passing null to QUtil::setRandomDataProvider() resets the random data provider. * Provide QUtil::getRandomDataProvider(). 2013-12-14 Jay Berkenbilt <ejb@ql.org> * Allow any space rather than just newline to follow xref header. This allows qpdf to read a wider range of damaged files. 2013-11-30 Jay Berkenbilt <ejb@ql.org> * Allow user-supplied random data provider to be used in place of OS-provided or insecure random number generation. See documentation for 5.1.0 for details. * Add configure option --enable-os-secure-random (enabled by default). Pass --disable-os-secure-random or define SKIP_OS_SECURE_RANDOM to avoid attempts to use the operating system-provided secure random number generation. This can be especially useful on Windows if you wish to avoid any dependency on Microsoft's cryptography system. 2013-11-29 Jay Berkenbilt <ejb@ql.org> * If NO_GET_ENVIRONMENT is #defined, for Windows only, QUtil::get_env will always return false. This was added to support a user who needs to avoid calling GetEnvironmentVariable from the Windows API. QUtil::get_env is not used for any functionality in qpdf and exists only to support the test suite including test coverage support with QTC (part of qtest). * Add /FS to msvc builds to allow parallel builds to work with Visual C++ 2013. * Add missing #include <algorithm> in some files that use std::min and std::max. 2013-11-21 Jay Berkenbilt <ejb@ql.org> * Change image comparison tests, which are disabled by default, to use tiff files with 8 bits per sample rather than 4. This works around a bug in tiffcmp but also increases time and disk space for image comparison tests. 2013-10-28 Jay Berkenbilt <ejb@ql.org> * Fix MacOS compilation errors by adding a missing #include <string> in a header file. 2013-10-18 Jay Berkenbilt <ejb@ql.org> * 5.0.1: release * Warn when -accessibility=n is specified with a modern encryption format (R > 3). Also, accept this flag (and ignore with warning) with 256-bit encryption. qpdf has always ignored the accessibility setting with R > 3, but it previously did so silently. 2013-10-05 Jay Berkenbilt <ejb@ql.org> * Replace operator[] in std::string and std::vector with "at" in order to get bounds checking. This reduces the chances that incorrect code will result in data exposure or buffer overruns. See README.hardening for additional notes. * Use cryptographically secure random number generation when available. See additional notes in README. * Replace some assert() calls with std::logic_error exceptions. Ideally there shouldn't be assert() calls outside of testing. This change may make a few more potential code errors in handling invalid data recoverable. * Security fix: In places where std::vector<T>(size_t) was used, either validate that the size parameter is sane or refactor code to avoid the need to pre-allocate the vector. This reduces the likelihood of allocating a lot of memory in response to invalid data in linearization hint streams. * Security fix: sanitize /W array in cross reference stream to avoid a potential integer overflow in a multiplication. It is unlikely that any exploits were possible from this bug as additional checks were also performed. * Security fix: avoid buffer overrun that could be caused by bogus data in linearization hint streams. The incorrect code could only be triggered when checking linearization data, which must be invoked explicitly. qpdf does not check linearization data when reading or writing linearized files, but the qpdf --check command does check linearization data. * Security fix: properly handle empty strings in QPDF_Name::normalizeName. The empty string is not a valid name and would never be parsed as a name, so there were no known conditions where this method could be called with an empty string. * Security fix: perform additional argument sanity checks when reading bit streams. * Security fix: in QUtil::toUTF8, change bounds checking to avoid having a pointer point temporarily outside the bounds of an array. Some compiler optimizations could have made the original code unsafe. 2013-07-10 Jay Berkenbilt <ejb@ql.org> * 5.0.0: release * 4.2.0 turned out to be binary incompatible on some platforms even though there were no changes to the public API. Therefore the 4.2.0 release has been withdrawn, and is being replaced with a 5.0.0 release that acknowledges the ABI change and also removes some problematic methods from the public API. * Remove methods from public API that were only intended to be used by QPDFWriter and really didn't make sense to call from anywhere else as they required internal knowledge that only QPDFWriter had: - QPDF::getLinearizedParts - QPDF::generateHintStream - QPDF::getObjectStreamData - QPDF::getCompressibleObjGens - QPDF::getCompressibleObjects 2013-07-07 Jay Berkenbilt <ejb@ql.org> * 4.2.0: release [withdrawn] * Ignore error case of a stream's decode parameters having invalid length when there are no stream filters. * qpdf: add --show-npages command-line option, which causes the number of pages in the input file to be printed on a line by itself. * qpdf: allow omission of range in --pages. If range is omitted such that an argument that is supposed to be a range is an invalid range and a valid file name, the range of 1-z is assumed. This makes it possible to merge a bunch of files with something like qpdf --empty out.pdf --pages *.pdf -- 2013-06-15 Jay Berkenbilt <ejb@ql.org> * Handle some additional broken files with missing /ID in trailer for encrypted files and with space rather than newline after xref. 2013-06-14 Jay Berkenbilt <ejb@ql.org> * Detect and correct /Outlines dictionary being a direct object when linearizing files. This is not allowed by the spec but has been seen in the wild. Prior to this change, such a file would cause an internal error in the linearization code, which assumed /Outlines was indirect. * Add /Length key to crypt filter dictionary for encrypted files. This key is optional, but some version of MacOS reportedly fail to open encrypted PDF files without this key. * Bug fix: properly handle object stream generation when the original file has some compressible objects with generation != 0. * Add QPDF::getCompressibleObjGens() and deprecate QPDF::getCompressibleObjects(), which had a flaw in its logic. * Add new QPDFObjectHandle::getObjGen() method and indicate in comments that its use is favored over getObjectID() and getGeneration() for most cases. * Add new QPDFObjGen object to represent an object ID/generation pair. 2013-04-14 Jay Berkenbilt <ejb@ql.org> * 4.1.0: release 2013-03-25 Jay Berkenbilt <ejb@ql.org> * manual/qpdf-manual.xml: Document the casting policy that is followed in qpdf's implementation. 2013-03-11 Jay Berkenbilt <ejb@ql.org> * When creating Windows binary distributions, make sure to only copy DLLs of the correct type. The ensures that the 32-bit distributions contain 32-bit DLLs and the 64-bit distributions contain 64-bit DLLs. 2013-03-07 Jay Berkenbilt <ejb@ql.org> * Use ./install-sh (already present) instead of "install -c" to install executables to fix portability problems against different UNIX variants. 2013-03-03 Jay Berkenbilt <ejb@ql.org> * Add protected terminateParsing method to QPDFObjectHandle::ParserCallbacks that implementor can call to terminate parsing of a content stream. 2013-02-28 Jay Berkenbilt <ejb@ql.org> * Favor fopen_s and strerror_s on MSVC to avoid CRT security warnings. This is useful for people who may want to use qpdf in an application that is Windows 8 certified. * New method QUtil::safe_fopen to wrap calls to fopen. This is less cumbersome than calling QUtil::fopen_wrapper. * Remove all calls to sprintf * New method QUtil::int_to_string_base to convert to octal or hexadecimal (or decimal) strings without using sprintf 2013-02-26 Jay Berkenbilt <ejb@ql.org> * Rewrite QUtil::int_to_string and QUtil::double_to_string to remove internal length limits but to remain backward compatible with the old versions for valid inputs. 2013-02-23 Jay Berkenbilt <ejb@ql.org> * Bug fix: properly handle overridden compressed objects. When caching objects from an object stream, only cache objects that, based on the xref table, would actually be resolved into this stream. Prior to this fix, if an object stream A contained an object B that was overridden by an appended section of the file, qpdf would cache the old value of B if any non-overridden member of A was accessed before B. This commit fixes that bug. 2013-01-31 Jay Berkenbilt <ejb@ql.org> * Do not remove libtool's .la file during the make install step. Note to packagers: if your distribution wants to you remove the .la file, you will have to do that yourself now. 2013-01-25 Jay Berkenbilt <ejb@ql.org> * New method QUtil::hex_encode to encode binary data as a hexadecimal string * qpdf --check was exiting with status 0 in some rare cases even when errors were found. It now always exits with one of the document error codes (0 for success, 2 for errors, 3 or warnings). 2013-01-24 Jay Berkenbilt <ejb@ql.org> * Make --enable-werror work for MSVC, and generally handle warning options better for that compiler. Warning flags for that compiler were previous hard-coded into the build with /WX enabled unconditionally. * Split warning flags into WFLAGS in autoconf.mk to make them easier to override. Before they were repeated in CFLAGS and CXXFLAGS and were commingled with other compiler flags. * qpdf --check now does syntactic checks all pages' content streams as well as checking overall document structure. Semantic errors are still not checked, and there are no plans to add semantic checks. 2013-01-22 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::getTypeCode(). This method returns a unique integer (enumerated type) value corresponding to the object type of the QPDFObjectHandle. It can be used as an alternative to the QPDFObjectHandle::is* methods for type testing, particularly where there is a desire to use a switch statement or optimize for performance when testing object types. * Add QPDFObjectHandle::getTypeName(). This method returns a string literal describing the object type. It is useful for testing and debugging. 2013-01-20 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::parseContentStream, which parses the objects in a content stream and calls handlers in a callback class. The example pdf-parse-content illustrates it use. * Add QPDF_Operator and QPDF_InlineImage types along with appropriate wrapper methods in QPDFObjectHandle. These new object types are to facilitate content stream parsing. 2013-01-17 Jay Berkenbilt <ejb@ql.org> * 4.0.1: release * Add clarifying comment in QPDF.hh for methods that return the user password to state that it is no longer possible with newer encryption formats to recover the user password knowing the owner password. * Fix detection of binary attachments in the test suite. This resolves false test failures on some platforms. No changes to the actual QPDF code were made. 2012-12-31 Jay Berkenbilt <ejb@ql.org> * 4.0.0: release * Add new methods qpdf_get_pdf_extension_level, qpdf_set_r5_encryption_parameters, qpdf_set_r6_encryption_parameters, qpdf_set_minimum_pdf_version_and_extension, and qpdf_force_pdf_version_and_extension to support new functionality from the C API. 2012-12-30 Jay Berkenbilt <ejb@ql.org> * Fix long-standing bug that could theoretically have resulted in possible misinterpretation of decode parameters in streams. As far as I can tell, it is extremely unlikely that files with the characteristics that would have triggered the bug actually exist in cases that qpdf versions prior to 4.0.0 could have read. Unencrypted files with encrypted attachments would have triggered this bug, but qpdf versions prior to 4.0.0 already refused to open such files. * Fix long-standing bug in which a stream that used a crypt filter and was otherwise not filterable by qpdf would be decrypted properly but would retain the crypt filter indication in the file. There are no known ways to create files like this, so it is unlikely that anyone ever hit this bug. 2012-12-29 Jay Berkenbilt <ejb@ql.org> * Add read/write support for both the deprecated Acrobat IX encryption format and the Acrobat X/PDF 2.0 encryption format using 256-bit AES keys. Using the Acrobat IX format (R=5) forces the version of the file to 1.7 with extension level 3. Using the PDF 2.0 format (R=6) forces it to 1.7 extension level 8. * Add new method QPDF::getEncryptionKey to return the actual encryption key used for encryption of data in the file. The key is returned as a std::string. * Non-compatible API change: change signature of QPDF::compute_data_key to take the R and V values from the encryption dictionary. There is no reason for any application code to call this method since handling of encryption is done automatically by the qpdf library. It is used internally by QPDFWriter. * Support reading and decryption of files whose main text is not encrypted but whose attachments are. More generally, support the case of files and streams encrypted differently with some limitations, described in the documentation. This was not previously supported due to lack of test files, but I created test files using a trial version of Acrobat XI to fully implement this case. * Incorporate sha2 code from sphlib 3.0. See README for licensing. Create private pipeline class for computing hashes with sha256, sha384, and sha512. * Allow specification of initialization vector when using AES filtering. This is required to compute the hash used in /R=6 (PDF 2.0) encryption. 2012-12-28 Jay Berkenbilt <ejb@ql.org> * Add random number generation functions to QUtil. * Fix old bug that could cause an infinite loop if user password recovery methods were called and a password contained the "(" character (which happens to be the first byte of padding used by older PDF encryption formats). This bug was noticed while reading code and would not happen under ordinary usage patterns even if the password contained that character. 2012-12-27 Jay Berkenbilt <ejb@ql.org> * Add awareness of extension level to PDF Version methods for both reading and writing. This includes adding method QPDF::getExtensionLevel and new versions of QPDFWriter::setMinimumPDFVersion and QPDFWriter::forcePDFVersion that support extension levels. The qpdf command-line tool interprets version numbers of the form x.y.z as version x.y at extension level z. * Update AES classes to support use of 256-bit keys. * Non-compatible API change: Removed public method QPDF::flattenScalarReferences. Instead, just flatten the scalar references we actually need to flatten. Flattening scalar references was a wrong decision years ago and has occasionally caused other problems, among which were that it caused qpdf to visit otherwise unreferenced and possibly erroneous objects in the file when it didn't have to. There's no reason that any non-internal code would have had to call this. * Non-compatible API change: Removed public method QPDF::decodeStreams which was previously used by qpdf --check but is no longer used. The decodeStreams method could generate false positives since it would attempt to access all objects in the file including those that were not referenced. There's no reason that any non-internal code would have had to call this. * Non-compatible API change: Removed public method QPDF::trimTrailerForWrite, which was only intended for use by QPDFWriter and which is no longer used. 2012-12-26 Jay Berkenbilt <ejb@ql.org> * Add new fields to QPDF::EncryptionData to support newer encryption formats (V=5, R=5 and R=6) * Non-compatible API change: Change public nested class QPDF::EncryptionData to make all member fields private and to add method calls. This is a non-compatible API change, but changing EncryptionData is necessary to support newer encryption formats, and making this change will prevent the need from making a non-compatible change in the future if new fields are added. A public nested class should never have had public members to begin with. 2012-12-25 Jay Berkenbilt <ejb@ql.org> * Allow PDF header to appear anywhere in the first 1024 bytes of the file as recommended in the implementation notes of the Adobe version of the PDF spec. 2012-11-20 Jay Berkenbilt <ejb@ql.org> * Add zlib and libpcre to Requires.private in the pkg-config file to support static linking. Thanks Tobias Hoffmann for pointing out the omission. * Ignore (with warning) non-freed objects in the xref table whose offset is 0. Some PDF producers (incorrectly) do this. See https://bugs.linuxfoundation.org/show_bug.cgi?id=1081. 2012-09-23 Jay Berkenbilt <ejb@ql.org> * Add public methods QPDF::processInputSource and QPDFWriter::setOutputPipeline to allow users to read from custom input sources and to write to custom pipelines. This allows the maximum flexibility in sources for reading and writing PDF files. 2012-09-06 Jay Berkenbilt <ejb@ql.org> * 3.0.2: release * Add new method QPDFWriter::setExtraHeaderText to add extra text, such as application-specific comments, to near the beginning of a PDF file. For linearized files, this appears after the linearization parameter dictionary. For non-linearized files, it appears right after the PDF header and non-ASCII comment. * Make it possible to write the same QPDF object with two different QPDFWriter objects that have both called setLinearization(true) by making private method QPDF::calculateLinearizationData() properly initialize its state. * Bug fix: Writing after calling QPDFWriter::setOutputMemory() would cause a segmentation fault because of an internal field not being initialized, rendering that method useless. This has been corrected. 2012-08-11 Jay Berkenbilt <ejb@ql.org> * 3.0.1: release * Bug fix: let EOF terminate a literal token as well as whitespace or comments. 2012-07-31 Jay Berkenbilt <ejb@ql.org> * 3.0.0: release 2012-07-29 Jay Berkenbilt <ejb@ql.org> * 3.0.rc1: release 2012-07-25 Jay Berkenbilt <ejb@ql.org> * From Tobias: add QPDFObjectHandle::replaceStreamData that takes a std::string analogous to the QPDFObjectHandle::newStream that takes a string that was added earlier. 2012-07-21 Jay Berkenbilt <ejb@ql.org> * Change configure to have image comparison tests disabled by default. Update README and README.maintainer with information about running them. * Add --pages command-line option to qpdf to enable page-based merging and splitting. * Add new method QPDFObjectHandle::replaceDict to replace a stream's dictionary. Use with caution; see comments in QPDFObjectHandle.hh. * Add new method QPDFObjectHandle::parse for creation of QPDFObjectHandle objects from string representations of the objects. Thanks to Tobias Hoffmann for the idea. 2012-07-15 Jay Berkenbilt <ejb@ql.org> * add new QPDF::isEncrypted method that returns some additional information beyond other versions. * libqpdf/QPDFWriter.cc: fix copyEncryptionParameters to fix the minimum PDF version based on other file's encryption needs. This is a fix to code added on 2012-07-14 and did not impact previously released code. * libqpdf/QPDFWriter.cc (copyEncryptionParameters): Bug fix: qpdf was not preserving whether or not AES encryption was being used when copying encryption parameters. The file would still have been properly encrypted, but a file that started off encrypted with AES could have become encrypted with RC4. 2012-07-14 Jay Berkenbilt <ejb@ql.org> * QPDFWriter: add public copyEncryptionParameters to allow copying encryption parameters from another file. * QPDFWriter: detect if the user has inserted an indirect object from another QPDF object and throw an exception directing the user to copyForeignObject. 2012-07-11 Jay Berkenbilt <ejb@ql.org> * Added new APIs to copy objects from one QPDF to another. This includes letting QPDF::addPage() (and QPDF::addPageAt()) accept a page object from another QPDF and adding QPDF::copyForeignObject(). See QPDF.hh for details. * Add method QPDFObjectHandle::getOwningQPDF() to return the QPDF object associated with an indirect QPDFObjectHandle. * Add convenience methods to QPDFObjectHandle: assertIndirect(), isPageObject(), isPagesObject() * Cache when QPDF::pushInheritedAttributesToPage() has been called to avoid traversing the pages trees multiple times. This state is cleared by QPDF::updateAllPagesCache() and ignored by QPDF::flattenPagesTree(). 2012-07-08 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::newReserved to create a reserved object and QPDF::replaceReserved to replace it with a real object. QPDFObjectHandle::newReserved reserves an object ID in a QPDF object and ensures that any references to it remain unresolved. When QPDF::replaceReserved is later called, previous references to the reserved object will properly resolve to the replaced object. 2012-07-07 Jay Berkenbilt <ejb@ql.org> * NOTE: BREAKING API CHANGE. Remove previously required length parameter from the version QPDFObjectHandle::replaceStreamData that uses a stream data provider. Prior to qpdf 3.0.0, you had to compute the stream length in advance so that qpdf could internally verify that the stream data had the same length every time the provider was invoked. Now this requirement is enforced a different way, and the length parameter is no longer required. Note that I take API-breaking changes very seriously and only did it in this case since the lack of need to know length in advance could significantly simplify people's code. If you were previously going to a lot of trouble to compute the length of the new stream data in advance, you now no longer have to do that. You can just drop the length parameter and remove any code that was previously computing the length. Thanks to Tobias Hoffmann for pointing out how annoying the original interface was. 2012-07-05 Jay Berkenbilt <ejb@ql.org> * Add QPDFWriter methods to write to an already open stdio FILE*. Implementation and idea area based on contributions from Tobias Hoffmann. 2012-07-04 Jay Berkenbilt <ejb@ql.org> * Accept changes from Tobias Hoffmann: add public method QPDF::pushInheritedAttributesToPage including warnings for non-inherited keys that may be discarded from /Pages by non-conformant PDF files when the /Pages tree is flattened. 2012-06-27 Jay Berkenbilt <ejb@ql.org> * Add Pl_Concatenate pipeline for stream concatenation also implemented by Tobias Hoffmann. Also added test code (libtests/concatenate.cc). * Add new methods implemented by Tobias Hoffmann: QPDFObjectHandle::newReal(double) and QPDFObjectHandle::newStream(QPDF*, std::string const&). 2012-06-26 Jay Berkenbilt <ejb@ql.org> * Minor changes so that support for PDF files larger than 4GB works well with 32-bit and 64-bit Linux and also with 32-bit and 64-bit Windows with both MSVC and mingw. * Rework internal methods for doing recovery of the cross reference tables for much greater efficiency both in terms of time and memory usage. 2012-06-24 Jay Berkenbilt <ejb@ql.org> * Support PDF files larger than 4 GB. This involved many changes to the ABI to increase the size of integer types used in various places as well as increasing the amount of padding used when creating linearized files. Automated tests for large files are disabled by default. Run ./configure --help for information on enabling them. Running the tests requires 11 GB of free disk space and takes several minutes. 2012-06-22 Jay Berkenbilt <ejb@ql.org> * examples/pdf-create.cc: Provide an example of creating a PDF from scratch. This simple PDF has a single page with some text and an image. * Add empty QPDFObjectHandle factories for array and dictionary. With PDF-from-scratch capability, it is useful to be able to create empty arrays and dictionaries and add keys to them. Updated pdf_from_scratch.cc to use these interfaces. 2012-06-21 Jay Berkenbilt <ejb@ql.org> * Add QPDF::emptyPDF() to create an empty QPDF object suitable for adding pages and other objects to. pdf_from_scratch.cc is test code that exercises it. * make/libtool.mk: Place user-specified CPPFLAGS and LDFLAGS later in the compilation so that if a user installs things in a non-standard place that they have to tell the build about, earlier versions of qpdf installed there won't break the build. Thanks to Macports for reporting this. (Fixes bug 3468860.) * Instead of using off_t in the public APIs, use qpdf_offset_t instead. This is defined as long long in qpdf/Types.h. If your system doesn't support long long, you can redefine it. * Add pkg-config files * QPDFObjectHandle: add shallowCopy() method * QPDF: add new APIs for adding and removing pages. This includes addPage(), addPageAt(), and removePage(). Also a method updateAllPagesCache() is now available to force update of the internal pages cache if you should modify the pages structure manually. * QPDF: new processFile method that takes an open FILE* instead of a filename. 2012-06-20 Jay Berkenbilt <ejb@ql.org> * Add new array mutation routines to QPDFObjectHandle. Implemented by Tobias Hoffmann. * Rework APIs that use size_t, off_t, and primitive integer types so that size_t is used for sizes of memory and off_t is used for file offsets. Also set _FILE_OFFSET_BITS so that large files can be supported on 32-bit UNIX/Linux platforms. The code assumes in places that sizeof(off_t) >= sizeof(size_t). This resulted in non-compatible ABI changes and hopefully clears the way for QPDF to work with files that are larger than 4 GiB in size. * Add support for versioned symbols on ELF platforms. * Various fixes for gcc 4.7 2011-04-06 Jay Berkenbilt <ejb@ql.org> * Fix PCRE to stop using deprecated (and now dropped) interfaces. 2011-12-28 Jay Berkenbilt <ejb@ql.org> * 2.3.1: release * include <stdint.h> if available to support MSVC 2010 * Since PCRE is not necessarily thread safe, don't declare any PCRE objects to be static. * Disregard stderr output from ghostscript when using it to compare images in the test suite; see comments in qpdf.test for details. * Fixed a few documentation errors. 2011-08-11 Jay Berkenbilt <ejb@ql.org> * 2.3.0: release * include/qpdf/qpdf-c.h ("C"): add new methods qpdf_init_write_memory, qpdf_get_buffer_length, and qpdf_get_buffer to support writing to memory from the C API. * include/qpdf/qpdf-c.h ("C"): add new methods qpdf_get_info_key and qpdf_set_info_key for manipulating text fields of the /Info dictionary. 2011-08-10 Jay Berkenbilt <ejb@ql.org> * libqpdf/QPDFWriter.cc (copyEncryptionParameters): preserve whether metadata is encryption. This fixes part of bug 3173659: the password becomes invalid if qpdf copies an encrypted file with cleartext-metadata. * include/qpdf/QPDFWriter.hh: add a new constructor that takes only a QPDF reference and leaves specification of output for later. Add methods setOutputFilename() to set the output to a filename or stdout, and setOutputMemory() to indicate that output should go to a memory buffer. Add method getBuffer() to retrieve the buffer used if output was saved to a memory buffer. * include/qpdf/QPDF.hh: add methods replaceObject() and swapObjects() to allow replacement of an object and swapping of two objects by object ID. * include/qpdf/QPDFObjectHandle.hh: add new methods getDictAsMap() and getArrayAsVector() for returning the elements of a dictionary or an array as a map or vector. 2011-06-25 Jay Berkenbilt <ejb@ql.org> * 2.2.4: release 2011-06-23 Jay Berkenbilt <ejb@ql.org> * make/libtool.mk (install): Do not strip executables and shared libraries during installation. Leave that up to the packager. * configure.ac: disable -Werror by default. 2011-05-07 Jay Berkenbilt <ejb@ql.org> * libqpdf/QPDF_linearization.cc (isLinearized): remove unused offset variable, found by a gcc 4.6 warning. 2011-04-30 Jay Berkenbilt <ejb@ql.org> * 2.2.3: release * libqpdf/QPDF.cc (readObjectInternal): Accept the case of the stream keyword being followed by carriage return by itself. While this is not permitted by the specification, there are PDF files that do this, and other readers can read them. * libqpdf/Pl_QPDFTokenizer.cc (processChar): When an inline image is detected, suspend normalization only up to the end of the inline image rather than for the remainder of the content stream. (Fixes qpdf-Bugs 3152169.) 2011-01-31 Jay Berkenbilt <ejb@ql.org> * libqpdf/QPDF.cc (readObjectAtOffset): use -1 rather than 0 when reading an object at a given to indicate that no object number is expected. This allows xref recovery to proceed even if a file uses the invalid object number 0 as a regular object. * libqpdf/QPDF_linearization.cc (isLinearized): use -1 rather than 0 as a sentinel for not having found the first object in the file. Since -1 can never match the regular expression, this prevents an infinite loop when checking a file that starts with (erroneous) 0 0 obj. (Fixes qpdf-Bugs-3159950.) 2010-10-04 Jay Berkenbilt <ejb@ql.org> * 2.2.2: release * include/qpdf/qpdf-c.h: Add qpdf_read_memory to C API to call QPDF::processMemoryFile. 2010-10-01 Jay Berkenbilt <ejb@ql.org> * 2.2.1: release * include/qpdf/QPDF.hh: Add setOutputStreams method to allow redirection of library-generated output/error to alternative streams. * include/qpdf/QPDF.hh: Add processMemoryFile method for processing a PDF file from a memory buffer instead of a file. 2010-09-24 Jay Berkenbilt <ejb@ql.org> * libqpdf/QPDF.cc: change private "file" method to be a PointerHolder<InputSource> to prepare qpdf for being able to work with PDF files loaded into memory in addition to working with files on disk. * include/qpdf/PointerHolder.hh: add operator* and operator-> methods so that PointerHolder objects can be used like pointers. This is consistent with the smart pointer objects in the next revision of C++. 2010-09-05 Jay Berkenbilt <ejb@ql.org> * libqpdf/QPDF.cc (readObjectInternal): Recognize empty objects and treat them as null. * libqpdf/QPDF_Stream.cc (filterable): Handle inline image filter abbreviations as stream filter abbreviations. Although this is not technically allowed by the PDF specification, table H.1 in the pre-ISO spec indicates that Adobe's readers accept them. Thanks to Jian Ma <stronghorse@tom.com> for bringing this to my attention. 2010-08-14 Jay Berkenbilt <ejb@ql.org> * 2.2.0: release * Rename README.windows to README-windows.txt and convert its line endings to Windows-style line endings. Also mention Jian Ma's VC6 port in the manual and README-windows.txt. 2010-08-09 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::getRawStreamData to return raw (unfiltered) stream data. 2010-08-08 Jay Berkenbilt <ejb@ql.org> * 2.2.rc1: release 2010-08-05 Jay Berkenbilt <ejb@ql.org> * Add QPDFObjectHandle::addPageContents, a convenience routine for appending or prepending new streams to a page's content streams. The "pdf-double-page-size" example illustrates its use. * Add new methods to QPDFObjectHandle: replaceStreamData and newStream. These methods allow users of the qpdf library to add new streams and to replace data of existing streams. The "pdf-double-page-size" and "pdf-invert-images" examples illustrate their use. 2010-06-06 Jay Berkenbilt <ejb@ql.org> * Fix memory leak for QPDF objects whose underlying PDF objects contain circular references. Thanks to Jian Ma <stronghorse@tom.com> for calling my attention to the memory leak. 2010-04-25 Jay Berkenbilt <ejb@ql.org> * 2.1.5: release * libqpdf/QPDF_encryption.cc (compute_encryption_key): remove restrictions on length of file identifier string. (Fixes qpdf-Bugs-2991412.) 2010-04-18 Jay Berkenbilt <ejb@ql.org> * 2.1.4: release * libqpdf/QPDFWriter.cc (writeLinearized): the padding calculation fix in 2.1.2 was applied in only one place but it was needed in two places since there are actually two cross reference streams in a linearized file. The new padding calculation is now used for both streams. Hopefully this should put an end to linearization padding problems. (Fixes qpdf-Bugs-2979219.) 2010-04-10 Jay Berkenbilt <ejb@ql.org> * qpdf/qpdf.cc (main): Since qpdf --check only checks syntax and stream encoding without doing any semantic checks, make the output clearer when no errors around found. This is inspired by qpdf-Bugs-2983225. 2010-03-27 Jay Berkenbilt <ejb@ql.org> * 2.1.3: release * libqpdf/QPDF_optimization.cc (flattenScalarReferences): Flatten scalar references for unreferenced objects as well as those seen during traversal of the file. This matters when preserving object streams that contain unreferenced objects with indirect scalars. (Fixes qpdf-Bugs-2974522.) Updated TODO with a description of a possibly better fix involving removal of flattenScalarReferences. * libqpdf/Pl_AES_PDF.cc (finish): Don't complain if an AES input buffer is not a multiple of 16 bytes. Instead, just pad with nulls and hope for the best. PDF files have been encountered "in the wild" that contain AES buffers that aren't a multiple of 16 bytes. 2010-01-24 Jay Berkenbilt <ejb@ql.org> * 2.1.2: release * libqpdf/QPDFWriter.cc: fix logic error in padding calculation. When writing linearized files with cross reference streams, the padding calculation failed to take differences in sizes of compressed data between pass 1 and pass 2 into consideration. 2009-12-14 Jay Berkenbilt <ejb@ql.org> * 2.1.1: release * qpdf/qtest/qpdf.test: improve test for acroread to make sure it actually works and is not just present in the path. 2009-12-13 Jay Berkenbilt <ejb@ql.org> * libqpdf/qpdf/Pl_AES_PDF.hh: include <stdint.h>, if available, so we have valid definitions of uint32_t. 2009-10-30 Jay Berkenbilt <ejb@ql.org> * 2.1: release * libqpdf/QPDF.cc: be more forgiving of extraneous whitespace in the xref table and while recovering from error conditions. 2009-10-26 Jay Berkenbilt <ejb@ql.org> * Work around failure of PCRE test case; this test case exercises an aspect of PCRE that qpdf does not use, and the test fails with the version of PCRE on Red Hat Enterprise Linux 5, so we ignore failure on this particular test case. * Fix RPM .spec file to include "C" examples 2009-10-24 Jay Berkenbilt <ejb@ql.org> * 2.1.rc1: release * Provide interfaces for getting qpdf's own version number 2009-10-19 Jay Berkenbilt <ejb@ql.org> * include/qpdf/QPDF.hh (QPDF): getWarnings now returns a list of QPDFExc rather than a list of strings. This way, warnings may be inspected in more detail. * Include information about the last object read in most error messages. Most of the time, this will provide a good hint as to which object contains the error, but it's possible that the last object read may not necessarily be the one that has the error if the erroneous object was previously read and cached. 2009-10-18 Jay Berkenbilt <ejb@ql.org> * If forcing version, disable object stream creation and/or encryption if previous specifications are incompatible with new version. It is still possible that PDF content, compression schemes, etc., may be incompatible with the new version, but at least this way, older viewers will at least have a chance. * libqpdf/QPDFWriter.cc (unparseObject): avoid compressing Metadata streams if possible. 2009-10-13 Jay Berkenbilt <ejb@ql.org> * Upgrade embedded qtest to version 1.4, which allows the test suite to be run in Windows with MSYS and ActiveState Perl rather than requiring Cygwin perl. 2009-10-04 Jay Berkenbilt <ejb@ql.org> * Implement support AES encrypt and crypt filters. Implementation is not fully tested due to lack of test data but has been tested for several cases. 2009-10-04 Jay Berkenbilt <ejb@ql.org> * Add methods to QPDFWriter and corresponding command line arguments to qpdf to set the minimum output PDF version and also to force the version to a particular value. * libqpdf/QPDF.cc (processXRefStream): warn and ignore extra xref stream entries when stream is larger than reported size. This used to be a fatal error. (Fixes qpdf-Bugs-2872265.) 2009-09-27 Jay Berkenbilt <ejb@ql.org> * Add several methods to query permissions controlled by the encryption dictionary. Note that qpdf does not enforce these permissions even though it allows the user to query them. * The function QPDF::getUserPassword returned the user password with the required padding as specified by the PDF specification. This is seldom useful to users. This function has been replaced by QPDF::getPaddedUserPassword. Call the new QPDF::getTrimmedUserPassword to retrieve the user password in a human-readable format. * qpdf/qpdf.cc (main): qpdf --check now prints the PDF version number in addition to its other output. 2009-09-26 Jay Berkenbilt <ejb@ql.org> * Removed all references to QEXC; now using std::runtime_error and std::logic_error and their subclasses for all exceptions. 2009-05-03 Jay Berkenbilt <ejb@ql.org> * 2.0.6: release * libqpdf/QPDF_Stream.cc (filterable): ignore /DecodeParms if it's not a type we recognize. (Fixes qpdf-Bugs-2779746.) 2009-03-10 Jay Berkenbilt <ejb@ql.org> * 2.0.5: release 2009-03-09 Jay Berkenbilt <ejb@ql.org> * libqpdf/Pl_LZWDecoder.cc: adjust LZWDecoder full table detection, now having been able to adequately test boundary conditions both and with and without early code change. Also compared implementation with other LZW decoders. 2009-03-08 Jay Berkenbilt <ejb@ql.org> * qpdf/fix-qdf (write_ostream): Adjust offsets while writing object streams to account for changes in the length of the dictionary and offset tables. * qpdf/qpdf.cc (main): In check mode, in addition to checking structure of file, attempt to decode all stream data. * libqpdf/QPDFWriter.cc (QPDFWriter::writeObject): In QDF mode, write a comment to the QDF file before each object that indicates the object ID of the corresponding object from the original file. Add --no-original-object-ids flag to qpdf and setSuppressOriginalObjectIDs() method to QPDFWriter to turn this behavior off. * libqpdf/QPDF.cc (QPDF::pipeStreamData): Issue a warning instead of failing if there is a problem found while decoding stream. * qpdf/qpdf.cc: Exit with a status of 3 if warnings were found regardless of what mode we're in. 2009-02-21 Jay Berkenbilt <ejb@ql.org> * 2.0.4: release 2009-02-20 Jay Berkenbilt <ejb@ql.org> * Fix many typos in comments and strings. * qpdf/qpdf.cc: in --check mode, if there are warnings but no errors, exit with a status of 3. * libqpdf/QPDF.cc (QPDF::insertXrefEntry): when recovering the cross-reference table, have objects we encounter later in the file supersede those we found earlier. This improves the chances of being able to recover appended files with damaged cross-reference tables. 2009-02-19 Jay Berkenbilt <ejb@ql.org> * libqpdf/Pl_LZWDecoder.cc: correct logic error for previously untested case of running the LZW decoder without the "early code change" flag. Thanks to a bug report from "Atom Smasher", I finally was able to obtain an input stream compressed in this way. 2009-02-15 Jay Berkenbilt <ejb@ql.org> * 2.0.3: release 2008-12-11 Jay Berkenbilt <ejb@ql.org> * qpdf/qpdf.cc (main): Accept -help and -version as well as --help and --version 2008-11-23 Jay Berkenbilt <ejb@ql.org> * Include stdio.h in a few files for proper compilation with (yet to be released) gcc 4.4 * updated embedded qtest to version 1.3 * libqpdf/QPDF_String.cc (QPDF_String::getUTF8Val): handle UTF-16BE properly rather than just treating the string as a string of 16-bit characters. 2008-06-30 Jay Berkenbilt <ejb@ql.org> * 2.0.2: release * updated embedded qtest to version 1.2 (includes previous changes) 2008-06-07 Jay Berkenbilt <ejb@ql.org> * qpdf/qtest/qpdf/diff-encrypted: change == to = so that the test suite passes when /bin/sh is not bash 2008-05-07 Jay Berkenbilt <ejb@ql.org> * qtest/bin/qtest-driver (run_test): increase timeout for qtest to be more tolerant of slow machines 2008-05-06 Jay Berkenbilt <ejb@ql.org> * 2.0.1: release * make/rules.mk: fix logic with .dep generation for .lo files so that dependencies work properly with libtool 2008-05-05 Jay Berkenbilt <ejb@ql.org> * libqpdf/qpdf/MD5.hh: fix header to be 64-bit clean * configure.ac: add tests for sized integer types 2008-05-04 Jay Berkenbilt <ejb@ql.org> * libqpdf/QPDF_encryption.cc: do not assume size_t is unsigned int * qpdf/qtest/qpdf.test: removed locale-specific tests. These were really to check bugs in perl 5.8.0 and are obsolete now. They also make the test suite fail in some environments that don't have all the locales fully configured. * various: updated several files for gcc 4.3 by adding missing includes (string.h, stdlib.h) 2008-04-26 Jay Berkenbilt <ejb@ql.org> * 2.0: initial public release