diff --git a/ChangeLog b/ChangeLog index 6a62bdec..f536493b 100644 --- a/ChangeLog +++ b/ChangeLog @@ -53,6 +53,9 @@ 2019-01-03 Jay Berkenbilt + * Add --generate-appearances flag to the qpdf command-line tool to + trigger generation of appearance streams. + * Fix behavior of form field value setting to handle the following cases: - Strings are always written as UTF-16 diff --git a/ispell-words b/ispell-words index 7aadbb7a..fdaafb69 100644 --- a/ispell-words +++ b/ispell-words @@ -5,11 +5,9 @@ abacc abc ABCD abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnom -abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnom abcde abcdefABCDEF abcdefghbcdefghicdefghijdefghijkefghijklfghijklmg -abcdefghbcdefghicdefghijdefghijkefghijklfghijklmg abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi ABI @@ -896,7 +894,6 @@ HGeneric hh HighPart hijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstn -hijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstn hlen Hoffmann HOi @@ -1860,6 +1857,7 @@ toupper toUTF tp transcode +transcoding traverseField travis TrimBox @@ -2004,6 +2002,7 @@ xc xcc xcf xD +xd xDC xe xeaa diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml index 9fc7c1d5..0c921eb6 100644 --- a/manual/qpdf-manual.xml +++ b/manual/qpdf-manual.xml @@ -1843,7 +1843,7 @@ outfile.pdf - Generate a json representation of the file. This is described + Generate a JSON representation of the file. This is described in depth in @@ -1852,7 +1852,7 @@ outfile.pdf - Describe the format of the json output. + Describe the format of the JSON output. @@ -1861,7 +1861,7 @@ outfile.pdf This option is repeatable. If specified, only top-level keys - specified will be included in the json output. If not + specified will be included in the JSON output. If not specified, all keys will be shown. @@ -1872,7 +1872,7 @@ outfile.pdf This option is repeatable. If specified, only specified objects will be shown in the - “objects” key of the json + “objects” key of the JSON output. If absent, all objects will be shown. @@ -2150,7 +2150,7 @@ outfile.pdf Starting with version 8.3.0, the qpdf - command-line tool can produce a json representation of the PDF + command-line tool can produce a JSON representation of the PDF file's non-content data. This can facilitate interacting programmatically with PDF files through qpdf's command line interface. For more information, please see Overview Beginning with qpdf version 8.3.0, the qpdf - command-line program can produce a json representation of the - non-content data in a PDF file. It includes a dump in json format + command-line program can produce a JSON representation of the + non-content data in a PDF file. It includes a dump in JSON format of all objects in the PDF file excluding the content of streams. - This json representation makes it very easy to look in detail at + This JSON representation makes it very easy to look in detail at the structure of a given PDF file, and it also provides a great way to work with PDF files programmatically from the command-line in languages that can't call or link with the qpdf library directly. @@ -2181,17 +2181,17 @@ outfile.pdf JSON Guarantees - The qpdf json representation includes a json serialization of the + The qpdf JSON representation includes a JSON serialization of the raw objects in the PDF file as well as some computed information in a more easily extracted format. QPDF provides some guarantees about - its json format. These guarantees are designed to simplify the + its JSON format. These guarantees are designed to simplify the experience of a developer working with the JSON format. Compatibility - The top-level json object output is a dictionary. The json + The top-level JSON object output is a dictionary. The JSON output contains various nested dictionaries and arrays. With the exception of dictionaries that are populated by the fields of objects from the file, all instances of a dictionary are @@ -2204,7 +2204,7 @@ outfile.pdf report. - The top-level json structure contains a + The top-level JSON structure contains a “version” key whose value is simple integer. The value of the version key will be incremented if a non-compatible change is made. A @@ -2221,16 +2221,16 @@ outfile.pdf The qpdf command can be invoked with the - option. This will output a json - structure that has the same structure as the json output that + option. This will output a JSON + structure that has the same structure as the JSON output that qpdf generates, except that each field in the help output is a - description of the corresponding field in the json output. The + description of the corresponding field in the JSON output. The specific guarantees are as follows: A dictionary in the help output means that the corresponding - location in the actual json output is also a dictionary with + location in the actual JSON output is also a dictionary with exactly the same keys; that is, no keys present in help are absent in the real output, and no keys will be present in the real output that are not in help. @@ -2259,7 +2259,7 @@ outfile.pdf “index” and “label”. In addition to describing the meaning of those keys, this tells you that the - actual json output will contain a pagelabels + actual JSON output will contain a pagelabels array, each of whose elements is a dictionary that contains an index key, a label key, and no other keys. @@ -2270,7 +2270,7 @@ outfile.pdf Directness and Simplicity - The json output contains the value of every object in the file, + The JSON output contains the value of every object in the file, but it also contains some processed data. This is analogous to how qpdf's library interface works. The processed data is similar to the helper functions in that it allows you to look @@ -2287,18 +2287,18 @@ outfile.pdf Limitations of JSON Representation - There are a few limitations to be aware of with the json structure: + There are a few limitations to be aware of with the JSON structure: Strings, names, and indirect object references in the original - PDF file are all converted to strings in the json + PDF file are all converted to strings in the JSON representation. In the case of a “normal” PDF file, you can tell the difference because a name starts with a slash (/), and an indirect object reference looks like n n R, but if there were to be a string that looked like a name or indirect object reference, there - would be no way to tell this from the json output. Note that + would be no way to tell this from the JSON output. Note that there are certain cases where you know for sure what something is, such as knowing that dictionary keys in objects are always names and that certain things in the higher-level computed data @@ -2307,9 +2307,9 @@ outfile.pdf - The json format doesn't support binary data very well. Mostly + The JSON format doesn't support binary data very well. Mostly the details are not important, but they are presented here for - information. When qpdf outputs a string in the json + information. When qpdf outputs a string in the JSON representation, it converts the string to UTF-8, assuming usual PDF string semantics. Specifically, if the original string is UTF-16, it is converted to UTF-8. Otherwise, it is assumed to @@ -2317,7 +2317,7 @@ outfile.pdf assumption. This causes strange things to happen to binary strings. For example, if you had the binary string <038051>, this would be output to the - json as \u0003•Q because + JSON as \u0003•Q because 03 is not a printable character and 80 is the bullet character in PDF doc encoding and is mapped to the Unicode value @@ -2330,7 +2330,7 @@ outfile.pdf tell the difference between a Unicode string that was originally encoded as UTF-16 or one that was converted from PDF doc encoding. In other words, it's best if you don't try to use the - json format to extract binary strings from the PDF file, but if + JSON format to extract binary strings from the PDF file, but if you really had to, it could be done. Note that qpdf's option does not have this limitation and will reveal the string as encoded in the original @@ -2362,11 +2362,11 @@ outfile.pdf In a few places, there are keys with names containing pageposfrom1. The values of these keys are null or an integer. If an integer, they point to a page index - within the file numbering from 1. Note that json indexes from + within the file numbering from 1. Note that JSON indexes from 0, and you would also use 0-based indexing using the API. However, 1-based indexing is easier in this case because the command-line syntax for specifying page ranges is 1-based. If - you were going to write a program that looked through the json + you were going to write a program that looked through the JSON for information about specific pages and then use the command-line to extract those pages, 1-based indexing is easier. Besides, it's more convenient to subtract 1 from a @@ -2377,11 +2377,11 @@ outfile.pdf The image information included in the page - section of the json output includes the key + section of the JSON output includes the key “filterable”. Note that the value of this field may depend on the that you invoke qpdf with. The - json output includes a top-level key + JSON output includes a top-level key “parameters” that indicates the decode level used for computing whether a stream was filterable. For example, jpeg images will be shown as not @@ -3870,6 +3870,322 @@ print "\n"; ChangeLog in the source distribution. + + 8.3.0: January 7, 2019 + + + + + Command-line Enhancements + + + + + Shell completion: you can now use eval $(qpdf + --completion-bash) and eval $(qpdf + --completion-zsh) to enable shell completion for + bash and zsh. + + + + + Page numbers (also known as page labels) are now preserved + when merging and splitting files with the + and + options. + + + + + Bookmarks are partially preserved when splitting pages with + the option. Specifically, the + outlines dictionary and some supporting metadata are copied + into the split files. The result is that all bookmarks from + the original file appear, those that point to pages that are + preserved work, and those that point to pages that are not + preserved don't do anything. This is an interim step toward + proper support for bookmarks in splitting and merging + operations. + + + + + Page collation: add new option . + When specified, the semantics of + change from concatenation to collation. See for examples and discussion. + + + + + Generation of information in JSON format, primarily to + facilitate use of qpdf from languages other than C++. Add + new options , + , and + to generate a JSON + representation of the PDF file. Run qpdf + --json-help to get a description of the JSON + format. For more information, see . + + + + + The flag will cause + qpdf to generate appearances for form fields if the PDF file + indicates that form field appearances are out of date. This + can happen when PDF forms are filled in by a program that + doesn't know how to regenerate the appearances of the + filled-in fields. + + + + + The flag can be used + to flatten annotations, including form + fields. Ordinarily, annotations are drawn separately from + the page. Flattening annotations is the process of combining + their appearances into the page's contents. You might want + to do this if you are going to rotate or combine pages using + a tool that doesn't understand about annotations. You may + also want to use + when using this flag since annotations for outdated form + fields are not flattened as that would cause loss of + information. + + + + + The flag tells qpdf to + recompresses every image using DCT (JPEG) compression as + long as the image is not already compressed with lossy + compression and recompressing the image reduces its size. + The additional options , + , and + prevent recompression of + images whose width, height, or pixel area + (width × height) are below a specified + threshold. + + + + + The option can now be given + as to show the + trailer dictionary. + + + + + + + Bug Fixes and Enhancements + + + + + QPDF now automatically detects and recovers from dangling + references. If a PDF file contained an indirect reference to + a non-existent object, which is valid, when adding a new + object to the file, it was possible for the new object to + take the object ID of the dangling reference, thereby + causing the dangling reference to point to the new object. + This case is now prevented. + + + + + Fixes to form field setting code: strings are always written + in UTF-16 format, and checkboxes and radio buttons are + handled properly with respect to synchronization of values + and appearance states. + + + + + The QPDF::checkLinearization() no + longer causes the program to crash when it detects problems + with linearization data. Instead, it issues a normal warning + or error. + + + + + Ordinarily qpdf treats an argument of the form + to mean that command-line options + should be read from file. Now, if + file does not exist but + @file does, qpdf will treat + @file as a regular option. This makes + it possible to work more easily with PDF files whose names + happen to start with the @ character. + + + + + + + Library Enhancements + + + + + Remove the restriction in most cases that the source QPDF + object used in a + QPDF::copyForeignObject call has to + stick around until the destination QPDF is written. The + exceptional case is when the source stream gets is data + using a QPDFObjectHandle::StreamDataProvider. For a more + in-depth discussion, see comments around + copyForeignObject in + QPDF.hh. + + + + + Add new method + QPDFWriter::getFinalVersion(), which + returns the PDF version that will ultimately be written to + the final file. See comments in + QPDFWriter.hh for some restrictions on + its use. + + + + + Add several methods for transcoding strings to some of the + character sets used in PDF files: + QUtil::utf8_to_ascii, + QUtil::utf8_to_win_ansi, + QUtil::utf8_to_mac_roman, and + QUtil::utf8_to_utf16. For the + single-byte encodings that support only a limited character + sets, these methods replace unsupported characters with a + specified substitute. + + + + + Add new methods to + QPDFAnnotationObjectHelper and + QPDFFormFieldObjectHelper for + querying flags and interpretation of different field types. + Define constants in qpdf/Constants.h to + help with interpretation of flag values. + + + + + Add new methods + QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded + and + QPDFFormFieldObjectHelper::generateAppearance + for generating appearance streams. See discussion in + QPDFFormFieldObjectHelper.hh for + limitations. + + + + + Add two new helper functions for dealing with resource + dictionaries: + QPDFObjectHandle::getResourceNames() + returns a list of all second-level keys, which correspond to + the names of resources, and + QPDFObjectHandle::mergeResources() + merges two resources dictionaries as long as they have + non-conflicting keys. These methods are useful for certain + types of objects that resolve resources from multiple places, + such as form fields. + + + + + Add methods + QPDFPageDocumentHelper::flattenAnnotations() + and + QPDFAnnotationObjectHelper::getPageContentForAppearance() + for handling low-level details of annotation flattening. + + + + + Add new helper classes: + QPDFOutlineDocumentHelper, + QPDFOutlineObjectHelper, + QPDFPageLabelDocumentHelper, + QPDFNameTreeObjectHelper, and + QPDFNumberTreeObjectHelper. + + + + + Add method QPDFObjectHandle::getJSON() + that returns a JSON representation of the object. Call + serialize() on the result to convert it + to a string. + + + + + Add a simple JSON serializer. This is not a complete or + general-purpose JSON library. It allows assembly and + serialization of JSON structures with some restrictions, + which are described in the header file. This is the + serializer used by qpdf's new JSON representation. + + + + + Add new QPDFObjectHandle::Matrix + class along with a few convenience methods for dealing with + six-element numerical arrays as matrices. + + + + + Add new method + QPDFObjectHandle::wrapInArray, which returns + the object itself if it is an array, or an array containing + the object otherwise. This is a common construct in PDF. + This method prevents you from having to explicitly test + whether something is a single element or an array. + + + + + + + Build Improvements + + + + + It is no longer necessary to run + autogen.sh to build from a pristine + checkout. Automatically generated files are now committed so + that it is possible to build on platforms without autoconf + directly from a clean checkout of the repository. The + configure script detects if the files are + out of date when it also determines that the tools are + present to regenerate them. + + + + + Pull requests and the master branch are now built + automatically in Azure + Pipelines, which is free for open source projects. + The build includes Linux, mac, Windows 32-bit and 64-bit + with mingw and MSVC, and an AppImage build. Official qpdf + releases are now built with Azure Pipelines. + + + + + + + 8.2.1: August 18, 2018