diff --git a/TODO b/TODO index c7151f5a..73ba5788 100644 --- a/TODO +++ b/TODO @@ -1,30 +1,13 @@ -Next +10.6 ==== -* Add user-defined initializer `QPDFObjectHandle operator ""_qpdf` to - be like QPDFObjectHandle::parse: `auto oh = "<< /a (b) >>"_qpdf;` +* Close issue #556. + * Add QPDF_MAJOR_VERSION, QPDF_MINOR_VERSION to some header, possibly dll.h since this is everywhere that there's API -* Take a fresh look at PointerHolder with a good plan for being able - to have developers phase it in using macros or something. Decide - about shared_ptr vs unique_ptr for each time make_shared_cstr is - called. For non-copiable classes, we can use unique_ptr instead of - shared_ptr as a replacement for PointerHolder. For performance - critical cases, we could potentially have a real pointer and a - shared pointer where the shared pointer's job is to clean up but we - use the real pointer for regular access. - -Consider in the context of #593, possibly with a different -implementation - -* replace mode: --replace-object, --replace-stream-raw, - --replace-stream-filtered - * update first paragraph of QPDF JSON in the manual to mention this - * object numbers are not preserved by write, so object ID lookup - has to be done separately for each invocation - * you don't have to specify length for streams - * you only have to specify filtering for streams if providing raw data +* Add user-defined initializer `QPDFObjectHandle operator ""_qpdf` to + be like QPDFObjectHandle::parse: `auto oh = "<< /a (b) >>"_qpdf;` * See if this has been done or is trivial with C++11 local static initializers: Secure random number generation could be made more @@ -43,6 +26,168 @@ implementation * Completion: would be nice if --job-json-file= would complete files +* Remember for release notes: starting in qpdf 11, the default value + for the --json keyword will be "latest". If you are depending on + version 1, change your code to specify --json=1, which works + starting with 10.6.0. + +* Try to put something in to ease future PointerHolder migration, such + as typedefs for containers of PointerHolders. Test to see whether + using auto or decltype in certain places may make containers of + pointerholders switch over cleanly. Clearly document the deprecation + stuff. + + +Output JSON v2 +============== + +Output JSON v2 contain enough information to completely recreate a PDF +file. + +This is not an ABI change as long as the default --json version is 1. + +If this is done, update --json option in cli.rst to mention v2. Also +update QPDFJob::Config::json and of course other parts of the docs +(json.rst). + +Fix the following problems: + +* Include the PDF version header somewhere. + +* Using "n n R" as a key in "objects" and "objectinfo" messes up + searching for things + +* Strings cannot be unambiguously encoded/decoded + + * Can't tell string from name from indirect object + + * Strings are treated as PDF doc encoding and output as UTF-8, which + doesn't work since multiple PDF doc code points are undefined + +* There is no representation of stream data + +* You can't tell a stream from a dictionary except by looking in both + "object" and "objectinfo". Fix this, and then remove "objectinfo". + +* There are differences between information shown in the json format + vs. information shown with options like --check, --list-attachments, + etc. The json format should be able to completely replace things + that write to stdout. + +* Consider using camelCase in multi-word key names to be consistent + with job JSON and with how JSON is often represented in languages + that use it more natively + +* Consider changing the contract to allow fields to be absent even + when present in the schema. It's reasonable for people to check for + presence of a key. Most languages make this easy to do. + +Most things that are informational can stay the same. We will have to +go through every item to decide for sure. + +To address ambiguity, consider the following: + +Whenever a direct PDF object appears, disambiguate things represented +in JSON as strings as follows: + +* "/Name" -- if it starts with /, it's a name +* "n n R" -- if it is "n n R", it's an indirect object +* "u:utf8-encoded" -- a utf8-encoded string +* "b:<12ab34>" -- a binary string + +In "objects", the key is "obj:o,g", and the value is a dictionary with +exactly one of "value" or "stream" as its single key. + +For non-streams, the value of "value" is as described above. + +{ + "obj:o,g": { + "value": ... + } +} + +For streams: + +{ + "obj:o,g": { + "stream": { + "dict": { ... stream dictionary ... }, + "filterable": bool, + "raw": "base64-encoded raw data", + "filtered": "base64-encoded filtered data" + } + } +} + +Notes about stream data: + +* Always include "dict". + +* Always include "filterable" regardless of value of + --json-stream-data. The value of filterable is influenced by + --decode-level, which is already in parameters. + +* Add new flag --json-stream-data={raw,filtered,none}. At most one of + "raw" and "filtered" will appear for each stream. + +* Add to parameters: value of json-stream-data, default is none + +* If none, omit stream data entirely + +* If raw, include raw stream data as base64 + +* If filtered, including the base64-encoded filtered stream data if we + can and should decode it based on decode-level. Otherwise, include + the base64-encoded raw data. See if we can honor + --normalize-content. + +Note that --json-stream-data=filtered is different from +--filtered-stream-data in that --filtered-stream-data implies +--decode-level=all while --json-stream-data=filtered does not. Make +sure this is mentioned in the help for both options. + +QPDFJob +======= + +Here are some ideas for QPDFJob that didn't make it into 10.6. Not all +of these are necessarily good -- just things to consider. + +* replace mode: --replace-object, --replace-stream-raw, + --replace-stream-filtered + * update first paragraph of QPDF JSON in the manual to mention this + * object numbers are not preserved by write, so object ID lookup + has to be done separately for each invocation + * you don't have to specify length for streams + * you only have to specify filtering for streams if providing raw data + +* Allow users to supply a custom progress reporter for QPDFJob + +* Better interoperability with json output: + + * Make sure all the things that print stuff to stdout have json + equivalents (check, showLinearizationData, etc.) + * There should be a way to get json output other than having it + print to stdout. It should be multi-language friendly and allow + for large amounts of data, such as providing a callback that qpdf + can write to (like a pipeline) + * See also JSON v2 + +* How do we chain jobs? The idea would be that the input and/or output + of a QPDFJob could be a QPDF object rather than a file. For input, + it's pretty easy. For output, none of the output-specific options + (encrypt, compress-streams, objects-streams, etc.) would have any + affect, so we would have to treat this like inspect for error + checking. The QPDF object in the state where it's ready to be sent + off to QPDFWriter would be used as the input to the next QPDFJob. + For the job json, I think we can have the output be an identifier + that can be used as the input for another QPDFJob. For a json file, + we could the top level detect if it's an array with the convention + that exactly one has an output, or we could have a subkey with other + job definitions or something. Ideally, any input + (copy-attachments-from, pages, etc.) could use a QPDF object. It + wouldn't surprise me if this exposes bugs in qpdf around foreign + streams as this has been a relatively fragile area before. + Documentation ============= @@ -210,6 +355,15 @@ This is a list of changes to make next time there is an ABI change. Comments appear in the code prefixed by "ABI" * Search for ABI to find items not listed here. +* Switch default --json to latest +* Take a fresh look at PointerHolder with a good plan for being able + to have developers phase it in using macros or something. Decide + about shared_ptr vs unique_ptr for each time make_shared_cstr is + called. For non-copiable classes, we can use unique_ptr instead of + shared_ptr as a replacement for PointerHolder. For performance + critical cases, we could potentially have a real pointer and a + shared pointer where the shared pointer's job is to clean up but we + use the real pointer for regular access. * See where anonymous namespaces can be used to keep things private to a source file. Search for `(class|struct)` in **/*.cc. * See if we can use constructor delegation instead of init() in diff --git a/cSpell.json b/cSpell.json index 688c9f1d..7332a2ca 100644 --- a/cSpell.json +++ b/cSpell.json @@ -411,6 +411,7 @@ "struct", "stylesheet", "subclassing", + "subkey", "subkeys", "subramanyam", "swversion",