2
1
mirror of https://github.com/qpdf/qpdf.git synced 2025-01-08 17:24:06 +00:00

Brush up roadmap in TODO.md

This commit is contained in:
Jay Berkenbilt 2024-01-06 18:40:54 -05:00
parent 55b0024899
commit 37bface8a2

76
TODO.md
View File

@ -31,23 +31,18 @@ Next
* Spell check: Have the spell-check script synchronize cSpell.json with .idea/dictionaries/qpdf.xml, * Spell check: Have the spell-check script synchronize cSpell.json with .idea/dictionaries/qpdf.xml,
which should be set to the union of all the validated user dictionaries. which should be set to the union of all the validated user dictionaries.
* Fix #874 -- make args in --encrypt to match the json and make positional fill in the gaps * Maybe fix #553 -- use file times for attachments (trivial with C++-20)
* Maybe fix #553 -- use file times for attachments
* std::string_view transition -- work being done by m-holger * std::string_view transition -- work being done by m-holger
* Break ground on "Document-level work" -- TODO-pages.md lives on a separate branch. * Support incremental updates. See "incremental updates" in [General](#general). See also issue #22.
* Standard for CLI and Job JSON support for JSON-based command-line arguments. Come up with a
standard way of supporting command-line arguments that take JSON specifications of things so that
* there is a predictable way to indicate whether an argument is a file or a JSON blob
* with QPDFJob JSON, make sure it is possible to directly include the JSON rather than having to
stringify a JSON blob
* One option might be to prepend file:// to a filename or otherwise to take a JSON blob. We could
have that as a particular type of argument that would behave properly for both job JSON and CLI.
* Support digital signatures. This probably requires support for incremental updates. See
"incremental updates" in rejected ideas. That description is out of date but would need to be
cleaned up. See also issue #22.
* Make it possible to see incremental updates in qdf mode. * Make it possible to see incremental updates in qdf mode.
* Make it possible to add incremental updates. * Make it possible to add incremental updates.
* We may want a writing mode that preserves object IDs. See #339. * We may want a writing mode that preserves object IDs. See #339.
* Support digital signatures. This probably requires support for incremental updates. First, add
support for verifying digital signatures. Then we can consider adding support for signing
documents, though the ability to sign documents is less useful without an interactive process of
filling in a field. We may want to support only a subset of digital signature with invisible
signature fields or with existing fields.
* Support public key security handler (Section 7.6.5.)
Possible future JSON enhancements Possible future JSON enhancements
================================= =================================
@ -328,6 +323,36 @@ NOTE: Some items in this list refer to files in my personal home directory or th
publicly accessible. This includes things sent to me by email that are specifically not public. Even publicly accessible. This includes things sent to me by email that are specifically not public. Even
so, I find it useful to make reference to them in this list. so, I find it useful to make reference to them in this list.
* Provide support in QPDFWriter for writing incremental updates. Provide support in qpdf for
preserving incremental updates. The goal should be that QDF mode should be fully functional for
files with incremental updates including fix_qdf. This will work best if original object IDs are
preserved when a file is written. We will also have to preserve generations, which are, I believe,
completely ignored by QPDFWriter. If an update adds an object with a higher generation, any
reference to the object with a lower generation resolves to the null object. Increasing the
generation represents reusing an object number, while keeping the generation the same is updating
an object. I think qpdf must handle generations correctly, but make sure to test this carefully.
Note that there's nothing that says an indirect object in one update can't refer to an object that
doesn't appear until a later update. This means that QPDF has to hang onto indirect nulls,
including when they appear as dictionary values. In this case, QPDF_Dictionary::getKeys() ignores
all keys with null values, and hasKey() returns false for keys that have null values. We would
probably want to make QPDF_Dictionary able to handle the special case of keys that are indirect
nulls and basically never have it drop any keys that are indirect objects. We also have to make
sure that the testing for this handles non-trivial cases of the targets of indirect nulls being
replaced by real objects in an update. Such indirect nulls should appear in tests as dictionary
values and as array values. In the distant past, qpdf used to replace indirect nulls with direct
nulls, but I think there are no longer any remnants of that behavior.
I'm not sure how this plays with linearization, if at all. For cases where incremental updates are
not being preserved as incremental updates and where the data is being folded in (as is always the
case with qpdf now), none of this should make any difference in the actual semantics of the files.
One thought about how to implement this would be to have a QPDF object that is an incremental
update to an underlying QPDF object. Objects would be resolved from the underlying QPDF if not
found in the main one. When you write this type of QPDF, it can either flatten or it can write as
incremental updates. Perhaps, in incremental mode, QPDF reads each increment as a separate QPDF
with this kind of layering.
* Consider enabling code scanning on GitHub. * Consider enabling code scanning on GitHub.
* Add an option --ignore-encryption to ignore encryption information and treat encrypted files as if * Add an option --ignore-encryption to ignore encryption information and treat encrypted files as if
@ -567,31 +592,6 @@ Rejected Ideas
* Investigate whether there is a way to automate the memory checker tests for Windows. * Investigate whether there is a way to automate the memory checker tests for Windows.
* (This idea may be revised with alterations. Some of this is out of date.) Provide support in
QPDFWriter for writing incremental updates. Provide support in qpdf for preserving incremental
updates. The goal should be that QDF mode should be fully functional for files with incremental
updates including fix_qdf.
Note that there's nothing that says an indirect object in one update can't refer to an object that
doesn't appear until a later update. This means that QPDF has to treat indirect null objects
differently from how it does now. QPDF drops indirect null objects that appear as members of
arrays or dictionaries. For arrays, it's handled in QPDFWriter where we make indirect nulls
direct. This is in a single if block, and nothing else in the code cares about it. We could just
remove that if block and not break anything except a few test cases that exercise the current
behavior. For dictionaries, it's more complicated. In this case, QPDF_Dictionary::getKeys()
ignores all keys with null values, and hasKey() returns false for keys that have null values. We
would probably want to make QPDF_Dictionary able to handle the special case of keys that are
indirect nulls and basically never have it drop any keys that are indirect objects.
If we make a change to have qpdf preserve indirect references to null objects, we have to note
this in ChangeLog and in the release notes since this will change output files. We did this before
when we stopped flattening scalar references, so this is probably not a big deal. We also have to
make sure that the testing for this handles non-trivial cases of the targets of indirect nulls
being replaced by real objects in an update. I'm not sure how this plays with linearization, if at
all. For cases where incremental updates are not being preserved as incremental updates and where
the data is being folded in (as is always the case with qpdf now), none of this should make any
difference in the actual semantics of the files.
* The second xref stream for linearized files has to be padded only because we need file_size as * The second xref stream for linearized files has to be padded only because we need file_size as
computed in pass 1 to be accurate. If we were not allowing writing to a pipe, we could seek back computed in pass 1 to be accurate. If we were not allowing writing to a pipe, we could seek back
to the beginning and fill in the value of /L in the linearization dictionary as an optimization to to the beginning and fill in the value of /L in the linearization dictionary as an optimization to