Allow for the fact that when processing JSON input we cannot determine
whether references are dangling until the whole input has been processed.
We handle this by optimistically storing references in the object table.
When additional gens are encountered for the same object we store them
in the new map unconfirmed_objects. Once processing is complete we clean
up the object table and clear unconfirmed_objects.
New test cases are adapted from manual-qpdf-json.json etc.
Remove invalid objects introduced into the object table during xref table
parsing. At the moment, such invalid objects could override valid objects
if valid gen < invalid gen. Once the object table is indexed by object id
only, the invalid object will prevent valid objects with the same id
entering the object table.
The additional test case uses good10.pdf, with the dangling reference
in the trailer replaced with 3 1 R. Prior to this commit, this caused
the page object 3 0 to be masked and replaced with a null object.
Optimistically read subsection headers without reading individual object
entries, assuming that they are 20 bytes long as per the PDF spec. If
problems are encountered, fall back to calling bad_subsections.
Split reconstruction into two passes - scanning of input for objects and
insertion of objects into the xref table. This allows insertion to take
place in the usual reverse order and removes the need for a separate
insert_reconstructed method.
Create unresolved objects only for objects in the xref table (except during
parsing of the xref table). Do not add indirect nulls into the the object
cache as the result of a cache miss during a call to getObject except
during parsing or creation/updating from JSON. To support this behaviour,
add new private methods getObjectForParser and getObjectForJSON.
As a result of this change, dangling references are treated as direct nulls
rather than indirect nulls.
Code failed to allow for QPDF::getCompressibleObjSet deleting objects
from the object cache in case of multiple entries for the same object id.
Add fuzz test case 68668.
A parse error in stream data in which stream data contained a nested
object would cause a crash because qpdf was not correctly updating its
internal state. Rework the QPDF json reactor to not be sensitive to
parse errors in this way.
...since they have to be handled before other options. It was working
because, in both cases, `file` was alphabetically before the other
keys, but this implementation gives a stronger guarantee.