2
1
mirror of https://github.com/qpdf/qpdf.git synced 2025-02-03 12:28:27 +00:00

998 Commits

Author SHA1 Message Date
m-holger
8753ffe335 Generalise last commit to xref parsing and reconstruction
Tests use a modified dangling-bad-xref.pdf.
2024-10-18 16:48:30 +01:00
m-holger
405f3765c5 Refactor QPDF::Objects / JSONReactor interaction
Allow for the fact that when processing JSON input we cannot determine
whether references are dangling until the whole input has been processed.

We handle this by optimistically storing references in the object table.
When additional gens are encountered for the same object we store them
in the new map unconfirmed_objects. Once processing is complete we clean
up the object table and clear unconfirmed_objects.

New test cases are adapted from manual-qpdf-json.json etc.
2024-10-18 16:23:46 +01:00
m-holger
55bca1a117 Index QPDF::Objects::table by object id only 2024-10-17 14:30:10 +01:00
m-holger
7a20ba0c6a Add new method Xref_table::prepare_obj_table
Remove invalid objects introduced into the object table during xref table
parsing. At the moment, such invalid objects could override valid objects
if valid gen < invalid gen. Once the object table is indexed by object id
only, the invalid object will prevent valid objects with the same id
entering the object table.

The additional test case uses good10.pdf, with the dangling reference
in the trailer replaced with 3 1 R. Prior to this commit, this caused
the page object 3 0 to be masked and replaced with a null object.
2024-10-17 14:30:10 +01:00
m-holger
43a88e1d28 Tweak #1287 comments 2024-09-27 11:58:46 +01:00
m-holger
1796365713
Merge branch 'main' into mslichao/capifreebuf 2024-09-27 11:31:55 +01:00
m-holger
50d385c858
Merge pull request #1274 from m-holger/meta
Add new commands --remove-metadata and --remove-info
2024-09-27 11:26:34 +01:00
m-holger
21f176d374 Add sanity check on trailer /Size entry 2024-09-20 15:28:49 +01:00
Chao Li(VISION)
f6ae1ff16a Rename to qpdf_oh_free_buffer 2024-09-20 04:53:32 +00:00
Chao Li(VISION)
8c1cde4ec3 Add C API qpdf_free_buffer to release memory allocated by stream data functions 2024-09-19 12:21:49 +00:00
m-holger
28c13f5492 Refactor Xref_table::subsections
Optimistically read subsection headers without reading individual object
entries, assuming that they are 20 bytes long as per the PDF spec. If
problems are encountered, fall back to calling bad_subsections.
2024-09-18 10:25:38 +01:00
m-holger
c0020cb17d Change Xref_table::table to std::vector
Temporarily disable 3 specific-bugs tests. Remove 'xref size mismatch'
test.
2024-09-18 10:25:38 +01:00
m-holger
91822ae6a1 Refactor Xref_table::reconstruct
Split reconstruction into two passes - scanning of input for objects and
insertion of objects into the xref table. This allows insertion to take
place in the usual reverse order and removes the need for a separate
insert_reconstructed method.
2024-09-18 10:25:38 +01:00
m-holger
ed65619428 Add new methods Xref_table::subsections
Calculate all subsections before reading subsection entries.

Duplicates some warnings for the time being.
2024-09-18 10:25:38 +01:00
m-holger
3fbff84594 Move QPDF::reconstruct_xref to QPDF::Xref_table
Also, when recovering trailer from xref streams, pick the last valid
trailer encountered rather than the first.
2024-09-18 10:25:37 +01:00
m-holger
d42fda60d5 Amend "recover file with xref stream" test
Change first xref stream dictionary to point to an invalid root in order
to detect failure to recover the last valid trailer.
2024-09-18 10:25:37 +01:00
m-holger
0afaaea22a Deprecate QPDFObjectHandle::isInitialized and remove from library 2024-09-17 09:59:00 +01:00
m-holger
8cb9bce780 Add new commands --remove-metadata and --remove-info 2024-08-25 13:10:11 +01:00
m-holger
c02cb9a720 Fix QPDF::recoverStreamLength
Ensure the the recovered stream end is not part of a different object.

Test file is bad24.pdf with stream 4 'endstream' corrupted.
2024-08-20 15:14:01 +01:00
m-holger
6b7a05a379 Fix test_driver comment 2024-08-15 18:59:55 +01:00
m-holger
06001ed25b Refactor the creation of unresolved objects
Create unresolved objects only for objects in the xref table (except during
parsing of the xref table). Do not add indirect nulls into the the object
cache as the result of a cache miss during a call to getObject except
during parsing or creation/updating from JSON. To support this behaviour,
add new private methods getObjectForParser and getObjectForJSON.

As a result of this change, dangling references are treated as direct nulls
rather than indirect nulls.
2024-08-06 12:22:09 +01:00
m-holger
f3cbaafcac Fix QPDFOutlineDocumentHelper::resolveNamedDest (fixes #1238)
Handle case where named destination is a dictionary with /D entry.

Test case is hand-edited outlines-with-old-root-dests.pdf with modified
object 107.
2024-07-14 12:15:45 +01:00
m-holger
e914bbbbbc Add further sanity check to QPDF::reconstruct_xref
If reconstruct_xref generates more than 1000 warnings give up because the
file is so severely damaged that there is very little point continuing.
2024-07-11 13:25:07 +01:00
m-holger
2e378d920d Add additional sanity check during xref reconstruction
Check that xref table is not empty after recovery. Empty xref tables
disable other sanity checks.
2024-07-09 17:01:44 +01:00
m-holger
f0ded6bca8 Add test case for self-referential object streams
Previous test case was lost in #1221. Test file was created from
object-stream.pdf by adding a reference to itself into object stream 1 0.
2024-07-04 20:40:47 +01:00
m-holger
6d640c569a Add additional object id sanity checks
Ensure objects with impossibly large ids are ignored.
2024-07-02 01:16:23 +01:00
m-holger
4a8c821e3e In QPDF::reconstruct_xref add sanity check for object ids 2024-06-25 15:46:47 +01:00
m-holger
e62973d277 In QPDF check for page tree after reading xref table
Also add new fuzz test case.
2024-06-25 15:18:54 +01:00
Jay Berkenbilt
5e121c9690 Handle null form field from annotation (fixes #1189)
A file that has Widget annotations that can't be mapped back to form
fields would crash qpdf json.
2024-06-18 08:51:15 -04:00
m-holger
02e89bbe47 Fix bug in QPDFWriter::preserveObjectStreams
Code failed to allow for QPDF::getCompressibleObjSet deleting objects
from the object cache in case of multiple entries for the same object id.

Add fuzz test case 68668.
2024-05-04 10:55:30 +01:00
m-holger
85107f39f2 Add bad xref table test 2024-03-06 15:26:14 +00:00
m-holger
862feed100 Add additional QPDFObjectHandle::Rectangle and Matrix tests 2024-02-20 14:53:18 +00:00
m-holger
36ee4ecc6e Add test for QPDFObjectHandle::isDirectNull 2024-02-20 13:02:16 +00:00
m-holger
a047d5497e Add test for QPDFObjectHandle::getStreamJSON 2024-02-20 00:49:41 +00:00
Jay Berkenbilt
7bc52c5728 set page labels: detect start page < 1 (fixes #939) 2024-02-17 16:13:42 -05:00
Jay Berkenbilt
e362bce8e8 Merge branch 'jw' from #1146 into work 2024-02-17 14:15:48 -05:00
m-holger
f0bc2f11ef Expose QPDFObjectHandle::writeJSON 2024-02-16 14:09:28 +00:00
m-holger
9379b76811 Add additional name token JSON tests
Also, test writing JSON v1 files and files with deeply nested containers.
2024-02-16 10:54:08 +00:00
m-holger
d28969bf37 Add additional sparse array JSON tests 2024-02-16 10:53:52 +00:00
Jay Berkenbilt
b1dad0de2a Fix previous fix to setting checkbox value (fixes #1056)
The code accepted values other than /Yes but still used /Yes as the
checked value instead of obeying the normal appearance dictionary.
2024-02-11 15:49:44 -05:00
Jay Berkenbilt
3490090fbc Detect JSON object whose value is an indirect object 2024-02-06 15:12:41 -05:00
Jay Berkenbilt
cb0f390cc1 Handle parse error stream data (fixes #1123)
A parse error in stream data in which stream data contained a nested
object would cause a crash because qpdf was not correctly updating its
internal state. Rework the QPDF json reactor to not be sensitive to
parse errors in this way.
2024-02-04 17:27:49 -05:00
m-holger
8ff20b0089 Allow "n:/pdf-syntax" JSON syntax for dictionary keys 2024-01-29 13:22:58 +00:00
m-holger
f0343565ed Tighten checks for invalid indirect references during xref reconstruction 2024-01-17 14:11:57 +00:00
m-holger
ed43691bf3 Tighten checks for invalid indirect references in QPDFParser 2024-01-17 13:15:13 +00:00
Jay Berkenbilt
ebb10f3256 Fix null pointer issue on array copy 2024-01-12 08:05:22 -05:00
Jay Berkenbilt
d339f8ad1a Add non-trivial multiple overlay/underlay tests 2024-01-11 06:13:57 -05:00
Jay Berkenbilt
90a97bf4ef Include filename in verbose output for overlay/underlay 2024-01-11 06:13:57 -05:00
Jay Berkenbilt
12f7a4461b Handle pages/under/overlay JSON file in begin
...since they have to be handled before other options. It was working
because, in both cases, `file` was alphabetically before the other
keys, but this implementation gives a stronger guarantee.
2024-01-10 16:45:14 -05:00
Jay Berkenbilt
9c723aeb56 Allow --file with --overlay and --underlay 2024-01-10 16:44:46 -05:00