m-holger
253d3aee8f
Move QPDF::read_xrefEntry to QPDF::Xref_table
2024-09-18 10:25:37 +01:00
m-holger
3fbff84594
Move QPDF::reconstruct_xref to QPDF::Xref_table
...
Also, when recovering trailer from xref streams, pick the last valid
trailer encountered rather than the first.
2024-09-18 10:25:37 +01:00
m-holger
1e072e223a
Move QPDF::insertXrefEntry etc to QPDF::Xref_table
2024-09-18 10:25:37 +01:00
m-holger
0ac37bc956
Add new class QPDF::Xref_table
2024-09-18 10:25:37 +01:00
m-holger
f8e6274a2e
Move QPDF inner class definitions to new QPDF_private.hh
2024-09-18 10:25:37 +01:00
m-holger
8f54319f7a
Merge pull request #1179 from m-holger/null
...
In FUTURE, treat uninitialized object handles as null
2024-09-18 10:22:04 +01:00
m-holger
dcf111a9bc
Apply fuzzer Pl_Flate memory limit only when inflating
...
Fixes fuzz issue 71689.
2024-09-18 00:12:44 +01:00
m-holger
266d479735
Refactor QPDF_Array::at
...
Change the return type to a std::pair<bool, QPDFObjectHandle> in order to
allow a default constructed object handle (which is currently returned to
indicate failure) to become a valid object.
2024-09-17 09:59:00 +01:00
m-holger
0afaaea22a
Deprecate QPDFObjectHandle::isInitialized and remove from library
2024-09-17 09:59:00 +01:00
m-holger
bcf56e5333
Merge pull request #1269 from m-holger/hybrid
...
Fix handling of hybrid reference files in QPDF::read_xrefTable
2024-09-17 09:50:59 +01:00
m-holger
61f7d97b20
Merge pull request #1283 from m-holger/fuzz
...
Fix #1242
2024-09-17 00:19:52 +01:00
m-holger
54ac92eb1d
Merge pull request #1271 from m-holger/rsl
...
Fix QPDF::recoverStreamLength
2024-09-16 20:38:14 +01:00
m-holger
75091093fe
Merge pull request #1280 from m-holger/streams
...
Tidy QPDF_Stream
2024-09-16 19:52:24 +01:00
m-holger
ddfa3a24f0
Merge pull request #1281 from m-holger/input
...
Replace some std::shared_ptr parameters with reference parameters
2024-09-16 17:59:53 +01:00
m-holger
9ba6e070a1
Fix #1242
...
Ensure QPDF m->all_pages and invalid_page_found are reset if
getAllPagesInternal throws an exception.
Fixes fuzz case 71624.
2024-09-16 16:04:43 +01:00
m-holger
5d25aac6c7
In QPDFParser constructor change input parameter to InputSource&
2024-09-05 15:30:32 +01:00
m-holger
258343fcc9
In QPDF::readToken change input parameter to InputSource&
2024-09-05 15:23:28 +01:00
m-holger
20edfb3f91
In QPDF::damagedPDF change input parameter to InputSource&
2024-09-05 15:13:30 +01:00
m-holger
83e0f8da88
Tidy QPDF_Stream
...
1. Make class final
2. Pass og parameter by value
3. Properly initialize qpdf and og
Also, tweak QPDF::replaceObject to allow stream replacement without
violating the requirement that streams must always be indirect objects.
Also, remove QPDF::reserveStream as it does not do what the name implies
and having this as a separate methods does not aid code readability.
2024-09-04 16:00:57 +01:00
m-holger
7777ea84e7
Add new method ObjTable::emplace_back
2024-08-31 21:03:37 +01:00
m-holger
4badc78aea
Remove methods ObjTable::initialize
2024-08-31 15:01:45 +01:00
m-holger
0d08f65cb8
Add new method ObjTable::resize
2024-08-31 14:20:16 +01:00
m-holger
68ac2179bd
In ObjTable change maximum allowable object id to std::vector<T>::max_size()
...
Given that the PDF spec requires the xref table to contain entries for all
object ids <= the maximum id present in a PDF document, max_size is a
qpdf implementation limitation for legitimate object ids.
2024-08-31 12:55:53 +01:00
m-holger
64f9b7b242
Refactor QPDFObjectHandle::getTypeName
2024-08-27 10:39:33 +01:00
m-holger
8ed10d71ea
In qpdf_fuzzer and dct_fuzzer add a scan limit for Pl_DCT
2024-08-25 17:03:26 +01:00
m-holger
8cb9bce780
Add new commands --remove-metadata and --remove-info
2024-08-25 13:10:11 +01:00
m-holger
ef49291682
In QPDF::readObjectAtOffset fail early on 'expect n n obj'
2024-08-23 14:09:20 +01:00
m-holger
0b3debaf86
Merge pull request #1253 from m-holger/pl_t
...
Refactor Pl_QPDFTokenizer
2024-08-21 18:29:55 +01:00
m-holger
c02cb9a720
Fix QPDF::recoverStreamLength
...
Ensure the the recovered stream end is not part of a different object.
Test file is bad24.pdf with stream 4 'endstream' corrupted.
2024-08-20 15:14:01 +01:00
m-holger
42cd7a98ad
In QPDF::recoverStreamLength mark unreachable code
2024-08-20 12:52:33 +01:00
m-holger
f2228b1f88
Fix handling of hybrid reference files in QPDF::read_xrefTable
...
QPDF::read_xrefTable ignores type 0 entries for objects in a section if an
associates XRefStm has an entry for the same object.
The spec states:
When the conforming reader searches for an object, if an entry is not
found in any given standard cross-reference section, the search shall
proceed to a cross-reference stream specified by the XRefStm entry
before looking in the previous cross-reference section,
If a deleted entry is found in a section, the XRefStm is not searched
according to the standard.
2024-08-16 15:58:55 +01:00
m-holger
0eb29c7357
If Pl_Flate memory limit is exceeded do not attempt 'finish' processing
2024-08-09 11:08:30 +01:00
m-holger
0663f1f8db
Guard against 0 byte writes in Pl_Buffer and Pl_String
2024-08-07 16:19:16 +01:00
m-holger
06001ed25b
Refactor the creation of unresolved objects
...
Create unresolved objects only for objects in the xref table (except during
parsing of the xref table). Do not add indirect nulls into the the object
cache as the result of a cache miss during a call to getObject except
during parsing or creation/updating from JSON. To support this behaviour,
add new private methods getObjectForParser and getObjectForJSON.
As a result of this change, dangling references are treated as direct nulls
rather than indirect nulls.
2024-08-06 12:22:09 +01:00
m-holger
87ee8ad071
In QPDFParser constructor add add parameter parse_pdf
...
Prepare for treating indirect references differently depending on whether
we are parsing a PDF file (in which case reference to objects not in the
xref table are null even if they are in the object cache) or whether parse
from user code (in which case an indirect reference can refer to a user
created object).
2024-08-06 10:02:07 +01:00
m-holger
7a1ec75ee1
Fix writing reals with trailing '.' as JSON ( fixes #1261 )
2024-08-06 01:09:48 +01:00
m-holger
3bab4cf394
Refactor Pl_RunLength::decode
...
Buffer output locally.
Add qpdf_fuzzer test case.
2024-08-03 15:52:45 +01:00
m-holger
99f3a7b5a3
In QPDFWriter::writeLinearized remember whether streams are filtered
2024-08-02 21:05:17 +01:00
m-holger
634d924986
In QPDFWriter::willFilterStream remember unfilterable streams
2024-08-02 19:23:17 +01:00
m-holger
2bb9e06d1e
In qpdf_fuzzer add a memory limit for Pl_Flate
2024-07-28 19:54:38 +01:00
m-holger
aa4f288291
Refactor xref reconstruction
...
Avoid unnecessary rescanning of lines and repositioning of input file.
Limit max size of tokens.
2024-07-28 18:03:59 +01:00
m-holger
1536a76071
Refactor Pl_QPDFTokenizer::finish
...
Remove unnecessary use of shared pointers and avoid unnecessary string
creation.
2024-07-27 18:55:43 +01:00
m-holger
986a253cdd
Overload QPDFTokenizer::findEI to take a InputSource&
2024-07-27 18:27:49 +01:00
m-holger
4783b22312
In ContentNormalizer::handleToken refactor handling of space tokens
...
Avoid writing each space char individually.
2024-07-27 18:06:12 +01:00
m-holger
ffe462e67e
In ContentNormalizer::handleToken refactor handling of string and name tokens
2024-07-27 16:49:27 +01:00
m-holger
959ae4b4da
Avoid unnecessary string copies in ContentNormalizer::handleToken
2024-07-27 16:33:17 +01:00
m-holger
4f16961052
In MD5_native::transform disable sanitizer unsigned integer overflow checks
...
Wrap-around is intentional and generates false positives
2024-07-22 13:11:07 +01:00
m-holger
9ce18e41f4
Merge pull request #979 from m-holger/const
...
In FUTURE make various QPDFObjectHandle methods const
2024-07-19 10:50:08 +01:00
m-holger
5be057caf0
Merge pull request #1247 from m-holger/fuzz
...
Adjust fuzzer warning and memory limits
2024-07-18 22:24:54 +01:00
m-holger
9ac506509b
Merge pull request #1240 from m-holger/i1238
...
Fix QPDFOutlineDocumentHelper::resolveNamedDest (fixes #1238 )
2024-07-18 22:24:16 +01:00
m-holger
34729e37e0
Limit memory used by Pl_PNGFilter and Pl_TIFFPredictor during fuzzing
2024-07-18 16:50:30 +01:00
m-holger
fe1fffe8db
Change QPDF max_warnings into a hard limit
...
Throw damagedFile if max_warnings is exceeded. Change qpdf_fuzzer warnings limit to
limit to 500.
2024-07-18 16:50:08 +01:00
m-holger
992b7911ce
Limit the number of warnings in json_fuzzer before giving up
2024-07-16 15:36:58 +01:00
m-holger
25e11a444a
Throw an exception if the root of the pages tree misses the /Kids array
2024-07-16 14:44:47 +01:00
m-holger
7f2d76b78d
Remove non-dictionary objects from pages tree
2024-07-16 14:35:32 +01:00
m-holger
f3cbaafcac
Fix QPDFOutlineDocumentHelper::resolveNamedDest ( fixes #1238 )
...
Handle case where named destination is a dictionary with /D entry.
Test case is hand-edited outlines-with-old-root-dests.pdf with modified
object 107.
2024-07-14 12:15:45 +01:00
m-holger
186fca6d8d
Add further sanity checks to QPDF::reconstruct_xref
...
Run getAllPages as sanity check and throw an exception if too many
warnings are generated or no pages are found.
2024-07-13 14:51:14 +01:00
m-holger
963574f27f
Refactor QPDFOutlineDocumentHelper::resolveNamedDest
2024-07-13 11:34:02 +01:00
m-holger
722148de3d
Further limit size of uncompressed JPEG for fuzzing
...
Try a limit of 50MB. For very large limits processing time before
damage is encountered may exceed oss-fuzz limits.
Add further test cases.
2024-07-11 14:32:22 +01:00
m-holger
e914bbbbbc
Add further sanity check to QPDF::reconstruct_xref
...
If reconstruct_xref generates more than 1000 warnings give up because the
file is so severely damaged that there is very little point continuing.
2024-07-11 13:25:07 +01:00
m-holger
c2c1618e08
Add extra sanity check on pages tree
...
Reject non-dictionary Page and Pages objects.
Also add additional qpdf_fuzzer test cases.
2024-07-10 19:03:23 +01:00
m-holger
2b6500ea17
In Pl_DCT::decompress refactor handling of corrupt data
...
If throw_on_corrupt is set, use a custom implementation of libjeg's
emit_message procedure to throw an exception when the first corrupt data
warning is encountered.
2024-07-09 20:55:51 +01:00
m-holger
2e378d920d
Add additional sanity check during xref reconstruction
...
Check that xref table is not empty after recovery. Empty xref tables
disable other sanity checks.
2024-07-09 17:01:44 +01:00
m-holger
7445e0ac1e
Fix QPDF::setSuppressWarnings
2024-07-09 16:38:02 +01:00
m-holger
43004e3399
Fix Pl_DCT memory limit
2024-07-08 13:31:02 +01:00
m-holger
c1cd3ec8a0
In QPDF::processXRefIndex check number of objects in subsection is > 0
...
Fixes oss-fuzz 70055
2024-07-06 16:09:50 +01:00
m-holger
f0ded6bca8
Add test case for self-referential object streams
...
Previous test case was lost in #1221 . Test file was created from
object-stream.pdf by adding a reference to itself into object stream 1 0.
2024-07-04 20:40:47 +01:00
m-holger
edf3509b78
Treat corrupt JPEG streams as unfilterable
2024-07-04 17:06:42 +01:00
Jay Berkenbilt
598268f6ad
Add setMaxWarnings rather than using conditional compilation
2024-07-03 15:44:44 +01:00
Jay Berkenbilt
65bd8bc57d
Add DCT decompression config methods in favor of compile-time changes
...
As a rule, we should avoid conditional compilation is it always causes
code paths that are sometimes not even seen lexically by the compiler.
Also, we want the actual code being fuzzed to be as close as possible
to the real code. Conditional compilation is suitable to handle
underlying system differences.
Instead, favor configuration using callbacks or other methods that can
be triggered in the places where they need to be exercised.
2024-07-03 15:43:38 +01:00
m-holger
a367e56afc
In QPDF::resolveObjectsInStream avoid creating xref table entries
...
Invalid entries are created when objects in the stream do not have
an existing xref entry.
2024-07-02 01:16:23 +01:00
m-holger
6d640c569a
Add additional object id sanity checks
...
Ensure objects with impossibly large ids are ignored.
2024-07-02 01:16:23 +01:00
m-holger
42c511198b
Suppress excessive warnings while fuzzing
...
Add extra fuzz test case and amend memory limit for Pl_DCT.
2024-07-02 01:16:23 +01:00
m-holger
9081ac69cd
Merge pull request #1227 from m-holger/fuzz6
...
Refine #1225
2024-06-30 01:50:36 +01:00
m-holger
18c52640cc
Refine #1225
2024-06-29 14:47:03 +01:00
m-holger
0a081e1f09
In QPDFOutlineObjectHelper detect loops in direct children
...
Also, add diagnostic messages in qpdf_fuzzer and additional fuzz test case.
2024-06-29 12:38:07 +01:00
m-holger
c93b149b4d
Limit memory used for JPEG decompression during fuzzing
2024-06-28 21:15:45 +01:00
m-holger
6ed2880405
Merge pull request #1224 from m-holger/fuzz3
...
Fix #1170
2024-06-27 08:47:42 +01:00
m-holger
732aab8610
Merge pull request #1222 from m-holger/fuzz2
...
In PL_DCT add option to limit the size of uncompressed corrupt data
2024-06-27 08:20:01 +01:00
m-holger
8ae3ef28ac
Fix #1170
...
In QPDF::read_xrefEntry add buffer overflow test for first eol character.
Overlong f1 or f2 entries consisting only of zeros could cause a buffer
overflow.
Add fuzz testcase 69913.
2024-06-27 08:17:58 +01:00
m-holger
3d569e2171
Merge pull request #1221 from m-holger/fuzz
...
Refine handling of severely damaged files
2024-06-27 01:18:37 +01:00
m-holger
d83cf43811
In PL_DCT add option to limit the size of uncompressed corrupt data
...
Also, apply limit in dct_fuzzer
2024-06-26 11:57:29 +01:00
m-holger
4a8c821e3e
In QPDF::reconstruct_xref add sanity check for object ids
2024-06-25 15:46:47 +01:00
m-holger
e62973d277
In QPDF check for page tree after reading xref table
...
Also add new fuzz test case.
2024-06-25 15:18:54 +01:00
m-holger
295f62f041
Merge pull request #1170 from m-holger/readxref
...
Refactor QPDF::parse_xrefEntry
2024-06-19 20:08:44 +01:00
m-holger
6ad16cd1fd
In FUTURE make QPDFObjectHandle methods const and noexcept where possible
2024-06-19 10:34:01 +01:00
m-holger
9641626cae
Refactor resolving of objects
2024-06-19 10:34:01 +01:00
m-holger
ce5b864c53
Merge pull request #1201 from m-holger/xref_stream
...
QPDF::processXRefStream
2024-06-18 20:21:39 +01:00
Jay Berkenbilt
5e121c9690
Handle null form field from annotation ( fixes #1189 )
...
A file that has Widget annotations that can't be mapped back to form
fields would crash qpdf json.
2024-06-18 08:51:15 -04:00
Jay Berkenbilt
167057411e
Format code
2024-06-07 08:07:51 -04:00
Jay Berkenbilt
d17f11e721
Make QPDF::updateObjectMaps iterative
2024-06-06 15:22:14 -04:00
m-holger
2b0c2da720
Refactor QPDF::processXRefStream
...
Change the processed Index array to a vector of <first object, number of
entries> pairs.
2024-05-22 18:53:30 +01:00
m-holger
7477ea7828
Add new private method QPDF::processXRefSize
2024-05-22 17:07:42 +01:00
m-holger
f74b28f0d1
Add new private method QPDF::processXRefW
2024-05-22 17:07:37 +01:00
m-holger
0186d60dcf
Add new private method QPDF::processXRefIndex
2024-05-22 17:07:28 +01:00
m-holger
7aa5027bf8
Refactor QPDF::procesXRefStream
...
Add closure damaged to create damagedPDF exceptions.
2024-05-22 17:07:16 +01:00
m-holger
1737902a5e
Refactor QPDF::processXRefStream
...
Tune processing of subsections.
2024-05-21 20:31:52 +01:00
m-holger
f1c774f13f
Refactor QPDF::processXRefStream
...
Tune pointer arithmetic.
2024-05-21 20:31:40 +01:00
m-holger
8cd50e0e3e
Fix QPDF::tableSize
...
Apply temporary fix to deal with fuzz case 68915.
(Error is an integer overflow which would immediately cause a runtime error
as a result of a call to QInitCQIntC::to_size.)
2024-05-21 12:50:19 +01:00
m-holger
6f09069f43
Further refactor QUtil::call_main_from_wmain
2024-05-17 10:31:50 +01:00