2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-06-26 07:12:45 +00:00
Commit Graph

247 Commits

Author SHA1 Message Date
Jay Berkenbilt
30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ca5b1d267a Improve stream length recovery
Eliminate PCRE and find endobj not preceded by endstream. Be more lax
about placement of endstream and endobj.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
3082e4e606 Find xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
1765c6ec20 Find header without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ef8ae5449d Allow QPDFTokenizer::readToken to return bad tokens
Sometimes we want to ignore bad tokens rather than having them throw
an exception. A coverage case is commented out here and added in a
later commit.
2017-08-10 19:01:41 -04:00
Jay Berkenbilt
c5dc6d8067 Remove unused PointerHolder interface
Also fix a bug resulting from incorrect use of PointerHolder because
of this unused parameter.
2017-08-10 19:01:38 -04:00
Jay Berkenbilt
ff6971fb1c Call PointerHolder constructor properly (fixes #135)
Passed arguments to the constructor in the wrong order.
2017-08-09 22:00:49 -04:00
Jay Berkenbilt
49825e5cb6 Add --split-pages option (fixes #30) 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
a60eb552d3 Split bug tests into separate chunk 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
1ec59c299d Refactor write_output 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
909daf9543 Move page spec processing earlier 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
24f28f0768 Split qpdf.cc's main into reasonably sized functions
main() had gotten absurdly long. Split it into reasonable chunks. This
refactoring is in preparation for handling splitting output into
single pages.
2017-08-05 08:24:05 -04:00
Jay Berkenbilt
c88eaae2f2 Fix off-by-one error in --pages argument parsing (fixes #129) 2017-08-02 21:08:43 -04:00
Jay Berkenbilt
2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt
5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt
885b8781cc Allow --check to coexist with and precede other operations (fixes #42) 2017-07-29 19:56:21 -04:00
Jay Berkenbilt
b43a0ac237 When recover stream length, indicate the length (fixes #44) 2017-07-29 19:15:06 -04:00
Jay Berkenbilt
f37d399d82 Add newline-before-endstream option (fixes #103) 2017-07-29 12:21:38 -04:00
Jay Berkenbilt
6a7d53ad2b Handle zlib data errors better (fixes #106) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
07d6f770b2 Better recovery of bad stream start (fixes #104) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
3a1ff5ded9 Add option to preserve unreferenced objects 2017-07-28 19:19:11 -04:00
Jay Berkenbilt
a94a729fee Explicitly check root dictionary type
Very badly corrupted files may not have a retrievable root dictionary.
Handle that as a special case so that a more helpful error message can
be provided.
2017-07-28 18:03:30 -04:00
Jay Berkenbilt
7f8892525f Add precheck streams capability
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
2017-07-27 23:42:27 -04:00
Jay Berkenbilt
428d96dfe1 Convert many more errors to warnings 2017-07-27 22:57:55 -04:00
Jay Berkenbilt
a4fd4b91c6 Convert stream filtering errors to warnings 2017-07-27 18:43:07 -04:00
Jay Berkenbilt
40f00122b8 Convert object parsing errors to warnings
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
2017-07-27 18:20:31 -04:00
Jay Berkenbilt
ac3c81a8ed Include tests for other infinite loop bugs
fixes #117
fixes #118
fixes #119
fixes #120

Several other infinite loop bugs were fixed by previous changes.
Include their test files in the test suite.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
701b518d5c Detect recursion loops resolving objects (fixes #51)
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
afe0242b26 Handle object ID 0 (fixes #99)
This is CVE-2017-9208.

The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
315092dd98 Avoid xref reconstruction infinite loop (fixes #100)
This is CVE-2017-9209.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
603f222365 Fix infinite loop while reporting an error (fixes #101)
This is CVE-2017-9210.

The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
2017-07-26 06:24:07 -04:00
Thorsten Schöning
e80b6e3341 Support paths with spaces 2016-01-24 11:52:09 -05:00
Thorsten Schöning
eff935ab60 Use absolute paths for large file tests
Working with absolute paths makes debugging easier, but some called
scripts always need / as dir separator or won't work.
2016-01-24 11:52:09 -05:00
Thorsten Schöning
adbaa54ad4 Fix non-portable use of /dev/null
/dev/null is not portable, so use File::Spec instead, which provides
portable "paths" and especially "nul" on Windows. I changed all places
with hard coded /dev/null to be sure, while I think it only is a
problem in direct system calls, because the other executed commands go
to sh.exe from MSYS which itself should port /dev/null to NUL. The
test still pass, so shouldn't have made any harm...
2016-01-24 11:52:09 -05:00
Thorsten Schöning
951dbc3b7f Fix expr syntax, support spaces in paths
expr needs ARG + ARG
quote paths to support support spaces
2016-01-24 11:52:09 -05:00
Thorsten Schöning
3c1555a622 Explicitly invoke shell scripts with sh
Shebang doesn't work well on Windows.
2016-01-24 11:52:09 -05:00
Jay Berkenbilt
b62cbe2508 Tolerate some mangled xref tables
If xref table entries lack the spec-required trailing whitespace or
contain a small amount of extra space, handle them anyway.
2015-10-31 18:56:43 -04:00
Jay Berkenbilt
b8bdef0ad1 Implement deterministic ID
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
2015-10-31 18:56:42 -04:00
Jay Berkenbilt
f77acbdbba Copyright 2015 2015-05-24 17:26:49 -04:00
Jay Berkenbilt
b356b9dfa2 fix-qdf: handle object streams with > 255 objects
fix-qdf was previously hard-coding the number of bytes for the f2
field of the xref stream entry. This addresses issue #37. Thanks
aluebcke for reporting.
2015-05-24 16:52:42 -04:00
Jay Berkenbilt
a11549a566 Detect loops in /Pages structure
Pushing inherited objects to pages and getting all pages were both
prone to stack overflow infinite loops if there were loops in the
Pages dictionary. There is a general weakness in the code in that any
part of the code that traverses the Pages structure would be prone to
this and would have to implement its own loop detection. A more robust
fix may provide some general method for handling the Pages structure,
but it's probably not worth doing.

Note: addition of *Internal2 private functions was done rather than
changing signatures of existing methods to avoid breaking
compatibility.
2015-02-21 19:47:11 -05:00
Jay Berkenbilt
c729e07d55 Avoid resolving arguments to R
When checking two objects preceding R while parsing, ensure that the
objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R
from causing an infinite loop in object resolution.
2015-02-21 17:51:08 -05:00
Jay Berkenbilt
d8900c2255 Handle page tree node with no /Type
Original reported here:
https://bugs.launchpad.net/ubuntu/+source/qpdf/+bug/1397413

The PDF specification says that the /Type key for nodes in the pages
dictionary (both /Page and /Pages) is required, but some PDF files
omit them. Use the presence of other keys to determine the type of
pages tree node this is if the type key is not found.
2014-12-29 10:17:21 -05:00
Jay Berkenbilt
caab1b0e16 Handle pages with no /Contents from getPageContents()
The spec allows /Contents to be omitted for pages that are blank, but
QPDFObjectHandle::getPageContents() was throwing an exception in this
case.
2014-11-14 13:43:34 -05:00
Jay Berkenbilt
9f8aba1db7 Handle indirect stream filter/decode parameters
QPDFWriter was trying to make /Filter and /DecodeParms direct in all
cases, but there are some cases where /DecodeParms may refer to a
stream, which can't be direct. QPDFWriter doesn't actually need
/DecodeParms to be direct in that case because it won't be able to
filter the stream. Until we can handle this type of stream, just don't
make /Filter and /DecodeParms direct if we can't filter the stream
anyway.

Fixes #34
2014-06-07 16:31:03 -04:00
Jay Berkenbilt
225b018290 Update Copyright to 2014 2014-01-14 15:40:02 -05:00
Jay Berkenbilt
c9a9fe9c2f Avoid traversing same object twice when copying objects
This is a performance fix.  The output is unchanged.

Fixes #28.
2013-12-26 11:51:50 -05:00