2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-06-16 08:52:21 +00:00
Commit Graph

104 Commits

Author SHA1 Message Date
Jay Berkenbilt
28453a4908 Add --keep-files-open flag (fixes #237) 2018-08-18 10:56:01 -04:00
Jay Berkenbilt
0b05111db8 Implement helper class for interactive forms 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
b4d6cf6836 Limit depth of nesting in direct objects (fixes #202)
This fixes CVE-2018-9918.
2018-04-15 16:11:22 -04:00
Jay Berkenbilt
ee44aef8d0 Treat loop in xref tables as damage (fixes #192)
Prior to this fix, if there was a loop detected in following /Prev
pointers in xref streams/tables, it would cause qpdf to lose data.
Note that this condition causes many PDF readers to hang or fail.
2018-03-05 14:26:58 -05:00
Jay Berkenbilt
2780a1871d Add C API for checking PDF files 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
ec538792fa Use inline image token type in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fefe25030e Inline image token type 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
d97474868d Lexer enhancements: EOF, comment, space
Significant enhancements to the lexer to improve EOF handling and to
support comments and spaces as tokens. Various other minor issues were
fixed as well.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
13d9756a45 Minor fixes to tokenizer 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
ec0087e3ce Support TIFF Predictor (fixes #171) 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
eaacf94005 Update C API with new QPDFWriter methods 2017-09-12 14:30:39 -04:00
Jay Berkenbilt
fabff0f3ec Limit token length during xref recovery
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
ddc6cf0cf6 Precheck streams by default
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt
df33c368b4 Change --single-pages to --split-pages
This is in preparation for implementing page groups.
2017-08-12 11:49:04 -04:00
Jay Berkenbilt
8249a26d69 Fix infinite loop in QPDFWriter (fixes #143) 2017-08-12 08:36:36 -04:00
Jay Berkenbilt
8fe0b06cd8 Pad encryption parameters that are too short (fixes #96) 2017-08-11 19:53:56 -04:00
Jay Berkenbilt
30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
49825e5cb6 Add --split-pages option (fixes #30) 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt
5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt
07d6f770b2 Better recovery of bad stream start (fixes #104) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
3a1ff5ded9 Add option to preserve unreferenced objects 2017-07-28 19:19:11 -04:00
Jay Berkenbilt
7f8892525f Add precheck streams capability
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
2017-07-27 23:42:27 -04:00
Jay Berkenbilt
428d96dfe1 Convert many more errors to warnings 2017-07-27 22:57:55 -04:00
Jay Berkenbilt
a4fd4b91c6 Convert stream filtering errors to warnings 2017-07-27 18:43:07 -04:00
Jay Berkenbilt
40f00122b8 Convert object parsing errors to warnings
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
2017-07-27 18:20:31 -04:00
Jay Berkenbilt
701b518d5c Detect recursion loops resolving objects (fixes #51)
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
afe0242b26 Handle object ID 0 (fixes #99)
This is CVE-2017-9208.

The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
b8bdef0ad1 Implement deterministic ID
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
2015-10-31 18:56:42 -04:00
Jay Berkenbilt
c9a9fe9c2f Avoid traversing same object twice when copying objects
This is a performance fix.  The output is unchanged.

Fixes #28.
2013-12-26 11:51:50 -05:00
Jay Berkenbilt
91367239fd Add --show-npages option to qpdf 2013-07-07 19:43:16 -04:00
Jay Berkenbilt
adccedc02f Allow numeric range to be omitted qpdf --pages
Detect a missing page range and assume 1-z.
2013-07-07 19:43:16 -04:00
Jay Berkenbilt
a85007cb0d Handle more broken files
Space rather than newline after xref, missing /ID in trailer for
encrypted file.  This enables qpdf to handle some files that xpdf can
handle.  Adobe reader can't necessarily handle them.
2013-06-15 12:40:01 -04:00
Jay Berkenbilt
16051788ed Handle /Outlines dictionary being a direct object
Even though this case is not valid according to the spec, it has been
seen, and caused an internal error.
2013-06-14 21:36:04 -04:00
Jay Berkenbilt
a3576a7359 Bug fix: handle generation > 0 when generating object streams
Rework QPDFWriter to always track old object IDs and QPDFObjGen
instead of int, thus not discarding the generation number.  Switch to
QPDF::getCompressibleObjGen() to properly handle the case of an old
object eligible for compression that has a generation of other than
zero.
2013-06-14 14:58:09 -04:00
Jay Berkenbilt
6c7bf114dc Bug fix: properly handle overridden compressed objects
When caching objects in an object stream, only cache objects that
still resolve to that stream.  See Changelog mod from this commit for
details.
2013-02-23 17:51:17 -05:00
Jay Berkenbilt
f81152311e Add QPDFObjectHandle::parseContentStream method
This method allows parsing of the PDF objects in a content stream or
array of content streams.
2013-01-20 15:35:39 -05:00
Jay Berkenbilt
f8306913ba Update "C" API with functions for new features 2012-12-31 10:32:32 -05:00
Jay Berkenbilt
9a23c3dcb6 Remove /Crypt from stream filters unconditionally
When writing a new stream, always remove /Crypt even if we are not
otherwise able to filter the stream.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
4237a29c94 Refactor Dictionary writing code
Original code was written before we could shallow copy objects, so all
the filtering was done by suppressing the output of certain keys and
replacing them with other keys.  Now we can simplify the code greatly
by modifying shallow copies of dictionaries in place.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
e57c25814e Support for encryption with /V=5 and /R=5 and /R=6
Read and write support is implemented for /V=5 with /R=5 as well as
/R=6.  /R=5 is the deprecated encryption method used by Acrobat IX.
/R=6 is the encryption method used by PDF 2.0 from ISO 32000-2.
2012-12-31 10:32:32 -05:00