If the stream isn't filterable but we call getStreamData, throw a
regular exception instead of a logic error so that normal error
handling and reporting mechanisms will be used.
Avoid calling jpeg_mem_src and jpeg_mem_dest. The custom destination
manager writes to the pipeline in smaller chunks to avoid having the
whole image in memory at once. The source manager works directly with
the Buffer object. Using customer managers avoids use of memory source
and destination managers, which are not present in older versions of
libjpeg still in use by some Linux distributions.
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
* Add support for PCLm using setPCLm() and writePCLm() methods in
QPDFWriter.hh and QPDFWriter.cc
* Add a function writePCLmHeader() for PCLm header in QPDFWriter
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
Very badly corrupted files may not have a retrievable root dictionary.
Handle that as a special case so that a more helpful error message can
be provided.
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
This is CVE-2017-9208.
The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
This is CVE-2017-9210.
The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
The 64 Bit file functions are supported by C++-Builder as well and
need to be used, else fseek will error out on larger files than 4 GB
like used in the large file test.
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
As reported in issue #40, a call to CryptAcquireContext in
SecureRandomDataProvider fails in a fresh windows install prior to any
user keys being created in AppData\Roaming\Microsoft\Crypto\RSA.
Thanks michalrames.
Pushing inherited objects to pages and getting all pages were both
prone to stack overflow infinite loops if there were loops in the
Pages dictionary. There is a general weakness in the code in that any
part of the code that traverses the Pages structure would be prone to
this and would have to implement its own loop detection. A more robust
fix may provide some general method for handling the Pages structure,
but it's probably not worth doing.
Note: addition of *Internal2 private functions was done rather than
changing signatures of existing methods to avoid breaking
compatibility.
Converting a password to an encryption key is supposed to copy up to a
certain number of bytes from a digest. Make sure never to copy more
than the size of the digest.
When checking two objects preceding R while parsing, ensure that the
objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R
from causing an infinite loop in object resolution.