Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
Remove a redundant method that was equal to another one with
additional arguments. This breaks binary compatibility, but there are
other ABI breaking changes in the upcoming release, so now is the time
to do it.
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
This is CVE-2017-9208.
The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
This is CVE-2017-9210.
The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
When checking two objects preceding R while parsing, ensure that the
objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R
from causing an infinite loop in object resolution.
Remove const qualifier from getTypeCode and get getTypeName methods of
QPDFObjectHandle, make them work properly for indirect objects, and
exercise them much better in the test suite.
Fix exit status for case of errors without warnings, continue after
errors when possible, add test case for parsing a file with content
stream errors on some but not all pages.
Change object type Keyword to Operator, and place the order of the
object types in object_type_e in the same order as they are mentioned
in the PDF specification.
Note that this change only breaks backward compatibility with code
that has not yet been released.
Move object parsing code from QPDF to QPDFObjectHandle and
parameterize the parts of it that are specific to a QPDF object.
Provide a version that can't handle indirect objects and that can be
called on an arbitrary string.
A side effect of this change is that the offset used when reporting
invalid stream length has changed, but since the new value seems like
a better value than the old one, the test suite has been updated
rather than making the code backward compatible. This only effects
the offset reported for invalid streams that lack /Length or have an
invalid /Length key.
Updated some test code and exmaples to use QPDFObjectHandle::parse.
Supporting changes include adding a BufferInputSource constructor that
takes a string.
QPDFObjectHandle::{new,is,assert}Reserved, QPDF::replaceReserved
provide a mechanism to add objects to a PDF file when there are
circular references. This is a prerequisite to copying objects from
one PDF to another.
Breaking API change: length parameter has disappeared from the
StreamDataProvider version of QPDFObjectHandle::replaceStreamData
since it is no longer necessary to compute it in advance. This
breaking change is justified by the fact that removing the length
parameter provides the caller an opportunity to simplify the calling
code.
Also updated pdf_from_scratch test driver to use the new factories,
and made some cosmetic improvements and documentation updates for the
emptyPDF() method.