There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
fixes #117
fixes #118
fixes #119
fixes #120
Several other infinite loop bugs were fixed by previous changes.
Include their test files in the test suite.
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
This is CVE-2017-9208.
The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
This is CVE-2017-9210.
The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
/dev/null is not portable, so use File::Spec instead, which provides
portable "paths" and especially "nul" on Windows. I changed all places
with hard coded /dev/null to be sure, while I think it only is a
problem in direct system calls, because the other executed commands go
to sh.exe from MSYS which itself should port /dev/null to NUL. The
test still pass, so shouldn't have made any harm...
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
fix-qdf was previously hard-coding the number of bytes for the f2
field of the xref stream entry. This addresses issue #37. Thanks
aluebcke for reporting.
Pushing inherited objects to pages and getting all pages were both
prone to stack overflow infinite loops if there were loops in the
Pages dictionary. There is a general weakness in the code in that any
part of the code that traverses the Pages structure would be prone to
this and would have to implement its own loop detection. A more robust
fix may provide some general method for handling the Pages structure,
but it's probably not worth doing.
Note: addition of *Internal2 private functions was done rather than
changing signatures of existing methods to avoid breaking
compatibility.
When checking two objects preceding R while parsing, ensure that the
objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R
from causing an infinite loop in object resolution.
Original reported here:
https://bugs.launchpad.net/ubuntu/+source/qpdf/+bug/1397413
The PDF specification says that the /Type key for nodes in the pages
dictionary (both /Page and /Pages) is required, but some PDF files
omit them. Use the presence of other keys to determine the type of
pages tree node this is if the type key is not found.
QPDFWriter was trying to make /Filter and /DecodeParms direct in all
cases, but there are some cases where /DecodeParms may refer to a
stream, which can't be direct. QPDFWriter doesn't actually need
/DecodeParms to be direct in that case because it won't be able to
filter the stream. Until we can handle this type of stream, just don't
make /Filter and /DecodeParms direct if we can't filter the stream
anyway.
Fixes #34
In compare image tests, use the gs device tiff24nc instead of tiff12nc
since the 4 bit per sample images created by tiff12nc could sometimes
trigger a bug in tiffcmp. Fixes #20.
In places where std::vector<T>(size_t) was used, either validate that
the size parameter is sane or refactor code to avoid the need to
pre-allocate the vector.
Space rather than newline after xref, missing /ID in trailer for
encrypted file. This enables qpdf to handle some files that xpdf can
handle. Adobe reader can't necessarily handle them.
Rework QPDFWriter to always track old object IDs and QPDFObjGen
instead of int, thus not discarding the generation number. Switch to
QPDF::getCompressibleObjGen() to properly handle the case of an old
object eligible for compression that has a generation of other than
zero.
Fix exit status for case of errors without warnings, continue after
errors when possible, add test case for parsing a file with content
stream errors on some but not all pages.
Original code was written before we could shallow copy objects, so all
the filtering was done by suppressing the output of certain keys and
replacing them with other keys. Now we can simplify the code greatly
by modifying shallow copies of dictionaries in place.
Read and write support is implemented for /V=5 with /R=5 as well as
/R=6. /R=5 is the deprecated encryption method used by Acrobat IX.
/R=6 is the encryption method used by PDF 2.0 from ISO 32000-2.
For linearization tests where we are actually comparing the exact
output of the test with a known file, uncompress stream data so we can
see what's there. This makes looking at future changes a little
easier.
Allowing users to subclass InputSource and Pipeline to read and write
from/to arbitrary sources provides the maximum flexibility for users
who want to read and write from other than files or memory.
Move object parsing code from QPDF to QPDFObjectHandle and
parameterize the parts of it that are specific to a QPDF object.
Provide a version that can't handle indirect objects and that can be
called on an arbitrary string.
A side effect of this change is that the offset used when reporting
invalid stream length has changed, but since the new value seems like
a better value than the old one, the test suite has been updated
rather than making the code backward compatible. This only effects
the offset reported for invalid streams that lack /Length or have an
invalid /Length key.
Updated some test code and exmaples to use QPDFObjectHandle::parse.
Supporting changes include adding a BufferInputSource constructor that
takes a string.
Add --copy-encryption and --encryption-file-password options to qpdf.
Also strengthen test suite for copying encryption. The strengthened
test suite would have caught the failure to preserve AES and the
failure to update the file version, which was invalidating the
encrypted data.
QPDFObjectHandle::{new,is,assert}Reserved, QPDF::replaceReserved
provide a mechanism to add objects to a PDF file when there are
circular references. This is a prerequisite to copying objects from
one PDF to another.
This works around cases where gs falsely complains about things in the
PDF files or gives warnings that don't harm the output. See comments
in qpdf.test for details.
- Add a way to set the minimum PDF version
- Add a way to force the PDF version
- Have isEncrypted return true if an /Encrypt dictionary exists even
when we can't read the file
- Allow qpdf_init_write to be called multiple times
- Update some comments in headers
git-svn-id: svn+q:///qpdf/trunk@748 71b93d88-0707-0410-a8cf-f5a4172ac649