For some reason, qpdf from the beginning was replacing indirect
references to null with literal null in arrays even after removing the
old behavior of flattening scalar references. This seems like a bad
idea.
Use PointerHolder in several places where manually memory allocation
and deallocation were being used. This helps to protect against memory
leaks when exceptions are thrown in surprising places.
In a small number of cases, it makes sense to replace an overloaded
function with a function that takes a default argument. We can do this
now because we've already broken binary compatibility since the last
release.
* Several assertions in linearization were not always true; change
them to run time errors
* Handle a few cases of uninitialized objects
* Handle pages with no contents when doing form operations
* Handle invalid page tree nodes when traversing pages
This makes all integer type conversions that have potential data loss
explicit with calls that do range checks and raise an exception. After
this commit, qpdf builds with no warnings when -Wsign-conversion
-Wconversion is used with gcc or clang or when -W3 -Wd4800 is used
with MSVC. This significantly reduces the likelihood of potential
crashes from bogus integer values.
There are some parts of the code that take int when they should take
size_t or an offset. Such places would make qpdf not support files
with more than 2^31 of something that usually wouldn't be so large. In
the event that such a file shows up and is valid, at least qpdf would
raise an error in the right spot so the issue could be legitimately
addressed rather than failing in some weird way because of a silent
overflow condition.
On read, ignore /DecodeParms when empty list; on write, delete it.
Some files have been found that include an empty list for
/DecodeParms, but this is not technically compliant with the spec, and
the only sensible interpretation is to treat it as if there are no
decode parameters.
Setting encryption permissions for R >= 3 set permission bits in
groups corresponding to menu options in Acrobat 5. The new API allows
the bits to be set individually.
On certain operations, such as iterating through all objects and
adding new indirect objects, walk through the entire object structure
and explicitly resolve any indirect references to non-existent
objects. That prevents new objects from springing into existence and
causing the previously dangling references to point to them.
There were a few places in the code that were checking that a pointer
wasn't null before deleting it, even though C++ has always allowed
delete 0. Most of the code did not perform these checks.
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
* Add support for PCLm using setPCLm() and writePCLm() methods in
QPDFWriter.hh and QPDFWriter.cc
* Add a function writePCLmHeader() for PCLm header in QPDFWriter
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
QPDFWriter was trying to make /Filter and /DecodeParms direct in all
cases, but there are some cases where /DecodeParms may refer to a
stream, which can't be direct. QPDFWriter doesn't actually need
/DecodeParms to be direct in that case because it won't be able to
filter the stream. Until we can handle this type of stream, just don't
make /Filter and /DecodeParms direct if we can't filter the stream
anyway.
Fixes #34
Fix problem: if the last object in the first part of a linearized file
had an offset that was below 65536 by less than the size of the hint
stream, the xref stream was invalid and the resulting file is not
usable.
Ideally, the library should never call assert outside of test code,
but it does in several places. For some cases where the assertion
might conceivably fail because of a problem with the input data,
replace assertions with exceptions so that they can be trapped by the
calling application. This commit surely misses some cases and
replaced some cases unnecessarily, but it should still be an
improvement.
4.2.0 was binary incompatible in spite of there being no deletions or
changes to any public methods. As such, we have to bump the ABI and
are fixing some API breakage while we're at it.
Previous 4.3.0 target is now 5.1.0.
Rework QPDFWriter to always track old object IDs and QPDFObjGen
instead of int, thus not discarding the generation number. Switch to
QPDF::getCompressibleObjGen() to properly handle the case of an old
object eligible for compression that has a generation of other than
zero.
Put a specific comment marker next to every piece of code that MSVC
gives warning 4996 for. This warning is generated for calls to
functions that Microsoft considers insecure or deprecated. This
change is in preparation for fixing all these cases even though none
of them are actually incorrect or insecure as used in qpdf. The
comment marker makes them easier to find so they can be fixed in
subsequent commits.
Original code was written before we could shallow copy objects, so all
the filtering was done by suppressing the output of certain keys and
replacing them with other keys. Now we can simplify the code greatly
by modifying shallow copies of dictionaries in place.
Read and write support is implemented for /V=5 with /R=5 as well as
/R=6. /R=5 is the deprecated encryption method used by Acrobat IX.
/R=6 is the encryption method used by PDF 2.0 from ISO 32000-2.
Allowing users to subclass InputSource and Pipeline to read and write
from/to arbitrary sources provides the maximum flexibility for users
who want to read and write from other than files or memory.
This fixes were to code added yesterday; the problems would not have
impacted any previously released code. These are all changes related
to the possibility that copyEncryptionParameters may be called on
behalf a different QPDF than the one being written.
With QPDF allowing integers to contain 64-bit quantities, this change
is necessary to be able to linearize files whose sizes might be larger
than 10 digits.
Significantly improve the code's use of off_t for file offsets, size_t
for memory sizes, and integer types in cases where there has to be
compatibility with external interfaces. Rework sections of the code
that would have prevented qpdf from working on files larger than 2 (or
maybe 4) GB in size.