2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-09-28 04:59:05 +00:00
Commit Graph

454 Commits

Author SHA1 Message Date
Jay Berkenbilt
b72a38bf5f Reorganize some test cases
Too many test cases were "miscellaneous".
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
c2e16827b6 Replace "file position" with "offset" in error messages
Sometimes it's an offset in an object stream or a content stream, so
file position is confusing in some cases.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
52e024f701 Include omitted object description in error message 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
cb3b705cf9 Include filename in object stream parse error 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
e410b0fe0d Simplify TokenFilter interface
Expose Pl_QPDFTokenizer, and have it do more of the work of managing
the token filter's pipeline.
2018-02-18 21:05:47 -05:00
Jay Berkenbilt
5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
25988e8d10 Bug fix: content normalizer should not add trailing newline
Adding a trailing newline in content normalization damages files whose
contents are split across streams in the middle of tokens. Let
QPDFWriter add the newline with the indicator to ignore the newline,
which it already does. This changes the way some qdf files look.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
cc108a7f1b Use pipePageContents in tokenizer test 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
6afe83978f Switch from parseContentStream to parsePageContents 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
ec538792fa Use inline image token type in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fefe25030e Inline image token type 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
d97474868d Lexer enhancements: EOF, comment, space
Significant enhancements to the lexer to improve EOF handling and to
support comments and spaces as tokens. Various other minor issues were
fixed as well.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
bb9e91adbd Create isolated tokenizer tests
This tokenizes outer parts of the file, page content streams, and
object streams. It is for exercising the tokenizer in isolation and is
being introduced before reworking the lexical layer of qpdf.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
ebd5ed63de Add option to save pass 1 of lineariziation
This is useful only for debugging the linearization code.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
e3167c1a60 Fix linearization for files with nonstandard ID length 2018-02-04 18:16:23 -05:00
Jay Berkenbilt
cffb6fd64a Test stream that ends with name token and no newline 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
13d9756a45 Minor fixes to tokenizer 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
569d74d36b Allow raw encryption key to be specified
Add options to enable the raw encryption key to be directly shown or
specified. Thanks to Didier Stevens <didier.stevens@gmail.com> for the
idea and contribution of one implementation of this idea.
2018-01-14 10:21:05 -05:00
Jay Berkenbilt
68572df2bf Update copyright to 2018 2018-01-13 20:25:58 -05:00
Jay Berkenbilt
791e0db762 Allow trailing . in numeric token (fixes #165) 2018-01-13 20:05:40 -05:00
Jay Berkenbilt
6299c64cf3 Use correct link directory order (fixes #158)
Make sure to link from the source tree before linking from the system.
In many environments, this is necessary to allow a newly built qpdf to
link properly instead of trying to link or resolve libraries from an
older installed version.
2018-01-13 19:53:52 -05:00
Jay Berkenbilt
ec0087e3ce Support TIFF Predictor (fixes #171) 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
be27d47bdc Use better error for getStreamData failure
If the stream isn't filterable but we call getStreamData, throw a
regular exception instead of a logic error so that normal error
handling and reporting mechanisms will be used.
2018-01-13 19:49:42 -05:00
Jay Berkenbilt
48864b8d6e Clarify documentation of advanced parsing options 2017-12-25 18:42:33 -05:00
Jay Berkenbilt
4edfe1f41d Add tests for new PNG filters 2017-12-25 18:20:52 -05:00
Jay Berkenbilt
07c8bb2843 Additionally license under Apache License version 2.0
The Apache License version 2.0 is now the primary license for qpdf.
However, users may, at their option, continue to use Artistic version
2.0.
2017-09-14 12:59:25 -04:00
Jay Berkenbilt
d31a7b76e7 Improve message for stream decoding error
Tweak the message so that we inform the user that we are mitigating
data loss.
2017-09-12 16:03:48 -04:00
Jay Berkenbilt
eaacf94005 Update C API with new QPDFWriter methods 2017-09-12 14:30:39 -04:00
Jay Berkenbilt
cbb2614975 Fix command-line parsing for --rotate 2017-09-07 22:58:37 -04:00
Jay Berkenbilt
ec7d74a386 Add test case for overflow in PNG filter (fixes #150) 2017-08-29 12:33:01 -04:00
Jay Berkenbilt
1868a10f8b Replace all atoi calls with QUtil::string_to_int
The latter catches underflow/overflow.
2017-08-29 12:28:32 -04:00
Jay Berkenbilt
abb3191c32 Add tests for previous memory issues
Now that the test suite runs clean with address sanitizer, add some
test cases that previously were used to expose memory errors.
2017-08-28 22:28:12 -04:00
Jay Berkenbilt
4f8c734d8e Missing free in some test code
There was a missing free causing a memory leak in some test code. The
memory leak was not in library code.
2017-08-26 22:04:49 -04:00
Jay Berkenbilt
ad527a64f9 Parse iteratively to avoid stack overflow (fixes #146) 2017-08-25 21:56:45 -04:00
Jay Berkenbilt
85f05cc57f Detect xref pointer infinite loop (fixes #149) 2017-08-25 19:58:31 -04:00
Jay Berkenbilt
e452d9dca6 Spell check 2017-08-22 14:22:20 -04:00
Jay Berkenbilt
fabff0f3ec Limit token length during xref recovery
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
6884ad2ead Fix logic error in recovery
A stray semicolon caused a condition to be incorrectly applied during
stream length recovery.
2017-08-22 07:19:41 -04:00
Jay Berkenbilt
8288a4eb3a Update copyright to 2017 2017-08-21 21:18:47 -04:00
Jay Berkenbilt
f08ce00e62 Add tests for PCLm
Files written in PCLm mode have to be created in a very specific way.
qpdf doesn't know how to create PCLm files from scratch. All it knows
how to do is to write an already valid file in a suitable way.
Therefore there is no command-line support for PCLm.
2017-08-21 21:05:47 -04:00
Jay Berkenbilt
ddc6cf0cf6 Precheck streams by default
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
e0d1cd1f4b Fix test case
There was an unintended recoverable error in a test file. It wasn't
hurting anything, but it was obscuring the actual intent of the test.
2017-08-19 14:50:55 -04:00
Jay Berkenbilt
cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt
d926d78059 Add --verbose flag 2017-08-12 12:30:18 -04:00
Jay Berkenbilt
2c6fe1805a Support groups of pages in --split-pages (fixes #30) 2017-08-12 12:08:23 -04:00
Jay Berkenbilt
df33c368b4 Change --single-pages to --split-pages
This is in preparation for implementing page groups.
2017-08-12 11:49:04 -04:00
Jay Berkenbilt
ad82706003 Note about veraPDF 2017-08-12 11:35:02 -04:00
Jay Berkenbilt
8249a26d69 Fix infinite loop in QPDFWriter (fixes #143) 2017-08-12 08:36:36 -04:00
Jay Berkenbilt
36b3fe5af7 Fix --newline-before-endstream option (fixes #133)
Add a newline unconditionally before endstream even if a newline was
already written as part of the stream data.
2017-08-11 20:57:05 -04:00
Jay Berkenbilt
46611f0710 Prevent a division by zero error (fixes #141)
Bad /W in an xref stream could cause a division by zero error. Now
this is handled as a special case.
2017-08-11 20:11:19 -04:00
Jay Berkenbilt
8fe0b06cd8 Pad encryption parameters that are too short (fixes #96) 2017-08-11 19:53:56 -04:00
Jay Berkenbilt
0c99cf874b Sanitize test suite
Remove problematic test files
2017-08-11 07:41:11 -04:00
Jay Berkenbilt
30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ca5b1d267a Improve stream length recovery
Eliminate PCRE and find endobj not preceded by endstream. Be more lax
about placement of endstream and endobj.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
3082e4e606 Find xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
1765c6ec20 Find header without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ef8ae5449d Allow QPDFTokenizer::readToken to return bad tokens
Sometimes we want to ignore bad tokens rather than having them throw
an exception. A coverage case is commented out here and added in a
later commit.
2017-08-10 19:01:41 -04:00
Jay Berkenbilt
c5dc6d8067 Remove unused PointerHolder interface
Also fix a bug resulting from incorrect use of PointerHolder because
of this unused parameter.
2017-08-10 19:01:38 -04:00
Jay Berkenbilt
ff6971fb1c Call PointerHolder constructor properly (fixes #135)
Passed arguments to the constructor in the wrong order.
2017-08-09 22:00:49 -04:00
Jay Berkenbilt
49825e5cb6 Add --split-pages option (fixes #30) 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
a60eb552d3 Split bug tests into separate chunk 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
1ec59c299d Refactor write_output 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
909daf9543 Move page spec processing earlier 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
24f28f0768 Split qpdf.cc's main into reasonably sized functions
main() had gotten absurdly long. Split it into reasonable chunks. This
refactoring is in preparation for handling splitting output into
single pages.
2017-08-05 08:24:05 -04:00
Jay Berkenbilt
c88eaae2f2 Fix off-by-one error in --pages argument parsing (fixes #129) 2017-08-02 21:08:43 -04:00
Jay Berkenbilt
2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt
5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt
885b8781cc Allow --check to coexist with and precede other operations (fixes #42) 2017-07-29 19:56:21 -04:00
Jay Berkenbilt
b43a0ac237 When recover stream length, indicate the length (fixes #44) 2017-07-29 19:15:06 -04:00
Jay Berkenbilt
f37d399d82 Add newline-before-endstream option (fixes #103) 2017-07-29 12:21:38 -04:00
Jay Berkenbilt
6a7d53ad2b Handle zlib data errors better (fixes #106) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
07d6f770b2 Better recovery of bad stream start (fixes #104) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
3a1ff5ded9 Add option to preserve unreferenced objects 2017-07-28 19:19:11 -04:00
Jay Berkenbilt
a94a729fee Explicitly check root dictionary type
Very badly corrupted files may not have a retrievable root dictionary.
Handle that as a special case so that a more helpful error message can
be provided.
2017-07-28 18:03:30 -04:00
Jay Berkenbilt
7f8892525f Add precheck streams capability
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
2017-07-27 23:42:27 -04:00
Jay Berkenbilt
428d96dfe1 Convert many more errors to warnings 2017-07-27 22:57:55 -04:00
Jay Berkenbilt
a4fd4b91c6 Convert stream filtering errors to warnings 2017-07-27 18:43:07 -04:00
Jay Berkenbilt
40f00122b8 Convert object parsing errors to warnings
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
2017-07-27 18:20:31 -04:00
Jay Berkenbilt
ac3c81a8ed Include tests for other infinite loop bugs
fixes #117
fixes #118
fixes #119
fixes #120

Several other infinite loop bugs were fixed by previous changes.
Include their test files in the test suite.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
701b518d5c Detect recursion loops resolving objects (fixes #51)
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
afe0242b26 Handle object ID 0 (fixes #99)
This is CVE-2017-9208.

The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
315092dd98 Avoid xref reconstruction infinite loop (fixes #100)
This is CVE-2017-9209.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
603f222365 Fix infinite loop while reporting an error (fixes #101)
This is CVE-2017-9210.

The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
2017-07-26 06:24:07 -04:00
Thorsten Schöning
e80b6e3341 Support paths with spaces 2016-01-24 11:52:09 -05:00
Thorsten Schöning
eff935ab60 Use absolute paths for large file tests
Working with absolute paths makes debugging easier, but some called
scripts always need / as dir separator or won't work.
2016-01-24 11:52:09 -05:00
Thorsten Schöning
adbaa54ad4 Fix non-portable use of /dev/null
/dev/null is not portable, so use File::Spec instead, which provides
portable "paths" and especially "nul" on Windows. I changed all places
with hard coded /dev/null to be sure, while I think it only is a
problem in direct system calls, because the other executed commands go
to sh.exe from MSYS which itself should port /dev/null to NUL. The
test still pass, so shouldn't have made any harm...
2016-01-24 11:52:09 -05:00
Thorsten Schöning
951dbc3b7f Fix expr syntax, support spaces in paths
expr needs ARG + ARG
quote paths to support support spaces
2016-01-24 11:52:09 -05:00
Thorsten Schöning
3c1555a622 Explicitly invoke shell scripts with sh
Shebang doesn't work well on Windows.
2016-01-24 11:52:09 -05:00
Jay Berkenbilt
b62cbe2508 Tolerate some mangled xref tables
If xref table entries lack the spec-required trailing whitespace or
contain a small amount of extra space, handle them anyway.
2015-10-31 18:56:43 -04:00
Jay Berkenbilt
b8bdef0ad1 Implement deterministic ID
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
2015-10-31 18:56:42 -04:00
Jay Berkenbilt
f77acbdbba Copyright 2015 2015-05-24 17:26:49 -04:00
Jay Berkenbilt
b356b9dfa2 fix-qdf: handle object streams with > 255 objects
fix-qdf was previously hard-coding the number of bytes for the f2
field of the xref stream entry. This addresses issue #37. Thanks
aluebcke for reporting.
2015-05-24 16:52:42 -04:00
Jay Berkenbilt
a11549a566 Detect loops in /Pages structure
Pushing inherited objects to pages and getting all pages were both
prone to stack overflow infinite loops if there were loops in the
Pages dictionary. There is a general weakness in the code in that any
part of the code that traverses the Pages structure would be prone to
this and would have to implement its own loop detection. A more robust
fix may provide some general method for handling the Pages structure,
but it's probably not worth doing.

Note: addition of *Internal2 private functions was done rather than
changing signatures of existing methods to avoid breaking
compatibility.
2015-02-21 19:47:11 -05:00
Jay Berkenbilt
c729e07d55 Avoid resolving arguments to R
When checking two objects preceding R while parsing, ensure that the
objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R
from causing an infinite loop in object resolution.
2015-02-21 17:51:08 -05:00
Jay Berkenbilt
d8900c2255 Handle page tree node with no /Type
Original reported here:
https://bugs.launchpad.net/ubuntu/+source/qpdf/+bug/1397413

The PDF specification says that the /Type key for nodes in the pages
dictionary (both /Page and /Pages) is required, but some PDF files
omit them. Use the presence of other keys to determine the type of
pages tree node this is if the type key is not found.
2014-12-29 10:17:21 -05:00
Jay Berkenbilt
caab1b0e16 Handle pages with no /Contents from getPageContents()
The spec allows /Contents to be omitted for pages that are blank, but
QPDFObjectHandle::getPageContents() was throwing an exception in this
case.
2014-11-14 13:43:34 -05:00
Jay Berkenbilt
9f8aba1db7 Handle indirect stream filter/decode parameters
QPDFWriter was trying to make /Filter and /DecodeParms direct in all
cases, but there are some cases where /DecodeParms may refer to a
stream, which can't be direct. QPDFWriter doesn't actually need
/DecodeParms to be direct in that case because it won't be able to
filter the stream. Until we can handle this type of stream, just don't
make /Filter and /DecodeParms direct if we can't filter the stream
anyway.

Fixes #34
2014-06-07 16:31:03 -04:00
Jay Berkenbilt
225b018290 Update Copyright to 2014 2014-01-14 15:40:02 -05:00
Jay Berkenbilt
c9a9fe9c2f Avoid traversing same object twice when copying objects
This is a performance fix.  The output is unchanged.

Fixes #28.
2013-12-26 11:51:50 -05:00
Jay Berkenbilt
e9a319fb95 Allow arbitrary whitespace, not just newline, after xref
Fixes #27.
2013-12-14 15:17:23 -05:00
Jay Berkenbilt
dc9df97466 Include <algorithm> for std::min, std::max 2013-11-29 10:48:16 -05:00
Jay Berkenbilt
157c936b97 Use 8 bit per sample images in tests
In compare image tests, use the gs device tiff24nc instead of tiff12nc
since the 4 bit per sample images created by tiff12nc could sometimes
trigger a bug in tiffcmp.  Fixes #20.
2013-11-21 13:41:37 -05:00
Jay Berkenbilt
a237e92445 Warn when -accessibility=n will be ignored
Also accept -accessibility=n with 256 bit keys even though it will be
ignored.
2013-10-18 10:45:15 -04:00
Jay Berkenbilt
ac9c1f0d56 Security: replace operator[] with at
For std::string and std::vector, replace operator[] with at.  This was
done using an automated process.  See README.hardening for details.
2013-10-18 10:45:14 -04:00
Jay Berkenbilt
0bfe902489 Security: avoid pre-allocating vectors based on file data
In places where std::vector<T>(size_t) was used, either validate that
the size parameter is sane or refactor code to avoid the need to
pre-allocate the vector.
2013-10-09 20:57:14 -04:00
Jay Berkenbilt
3eb4b066ab Security: better bounds checks for linearization data
The faulty code was only used during explicit checks of linearization
data.  Those checks are not part of normal reading or writing of PDF
files.
2013-10-09 19:50:09 -04:00
Jay Berkenbilt
b84f57e56d Ignore broken DecodeParms for stream with no filters 2013-07-07 19:43:16 -04:00
Jay Berkenbilt
91367239fd Add --show-npages option to qpdf 2013-07-07 19:43:16 -04:00
Jay Berkenbilt
adccedc02f Allow numeric range to be omitted qpdf --pages
Detect a missing page range and assume 1-z.
2013-07-07 19:43:16 -04:00
Jay Berkenbilt
a85007cb0d Handle more broken files
Space rather than newline after xref, missing /ID in trailer for
encrypted file.  This enables qpdf to handle some files that xpdf can
handle.  Adobe reader can't necessarily handle them.
2013-06-15 12:40:01 -04:00
Jay Berkenbilt
16051788ed Handle /Outlines dictionary being a direct object
Even though this case is not valid according to the spec, it has been
seen, and caused an internal error.
2013-06-14 21:36:04 -04:00
Jay Berkenbilt
eae8370cd9 Add optional /Length key in crypt filter dictionary 2013-06-14 20:42:39 -04:00
Jay Berkenbilt
a3576a7359 Bug fix: handle generation > 0 when generating object streams
Rework QPDFWriter to always track old object IDs and QPDFObjGen
instead of int, thus not discarding the generation number.  Switch to
QPDF::getCompressibleObjGen() to properly handle the case of an old
object eligible for compression that has a generation of other than
zero.
2013-06-14 14:58:09 -04:00
Jay Berkenbilt
96eb965115 Use QPDFObjectHandle::getObjGen() where appropriate
In internal code and examples, replace calls to getObjectID() and
getGeneration() with calls to getObjGen() where possible.
2013-06-14 14:58:09 -04:00
Jay Berkenbilt
29f5830325 Fix getTypeCode and getTypeName work for indirect objects
Remove const qualifier from getTypeCode and get getTypeName methods of
QPDFObjectHandle, make them work properly for indirect objects, and
exercise them much better in the test suite.
2013-03-05 13:35:46 -05:00
Jay Berkenbilt
119f2a4b68 Add method to terminate content stream parsing 2013-03-05 13:35:46 -05:00
Jay Berkenbilt
7be97b3e80 Fix long long format string for WIN32 2013-03-05 13:35:46 -05:00
Jay Berkenbilt
53bfa86084 Fix inadvertent pointer to integer cast 2013-03-05 13:35:46 -05:00
Jay Berkenbilt
fd64959398 Favor strerror_s and fopen_s on MSVC
Make remaining calls to fopen and strerror use strerror_s and fopen_s
on MSVC.
2013-03-05 13:35:46 -05:00
Jay Berkenbilt
ac4deac187 Call QUtil::safe_fopen in place of fopen
fopen was previuosly called wrapped by QUtil::fopen_wrapper, but
QUtil::safe_fopen does this itself, which is less cumbersome.
2013-03-05 13:35:46 -05:00
Jay Berkenbilt
6b9297882e Mark secure CRT warnings with comment
Put a specific comment marker next to every piece of code that MSVC
gives warning 4996 for.  This warning is generated for calls to
functions that Microsoft considers insecure or deprecated.  This
change is in preparation for fixing all these cases even though none
of them are actually incorrect or insecure as used in qpdf.  The
comment marker makes them easier to find so they can be fixed in
subsequent commits.
2013-03-05 13:33:32 -05:00
Jay Berkenbilt
30027481f7 Remove all old-style casts from C++ code 2013-03-04 16:45:16 -05:00
Jay Berkenbilt
32b62035ce Replace many calls to sprintf with QUtil::hex_encode
Add QUtil::hex_encode to encode binary data has a hexadecimal string,
and use it in place of sprintf where possible.
2013-03-04 16:45:15 -05:00
Jay Berkenbilt
9f1594656c Work around gcc 4.8.0 issue on ppc64
Change iteration to use size_t instead of int.  The code should be
equivalent in all reasonable cases, but the original way this was
coded was causing a test failure with gcc 4.8.0 on ppc64.  See
https://bugzilla.redhat.com/show_bug.cgi?id=915321 for additional
information.
2013-03-04 16:43:29 -05:00
Jay Berkenbilt
6c7bf114dc Bug fix: properly handle overridden compressed objects
When caching objects in an object stream, only cache objects that
still resolve to that stream.  See Changelog mod from this commit for
details.
2013-02-23 17:51:17 -05:00
Jay Berkenbilt
a5d8783f67 Improve qpdf --check
Fix exit status for case of errors without warnings, continue after
errors when possible, add test case for parsing a file with content
stream errors on some but not all pages.
2013-01-25 11:08:50 -05:00
Jay Berkenbilt
a7e8b8c789 Have qpdf --check parse content streams
Also move writing to null and parsing of content streams out of the
wrong if block.
2013-01-24 11:47:36 -05:00
Jay Berkenbilt
bfda717749 Cosmetic changes to be closer to Adobe terminology
Change object type Keyword to Operator, and place the order of the
object types in object_type_e in the same order as they are mentioned
in the PDF specification.

Note that this change only breaks backward compatibility with code
that has not yet been released.
2013-01-23 09:38:05 -05:00
Jay Berkenbilt
913eb5ac35 Add getTypeCode() and getTypeName()
Add virtual methods to QPDFObject, wrappers to QPDFObjectHandle, and
implementations to all the QPDF_Object types.
2013-01-22 10:01:45 -05:00
Jay Berkenbilt
f81152311e Add QPDFObjectHandle::parseContentStream method
This method allows parsing of the PDF objects in a content stream or
array of content streams.
2013-01-20 15:35:39 -05:00
Jay Berkenbilt
9261f3b922 Detect binary attachments better
This fix eliminates a false test failure on some platforms and makes
the binary test work properly whether characters with the high bit
set, when treated as integers, are negative or not.
2013-01-03 16:44:04 -05:00
Jay Berkenbilt
f8306913ba Update "C" API with functions for new features 2012-12-31 10:32:32 -05:00
Jay Berkenbilt
8843e499b8 Update copyright year to 2013
Also add copyright notice to a few public headers that were missing
one.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
9a23c3dcb6 Remove /Crypt from stream filters unconditionally
When writing a new stream, always remove /Crypt even if we are not
otherwise able to filter the stream.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
4237a29c94 Refactor Dictionary writing code
Original code was written before we could shallow copy objects, so all
the filtering was done by suppressing the output of certain keys and
replacing them with other keys.  Now we can simplify the code greatly
by modifying shallow copies of dictionaries in place.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
e57c25814e Support for encryption with /V=5 and /R=5 and /R=6
Read and write support is implemented for /V=5 with /R=5 as well as
/R=6.  /R=5 is the deprecated encryption method used by Acrobat IX.
/R=6 is the encryption method used by PDF 2.0 from ISO 32000-2.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
93ac1695a4 Support files with only attachments encrypted
Test cases added in a future commit since they depend on /R=6 support.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
eff2c9a679 Cosmetic change to test_driver source
Change variable name for better clarity.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
4fe6f61def Add missing test case from long ago
I noticed a test output file that was not accessed in the test suite
and added a test case for it.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
16a23368e7 Fix infinite loop trimming passwords with ( in them 2012-12-31 10:32:31 -05:00
Jay Berkenbilt
774584163f Add ExtensionLevel support to version handling
All version operations are now fully aware of extension levels.
2012-12-31 05:36:50 -05:00
Jay Berkenbilt
04c203ae06 Eliminate flattenScalarReferences 2012-12-31 05:36:48 -05:00
Jay Berkenbilt
b4b8b28ed2 Reference object with zero offset
This file used to exercise a zero offset test case when qpdf would
visit every object in the file.  After the next commit, qpdf no longer
touches unreferenced objects, so a reference had to be added to
continue to have this file exercise the zero offset case.
2012-12-27 11:36:48 -05:00
Jay Berkenbilt
35873031a7 Uncompress stream data for some linearization tests
For linearization tests where we are actually comparing the exact
output of the test with a known file, uncompress stream data so we can
see what's there.  This makes looking at future changes a little
easier.
2012-12-27 11:26:06 -05:00
Jay Berkenbilt
7f84239cad Find PDF header anywhere in the first 1024 bytes 2012-12-25 14:43:37 -05:00
Jay Berkenbilt
f256670eba Ignore objects with offset 0 2012-11-20 13:57:37 -05:00
Jay Berkenbilt
041397fdab Allow reading from InputSource and writing to Pipeline
Allowing users to subclass InputSource and Pipeline to read and write
from/to arbitrary sources provides the maximum flexibility for users
who want to read and write from other than files or memory.
2012-09-23 17:42:26 -04:00
Jay Berkenbilt
c1627d0438 Add QPDFWriter::setExtraHeaderText 2012-09-06 15:31:12 -04:00
Jay Berkenbilt
8d2b29ef98 Fix segmentation fault with use of QPDFWriter::setOutputMemory 2012-09-06 14:39:06 -04:00
Jay Berkenbilt
3c4110184c Add specially crafted test cases for EOF error
This replaces a PDF from the wild that I didn't want to include in the
test suite but used to verify the original fix.
2012-08-11 12:36:57 -04:00
Jay Berkenbilt
29e9c34fe3 Bug fix: let EOF resolve literal token
Previously only whitespace and comments did it.  This fix is needed
for object streams whose last object is a literal (name, integer,
real, string) not terminated by space or newline.
2012-08-11 09:29:04 -04:00
Jay Berkenbilt
32051283b9 Fix spelling errors 2012-07-29 14:44:12 -04:00
Jay Berkenbilt
bde98044f4 Improve password handling
Use --encryption-file-password, if given, in addition to --password as
a source for passwords for files specified in --pages.
2012-07-29 13:22:37 -04:00
Jay Berkenbilt
f83bddf882 Update copyright to 2012 2012-07-28 22:03:36 -04:00
Jay Berkenbilt
2280c4f6d1 Update documentation and version numbers
3.0.rc1
2012-07-28 22:03:36 -04:00
Jay Berkenbilt
5878d17f0d Add QPDF_ to some variables used by the test suite
LARGE_FILE_TEST_PATH -> QPDF_LARGE_FILE_TEST_PATH
SKIP_TEST_COMPARE_IMAGES -> QPDF_SKIP_TEST_COMPARE_IMAGES
2012-07-28 19:07:37 -04:00
Jay Berkenbilt
d2e8bae6a4 Mention page selection in basic options 2012-07-28 15:02:20 -04:00
Jay Berkenbilt
f689324214 Restore coverage case
Previous commit lost coverage case for buffer-based replaceStreamData.
2012-07-25 22:32:14 -04:00
Jay Berkenbilt
316328704b Windows compilation fixes 2012-07-21 20:51:56 -04:00
Jay Berkenbilt
5a02471bb1 Command-line page merging and splitting
Implement --pages ... -- option for qpdf.  Update TODO with remaining
things to document.
2012-07-21 20:33:33 -04:00
Jay Berkenbilt
6bbea4baa0 Implement QPDFObjectHandle::parse
Move object parsing code from QPDF to QPDFObjectHandle and
parameterize the parts of it that are specific to a QPDF object.
Provide a version that can't handle indirect objects and that can be
called on an arbitrary string.

A side effect of this change is that the offset used when reporting
invalid stream length has changed, but since the new value seems like
a better value than the old one, the test suite has been updated
rather than making the code backward compatible.  This only effects
the offset reported for invalid streams that lack /Length or have an
invalid /Length key.

Updated some test code and exmaples to use QPDFObjectHandle::parse.

Supporting changes include adding a BufferInputSource constructor that
takes a string.
2012-07-21 09:06:10 -04:00
Jay Berkenbilt
a101533e0a Add command line option to copy encryption from other file
Add --copy-encryption and --encryption-file-password options to qpdf.
Also strengthen test suite for copying encryption.  The strengthened
test suite would have caught the failure to preserve AES and the
failure to update the file version, which was invalidating the
encrypted data.
2012-07-15 21:15:24 -04:00
Jay Berkenbilt
db95960ac1 Bug fix: preserve AES when copying encryption parameters 2012-07-15 19:07:59 -04:00
Jay Berkenbilt
b501251291 qpdf: push inherited attributes to page when showing images
from qpdf command-line tool
2012-07-15 16:22:28 -04:00
Jay Berkenbilt
0575d77d77 Add public QPDFWriter::copyEncryptionParameters
Method to copy encryption parameters from another file.  Adapted from
existing code to copy encryption parameters from the original file.
2012-07-14 09:14:41 -04:00
Jay Berkenbilt
ee3682f106 test_driver: accept optional second file name
This way we don't have to hard-code the name of a second file in the
test driver for tests that require one.
2012-07-14 08:48:30 -04:00
Jay Berkenbilt
1c944e4c89 Have QPDFWriter detect foreign objects while writing
Throw an exception that directs the user to QPDF::copyForeignObject.
2012-07-14 08:07:23 -04:00
Jay Berkenbilt
e7b8f297ba Support copying objects from another QPDF object
This includes QPDF::copyForeignObject and supporting foreign objects
as arguments to addPage*.
2012-07-11 15:54:33 -04:00
Jay Berkenbilt
8a217eb3a2 Add concept of reserved objects
QPDFObjectHandle::{new,is,assert}Reserved, QPDF::replaceReserved
provide a mechanism to add objects to a PDF file when there are
circular references.  This is a prerequisite to copying objects from
one PDF to another.
2012-07-10 23:34:32 -04:00
Jay Berkenbilt
43d4f79352 Fix typo in variable name 2012-07-10 23:17:26 -04:00
Jay Berkenbilt
b9aded1a00 Favor string-based newStream method 2012-07-10 23:17:26 -04:00
Jay Berkenbilt
e2dedde4bd Don't require stream data provider to know length in advance
Breaking API change: length parameter has disappeared from the
StreamDataProvider version of QPDFObjectHandle::replaceStreamData
since it is no longer necessary to compute it in advance.  This
breaking change is justified by the fact that removing the length
parameter provides the caller an opportunity to simplify the calling
code.
2012-07-07 17:33:45 -04:00
Jay Berkenbilt
8705e2e8fc Add QPDFWriter method to output to FILE* 2012-07-05 21:24:04 -04:00
Jay Berkenbilt
3b5d72b946 Remove stray comment 2012-07-05 13:28:21 -04:00
Jay Berkenbilt
c227249ef1 Added test code for Tobias's changes 2012-07-04 23:19:32 -04:00
Tobias Hoffmann
abb53ac369 Limited inheritance to the attributes explicitly listed in the PDF spec
Previous versions of qpdf incorrectly passed arbitrary objects from
/Pages objects down to individual pages in direct contradition with
the PDF specification.  These are now left in /Pages.  When
intermediate /Pages nodes are being discarded as when the /Pages tree
is being flattened, a warning is issued when unknown keys are
encountered.
2012-07-04 23:04:55 -04:00
Jay Berkenbilt
5f59c32f87 Add a few minor enhancements to recent work
Test coverage case for new newStream method
Expose decimal_places argument for double-based newReal

All enhancements suggested by Tobias.
2012-06-27 10:43:27 -04:00
Tobias Hoffmann
43c404b45a Add QPDFObjectHandle::newStream(QPDF *, std::string const&)
This makes the code simpler than having to create a buffer of a fixed
size and copy the string to it.
2012-06-27 10:19:57 -04:00
Jay Berkenbilt
736bafbb9c Rename seek functions in QUtil 2012-06-26 23:10:10 -04:00
Jay Berkenbilt
df3c762600 Fix Windows compilation issue 2012-06-25 21:01:55 -04:00
Jay Berkenbilt
1a3e88ca09 Fix large file support for 32-bit Linux 2012-06-25 10:51:44 -04:00
Jay Berkenbilt
c16db4106c Increase padding in linearized files
With QPDF allowing integers to contain 64-bit quantities, this change
is necessary to be able to linearize files whose sizes might be larger
than 10 digits.
2012-06-24 15:56:59 -04:00
Jay Berkenbilt
8318d81ada Fix and test support for files >= 4 GB 2012-06-24 15:56:50 -04:00
Jay Berkenbilt
2a057ac0d4 Add test case for removing a page we don't have 2012-06-23 18:32:14 -04:00
Jay Berkenbilt
4f305488d8 Improve the FILE* version of QPDF::processFile 2012-06-23 18:23:06 -04:00
Jay Berkenbilt
6c0af0844c Switch some code to use empty newArray/newDictionary 2012-06-22 10:09:42 -04:00
Jay Berkenbilt
b6bdc0f595 Add factory methods for creating empty arrays and dictionaries.
Also updated pdf_from_scratch test driver to use the new factories,
and made some cosmetic improvements and documentation updates for the
emptyPDF() method.
2012-06-22 09:46:33 -04:00
Jay Berkenbilt
a0768e4190 Add QPDF::emptyPDF() and pdf_from_scratch test code 2012-06-21 23:09:05 -04:00
Jay Berkenbilt
d1ebe30ff6 Add QPDFObjectHandle::shallowCopy() 2012-06-21 16:15:09 -04:00
Jay Berkenbilt
1b364ad7b8 Add additional page API test cases 2012-06-21 15:27:32 -04:00
Jay Berkenbilt
3844aedd93 Add testing for page APIs 2012-06-21 15:01:02 -04:00
Jay Berkenbilt
eb802cfa8c Implement page manipulation APIs 2012-06-21 15:01:02 -04:00