2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-06-01 01:40:51 +00:00
Commit Graph

189 Commits

Author SHA1 Message Date
Jay Berkenbilt
fabff0f3ec Limit token length during xref recovery
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
ddc6cf0cf6 Precheck streams by default
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt
df33c368b4 Change --single-pages to --split-pages
This is in preparation for implementing page groups.
2017-08-12 11:49:04 -04:00
Jay Berkenbilt
8249a26d69 Fix infinite loop in QPDFWriter (fixes #143) 2017-08-12 08:36:36 -04:00
Jay Berkenbilt
8fe0b06cd8 Pad encryption parameters that are too short (fixes #96) 2017-08-11 19:53:56 -04:00
Jay Berkenbilt
30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
49825e5cb6 Add --split-pages option (fixes #30) 2017-08-05 10:22:33 -04:00
Jay Berkenbilt
2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt
5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt
07d6f770b2 Better recovery of bad stream start (fixes #104) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
3a1ff5ded9 Add option to preserve unreferenced objects 2017-07-28 19:19:11 -04:00
Jay Berkenbilt
7f8892525f Add precheck streams capability
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
2017-07-27 23:42:27 -04:00
Jay Berkenbilt
428d96dfe1 Convert many more errors to warnings 2017-07-27 22:57:55 -04:00
Jay Berkenbilt
a4fd4b91c6 Convert stream filtering errors to warnings 2017-07-27 18:43:07 -04:00
Jay Berkenbilt
40f00122b8 Convert object parsing errors to warnings
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
2017-07-27 18:20:31 -04:00
Jay Berkenbilt
701b518d5c Detect recursion loops resolving objects (fixes #51)
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
afe0242b26 Handle object ID 0 (fixes #99)
This is CVE-2017-9208.

The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
b8bdef0ad1 Implement deterministic ID
For non-encrypted files, determinstic ID generation uses file contents
instead of timestamp and file name. At a small runtime cost, this
enables generation of the same /ID if the same inputs are converted in
the same way multiple times.
2015-10-31 18:56:42 -04:00
Jay Berkenbilt
c9a9fe9c2f Avoid traversing same object twice when copying objects
This is a performance fix.  The output is unchanged.

Fixes #28.
2013-12-26 11:51:50 -05:00
Jay Berkenbilt
91367239fd Add --show-npages option to qpdf 2013-07-07 19:43:16 -04:00
Jay Berkenbilt
adccedc02f Allow numeric range to be omitted qpdf --pages
Detect a missing page range and assume 1-z.
2013-07-07 19:43:16 -04:00
Jay Berkenbilt
a85007cb0d Handle more broken files
Space rather than newline after xref, missing /ID in trailer for
encrypted file.  This enables qpdf to handle some files that xpdf can
handle.  Adobe reader can't necessarily handle them.
2013-06-15 12:40:01 -04:00
Jay Berkenbilt
16051788ed Handle /Outlines dictionary being a direct object
Even though this case is not valid according to the spec, it has been
seen, and caused an internal error.
2013-06-14 21:36:04 -04:00
Jay Berkenbilt
a3576a7359 Bug fix: handle generation > 0 when generating object streams
Rework QPDFWriter to always track old object IDs and QPDFObjGen
instead of int, thus not discarding the generation number.  Switch to
QPDF::getCompressibleObjGen() to properly handle the case of an old
object eligible for compression that has a generation of other than
zero.
2013-06-14 14:58:09 -04:00
Jay Berkenbilt
6c7bf114dc Bug fix: properly handle overridden compressed objects
When caching objects in an object stream, only cache objects that
still resolve to that stream.  See Changelog mod from this commit for
details.
2013-02-23 17:51:17 -05:00
Jay Berkenbilt
f81152311e Add QPDFObjectHandle::parseContentStream method
This method allows parsing of the PDF objects in a content stream or
array of content streams.
2013-01-20 15:35:39 -05:00
Jay Berkenbilt
f8306913ba Update "C" API with functions for new features 2012-12-31 10:32:32 -05:00
Jay Berkenbilt
9a23c3dcb6 Remove /Crypt from stream filters unconditionally
When writing a new stream, always remove /Crypt even if we are not
otherwise able to filter the stream.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
4237a29c94 Refactor Dictionary writing code
Original code was written before we could shallow copy objects, so all
the filtering was done by suppressing the output of certain keys and
replacing them with other keys.  Now we can simplify the code greatly
by modifying shallow copies of dictionaries in place.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
e57c25814e Support for encryption with /V=5 and /R=5 and /R=6
Read and write support is implemented for /V=5 with /R=5 as well as
/R=6.  /R=5 is the deprecated encryption method used by Acrobat IX.
/R=6 is the encryption method used by PDF 2.0 from ISO 32000-2.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
93ac1695a4 Support files with only attachments encrypted
Test cases added in a future commit since they depend on /R=6 support.
2012-12-31 10:32:32 -05:00
Jay Berkenbilt
16a23368e7 Fix infinite loop trimming passwords with ( in them 2012-12-31 10:32:31 -05:00
Jay Berkenbilt
774584163f Add ExtensionLevel support to version handling
All version operations are now fully aware of extension levels.
2012-12-31 05:36:50 -05:00
Jay Berkenbilt
04c203ae06 Eliminate flattenScalarReferences 2012-12-31 05:36:48 -05:00
Jay Berkenbilt
7f84239cad Find PDF header anywhere in the first 1024 bytes 2012-12-25 14:43:37 -05:00
Jay Berkenbilt
f256670eba Ignore objects with offset 0 2012-11-20 13:57:37 -05:00
Jay Berkenbilt
c1627d0438 Add QPDFWriter::setExtraHeaderText 2012-09-06 15:31:12 -04:00
Jay Berkenbilt
29e9c34fe3 Bug fix: let EOF resolve literal token
Previously only whitespace and comments did it.  This fix is needed
for object streams whose last object is a literal (name, integer,
real, string) not terminated by space or newline.
2012-08-11 09:29:04 -04:00
Jay Berkenbilt
bde98044f4 Improve password handling
Use --encryption-file-password, if given, in addition to --password as
a source for passwords for files specified in --pages.
2012-07-29 13:22:37 -04:00
Jay Berkenbilt
6bbea4baa0 Implement QPDFObjectHandle::parse
Move object parsing code from QPDF to QPDFObjectHandle and
parameterize the parts of it that are specific to a QPDF object.
Provide a version that can't handle indirect objects and that can be
called on an arbitrary string.

A side effect of this change is that the offset used when reporting
invalid stream length has changed, but since the new value seems like
a better value than the old one, the test suite has been updated
rather than making the code backward compatible.  This only effects
the offset reported for invalid streams that lack /Length or have an
invalid /Length key.

Updated some test code and exmaples to use QPDFObjectHandle::parse.

Supporting changes include adding a BufferInputSource constructor that
takes a string.
2012-07-21 09:06:10 -04:00
Jay Berkenbilt
db95960ac1 Bug fix: preserve AES when copying encryption parameters 2012-07-15 19:07:59 -04:00
Jay Berkenbilt
1c944e4c89 Have QPDFWriter detect foreign objects while writing
Throw an exception that directs the user to QPDF::copyForeignObject.
2012-07-14 08:07:23 -04:00
Jay Berkenbilt
e7b8f297ba Support copying objects from another QPDF object
This includes QPDF::copyForeignObject and supporting foreign objects
as arguments to addPage*.
2012-07-11 15:54:33 -04:00
Jay Berkenbilt
8a217eb3a2 Add concept of reserved objects
QPDFObjectHandle::{new,is,assert}Reserved, QPDF::replaceReserved
provide a mechanism to add objects to a PDF file when there are
circular references.  This is a prerequisite to copying objects from
one PDF to another.
2012-07-10 23:34:32 -04:00
Jay Berkenbilt
e2dedde4bd Don't require stream data provider to know length in advance
Breaking API change: length parameter has disappeared from the
StreamDataProvider version of QPDFObjectHandle::replaceStreamData
since it is no longer necessary to compute it in advance.  This
breaking change is justified by the fact that removing the length
parameter provides the caller an opportunity to simplify the calling
code.
2012-07-07 17:33:45 -04:00
Tobias Hoffmann
abb53ac369 Limited inheritance to the attributes explicitly listed in the PDF spec
Previous versions of qpdf incorrectly passed arbitrary objects from
/Pages objects down to individual pages in direct contradition with
the PDF specification.  These are now left in /Pages.  When
intermediate /Pages nodes are being discarded as when the /Pages tree
is being flattened, a warning is issued when unknown keys are
encountered.
2012-07-04 23:04:55 -04:00
Jay Berkenbilt
5f59c32f87 Add a few minor enhancements to recent work
Test coverage case for new newStream method
Expose decimal_places argument for double-based newReal

All enhancements suggested by Tobias.
2012-06-27 10:43:27 -04:00
Jay Berkenbilt
d1ebe30ff6 Add QPDFObjectHandle::shallowCopy() 2012-06-21 16:15:09 -04:00
Jay Berkenbilt
3844aedd93 Add testing for page APIs 2012-06-21 15:01:02 -04:00
Jay Berkenbilt
eb802cfa8c Implement page manipulation APIs 2012-06-21 15:01:02 -04:00
Jay Berkenbilt
bc1c4bb578 Add QPDF::processFile that takes an open FILE* 2012-06-21 08:00:35 -04:00
Jay Berkenbilt
76b1659177 enhance PointerHolder so that it can explicitly be told to use delete [] instead of delete, thus making it useful to run valgrind over qpdf during its test suite 2011-08-11 11:57:37 -04:00
Jay Berkenbilt
14fe2e6de3 qpdf_set_info_key, qpdf_get_info_key 2011-08-11 10:48:37 -04:00
Jay Berkenbilt
a42a4068b5 preserve /EncryptMetadata when copying encryption parameters 2011-08-10 19:47:18 -04:00
Jay Berkenbilt
7dc197ef88 implement replace and swap 2011-08-10 12:42:48 -04:00
Jay Berkenbilt
aeb892f99b accept stream keyword with CR only
git-svn-id: svn+q:///qpdf/trunk@1052 71b93d88-0707-0410-a8cf-f5a4172ac649
2011-04-30 21:46:09 +00:00
Jay Berkenbilt
6405d3928f be less conservative when skipping over inline images in content normalization
git-svn-id: svn+q:///qpdf/trunk@1050 71b93d88-0707-0410-a8cf-f5a4172ac649
2011-04-30 18:20:35 +00:00
Jay Berkenbilt
b36f62a326 add qpdf_read_memory to C API
git-svn-id: svn+q:///qpdf/trunk@1044 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-10-04 15:24:10 +00:00
Jay Berkenbilt
b1e0dcff16 handle stream filter abbreviations from table H.1
git-svn-id: svn+q:///qpdf/trunk@1025 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-09-05 15:00:44 +00:00
Jay Berkenbilt
bd7261da9b getRawStreamData()
git-svn-id: svn+q:///qpdf/trunk@1010 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-08-09 23:33:40 +00:00
Jay Berkenbilt
2dbc1006fb addPageContents
git-svn-id: svn+q:///qpdf/trunk@995 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-08-05 21:06:49 +00:00
Jay Berkenbilt
6f2bd7eb3a newStream
git-svn-id: svn+q:///qpdf/trunk@991 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-08-05 20:20:52 +00:00
Jay Berkenbilt
11df7809af add pipeline-based stream data replacement function
git-svn-id: svn+q:///qpdf/trunk@990 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-08-05 19:04:22 +00:00
Jay Berkenbilt
998a6cbee9 remove stream_data_handler; it wouldn't work as designed. replacement data implemented but not tested
git-svn-id: svn+q:///qpdf/trunk@988 71b93d88-0707-0410-a8cf-f5a4172ac649
2010-08-02 22:40:52 +00:00
Jay Berkenbilt
a80d9d176d add C interface for getting software version
git-svn-id: svn+q:///qpdf/trunk@903 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-24 13:23:20 +00:00
Jay Berkenbilt
7f5d78c2d1 improve C error handling interface
git-svn-id: svn+q:///qpdf/trunk@884 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-23 15:27:30 +00:00
Jay Berkenbilt
398354b6f0 update C API for error retrieval
git-svn-id: svn+q:///qpdf/trunk@830 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-20 00:24:44 +00:00
Jay Berkenbilt
3f8c4c2736 categorize all error messages and include object information if available
git-svn-id: svn+q:///qpdf/trunk@829 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-19 23:09:19 +00:00
Jay Berkenbilt
734ac1e1d2 deal with stream-specific crypt filters
git-svn-id: svn+q:///qpdf/trunk@827 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-19 01:58:31 +00:00
Jay Berkenbilt
a8715c495b add C API for R4 encryption
git-svn-id: svn+q:///qpdf/trunk@825 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-19 00:36:51 +00:00
Jay Berkenbilt
09175e4578 more testing, bug fix for linearized aes encrypted files
git-svn-id: svn+q:///qpdf/trunk@824 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-19 00:17:11 +00:00
Jay Berkenbilt
94131116a9 more notes, testing of cleartext metadata, some crypt filter fixes
git-svn-id: svn+q:///qpdf/trunk@823 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-18 19:54:24 +00:00
Jay Berkenbilt
4ccc9330a8 only seed randon number generater once for aes-cbc, try to avoid compressing Metadata streams
git-svn-id: svn+q:///qpdf/trunk@818 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-18 14:09:10 +00:00
Jay Berkenbilt
e25910b59a reading crypt filters is largely implemented but not fully tested
git-svn-id: svn+q:///qpdf/trunk@812 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-17 23:37:55 +00:00
Jay Berkenbilt
c2023db265 Implement changes suggested by Zarko and our subsequent conversations:
- Add a way to set the minimum PDF version
 - Add a way to force the PDF version
 - Have isEncrypted return true if an /Encrypt dictionary exists even
   when we can't read the file
 - Allow qpdf_init_write to be called multiple times
 - Update some comments in headers


git-svn-id: svn+q:///qpdf/trunk@748 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-10-05 00:42:48 +00:00
Jay Berkenbilt
8d7bb3ff50 add methods for getting encryption data
git-svn-id: svn+q:///qpdf/trunk@733 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-09-27 20:05:38 +00:00
Jay Berkenbilt
fe6771e0e5 add many new tests to exercise C api
git-svn-id: svn+q:///qpdf/trunk@727 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-09-27 16:01:45 +00:00
Jay Berkenbilt
84ec83e925 basic implementation of C API
git-svn-id: svn+q:///qpdf/trunk@725 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-09-27 14:39:04 +00:00
Jay Berkenbilt
d6f50e98c3 remove extraneous coverage case (another coverage case was in the same
block of code)


git-svn-id: svn+q:///qpdf/trunk@694 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-09-26 14:39:50 +00:00
Jay Berkenbilt
a1fbb4bd97 update test suite
git-svn-id: svn+q:///qpdf/trunk@675 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-05-03 20:03:21 +00:00
Jay Berkenbilt
599daddb47 decode streams on check, always exit abnormally when warnings are detected
git-svn-id: svn+q:///qpdf/trunk@660 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-03-08 19:00:19 +00:00
Jay Berkenbilt
35d72c822e better recovery for appended files with damaged cross-reference tables
git-svn-id: svn+q:///qpdf/trunk@648 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-02-21 02:31:08 +00:00
Jay Berkenbilt
337b900708 handle UTF-16BE fully
git-svn-id: svn+q:///qpdf/trunk@639 71b93d88-0707-0410-a8cf-f5a4172ac649
2008-11-23 18:49:13 +00:00
Jay Berkenbilt
9a0b88bf77 update release date to actual date
git-svn-id: svn+q:///qpdf/trunk@599 71b93d88-0707-0410-a8cf-f5a4172ac649
2008-04-29 12:55:25 +00:00