2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-11-14 00:34:03 +00:00
Commit Graph

276 Commits

Author SHA1 Message Date
Jay Berkenbilt
c76536dd9a Implement JSON v2 output 2022-05-08 13:45:20 -04:00
Jay Berkenbilt
1bc8abfdd3 Implement JSON v2 for Stream
Not fully exercised in this commit
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
16f4f94cd9 Prepare code for JSON v2
Update getJSON() methods and calls to them
2022-05-07 11:12:01 -04:00
Jay Berkenbilt
21d6e3231f Make use of the new Pipeline methods in some places 2022-05-03 18:31:23 -04:00
Jay Berkenbilt
59f3e09edf Make Pipeline::write take an unsigned char const* (API change) 2022-05-03 18:31:22 -04:00
Jay Berkenbilt
c365a26e9d Revert "Remove QPDFObjectHandle::replaceOrRemoveKey"
This reverts commit dc059560e7.

I changed my mind. There's no harm in leaving it deprecated for a
release cycle.
2022-04-30 14:15:07 -04:00
Jay Berkenbilt
dc059560e7 Remove QPDFObjectHandle::replaceOrRemoveKey
See ChangeLog for rationale for not deprecating it as originally
planned.
2022-04-30 13:39:45 -04:00
Jay Berkenbilt
4f24617e1e Code clean up: use range-style for loops wherever possible
Where not possible, use "auto" to get the iterator type.

Editorial note: I have avoid this change for a long time because of
not wanting to make gratuitous changes to version history, which can
obscure when certain changes were made, but with having recently
touched every single file to apply automatic code formatting and with
making several broad changes to the API, I decided it was time to take
the plunge and get rid of the older (pre-C++11) verbose iterator
syntax. The new code is just easier to read and understand, and in
many cases, it will be more effecient as fewer temporary copies are
being made.

m-holger, if you're reading, you can see that I've finally come
around. :-)
2022-04-30 13:27:18 -04:00
Jay Berkenbilt
7f023701dd Formatting: remove space in range-style for loops
Change .clang-format and commit automated changes from a fresh run of
format-code
2022-04-30 13:26:43 -04:00
Jay Berkenbilt
d8fdf632a9 Use replaceKeyAndGet in a few places in existing code 2022-04-29 20:28:02 -04:00
Jay Berkenbilt
e80fad86e9 Add new QPDFObjectHandle methods for more fluent programming 2022-04-29 20:09:10 -04:00
Jay Berkenbilt
4be2f36049 Deprecate replaceOrRemoveKey -- it's the same as replaceKey 2022-04-24 09:31:32 -04:00
Jay Berkenbilt
4925f0d18c Have dictionary/streams mutators take const& where possible 2022-04-24 09:05:50 -04:00
Jay Berkenbilt
68e721981a Add new QPDF::warn that takes most of QPDFExc's arguments 2022-04-23 18:25:43 -04:00
Jay Berkenbilt
75fe4f60c3 Use anonymous namespaces for file-private classes 2022-04-16 13:35:27 -04:00
Jay Berkenbilt
cdd0b4fb7d Use = default and = delete where possible in classes 2022-04-16 11:39:14 -04:00
Jay Berkenbilt
2a7d2b63c2 Make ABI-breaking changes that don't modify API at all
* Merge overloaded functions by adding default values
* Remove non-const methods that are identical to const methods
2022-04-16 10:41:46 -04:00
Jay Berkenbilt
a68703b07e Replace PointerHolder with std::shared_ptr in library sources only
(patrepl and cleanpatch are my own utilities)

patrepl s/PointerHolder/std::shared_ptr/g {include,libqpdf}/qpdf/*.hh
patrepl s/PointerHolder/std::shared_ptr/g libqpdf/*.cc
patrepl s/make_pointer_holder/std::make_shared/g libqpdf/*.cc
patrepl s/make_array_pointer_holder/QUtil::make_shared_array/g libqpdf/*.cc
patrepl s,qpdf/std::shared_ptr,qpdf/PointerHolder, **/*.cc **/*.hh
git restore include/qpdf/PointerHolder.hh
cleanpatch
./format-code
2022-04-09 17:33:29 -04:00
Jay Berkenbilt
77e889495f Update some code manually to get better formatting results
Add comments to force line breaks, parenthesize function arguments
that are contatenated strings, etc. -- these kinds of changes improve
clang-format's results and also cause emacs cc-mode to match
clang-format. After this type of change, most of the time, when
clang-format and emacs disagree, clang-format is better.
2022-04-05 14:56:19 -04:00
Jay Berkenbilt
12f1eb15ca Programmatically apply new formatting to code
Run this:

for i in  **/*.cc **/*.c **/*.h **/*.hh; do
  clang-format < $i >| $i.new && mv $i.new $i
done
2022-04-04 08:10:40 -04:00
m-holger
4ff837f099 Fix tests for Form XObjects
Remove test for type == /XObject in QPDFObjectHandle::isFormXObject
as type value is optional (as per spec 8.10.2).

Replace code to test for /Form in QPDFJob::shouldRemoveUnreferencedResources
with a call to isFormXObject.
2022-02-10 19:47:37 -05:00
Jay Berkenbilt
cb769c62e5 WHITESPACE ONLY -- expand tabs in source code
This comment expands all tabs using an 8-character tab-width. You
should ignore this commit when using git blame or use git blame -w.

In the early days, I used to use tabs where possible for indentation,
since emacs did this automatically. In recent years, I have switched
to only using spaces, which means qpdf source code has been a mixture
of spaces and tabs. I have avoided cleaning this up because of not
wanting gratuitous whitespaces change to cloud the output of git
blame, but I changed my mind after discussing with users who view qpdf
source code in editors/IDEs that have other tab widths by default and
in light of the fact that I am planning to start applying automatic
code formatting soon.
2022-02-08 11:51:15 -05:00
Jay Berkenbilt
c62e8e2b28 Update for clean compile with POINTERHOLDER_TRANSITION=2 2022-02-07 17:38:22 -05:00
m-holger
8371060340 Add method QPDFObjectHandle::getKeyIfDict 2022-02-06 11:21:15 -05:00
Jay Berkenbilt
7fb22740e1 Add operator ""_qpdf for creating QPDFObjectHandle literals 2022-02-05 11:29:25 -05:00
m-holger
e58b1174c7 Add new QPDFObjectHandle::getValueAs... accessors 2022-02-05 11:24:35 -05:00
Jay Berkenbilt
9044a24097 PointerHolder: deprecate getPointer() and getRefcount()
Use get() and use_count() instead. Add #define
NO_POINTERHOLDER_DEPRECATION to remove deprecation markers for these
only.

This commit also removes all deprecated PointerHolder API calls from
qpdf's code except in PointerHolder's test suite, which must continue
to test the deprecated APIs.
2022-02-04 13:12:37 -05:00
m-holger
8eca9d8fd9 Fix QPDFObjectHandle::isOrHasName
Ensure isOrHasName returns true if object is an array and the name is
present anywhere in the array.
2022-01-27 09:35:39 -06:00
m-holger
07db3200cb Remove some if statements and simplify some boolean expressions
Use QPDFObjectHandle::isNameAndEquals, isDictionaryOfType and
isStreamOfType.
2022-01-27 07:31:12 -06:00
m-holger
710d2e54f0 Allow testing for subtype without specifying type in isDictionaryOfType etc
Accept empty string as type parameter in
QPDFObjectHandle::isDictionaryOfType and isStreamOfType
to allow for dictionaries with optional type.
2022-01-27 07:31:12 -06:00
m-holger
8593b9fdf7 Add new convenience methods QPDFObjectHandle::isNameAndEquals, etc
Add methods isNameAndEquals, isDictionaryOfType, isStreamOfType
2022-01-22 08:10:28 -06:00
Jay Berkenbilt
3340dbe976 Use a specific error code for type warnings and clarify docs 2021-12-10 11:15:49 -05:00
Jay Berkenbilt
9b28933647 Check object ownership when adding
When adding a QPDFObjectHandle to an array or dictionary, if possible,
check if the new object belongs to the same QPDF. This makes it much
easier to find incorrect code than waiting for the situation to be
detected when the file is written.
2021-11-04 12:29:42 -04:00
Jay Berkenbilt
0b77f2cf26 Revert non-binary-compatible handleWarning change -- see TODO (ABI) 2021-03-04 15:59:46 -05:00
Jay Berkenbilt
f68e25c7f2 Don't use handleWarning, which is being reverted 2021-03-04 15:59:45 -05:00
Jay Berkenbilt
552303a94a Check for reserved after dereference 2021-03-04 15:08:36 -05:00
Jay Berkenbilt
d7ffdfa994 Add optional conflict detection to mergeResources
Also improve behavior around direct vs. indirect resources.
2021-03-04 15:08:36 -05:00
Jay Berkenbilt
a15ec6967d Enhancements to ParserCallbacks 2021-03-03 17:05:49 -05:00
Jay Berkenbilt
a4d6589ff2 Have QPDFObjectHandle notice when replaceObject was called
This results in a performance penalty of 1% to 2% when replaceObject
and swapObjects are never called and a somewhat larger penalty if they
are called, but it's worth it to avoid very confusing behavior as
discussed in depth in qpdf#507.
2021-02-25 07:32:46 -05:00
Jay Berkenbilt
ec6719fd25 Always call dereference() before querying obj pointer 2021-02-25 07:31:26 -05:00
Jay Berkenbilt
7b3cbacf5d Change from QPDF{Array,Dict}Items to aitems() and ditems() 2021-02-22 11:05:39 -05:00
Jay Berkenbilt
92fbc6fdf5 QPDFObjectHandle::copyStream 2021-02-21 06:36:30 -05:00
Jay Berkenbilt
901f1a788c Enhance QPDFMatrix API 2021-02-21 06:36:30 -05:00
Jay Berkenbilt
05eb5826d8 Fix isPagesObject and isPageObject
There are lots of things with /Kids that are not pages. Repair the
pages tree, then do a reliable check.
2021-02-20 19:42:41 -05:00
Jay Berkenbilt
35dd11f356 Allow --rotate=0 2021-02-20 16:29:34 -05:00
Jay Berkenbilt
a773f4c71d Add QPDFObjectHandle::parse for strings with context 2021-02-15 11:33:03 -05:00
Jay Berkenbilt
efbb21673c Add functional versions of QPDFObjectHandle::replaceStreamData
Also fix a bug in checking consistency of length for stream data
providers. Length should not be checked or recorded if the provider
says it failed to generate the data.
2021-02-14 14:42:24 -05:00
Jay Berkenbilt
07f40bd254 QUtil::double_to_string: trim trailing zeroes with option to disable 2021-02-13 02:30:00 -05:00
Jay Berkenbilt
4ae93a73c5 Improve memory safety of dict/array iterators 2021-01-31 07:16:03 -05:00
Jay Berkenbilt
de0b11fc47 Add C++ iterator API around array and dictionary objects 2021-01-30 15:15:23 -05:00
Jay Berkenbilt
35e7859bc7 Make QPDFObjectHandle::is* return false for uninitialized objects 2021-01-29 15:46:54 -05:00
Jay Berkenbilt
6226b69dba Add warn() to QPDF's public API 2021-01-16 18:41:53 -05:00
Jay Berkenbilt
3be58f49e5 Make more QPDFPageObjectHelper methods work with form XObject 2021-01-02 14:08:53 -05:00
Jay Berkenbilt
bedf35d6a5 Bug fix: avoid extraneous pipeline finish calls with multiple contents
Avoid calling finish() multiple times on the pipeline passed to
pipeContentStreams. This commit also fixes a bug in which qpdf was not
exiting with the proper exit status if warnings found while splitting
pages; this was exposed by a test case that changed.
2021-01-02 14:08:17 -05:00
Jay Berkenbilt
a139d2b36d Add several methods for working with form XObjects (fixes #436)
Make some more methods in QPDFPageObjectHelper work with form
XObjects, provide forEach methods to walk through nested form
XObjects, possibly recursively. This should make it easier to work
with form XObjects from user code.
2021-01-02 12:29:31 -05:00
Jay Berkenbilt
63ea46193d QPDFPageObjectHelper: getPageImages -> getImages 2021-01-02 11:33:36 -05:00
Jay Berkenbilt
e7a8554563 QPDFPageObjectHelper::getPageImages: support form XObjects 2021-01-02 11:33:36 -05:00
Jay Berkenbilt
1562d34c09 Add QPDFObjectHandle::isFormXObject 2021-01-01 07:36:10 -05:00
Jay Berkenbilt
12ecd2019a Add QPDFObjectHandle::setFilterOnWrite 2020-12-28 12:58:19 -05:00
Jay Berkenbilt
cc8895078a Add QPDFObjectHandle::makeDirect(bool allow_streams) 2020-12-26 08:48:18 -05:00
Jay Berkenbilt
bd79138c84 Treat direct page as runtime rather than logic error (fuzz issue 27393) 2020-11-11 09:50:43 -05:00
Jay Berkenbilt
b30deaeeab Avoid merging adjacent tokens when concatenating contents (fixes #444) 2020-10-23 08:00:04 -04:00
Jay Berkenbilt
92d3cbecd4 Fix warnings reported by -Wshadow=local (fixes #431) 2020-04-16 12:41:43 -04:00
Jay Berkenbilt
893d38b87e Allow propagation of errors and retry through StreamDataProvider
StreamDataProvider::provideStreamData now has a rich enough API for it
to effectively proxy to pipeStreamData.
2020-04-05 20:07:13 -04:00
Jay Berkenbilt
6a4117add9 Avoid potential segfault in warning methods 2020-04-03 21:39:20 -04:00
Jay Berkenbilt
38afdcea7b Add QPDFObjectHandle::unsafeShallowCopy 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
89f19b7099 Performance: remove Members indirection for QPDFObjectHandle 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
278710fbe8 Refactor QPDFPageObjectHelper::removeUnreferencedResources()
Refactor removeUnreferencedResources to prepare for filtering form
XObjects.
2020-03-31 17:39:20 -04:00
Masamichi Hosoda
5a842792b6 Parse Contents in signature dictionary without encryption
Various PDF digital signing tools do not encrypt /Contents value in
signature dictionary. Adobe Acrobat Reader DC can handle a PDF with
the /Contents value not encrypted.

Write Contents in signature dictionary without encryption

Tests ensure that string /Contents are not handled specially when not
found in sig dicts.
2019-10-22 16:20:21 -04:00
Masamichi Hosoda
cdc46d78f4 Add QPDFObject::getParsedOffset() 2019-10-22 16:19:06 -04:00
Jay Berkenbilt
685250d7d6 Correct reversed Rectangle coordinates (fixes #363) 2019-09-19 21:25:34 -04:00
Jay Berkenbilt
8b1e307741 Warn for duplicated dictionary keys (fixes #345) 2019-09-19 20:22:34 -04:00
Jay Berkenbilt
94e86e2528 Fix fuzz issue 16301 2019-08-25 22:52:25 -04:00
Jay Berkenbilt
3f1ab64066 Pass offset and length to ParserCallbacks::handleObject 2019-08-22 22:54:29 -04:00
Jay Berkenbilt
4b2e72c4cd Test for direct, rather than resolved nulls in parser
Just because we know an indirect reference is null, doesn't mean we
shouldn't keep it indirect.
2019-08-22 17:55:16 -04:00
Jay Berkenbilt
225cd9dac2 Protect against coding error of re-entrant parsing 2019-08-22 17:55:16 -04:00
Jay Berkenbilt
42d396f1dd Handle invalid name tokens symmetrically for PDF < 1.2 (fixes #332) 2019-08-19 19:48:27 -04:00
Jay Berkenbilt
5187a3ec85 Shallow copy arrays without removing sparseness 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
bf7c6a8070 Use SparseOHArray in parsing 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
a89d8a0677 Refactor QPDF_Array in preparation for using SparseOHArray 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
e83f3308fb SparseOHArray 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
4db1de97ce Convert some cases of logic_error to runtime_error
There were a few cases that could be caused by invalid input rather
than bugs in the code which were throwing logic_error instead of
runtime_error.
2019-06-25 12:43:06 -04:00
Jay Berkenbilt
848351f1fc Add missing #include <cstring> 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
b07ad6794e Fix bugs found by fuzz tests
* Several assertions in linearization were not always true; change
  them to run time errors
* Handle a few cases of uninitialized objects
* Handle pages with no contents when doing form operations
* Handle invalid page tree nodes when traversing pages
2019-06-21 17:56:24 -04:00
Jay Berkenbilt
d71f05ca07 Fix sign and conversion warnings (major)
This makes all integer type conversions that have potential data loss
explicit with calls that do range checks and raise an exception. After
this commit, qpdf builds with no warnings when -Wsign-conversion
-Wconversion is used with gcc or clang or when -W3 -Wd4800 is used
with MSVC. This significantly reduces the likelihood of potential
crashes from bogus integer values.

There are some parts of the code that take int when they should take
size_t or an offset. Such places would make qpdf not support files
with more than 2^31 of something that usually wouldn't be so large. In
the event that such a file shows up and is valid, at least qpdf would
raise an error in the right spot so the issue could be legitimately
addressed rather than failing in some weird way because of a silent
overflow condition.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
da30764bce Change QPDFObjectHandle::pipeStreamData's encode_flags type
Change from unsigned long to int since we pass enumerated type values
to this field.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
3608afd5c5 Add new integer accessors to QPDFObjectHandle 2019-06-21 13:17:21 -04:00
Jay Berkenbilt
cf469d7890 Give up reading objects with too many consecutive errors 2019-06-15 08:52:19 -04:00
Jay Berkenbilt
4ccb29912a Tighten isPageObject (fixes #310) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
eb49e07c0a Make inline image token exactly contain the image data
Do not include the trailing EI, and handle cases where EI is not
preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31 20:28:44 -05:00
Jay Berkenbilt
ec9e310c9e Refactor QPDFTokenizer's inline image handling
Add a version of expectInlineImage that takes an input source and
searches for EI. This is in preparation for improving the way EI is
found. This commit just refactors the code without changing the
functionality and adds tests to make sure the old and new code behave
identically.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
31372edce0 Inline image token value ends with EI, not delimiter
The inline image token erroneously included the delimiter that
followed EI. The ObjectHandle created from it was correct.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
8cb245739c Add QPDFObjectHandle::getUniqueResourceName 2019-01-27 07:50:30 -05:00
Jay Berkenbilt
009767d97a Handle inheritable page attributes
Add getAttribute for handling inheritable page attributes, and fix
getPageImages and annotation flattening code to use it.
2019-01-25 22:30:05 -05:00
Jay Berkenbilt
f78ea057ca Switch annotation flattening to use the form xobjects
Instead of directly putting the contents of the annotation appearance
streams into the page's content stream, add commands to render the
form xobjects directly. This is a more robust way to do it than the
original solution as it works properly with patterns and avoids
problems with resource name clashes between the pages and the form
xobjects.
2019-01-02 21:49:47 -05:00
Jay Berkenbilt
95d6b17a89 Add QPDFObjectHandle::mergeDictionary() 2019-01-01 08:12:56 -05:00
Jay Berkenbilt
5059ec0d35 Add Matrix class under QPDFObjectHandle 2018-12-31 23:02:43 -05:00
Jay Berkenbilt
30a0c070e4 Add QPDFObjectHandle::getJSON() 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
077d3d4512 Add QPDFObjectHandle::wrapInArray()
Wrap an object in an array if it is not already an array.
2018-12-18 16:45:48 -05:00
Jay Berkenbilt
38c9ed23c3 Treat content stream parsing errors as an error, not a warning
If parsing content streams is treated as a warning, there is no way
for a caller to know if a parsing operation has failed. This is very
dangerous and will likely result in data loss when token filters are
parser callbacks are in use.
2018-06-22 10:44:08 -04:00
Jay Berkenbilt
ddd78c1b7f Fix QPDFObjectHandle::shallowCopy
It's not really a shallow copy. It just doesn't cross indirect object
boundaries. The old implementation had a bug that would cause multiple
shallow copies of the same object to share memory, which was not the
intention.
2018-06-21 20:34:45 -04:00
Jay Berkenbilt
952a665a4e Better support for creating Unicode strings 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
4cded10821 Add QPDFObjectHandle::Rectangle type
Provide a convenient way of accessing rectangles.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt
b4d6cf6836 Limit depth of nesting in direct objects (fixes #202)
This fixes CVE-2018-9918.
2018-04-15 16:11:22 -04:00
Jay Berkenbilt
e4e2e26d99 Properly handle pages with no contents (fixes #194)
Remove calls to assertPageObject(). All cases in the library that
called assertPageObject() work fine if you don't call
assertPageObject() because nothing assumes anything that was being
checked by that call. Removing the calls enables more files to be
successfully processed.
2018-03-06 11:34:07 -05:00
Jay Berkenbilt
d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
21b7481b0e Push members of QPDFObjectHandle into a Members object
As in other cases, this is to enable adding new member variables in
the future without breaking ABI compatibility.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
e410b0fe0d Simplify TokenFilter interface
Expose Pl_QPDFTokenizer, and have it do more of the work of managing
the token filter's pipeline.
2018-02-18 21:05:47 -05:00
Jay Berkenbilt
5708b5d0aa Add additional interface for filtering page contents 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
05ff619b09 Remove redundant method
Remove a redundant method that was equal to another one with
additional arguments. This breaks binary compatibility, but there are
other ABI breaking changes in the upcoming release, so now is the time
to do it.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
55ee55394c Use inline image token in content parser 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
d31a7b76e7 Improve message for stream decoding error
Tweak the message so that we inform the user that we are mitigating
data loss.
2017-09-12 16:03:48 -04:00
Jay Berkenbilt
728dc9e6d8 Fix error caught by clang 2017-08-26 21:51:17 -04:00
Jay Berkenbilt
ad527a64f9 Parse iteratively to avoid stack overflow (fixes #146) 2017-08-25 21:56:45 -04:00
Jay Berkenbilt
e452d9dca6 Spell check 2017-08-22 14:22:20 -04:00
Jay Berkenbilt
9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt
b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
7f8892525f Add precheck streams capability
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
2017-07-27 23:42:27 -04:00
Jay Berkenbilt
40f00122b8 Convert object parsing errors to warnings
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
2017-07-27 18:20:31 -04:00
Jay Berkenbilt
12db09898e Don't interpret word tokens in content streams (fixes #82) 2017-07-26 06:24:07 -04:00
Jay Berkenbilt
afe0242b26 Handle object ID 0 (fixes #99)
This is CVE-2017-9208.

The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
603f222365 Fix infinite loop while reporting an error (fixes #101)
This is CVE-2017-9210.

The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt
c729e07d55 Avoid resolving arguments to R
When checking two objects preceding R while parsing, ensure that the
objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R
from causing an infinite loop in object resolution.
2015-02-21 17:51:08 -05:00
Jay Berkenbilt
caab1b0e16 Handle pages with no /Contents from getPageContents()
The spec allows /Contents to be omitted for pages that are blank, but
QPDFObjectHandle::getPageContents() was throwing an exception in this
case.
2014-11-14 13:43:34 -05:00
Jay Berkenbilt
ac9c1f0d56 Security: replace operator[] with at
For std::string and std::vector, replace operator[] with at.  This was
done using an automated process.  See README.hardening for details.
2013-10-18 10:45:14 -04:00
Jay Berkenbilt
212812d837 Fix errors reported by Coverity
Thanks to Jiri Popelka from Red Hat for sending the output of a
Coverity run over qpdf.
2013-07-07 15:36:51 -04:00
Jay Berkenbilt
5039da0b91 Add QPDFObjectHandle::getObjGen()
This is safer than getObjectID() and getGeneration() for many uses.
2013-06-14 14:58:09 -04:00
Jay Berkenbilt
2d02b3cc3d Add explicit int to double cast 2013-04-04 14:13:31 -04:00
Jay Berkenbilt
29f5830325 Fix getTypeCode and getTypeName work for indirect objects
Remove const qualifier from getTypeCode and get getTypeName methods of
QPDFObjectHandle, make them work properly for indirect objects, and
exercise them much better in the test suite.
2013-03-05 13:35:46 -05:00
Jay Berkenbilt
119f2a4b68 Add method to terminate content stream parsing 2013-03-05 13:35:46 -05:00
Jay Berkenbilt
30027481f7 Remove all old-style casts from C++ code 2013-03-04 16:45:16 -05:00
Jay Berkenbilt
a5d8783f67 Improve qpdf --check
Fix exit status for case of errors without warnings, continue after
errors when possible, add test case for parsing a file with content
stream errors on some but not all pages.
2013-01-25 11:08:50 -05:00
Jay Berkenbilt
bfda717749 Cosmetic changes to be closer to Adobe terminology
Change object type Keyword to Operator, and place the order of the
object types in object_type_e in the same order as they are mentioned
in the PDF specification.

Note that this change only breaks backward compatibility with code
that has not yet been released.
2013-01-23 09:38:05 -05:00
Jay Berkenbilt
913eb5ac35 Add getTypeCode() and getTypeName()
Add virtual methods to QPDFObject, wrappers to QPDFObjectHandle, and
implementations to all the QPDF_Object types.
2013-01-22 10:01:45 -05:00
Jay Berkenbilt
f81152311e Add QPDFObjectHandle::parseContentStream method
This method allows parsing of the PDF objects in a content stream or
array of content streams.
2013-01-20 15:35:39 -05:00
Jay Berkenbilt
1d88955fa6 Added new QPDFObjectHandle types Keyword and InlineImage
These object types are to facilitate content stream parsing.
2013-01-20 15:35:39 -05:00
Tobias Hoffmann
9c00874e77 added QPDFObjectHandle::replaceStreamData(std::string data). 2012-07-25 03:02:46 +02:00
Jay Berkenbilt
6bbea4baa0 Implement QPDFObjectHandle::parse
Move object parsing code from QPDF to QPDFObjectHandle and
parameterize the parts of it that are specific to a QPDF object.
Provide a version that can't handle indirect objects and that can be
called on an arbitrary string.

A side effect of this change is that the offset used when reporting
invalid stream length has changed, but since the new value seems like
a better value than the old one, the test suite has been updated
rather than making the code backward compatible.  This only effects
the offset reported for invalid streams that lack /Length or have an
invalid /Length key.

Updated some test code and exmaples to use QPDFObjectHandle::parse.

Supporting changes include adding a BufferInputSource constructor that
takes a string.
2012-07-21 09:06:10 -04:00
Jay Berkenbilt
b501251291 qpdf: push inherited attributes to page when showing images
from qpdf command-line tool
2012-07-15 16:22:28 -04:00
Jay Berkenbilt
e7b8f297ba Support copying objects from another QPDF object
This includes QPDF::copyForeignObject and supporting foreign objects
as arguments to addPage*.
2012-07-11 15:54:33 -04:00
Jay Berkenbilt
8a217eb3a2 Add concept of reserved objects
QPDFObjectHandle::{new,is,assert}Reserved, QPDF::replaceReserved
provide a mechanism to add objects to a PDF file when there are
circular references.  This is a prerequisite to copying objects from
one PDF to another.
2012-07-10 23:34:32 -04:00
Tobias Hoffmann
8720446b23 Added assertNumber and assertScalar to QPDFObjectHandle 2012-07-07 18:55:08 -04:00
Tobias Hoffmann
a8266ccb0e Added public assert{Type} methods to QPDFObjectHandle 2012-07-07 18:53:38 -04:00
Jay Berkenbilt
e2dedde4bd Don't require stream data provider to know length in advance
Breaking API change: length parameter has disappeared from the
StreamDataProvider version of QPDFObjectHandle::replaceStreamData
since it is no longer necessary to compute it in advance.  This
breaking change is justified by the fact that removing the length
parameter provides the caller an opportunity to simplify the calling
code.
2012-07-07 17:33:45 -04:00
Jay Berkenbilt
5f59c32f87 Add a few minor enhancements to recent work
Test coverage case for new newStream method
Expose decimal_places argument for double-based newReal

All enhancements suggested by Tobias.
2012-06-27 10:43:27 -04:00
Tobias Hoffmann
43c404b45a Add QPDFObjectHandle::newStream(QPDF *, std::string const&)
This makes the code simpler than having to create a buffer of a fixed
size and copy the string to it.
2012-06-27 10:19:57 -04:00