2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-11-15 09:14:03 +00:00
Commit Graph

1696 Commits

Author SHA1 Message Date
Jay Berkenbilt
3b4b9efd21 Fix autogeneration of job.sums 2022-02-22 08:10:05 -05:00
Jay Berkenbilt
31b45b0fd4 Fix logic error with Tf when generating appearances (fixes #655) 2022-02-18 13:46:35 -05:00
Jay Berkenbilt
3e2109ab37 Remove special case for 0xad for 10.6.2. 2022-02-16 06:52:05 -05:00
Jay Berkenbilt
e810fe678a Fix asymmetry between newUnicodeString and getUTF8Value 2022-02-15 19:22:35 -05:00
Jay Berkenbilt
a478cbb6dc Silently/transparently recognize UTF-16LE as UTF-16 (fixes #649)
The PDF spec only allows UTF-16BE, but most readers seem to accept
UTF-16LE as well, so now qpdf does too.
2022-02-15 16:13:12 -05:00
Jay Berkenbilt
fbd3e56da7 Ignore -- at the top level arg parser (fixes #652)
This was unintended behavior that was added back for backward
compatibility. It is intentionally undocumented.
2022-02-15 16:13:12 -05:00
Jay Berkenbilt
1065bbb016 Handle odd PDFDoc codepoints in UTF-8 during transcoding (fixes #650)
There are codepoints in PDFDoc that are not valid UTF-8 but map to
valid UTF-8. We were handling those correctly with bidirectional
mapping.

However, if those same code points appeared in UTF-8, where they have
no meaning, they were left as fixed points when converting to PDFDoc,
where they do have meaning. This change recognizes them as errors.
2022-02-15 08:32:38 -05:00
m-holger
4ff837f099 Fix tests for Form XObjects
Remove test for type == /XObject in QPDFObjectHandle::isFormXObject
as type value is optional (as per spec 8.10.2).

Replace code to test for /Form in QPDFJob::shouldRemoveUnreferencedResources
with a call to isFormXObject.
2022-02-10 19:47:37 -05:00
Jay Berkenbilt
235c89e037 Fix one more PDF doc encoding error for 10.6 release (fixes #637) 2022-02-09 05:47:58 -05:00
Jay Berkenbilt
d501e1c0d4 Only update output version from files used as input
If we're opening a PDF file to copy its encryption information or
attachments, its version doesn't need to influence the output version.
2022-02-08 13:49:22 -05:00
Jay Berkenbilt
f91b21c7d4 Preserve input PDF version on pages/split-pages (fixes #610) 2022-02-08 12:34:14 -05:00
Jay Berkenbilt
cfd5147d92 Add QPDF::getVersionAsPDFVersion 2022-02-08 12:34:14 -05:00
Jay Berkenbilt
8082af09be Add PDFVersion class 2022-02-08 12:34:14 -05:00
Jay Berkenbilt
cb769c62e5 WHITESPACE ONLY -- expand tabs in source code
This comment expands all tabs using an 8-character tab-width. You
should ignore this commit when using git blame or use git blame -w.

In the early days, I used to use tabs where possible for indentation,
since emacs did this automatically. In recent years, I have switched
to only using spaces, which means qpdf source code has been a mixture
of spaces and tabs. I have avoided cleaning this up because of not
wanting gratuitous whitespaces change to cloud the output of git
blame, but I changed my mind after discussing with users who view qpdf
source code in editors/IDEs that have other tab widths by default and
in light of the fact that I am planning to start applying automatic
code formatting soon.
2022-02-08 11:51:15 -05:00
Jay Berkenbilt
c62e8e2b28 Update for clean compile with POINTERHOLDER_TRANSITION=2 2022-02-07 17:38:22 -05:00
Jay Berkenbilt
3f22bea084 Use make_array_pointer_holder
This will be able to be replaced with QUtil::make_shared_array
2022-02-07 17:38:22 -05:00
Jay Berkenbilt
40f1946df8 Replace PointerHolder arrays with shared_ptr arrays where possible
Replace PointerHolder arrays wherever it can be done without breaking ABI.
2022-02-07 17:38:22 -05:00
Jay Berkenbilt
df2f5c6a36 Add QUtil::make_shared_array to help with PointerHolder transition 2022-02-07 14:08:46 -05:00
Jay Berkenbilt
cfaae47dc6 Add getBufferSharedPointer() to Pl_Buffer and QPDFWriter 2022-02-07 12:53:28 -05:00
m-holger
5901fcad4c C-API expose QPDFObjectHandle::getKeyIfDict 2022-02-06 11:21:15 -05:00
m-holger
8371060340 Add method QPDFObjectHandle::getKeyIfDict 2022-02-06 11:21:15 -05:00
m-holger
2ed5f49a79 C-API expose QPDFObjectHandle::getValueAs... accessors 2022-02-05 19:40:30 -05:00
Jay Berkenbilt
af3f74de8c Stop using std::iterator (fixes #618)
Create the typedefs directly in iterators rather than deriving from
the deprecated std::iterator class.
2022-02-05 11:29:25 -05:00
Jay Berkenbilt
7fb22740e1 Add operator ""_qpdf for creating QPDFObjectHandle literals 2022-02-05 11:29:25 -05:00
Jay Berkenbilt
b48a0ff0e8 Add qpdf_empty_pdf to C API 2022-02-05 11:29:25 -05:00
Jay Berkenbilt
8cf7f2bfb5 API contract: qpdf_get_qpdf_version() returns a static 2022-02-05 11:24:56 -05:00
Jay Berkenbilt
5f3f78822b Improve use of std::unique_ptr
* Use unique_ptr in place of shared_ptr in some cases
* unique_ptr for arrays does not require a custom deleter
* use std::make_unique (c++14) where possible
2022-02-05 11:24:56 -05:00
m-holger
e58b1174c7 Add new QPDFObjectHandle::getValueAs... accessors 2022-02-05 11:24:35 -05:00
Jay Berkenbilt
cfaa2de804 Update copyright for 2022 2022-02-04 16:36:22 -05:00
Jay Berkenbilt
2229e37e88 Add a blank line after the first header included in each source 2022-02-04 16:31:31 -05:00
Jay Berkenbilt
8eab616d62 Add qpdf version macros to qpdf/DLL.h 2022-02-04 13:41:01 -05:00
Jay Berkenbilt
abc300f05c Replace containers of PointerHolder with containers of std::shared_ptr
None of these are in the public API.
2022-02-04 13:12:37 -05:00
Jay Berkenbilt
f0c2e0ef1e JSON: use std::shared_ptr internally 2022-02-04 13:12:37 -05:00
Jay Berkenbilt
9044a24097 PointerHolder: deprecate getPointer() and getRefcount()
Use get() and use_count() instead. Add #define
NO_POINTERHOLDER_DEPRECATION to remove deprecation markers for these
only.

This commit also removes all deprecated PointerHolder API calls from
qpdf's code except in PointerHolder's test suite, which must continue
to test the deprecated APIs.
2022-02-04 13:12:37 -05:00
m-holger
95e7d36b7a C-API add two binary UTF8 funtions
add qpdf_oh_new_binary_unicode_string and qpdf_oh_get_binary_utf8_value
2022-02-04 13:10:51 -05:00
m-holger
1925ffd467 Fix --check-linearization of non-linearized files (fixes #615) 2022-02-04 06:52:38 -05:00
m-holger
4d507251fe Change QPDFExc type to unsupported for /Standard filter 2022-02-02 14:07:32 -06:00
Jay Berkenbilt
42bff9f458 QPDFJob: let initializeFromArgv just take argv, not argc
Let argv be a null-terminated array. There is already code that
assumes this, and it makes it easier to construct the arguments.
2022-02-01 13:50:58 -05:00
Jay Berkenbilt
b02d37bc0a Make QPDFArgParser accept const argv
This makes it much more convention to use the initializeFromArgv
functions since you can use string literals.
2022-02-01 13:50:58 -05:00
Jay Berkenbilt
bc4e2320e7 Add qpdfjob-c.h -- simple C API around parts of QPDFJob 2022-02-01 09:04:55 -05:00
Jay Berkenbilt
03e67a28fe Move QTC::TC for qpdf to QPDFJob
All the coverage cases that used to be in qpdf.cc are now in
QPDFJob*.cc. It doesn't really matter, but better to follow the
convention of starting with the class that includes the coverage call.
2022-02-01 09:04:55 -05:00
Jay Berkenbilt
b42f3e1d15 Move more code from qpdf.cc into QPDFJob 2022-02-01 09:04:55 -05:00
Jay Berkenbilt
cc5485dac1 QPDFJob: documentation 2022-02-01 09:04:55 -05:00
Jay Berkenbilt
5a7bb3474e generate_auto_job: generate overloaded config decls for optional
For optional parameter/choices, generate an overloaded config method
that takes no arguments. This makes it possible to convert from a bare
argument to one that takes an optional parameter without breaking
binary compatibility.
2022-02-01 09:04:55 -05:00
Jay Berkenbilt
5953116634 Clean up documentation and help around json options 2022-01-31 18:40:11 -05:00
Jay Berkenbilt
606420ab54 Tweak short text for job schema help 2022-01-31 18:26:03 -05:00
Jay Berkenbilt
21b9290785 QPDFJob json: make bare arguments expect the empty string
Changing from bool requiring true to string requiring the empty string
is more consistent with the CLI and makes it possible to add an
optional parameter or choices later without breaking compatibility.
2022-01-31 18:16:09 -05:00
Jay Berkenbilt
ea96330bb6 QPDFJob json: flatten json structure
Flatten everything to make it easier to map command-line flags to
json. The old structure was an illusion anyway because there was no
mechanism to enforce that things were in the right place. This also
helps with future flexibility.
2022-01-31 18:16:09 -05:00
Jay Berkenbilt
47f33cec25 QPDFJob: add test cases 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
e3506253f1 Add optional version to --json 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
b4fb9b4ec3 Remove outdated comments 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
caa00556cf Change filename or path to file in json and QPDFJob
Use "file" consistently for specifying a file path. We use "filename"
when adding attachments for a completely different purpose.
2022-01-31 15:57:45 -05:00
Jay Berkenbilt
1a3ed1ee85 job json: move deterministic-id into output options 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
81b6314cb5 QPDFJob: fix logic errors in handling arrays
The code was assuming everything was happening inside dictionaries.
Instead, make the dictionary key handler creatino explicit only when
iterating through dictionary keys.
2022-01-31 15:57:45 -05:00
Jay Berkenbilt
f99e0af49c QPDFJob: rename function that returns job schema 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
1355d95d08 QPDFJob: partial mode for initializeFromJson 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
cd30f626fe QPDFJob: remove from json a few things that only make sense from CLI 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
eeffc69d87 QPDFJob_json: implement handlers for pages 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
fa9676557e QDPFJob: incorporate change to JSONHandler for array start function 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
3b60224bae JSONHandler: pass JSON object to array start function 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
b74e7989c3 QPDFJob_json: implement handlers except pages 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
e01bbccb40 QPDFJob: incorporate change to JSONHandler for dict start function 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
ce3406e93f JSONHandler: pass JSON object to dict start function
If some keys depend on others, we have to check up front since there
is no control of what order key handlers will be called. Anyway, keys
are unordered in json, so we don't want to depend on ordering.
2022-01-31 15:57:45 -05:00
Jay Berkenbilt
11a86e444d QPDFJob: autogenerate json init and declarations
Now still have to go through and implement the handlers.
2022-01-31 15:57:45 -05:00
Jay Berkenbilt
842a9d928e QPDFJob_json: add code to register handlers 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
967a2b9f28 Fix typo in error message 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
a7b0aec2cf Fix false compiler warning in debug mode 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
28278e27ea Keep JSONHandler and QPDFArgParser private
Since the functionality of argument parsing has moved into QPDFJob,
these classes no longer need to be public. Their methods still have to
be in the library's binary interface so they can be tested in libtests.
2022-01-31 15:57:45 -05:00
Jay Berkenbilt
0f05cae66a QPDFJob: generate json decl and init file skeletons 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
8a9100f674 QPDFJob: add checkConfiguration to Config 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
0c8e9e5912 QPDFJob: prepare for automatically generated json handlers 2022-01-31 15:57:45 -05:00
Jay Berkenbilt
7eeaf58bb7 More doc tweaks 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
7097f29019 More editorial changes from m-holger + spell check 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
0e909bab8e Improve top-level help information 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
0364024781 Use QPDFUsage exception for cli, json, and QPDFJob errors 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
f3d68aa5a0 Incorporate editorial changes from m-holger 2022-01-30 13:11:03 -05:00
m-holger
7dd5f31230 Fix typos in manual
Fix typos in cli.rst
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
c62ab2ee9f QPDFJob: use pointers instead of references for Config
Why? The main methods that create them return smart pointers so that
users can initialize them when needed, which you can't do with
references. Returning pointers instead of references makes for a more
uniform interface.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
03f3369f35 QPDFJob: use manually named end functions for Config classes
Use named functions rather than just end() for clarity.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
9013b7ca91 QPDFJob: move placeholder json to a separate source file 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
edef2cd330 QPDFJob: make remaining members private 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
f2409f4fca Minor cleanup 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
01969c78a8 QPDFJob: move private members into Members 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
cf6c56a463 QPDFJob: use config API in place-holder json 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
2c7b583b3a QPDFJob: move input/output handling into config 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1258054543 QPDFJob: eliminate most access to QPDFJob members from ArgParser
All that's left now is input and output handling.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
901e3e4fbf QPDFArgParser: remove unused copyFromOtherTable
This was used, but it no longer is, so let's not keep the extra
complexity around.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
700dfa40d3 QPDFJob: convert encryption handlers 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
b5d41b16b8 QPDFJob: convert under/overlay and rotate 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1cc532dc91 QPDFJob: move some helpers from ArgParser to QPDFJob 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
95d127641c QPDFJob: move more top-level trivial handlers into config 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
41c5af8f26 QPDFJob: convert pages 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
9373881cca Add QPDFJob::ConfigError exception 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
0a354af02c QPDFJob: convert AddAttachment handlers 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
bf255ccc89 QPDFJob: convert password in two tables 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
21c897aad0 QPDFJob: convert a flag in other than the main table 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
f60526aff9 QPDFJob: start changing generation for trivial config handlers 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
b4b0df0df9 QPDFJob: convert trivial functions to config API 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
79187e585a QPDFJob: begin configuration API with verbose 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
160e869d1e Mark trivial arg functions 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
558f043d91 QPDFJob: TRUE -> true 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
fcdbc8a102 Move doFinalChecks to QPDFJob::checkConfiguration 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
c4e56fa5f4 QPDFJob: make createsOutput callable before run() 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
564dc03607 QPDFJob: start real API
Create QPDFJob_options.cc to hold API implementation functions.
Reorganize a little in preparation for moving public member variables
private and creating the real QPDFJob API that will be used by callers
as well as the argv/json initialization methods.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1d099ab743 QPDFJob: placeholder for initializeFromJson 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1c8d53465f Incorporate job schema generation into generate_auto_job 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
b9cd693a5b QPDFJob: allocate QPDFArgParser on stack
The previous commits have removed all references to memory from
QPDFArgParser from QPDFJob. This commit removes the constraint that
QPDFArgParser remain in scope. This is a prerequisite to allowing JSON
as an alternative way to initialize QPDFJob and to initialize it
directly using a public API.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
d526d4c17f QPDFJob: convert Under/Overlay to use shared pointers 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
88891a75a2 QPDFJob: convert Under/Overlay ranges to strings 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
e48bfce930 QPDFJob: convert PageSpec to used shared pointer 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
e4905983d2 QPDFJob: convert outfilename to shared pointer 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
e5edfc786f QPDFJob: convert infilename to shared pointer 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
ee7824cf28 QPDFJob: convert encryption_file args to shared pointers 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
021db6f226 QPDFJob: convert password to shared pointer 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1a8c2eb93b QPDFJob: use std::shared_ptr over PointerHolder where possible
Also fix QPDFArgParser
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
76c4f78b5c Add QUtil::make_shared_cstr
Replace most of the calls to QUtil::copy_string with this instead.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
67f9d0b7d5 cli.rst: remove () from end of short help
This is used to generate a schema for the job json, which can't
contain `)"` because it breaks the R"(...)" syntax in C++. While C++
accepts R"anything(...)anything" to avoid this, as of this writing,
MSVC 2019 doesn't understand that. For now, just avoid it by removing
parentheses from the end of short help.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
8dea480c9f Allow optional fields in json "schema" checks 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
ec85e56c3f Add missing help topic for inspection 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1db0a7ffce JSONHandler: rework dictionary and array handlers 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
acf8d18b6e Editorial changes to cli.rst 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
cf8405d91e Fix json schema for objects to include dictionary key 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
2e58541493 Use JSON::parse to initialize schema for json mode 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
37105710ee Implement JSONHandler for recursively processing JSON 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
a6df6fdaf7 CLI doc: use tables where helpful 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
e8e8f6f43c Add JSON::parse 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
b9af421ef7 Add missing \f support for JSON string encoder 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
aa0a379b37 Add JSON::isDictionary and JSON::isArray 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
5c5e5ca29b Document how to add a command-line argument 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
c8729398dd Generate help content from manual
This is a massive rewrite of the help text and cli.rst section of the
manual. All command-line flags now have their own help and are
specifically index. qpdf --help is completely redone.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
b4bd124be4 QPDFArgParser: support adding/printing help information 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
5303130cf9 Fix comment on duplicated top-level json keys 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
53ba65eb59 QPDFArgParser: handle optional choices including help
Handle optional choices in addition to required choices. Refactor the
way help options are added to completion to make it work with optional
help choices.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
a301cc5373 Minor code cleanup 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
3ab25d595b Fix doc typos caught by m-holger -- thanks 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
4577df4b5d QPDFJob increment: generate option table initialization 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
f1d805badc Add QPDFArgParser::copyFromOtherTable 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
c3e9b64e7f QPDFJob increment: generate handler declarations 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
6e70d99b58 QPDFJob increment: generate choices variables in init 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
cb684ec4d3 QPDFJob increment: generate table names 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
f8eee83515 Expose QPDFArgParser::usage 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
8dcf6da259 QPDFJob: remove non-check from doFinalChecks 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
c216854607 Add basic framework for QPDFJob code generation 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
bd89aac360 QPDFJob increment: move arg parsing into QPDFJob
Move ArgParser from qpdf.cc into QPDFJob.cc. It still works with
millions of public member variables, but now qpdf.cc is minimal and
just calls stable library functions.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
12396702af QPDFJob: reorder functions, no other changes 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
2394dd8519 QPDFJob increment: static functions to member functions
Convert remaining static functions that take QPDFJob& as a parameter
to member functions. Utility functions that don't take QPDFJob& remain
static functions and can probably just stay that way since the keep
extra complexity out of QPDFJob.hh.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
e2975b9ed0 QPDFJob: de-templatize do_process and do_process_once 2022-01-30 13:11:03 -05:00
Jay Berkenbilt
2f631997f2 QPDFJob increment: remove std::cout, std::cerr, whoami
Remove remaining temporary duplication of hard-coded values and direct
access to std::cout, std::cerr, and whoami in favor of parameters in
QPDFJob. This moves a few more static methods into QPDFJob member
functions.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
1ddf5b4b4b QPDFJob increment: get rid of exit, handle verbose
Remove all calls to exit() from QPDFJob. Handle code that runs in
verbose mode to enable it to make use of output streams and message
prefix (whoami) from QPDFJob. This removes temporarily duplicated exit
code logic and most access to whoami/std::cout outside of QPDFJob
proper.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
0910e767ad QPDFJob increment: basic QPDFJob structure
Move most of the methods called from qpdf.cc after argument parsing
into QPDFJob. In this increment, enough QPDFJob API has been added to
handle the branch of QPDFJob::run() that creates output with an
appropriate division between qpdf.cc and QPDFJob.

There are temporary bits of code to enable everything to compile and
pass the test suite, including some duplication and hard-coded values.
2022-01-30 13:11:03 -05:00
Jay Berkenbilt
52817f0a45 Implement QPDFArgParser based on ArgParser from qpdf.cc 2022-01-30 13:11:02 -05:00
m-holger
0f9086e509 Fix doc typos 2022-01-30 12:09:54 -06:00
m-holger
8eca9d8fd9 Fix QPDFObjectHandle::isOrHasName
Ensure isOrHasName returns true if object is an array and the name is
present anywhere in the array.
2022-01-27 09:35:39 -06:00
m-holger
07db3200cb Remove some if statements and simplify some boolean expressions
Use QPDFObjectHandle::isNameAndEquals, isDictionaryOfType and
isStreamOfType.
2022-01-27 07:31:12 -06:00
m-holger
710d2e54f0 Allow testing for subtype without specifying type in isDictionaryOfType etc
Accept empty string as type parameter in
QPDFObjectHandle::isDictionaryOfType and isStreamOfType
to allow for dictionaries with optional type.
2022-01-27 07:31:12 -06:00
m-holger
1b1b471ca9 Make a few whitespace fixes from last commit
Commit by ejb@ql.org using m-holger as author so git annotate gives
proper credit for changes.
2022-01-22 09:14:53 -05:00
m-holger
8593b9fdf7 Add new convenience methods QPDFObjectHandle::isNameAndEquals, etc
Add methods isNameAndEquals, isDictionaryOfType, isStreamOfType
2022-01-22 08:10:28 -06:00
Jay Berkenbilt
370710657a Add missing characters from PDF doc encoding (fixes #606) 2022-01-11 15:55:19 -05:00
Jay Berkenbilt
77c31305fe Fix signed/unsigned char warning (fixes #604) 2022-01-11 06:51:31 -05:00
Jay Berkenbilt
af91b5b584 Add QUtil::file_can_be_opened 2021-12-29 13:41:02 -05:00
Jay Berkenbilt
04745320d6 Prepare 10.5.0 release 2021-12-20 14:51:46 -05:00
Jay Berkenbilt
d866f48081 Change names of qpdf_object_type_e enumerations
They have to be ot_* rather than qpdf_ot_* for compatibility.

* Different enumerated types are not assignment-compatible in C++, at
  least with strict compiler settings
* While you can do `constexpr ot_xyz = ::qpdf_ot_xyz` in QPDFObject.hh to
  make QPDFObject::ot_xyz work, QPDFObject::object_type_e::ot_xyz will
  only work if the enumerated type names are the same.
2021-12-20 14:51:45 -05:00
Jay Berkenbilt
ea73bf72e0 Further improvements to handling binary strings 2021-12-19 14:30:45 -05:00
Jay Berkenbilt
ddbe59179e C API: simplify new error handling and improve documentation 2021-12-17 15:59:47 -05:00
m-holger
f6293bd94c C-API expose QPDFObjectHandle::getTypeCode and getTypeName (fixes #597) 2021-12-17 14:24:43 -05:00
Jay Berkenbilt
feafcc4e88 C API: add several stream functions (fixes #596) 2021-12-17 13:28:11 -05:00
Jay Berkenbilt
fee7489ee4 Add Pl_Buffer::getMallocBuffer 2021-12-17 12:38:52 -05:00
Jay Berkenbilt
9bb6f570ec C API: add functions for working with pages (fixes #594) 2021-12-16 15:07:48 -05:00
Jay Berkenbilt
245ca28066 Use value rather than reference captures where possible 2021-12-16 11:47:07 -05:00
Jay Berkenbilt
af2a71aa2c Handle bitstream overflow errors more gracefully (fixes #581)
* Make it a runtime error, not a logic error
* Include additional information
* Capture it properly in checkLinearization
2021-12-10 15:37:35 -05:00
Jay Berkenbilt
1c62c2a342 C API: expose functions for indirect objects (fixes #588) 2021-12-10 14:57:35 -05:00
Jay Berkenbilt
72c10d8617 C API: overhaul error handling
* Handle error conditions that occur when using the object handle
  interfaces. In the past, some exceptions were not correctly
  converted to errors or warnings.
* Add more detailed information to qpdf-c.h
* Make it possible to work more explicitly with uninitialized objects
2021-12-10 12:16:02 -05:00
Jay Berkenbilt
3340dbe976 Use a specific error code for type warnings and clarify docs 2021-12-10 11:15:49 -05:00
Jay Berkenbilt
b2b2a175c4 Add missing unit test for register progress reporter in C API
It was exercised in the pdf-linearize example but not in qpdf-ctest.
2021-12-10 09:11:56 -05:00
Jay Berkenbilt
1faa21502f Refactor trap_errors to use std::function 2021-12-09 10:33:31 -05:00
Jay Berkenbilt
e3cc171d02 C API: qpdf_oh_is_initialized 2021-12-09 10:33:31 -05:00
Jay Berkenbilt
bef2c2222a C API: qpdf_get_last_string_length 2021-12-09 10:33:31 -05:00
m-holger
b4fc9eb700 C-API expose new_object as qpdf_oh_new_object 2021-12-02 13:59:58 -05:00
Jay Berkenbilt
720ce9e8f3 Improve testing and error handling around operating before processing 2021-11-29 07:42:36 -05:00
Jay Berkenbilt
ac17308cf6 Initialize QPDF::Members::file (fixes #584) 2021-11-29 07:16:34 -05:00
m-holger
4630b8567c Ensure qpdf_oh handles returned by C-API functions are unique.
Return new qpdf_oh from qpdf_oh_wrap_in_array when input is already an array.
Update some doc comments in qpdf-c.h.
2021-11-19 13:31:59 +00:00
Jay Berkenbilt
ce7db05d22 Prepare 10.4.0 release 2021-11-16 15:44:09 -05:00
Jay Berkenbilt
750aca5b94 First increment of improving handling of weak crypto (fixes #358) 2021-11-11 12:24:15 -05:00
Jay Berkenbilt
f45dacf4cb Make recovery logic flexible about where objects end (fixes #573)
Don't assume endobj is at the beginning of the line. This means we are
looking at tokens for every line, but the odds of n n obj appearing in
the middle of the object are likely much lower than endobj not being
at the beginning of the line or missing entirely. This will probably
have a negative impact on recovery time for very large files.
Hopefully it will be worth it.
2021-11-07 15:27:22 -05:00
Jay Berkenbilt
3794f8e2ad Support OpenSSL 3 (fixes #568) 2021-11-04 18:24:54 -04:00
Jay Berkenbilt
a84a0b2487 Add range check in QPDFNumberTreeObjectHelper (fuzz issue 37740) 2021-11-04 14:03:24 -04:00
Jay Berkenbilt
4a648b9a00 Fix bug in merging resources /DR from foreign AcroForm (fixes #548)
When making resources indirect in from_dr, the code was using the
wrong owning QPDF, forgetting that from_dr had already been copied
using CopyForeignObject.
2021-11-04 12:29:42 -04:00
Jay Berkenbilt
9b28933647 Check object ownership when adding
When adding a QPDFObjectHandle to an array or dictionary, if possible,
check if the new object belongs to the same QPDF. This makes it much
easier to find incorrect code than waiting for the situation to be
detected when the file is written.
2021-11-04 12:29:42 -04:00
Jay Berkenbilt
33a47d5c3c Make QPDF::findPage public (fixes #516)
This was originally not public because I wanted to get rid fo the
pages cache, but I recently realized there were deep reasons not to do
that, and the author of pikepdf wanted this, so I decided to make it
public.
2021-11-03 09:43:17 -04:00
Jay Berkenbilt
532a4f3d60 Detect recoverable but invalid zlib data streams (fixes #562) 2021-11-03 09:43:17 -04:00
Fredrik Fornwall
e0775238b8 Fix QPDFEFStreamObjectHelper::{get,set}Subtype
The /Subtype entry that specifies the mime type of an embedded file is
inside the embedded file stream dictionary directly, not it in the
parameter dictionary.

See Table 45 and 46 in the PDF 1.7 specification:
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=112
2021-09-10 10:02:24 -04:00
Jay Berkenbilt
3cacb27a90 Performance fix on preserveObjectStreams 2021-05-09 07:51:14 -04:00
Jay Berkenbilt
bddebdb0ea Prepare 10.3.2 release 2021-05-08 10:41:14 -04:00
Jay Berkenbilt
30ac51bc78 Exclude unreferenced objects in object streams (fixes #520) 2021-05-08 09:42:09 -04:00
Zdenek Dohnal
16c19e9424 libqpdf/Pl_AES_PDF.cc: remove duplicated if branch
Check for this->encrypt seems to be moved to plugged crypto
implementations, so it can be removed from Pl_AES_PDF.cc.
2021-04-29 09:42:38 -04:00
Jay Berkenbilt
36c7c20819 Fix timezone portability issue (fixes #515) 2021-04-17 18:12:55 -04:00
Jay Berkenbilt
8971443e46 QPDF::addPage*: handle duplicate pages more robustly 2021-04-05 10:58:10 -04:00
Jay Berkenbilt
ec48820c3c Fix loop detection in NNTree 2021-04-05 07:59:02 -04:00
Jay Berkenbilt
258675fc99 Move ABI comment to the right place 2021-04-03 11:43:08 -04:00
Jay Berkenbilt
a77f58142d Remove some assertions that are not necessarily true (fixes #514)
Operations that add the same object to multiple places in the pages
tree are throwing exceptions and then later causing assertion
failures. The assert calls shouldn't be there.
2021-03-21 19:35:23 -04:00
Jay Berkenbilt
3f05429cc5 Prepare 10.3.1 release 2021-03-11 12:59:41 -05:00
Jay Berkenbilt
85884c363c Allow /DR to be direct in /AcroForm
Also handle direct annotation, though this is much less likely.
2021-03-11 11:43:38 -05:00
Jay Berkenbilt
dc65b88457 Prepare 10.3.0 release 2021-03-05 06:15:48 -05:00
Jay Berkenbilt
cb6e53136f QPDFAcroFormDocumentHelper: add missing analyze calls 2021-03-04 18:11:44 -05:00
Jay Berkenbilt
0b77f2cf26 Revert non-binary-compatible handleWarning change -- see TODO (ABI) 2021-03-04 15:59:46 -05:00
Jay Berkenbilt
f68e25c7f2 Don't use handleWarning, which is being reverted 2021-03-04 15:59:45 -05:00
Jay Berkenbilt
9fb174b9e9 Major rework of handling form fields when copying pages (fixes #509) 2021-03-04 15:08:37 -05:00
Jay Berkenbilt
887f35efaa When resolving font from /DR, copy it into resources 2021-03-04 15:08:36 -05:00
Jay Berkenbilt
a2124f992c Add QPDFMatrix::operator== 2021-03-04 15:08:36 -05:00
Jay Berkenbilt
552303a94a Check for reserved after dereference 2021-03-04 15:08:36 -05:00
Jay Berkenbilt
d7ffdfa994 Add optional conflict detection to mergeResources
Also improve behavior around direct vs. indirect resources.
2021-03-04 15:08:36 -05:00
Jay Berkenbilt
e17585c2d2 Remove unreferenced: ignore names that are not Fonts or XObjects
Converted ResourceFinder to ParserCallbacks so we can better detect
the name that precedes various operators and use the operators to sort
the names into resource types. This enables us to be smarter about
detecting unreferenced resources in pages and also sets the stage for
reconciling differences in /DR across documents.
2021-03-03 17:05:49 -05:00
Jay Berkenbilt
a15ec6967d Enhancements to ParserCallbacks 2021-03-03 17:05:49 -05:00
Jay Berkenbilt
1bb209a9bf Add QPDF::numWarnings 2021-03-03 17:05:49 -05:00
Jay Berkenbilt
37fcc5ff71 Create ResourceFinder from NameWatcher in QPDFPageObjectHelper 2021-03-03 17:05:49 -05:00
Jay Berkenbilt
b444ab3352 Fix typos in coverage cases 2021-03-03 17:05:49 -05:00
Jay Berkenbilt
fa2516df71 Fix behavior for finding /Q, /DA, and /DR for form fields
If not found in the field hierarchy, /Q and /DA are supposed to be
looked up in the document-level form dictionary. /DR is supposed to
only come from the document dictionary.
2021-03-03 17:05:19 -05:00
Jay Berkenbilt
a4d6589ff2 Have QPDFObjectHandle notice when replaceObject was called
This results in a performance penalty of 1% to 2% when replaceObject
and swapObjects are never called and a somewhat larger penalty if they
are called, but it's worth it to avoid very confusing behavior as
discussed in depth in qpdf#507.
2021-02-25 07:32:46 -05:00
Jay Berkenbilt
ec6719fd25 Always call dereference() before querying obj pointer 2021-02-25 07:31:26 -05:00
Jay Berkenbilt
b5e937397c Prepare 10.2.0 release 2021-02-23 10:41:58 -05:00
Jay Berkenbilt
1886673d7e Spell check 2021-02-23 10:38:05 -05:00
Jay Berkenbilt
9e00be7ffa Remove warning that gives false positives in some normal cases 2021-02-23 08:26:21 -05:00
Jay Berkenbilt
be3a8c0e7a Keep only referenced form fields in --pages 2021-02-23 08:26:21 -05:00
Jay Berkenbilt
83216e640c Preserve form fields when splitting pages (fixes #340) 2021-02-22 18:42:06 -05:00
Jay Berkenbilt
1f35ec9988 Add methods for copying form fields 2021-02-22 18:42:06 -05:00
Jay Berkenbilt
8e8c0d8290 Add new placeFormXObject that takes a matrix reference 2021-02-22 18:42:06 -05:00
Jay Berkenbilt
61d41e2e88 Add copyAnnotations, use with overlay/underlay (fixes #395) 2021-02-22 18:42:06 -05:00
Jay Berkenbilt
7b3cbacf5d Change from QPDF{Array,Dict}Items to aitems() and ditems() 2021-02-22 11:05:39 -05:00
Jay Berkenbilt
a9ae8cadc6 Add transformAnnotations and fix flattenRotations to use it 2021-02-21 17:13:09 -05:00
Jay Berkenbilt
a76decd2d5 Add QPDFObjGen::unparse 2021-02-21 16:21:52 -05:00
Jay Berkenbilt
7540d2082a Explicitly override inherited rotate in flattenRotations 2021-02-21 14:58:45 -05:00
Jay Berkenbilt
e899926e0d Use QPDFMatrix inside flattenRotations 2021-02-21 14:58:45 -05:00
Jay Berkenbilt
92fbc6fdf5 QPDFObjectHandle::copyStream 2021-02-21 06:36:30 -05:00
Jay Berkenbilt
60afe4142e Refactor: separate copyStreamData from replaceForeignIndirectObjects 2021-02-21 06:36:30 -05:00
Jay Berkenbilt
15269f36d8 addFormField: update cache rather than invalidating 2021-02-21 06:36:30 -05:00
Jay Berkenbilt
901f1a788c Enhance QPDFMatrix API 2021-02-21 06:36:30 -05:00
Jay Berkenbilt
05eb5826d8 Fix isPagesObject and isPageObject
There are lots of things with /Kids that are not pages. Repair the
pages tree, then do a reliable check.
2021-02-20 19:42:41 -05:00
Jay Berkenbilt
35dd11f356 Allow --rotate=0 2021-02-20 16:29:34 -05:00
Jay Berkenbilt
71e8627285 Add const versions of QPDFMatrix::transform* 2021-02-19 18:35:19 -05:00
Jay Berkenbilt
de8929a41c Add QPDFAcroFormDocumentHelper::addFormField 2021-02-18 12:25:48 -05:00
Jay Berkenbilt
5cec6b4c3d Add QPDFPageObjectHelper::getMatrixForFormXObjectPlacement 2021-02-18 12:25:48 -05:00
Jay Berkenbilt
0765872295 Form field for non-widget just returns null 2021-02-18 10:25:07 -05:00
Jay Berkenbilt
0b1623d07d Add QUtil::path_basename 2021-02-18 09:59:03 -05:00
Jay Berkenbilt
a773f4c71d Add QPDFObjectHandle::parse for strings with context 2021-02-15 11:33:03 -05:00
Jay Berkenbilt
7eb903d9aa Use functional replaceStreamData 2021-02-14 14:42:24 -05:00
Jay Berkenbilt
efbb21673c Add functional versions of QPDFObjectHandle::replaceStreamData
Also fix a bug in checking consistency of length for stream data
providers. Length should not be checked or recorded if the provider
says it failed to generate the data.
2021-02-14 14:42:24 -05:00
Jay Berkenbilt
e2593e2efe Move QPDFMatrix into the public API 2021-02-13 02:30:00 -05:00
Jay Berkenbilt
07f40bd254 QUtil::double_to_string: trim trailing zeroes with option to disable 2021-02-13 02:30:00 -05:00
Jay Berkenbilt
8fbc8579f2 Allow zone information to be omitted from timestamp strings 2021-02-11 14:26:55 -05:00
Jay Berkenbilt
df067c9ab6 Add autoconf test for localtime_r 2021-02-11 14:26:55 -05:00
Jay Berkenbilt
1b3f84f967 Require C++14 instead of C++11 2021-02-10 16:27:58 -05:00
Jay Berkenbilt
9fcf61b2f6 Fix loop in QPDFOutlineDocumentHelper (fuzz issue 30507) 2021-02-10 16:27:44 -05:00
Jay Berkenbilt
4d1f2fdcac Update to new name/number tree API 2021-02-10 15:46:20 -05:00
Jay Berkenbilt
1f4771cd0d Minor clean up of Windows headers 2021-02-10 07:36:18 -05:00
Jay Berkenbilt
ad34b9c278 Implement helpers for file attachments 2021-02-10 06:57:37 -05:00
Jay Berkenbilt
bf0e6eb302 Add QUtil methods for dealing with PDF timestamp strings 2021-02-09 17:50:24 -05:00
Jay Berkenbilt
bfbeec5497 Make newly created name/number trees indirect objects 2021-02-08 06:49:56 -05:00
Jay Berkenbilt
553ac7f353 Add QUtil::pipe_file and QUtil::file_provider 2021-02-07 19:41:34 -05:00
Jay Berkenbilt
e076c9bf08 Remove erroneous handling of /EFF for stream decryption
I thought /EFF was supposed to be used as a default for decrypting
embedded file streams, but actually it's supposed to be advice to a
conforming writer about handling new ones. This makes sense since the
findAttachmentStreams code, which is not actually needed, was never
right.
2021-02-06 17:08:41 -05:00
Jay Berkenbilt
ac2b3b96e1 Make wrong object stream type a warning 2021-02-06 14:29:11 -05:00
Jay Berkenbilt
faa2e3ddfd Handle older PDFs whose form XObjects inherit resources (fixes #494)
When removing unreferenced resources, notice if a page (recursively)
contains a form XObject with unreferenced resources, and count any
such resources as referenced by the page.
2021-02-02 18:06:05 -05:00
Jay Berkenbilt
81025e4998 Refactor removal of unreferenced resources
Refactor in preparation for resolving unresolved resources in form
xobjects from page.
2021-02-02 18:06:05 -05:00
Jay Berkenbilt
9c9ce64eec Handle strings in inline image dictionaries
We need to use token.getRawValue, not token.getValue
2021-01-31 07:50:03 -05:00
Jay Berkenbilt
178f995fc2 Recover from exceptions during filtering for inline images 2021-01-31 07:49:08 -05:00
Jay Berkenbilt
4ae93a73c5 Improve memory safety of dict/array iterators 2021-01-31 07:16:03 -05:00
Jay Berkenbilt
de0b11fc47 Add C++ iterator API around array and dictionary objects 2021-01-30 15:15:23 -05:00
Jay Berkenbilt
35e7859bc7 Make QPDFObjectHandle::is* return false for uninitialized objects 2021-01-29 15:46:54 -05:00
Jay Berkenbilt
50decc9bb8 name/number tree: explicitly declare default destructors 2021-01-29 15:46:54 -05:00
Jay Berkenbilt
8ed3e8c79b NNTree: rework iterators to be more memory efficient
Keep a std::pair internal to the iterators so that operator* can
return a reference and operator-> can work, and each can work without
copying pairs of objects around.
2021-01-26 09:12:23 -05:00
Jay Berkenbilt
e7e20772ed name/number trees: remove 2021-01-26 09:12:23 -05:00
Jay Berkenbilt
5816fb44b8 name/number trees: insertAfter 2021-01-25 15:39:10 -05:00
Jay Berkenbilt
16a9bb3f6f name/number trees: newEmpty, increment/decrement end() 2021-01-25 15:39:10 -05:00
Jay Berkenbilt
b5614f611d Implement repair and insert for name/number trees 2021-01-24 19:31:45 -05:00
Jay Berkenbilt
04edfe9fad QPDFObjectHandle::newUnicodeString to uses UTF-16 only when needed
Use the first of ASCII, PDFDocEncoding, or UTF-16 that is capable of
encoding the string.
2021-01-24 03:27:28 -05:00
Jay Berkenbilt
63e5cb533d Use new QPDF{Name,Number}TreeObjectHelper API 2021-01-24 03:27:28 -05:00
Jay Berkenbilt
d61ffb65d0 Add new constructors for name/number tree helpers
Add constructors that take a QPDF object so we can issue warnings and
create new indirect objects.
2021-01-24 03:27:26 -05:00
Jay Berkenbilt
ba814703fb Use QPDFNameTreeObjectHelper's iterator directly 2021-01-24 03:25:11 -05:00
Jay Berkenbilt
5f0708418a Add iterators to name/number tree helpers 2021-01-24 03:22:59 -05:00
Jay Berkenbilt
4a1cce0a47 Reimplement name and number tree object helpers
Create a computationally and memory efficient implementation of name
and number trees that does binary searches as intended by the data
structure rather than loading into a map, which can use a great deal
of memory and can be very slow.
2021-01-24 03:22:51 -05:00
Jay Berkenbilt
6226b69dba Add warn() to QPDF's public API 2021-01-16 18:41:53 -05:00
Jay Berkenbilt
fc88837d4b Treat /EmbeddedFiles as a proper name tree
If we ever had an encrypted file with different filters for
attachments and either the /EmbeddedFiles name tree was deep or some
of the file specs didn't have /Type, we would have overlooked those as
attachment streams. The code now properly handles /EmbeddedFiles as a
name tree.
2021-01-11 10:50:44 -05:00
Jay Berkenbilt
6fe7b704c7 Warn rather than segv on access after closing input source (fixes #495) 2021-01-06 10:11:34 -05:00
Jay Berkenbilt
0fed040392 Prepare version 10.1.0 2021-01-04 16:59:55 -05:00
Jay Berkenbilt
18340b8835 Spell check 2021-01-04 16:26:58 -05:00
Jay Berkenbilt
dc92574c10 Fix some pipelines to be safe if downstream write fails (fuzz issue 28262) 2021-01-04 15:17:35 -05:00
Jay Berkenbilt
ba6b6aacf1 Fix outdated comment 2021-01-03 15:59:49 -05:00
Jay Berkenbilt
3be58f49e5 Make more QPDFPageObjectHelper methods work with form XObject 2021-01-02 14:08:53 -05:00
Jay Berkenbilt
98da4fd835 Externalize inline images now includes form XObjects 2021-01-02 14:08:17 -05:00
Jay Berkenbilt
bedf35d6a5 Bug fix: avoid extraneous pipeline finish calls with multiple contents
Avoid calling finish() multiple times on the pipeline passed to
pipeContentStreams. This commit also fixes a bug in which qpdf was not
exiting with the proper exit status if warnings found while splitting
pages; this was exposed by a test case that changed.
2021-01-02 14:08:17 -05:00
Jay Berkenbilt
a139d2b36d Add several methods for working with form XObjects (fixes #436)
Make some more methods in QPDFPageObjectHelper work with form
XObjects, provide forEach methods to walk through nested form
XObjects, possibly recursively. This should make it easier to work
with form XObjects from user code.
2021-01-02 12:29:31 -05:00
Jay Berkenbilt
6154221edb QPDFPageObjectHelper: filterPageContents -> filterContents + form XObject 2021-01-02 11:33:36 -05:00
Jay Berkenbilt
63ea46193d QPDFPageObjectHelper: getPageImages -> getImages 2021-01-02 11:33:36 -05:00
Jay Berkenbilt
e7a8554563 QPDFPageObjectHelper::getPageImages: support form XObjects 2021-01-02 11:33:36 -05:00
Jay Berkenbilt
1562d34c09 Add QPDFObjectHandle::isFormXObject 2021-01-01 07:36:10 -05:00
Jay Berkenbilt
c9271335fa Add QPDFPageObjectHelper::flattenRotation and --flatten-rotation 2020-12-30 13:03:55 -05:00
Jay Berkenbilt
12ecd2019a Add QPDFObjectHandle::setFilterOnWrite 2020-12-28 12:58:19 -05:00
Jay Berkenbilt
3f9191a344 Add ostream << for QPDFObjGen 2020-12-28 12:58:19 -05:00
Jay Berkenbilt
858c7b89bc Let optimize filter stream parameters instead of making them direct
Also removes preclusion of stream references in stream parameters of
filterable streams and reduces write times by about 8% by eliminating
an extra traversal of the objects.
2020-12-28 12:58:19 -05:00
Jay Berkenbilt
1a62cce940 Restructure optimize to allow skipping parameters of filtered streams 2020-12-28 12:58:19 -05:00
Jay Berkenbilt
09027344b9 Refactor: separate code that determines whether to filter a stream 2020-12-28 12:58:19 -05:00
Jay Berkenbilt
39bfa01307 Implement user-provided stream filters
Refactor QPDF_Stream to use stream filter classes to handle supported
stream filters as well.
2020-12-28 12:58:19 -05:00
Jay Berkenbilt
cc8895078a Add QPDFObjectHandle::makeDirect(bool allow_streams) 2020-12-26 08:48:18 -05:00
Jay Berkenbilt
573b6eb8b1 Provide qpdf write progress reporting from C API (fixes #487) 2020-12-20 14:43:24 -05:00
Jay Berkenbilt
2050977099 Add QPDFObjectHandle manipulation to C API 2020-11-28 19:48:07 -05:00
Jay Berkenbilt
78b9d6bfd4 Prepare 10.0.4 release 2020-11-21 13:50:02 -05:00
Jay Berkenbilt
bd79138c84 Treat direct page as runtime rather than logic error (fuzz issue 27393) 2020-11-11 09:50:43 -05:00
Jay Berkenbilt
47f4ebcdac Ignore unused field in xref entry, avoiding range error (fixes #482) 2020-11-04 07:46:46 -05:00
Jay Berkenbilt
fbe40b800d Prepare 10.0.3 release 2020-10-31 13:47:03 -04:00
Jay Berkenbilt
6971f78ff6 Fix stack overflow on direct root (fuzz issue 26761) 2020-10-31 13:10:39 -04:00
Jay Berkenbilt
ffe6af6f77 Add comments explaining the foreign object copying code
These are the comments I would have liked to have been able to read
while fixing #449 and #478.
2020-10-31 12:14:26 -04:00
Jay Berkenbilt
96767fb104 Fix foreign stream copying bug (fixes #478)
This reverts an incorrect fix to #449 and codes it properly. The real
problem was that we were looking at the local dictionaries rather than
the foreign dictionaries when saving the foreign stream data. In the
case of direct objects, these happened to be the same, but in the case
of indirect objects, the object references could be pointing anywhere
since object numbers don't match up between the old and new files.
2020-10-31 12:14:26 -04:00
Jay Berkenbilt
da7540794a Prepare 10.0.2 release 2020-10-27 11:57:48 -04:00
Jay Berkenbilt
09bd1fafb1 Improve efficiency of number to string conversion 2020-10-27 11:57:48 -04:00
Jay Berkenbilt
bcea54fcaa Revert removal of unreadCh change for performance
Turns out unreadCh is much more efficient than seek(-1, SEEK_CUR).
Update comments and code to reflect this.
2020-10-27 11:57:48 -04:00
Jay Berkenbilt
b30deaeeab Avoid merging adjacent tokens when concatenating contents (fixes #444) 2020-10-23 08:00:04 -04:00
Jay Berkenbilt
8a11feacc3 Avoid leak by resolving object streams more than once (fuzz issue 23642) 2020-10-22 15:39:36 -04:00
Jay Berkenbilt
30bb4c64ee Minor code cleanup
* Return rather than exiting from realmain in qpdf.cc
* Remove extraneous blank line
* Don't assign temporary to const reference
2020-10-22 15:39:36 -04:00
Jay Berkenbilt
232f5fc9f3 Handle jpeg library fuzz false positives
The jpeg library has some assembly code that is missed by the compiler
instrumentation used by memory sanitization. There is a runtime
environment variable that is used to work around this issue.
2020-10-22 06:31:52 -04:00
Jay Berkenbilt
c1684eae91 Check for overflow in page labels (fuzz issue 23599) 2020-10-22 05:49:24 -04:00
Jay Berkenbilt
7f4a4df919 Add range_check method to QIntC 2020-10-22 05:48:40 -04:00
Jay Berkenbilt
24196c08cb Fix loop detection error (fuzz issue 23172) 2020-10-22 05:48:35 -04:00
Jay Berkenbilt
956c8f6432 Obscure bug fix copying foreign streams in special cases (fixes #449)
Specifically, if a stream had its stream data replaced and had
indirect /Filter or /DecodeParms, it would result in non-silent loss
of data and/or internal error.
2020-10-21 19:23:23 -04:00
Jay Berkenbilt
98f6c00dad Protect numeric conversion against user's locale (fixes #459) 2020-10-21 16:42:51 -04:00
Jay Berkenbilt
bed165c9fc Stop using InputSource::unreadCh 2020-10-18 07:43:05 -04:00
Dean Scarff
153060a0c5 Check integer overflow in resolveObjectsInStream
Fixes a crash found by fuzzing.
2020-10-16 20:09:24 -04:00
Dean Scarff
9a3791c53b Properly detect OPENSSL_IS_BORINGSSL
OPENSSL_IS_BORINGSSL is not actually set by configure, so it will be
undefined until a BoringSSL header is included.  Hence the #ifdef logic
in QPDFCrypto_openssl.h would usually never apply.

This still worked because evp.h transitively included BoringSSL's
cipher.h and digest.h, but the latter are the correct (documented)
headers.

By re-ordering the includes, we can ensure the macro is defined when we
use it.

Also: fix case in the header guards.
2020-10-16 20:04:36 -04:00
Dean Scarff
2ff84aa2c9 Include detailed OpenSSL error messages
Fixes qpdf/qpdf#450
2020-10-16 19:58:11 -04:00
James R. Barlow
3fc7c99d02 Replace memchr with manual memory search
On large files with predominantly \n line endings, memchr(..'\r'..)
seems to waste a considerable amount of time searching for a line
ending candidate that we don't need.

On the Adobe PDF Reference Manual 1.7, this commit is 8x faster at
QPDF::processMemoryFile().
2020-10-16 19:57:29 -04:00
oltolm
3221022fc9 fix WindowsCryptProvider fixes #432 2020-10-16 19:56:33 -04:00
Jay Berkenbilt
ff65e272a8 Fix printf formatting for newer msvc
Use autoconf rather than ifdefs to determine what format string to use
for long long.
2020-10-16 07:02:23 -04:00
Jay Berkenbilt
88b8f8ec86 Remove redundant check found by lgtm.com 2020-10-15 14:47:43 -04:00
Jay Berkenbilt
26514ab731 Write linearization errors to stderr (fixes #438) 2020-04-29 17:33:34 -04:00
Jay Berkenbilt
92d3cbecd4 Fix warnings reported by -Wshadow=local (fixes #431) 2020-04-16 12:41:43 -04:00
Jay Berkenbilt
578c5ac66c Use more references when iterating
When possible, use `for (auto&` or `for (auto const&` when iterating
using C++-11 style iterators.
2020-04-10 13:30:33 -04:00
Jay Berkenbilt
821a701851 Prepare 10.0.1 release 2020-04-09 11:48:26 -04:00
Jay Berkenbilt
1a7d3700a6 Fix unnecessary copies in auto iter (fixes #426)
Also switch to colon-style iteration in some cases. Thanks to Dean
Scarff for drawing this to my attention after detecting some
unnecessary copies with
https://clang.llvm.org/extra/clang-tidy/checks/performance-for-range-copy.html
2020-04-08 20:45:26 -04:00
Jay Berkenbilt
4977a7efa5 Bug fix: getStreamData should on unfilterable stream (fixes #425) 2020-04-08 18:52:04 -04:00
Jay Berkenbilt
1e629c278a Prepare 10.0.0 release 2020-04-06 11:30:15 -04:00
Jay Berkenbilt
c996f4ac33 Don't include <cwchar> if not building with wchar 2020-04-06 11:23:02 -04:00
Jay Berkenbilt
77198d5310 Delegate random number generation to crypto provider (fixes #418) 2020-04-06 11:23:02 -04:00
Jay Berkenbilt
52749b85df Make random data provider code thread-safe
This uses C++-11 thread-safe static initializers now.
2020-04-06 10:00:43 -04:00
Jay Berkenbilt
619d294e9d Remove QUtil::srandom 2020-04-06 09:49:02 -04:00
Dean Scarff
0f2507234f Add OpenSSL/BoringSSL crypto provider
Fixes qpdf/qpdf#417
2020-04-06 09:01:55 -04:00
Jay Berkenbilt
893d38b87e Allow propagation of errors and retry through StreamDataProvider
StreamDataProvider::provideStreamData now has a rich enough API for it
to effectively proxy to pipeStreamData.
2020-04-05 20:07:13 -04:00
Jay Berkenbilt
7246404177 JSON: implement pattern keys in schema 2020-04-04 18:06:32 -04:00
Dean Scarff
c5c1a028cd Use deterministic assignments for unique_id
Fixes qpdf/qpdf#419
2020-04-04 08:29:28 -04:00
Jay Berkenbilt
2100b4ce15 Allow qpdf to be built on systems without wchar_t (fixes #406) 2020-04-03 21:39:44 -04:00
Jay Berkenbilt
6a4117add9 Avoid potential segfault in warning methods 2020-04-03 21:39:20 -04:00
Jay Berkenbilt
4f3b89991b placeFormXObject: allow control of shrink/expand (fixes #409) 2020-04-03 21:39:17 -04:00
Jay Berkenbilt
b76b73b229 C API: accept any non-zero value as TRUE 2020-04-03 17:33:44 -04:00
Jay Berkenbilt
54726930df Remove redundant methods in QUtil
This was being saved until we had to break ABI.
2020-04-03 12:17:57 -04:00
Jay Berkenbilt
5806e5c60c QPDFPageObjectHelper::placeFormXObject: use std::string const& (fixes #374) 2020-04-03 12:17:57 -04:00
Jay Berkenbilt
97de12343b Performance: remove Members indirection for Pipeline 2020-04-03 12:17:57 -04:00
Jay Berkenbilt
bfda941519 Use an unordered map for SparseOHArray for efficiency
This was added in C++11.
2020-04-03 12:16:24 -04:00
Jay Berkenbilt
ee271fd2f2 Use auto for iterating over sparse array 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
70665cb381 Internally use unsafeShallowCopy where we can 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
38afdcea7b Add QPDFObjectHandle::unsafeShallowCopy 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
07afb668b1 Performance: remove indirection through Members for QPDFObject 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
89f19b7099 Performance: remove Members indirection for QPDFObjectHandle 2020-04-03 12:16:24 -04:00
Jay Berkenbilt
dac65a21fb Look in form XObjects when removing unreferenced resources (fixes #373)
If a page contains a form XObject, also filter the form XObject and
remove its unreferenced resources.
2020-03-31 17:39:20 -04:00
Jay Berkenbilt
278710fbe8 Refactor QPDFPageObjectHelper::removeUnreferencedResources()
Refactor removeUnreferencedResources to prepare for filtering form
XObjects.
2020-03-31 17:39:20 -04:00
Jay Berkenbilt
bb6768b8f0 Include header for wcslen (fixes #405) 2020-02-29 08:43:33 -05:00
Jay Berkenbilt
bb3137296d Handle root /Pages pointing to other than page tree root (fixes #398) 2020-02-22 11:10:31 -05:00
Jay Berkenbilt
52a2e95dd5 Prepare 9.1.1 release 2020-01-26 18:49:04 -05:00
Jay Berkenbilt
57c01ef81f In qdf mode, don't write extra XRef streams (fixes #386)
fix-qdf assumes there is exactly one XRef stream and that it is at the
end of the file.
2020-01-26 16:50:57 -05:00
Jay Berkenbilt
bbc2f8ffae Bug fix: handle ColorSpace lookup for inline images (fixes #392)
If the value of /CS in the inline image dictionary was is key in the
page's /Resource -> /ColorSpace dictionary, properly resolve it by
referencing the proper colorspace, and not just the name, in the
external image dictionary.
2020-01-26 15:29:10 -05:00
Cloudmersive
a8b6ff5763 Fix for Windows unable to acquire crypt context with new keyset (fixes #387)
Fix is based on guidance
https://support.microsoft.com/en-us/help/238187/cryptacquirecontext-use-and-troubleshooting
and is the proper fix for #285/#286
2020-01-14 18:45:54 -05:00
Jay Berkenbilt
a44b5a34a0 Pull wmain -> main code from qpdf.cc into QUtil.cc 2020-01-14 11:40:51 -05:00
Jay Berkenbilt
ab4061f1ee Add error detection for read_lines_from_file(FILE*) 2020-01-14 11:07:09 -05:00
Jay Berkenbilt
211a7f57be QUtil::read_lines_from_file: optional EOL preservation 2020-01-13 11:26:18 -05:00
Jay Berkenbilt
9a398504ca Refactor QUtil::read_lines_from_file
This commit adds the preserve_eol flags but doesn't implement EOL
preservation yet.
2020-01-13 09:19:53 -05:00
Jay Berkenbilt
9b0c6022d7 Prepare 9.1.0 release 2019-11-16 22:29:54 -05:00
Jay Berkenbilt
5e6dfc938e Prepare 9.1.rc1 release 2019-11-09 22:00:53 -05:00
Jay Berkenbilt
c4478e5249 Allow odd/even modifiers in numeric range (fixes #364) 2019-11-09 13:23:12 -05:00
Jay Berkenbilt
5508f74603 Allow /P in encryption dictionary to be positive (fixes #382)
Even though this is disallowed by the spec, files like this have been
encountered in the wild.
2019-11-09 12:33:15 -05:00
Jay Berkenbilt
127a957aee Allow runtime inspection/override of crypto provider 2019-11-09 09:53:42 -05:00
Jay Berkenbilt
88bedb41fe Implement gnutls crypto provider (fixes #218)
Thanks to Zdenek Dohnal <zdohnal@redhat.com> for contributing the code
used for the gnutls crypto provider.
2019-11-09 09:53:38 -05:00
Jay Berkenbilt
cc14523440 Update autoconf to support crypto selection 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
d0a53cd3ea Fix typos in configure.ac 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
c03ced09c0 Isolate source files used for native crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
d1ffe46c04 AES_PDF: move CBC logic from pipeline to AES_PDF implementation 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
c8cda4f965 AES_PDF: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
bb427bd117 SHA2: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
eadc222ff9 Rename SHA2 implementation (non-bisectable) 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
4287fcc002 RC4: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
0cdcd10228 Rename RC4 implementation (non-bisectable) 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
ce8f9b6608 MD5: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
5c3e856e9f Rename MD5 implementation (non-bisectable)
Just rename MD5 -> MD5_native in place so that git annotate will show
the lines as having originated there.
2019-11-09 08:18:02 -05:00
Jay Berkenbilt
2de41856a0 QPDFCryptoProvider: initial implementation 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
700f5b961e Remove int type checks -- subsumed by C++-11 2019-11-09 08:18:02 -05:00
Jay Berkenbilt
653ce3550d Require C++-11
Includes updates to m4/ax_cxx_compile_stdcxx.m4 to make it work with
msvc, which supports C++-11 with no flags but doesn't set __cplusplus
to a recent value.
2019-11-09 08:18:02 -05:00
Jay Berkenbilt
9094fb1f8e Fix two additional fuzz test cases 2019-11-03 18:59:12 -05:00
Masamichi Hosoda
5a842792b6 Parse Contents in signature dictionary without encryption
Various PDF digital signing tools do not encrypt /Contents value in
signature dictionary. Adobe Acrobat Reader DC can handle a PDF with
the /Contents value not encrypted.

Write Contents in signature dictionary without encryption

Tests ensure that string /Contents are not handled specially when not
found in sig dicts.
2019-10-22 16:20:21 -04:00
Masamichi Hosoda
cdc46d78f4 Add QPDFObject::getParsedOffset() 2019-10-22 16:19:06 -04:00
Masamichi Hosoda
50b329ee9f Add QPDFWriter::getWrittenXRefTable() 2019-10-22 16:16:16 -04:00
Masamichi Hosoda
5cf4090aee Add QPDFWriter::getRenumberedObjGen() 2019-10-22 16:16:16 -04:00
Masamichi Hosoda
46ac3e21b3 Add QPDF::getXRefTable() 2019-10-22 16:16:16 -04:00
Masamichi Hosoda
06b818dcd3 Exclude signature dictionary from compressible objects
It seems better not to compress signature dictionaries. Various PDF
digital signing tools, including Adobe Acrobat Reader DC, do not
compress signature dictionaries.

Table 8.93 "Entries in a signature dictionary" in PDF 1.5 reference
describes that /ByteRange in the signature dictionary shall be used to
describe a digest that does not include the signature value
(/Contents) itself.

The byte ranges cannot be determined if the dictionary is compressed.
2019-10-22 16:16:16 -04:00
Masamichi Hosoda
5e0ba12687 Fix /Contents value representation in a signature dictionary
Table 8.93 "Entries in a signature dictionary" in PDF 1.5 reference
describes that the value of Contents entry is a hexadecimal string
representation when ByteRange is specified.

This commit makes QPDF always uses hexadecimal strings representation
instead of literal strings for it.
2019-10-22 16:16:16 -04:00
Jay Berkenbilt
3094955dee Prepare 9.0.2 release 2019-10-12 19:37:40 -04:00
Jay Berkenbilt
4ea940b03c Prepare 9.0.1 release 2019-09-20 07:38:18 -04:00
Jay Berkenbilt
685250d7d6 Correct reversed Rectangle coordinates (fixes #363) 2019-09-19 21:25:34 -04:00
Jay Berkenbilt
48b7de2cc3 Fix typo in comment 2019-09-19 21:04:32 -04:00
Jay Berkenbilt
8b1e307741 Warn for duplicated dictionary keys (fixes #345) 2019-09-19 20:22:34 -04:00
Jay Berkenbilt
bb83e65193 Fix fuzz issue 16953 (overflow checking in xref stream index) 2019-09-17 19:48:47 -04:00
Jay Berkenbilt
17d431dfd5 Fix integer type warnings for big-endian systems 2019-09-17 19:14:27 -04:00
Jay Berkenbilt
5462dfce31 Prepare 9.0.0 release 2019-08-31 20:07:36 -04:00
Jay Berkenbilt
babd12c9b2 Add methods QPDF::anyWarnings and QPDF::closeInputSource 2019-08-31 15:51:20 -04:00
Jay Berkenbilt
4fa7b1eb60 Add remove_file and rename_file to QUtil 2019-08-31 15:51:04 -04:00
Jay Berkenbilt
0e51a9aca6 Don't encrypt trailer, fixes fuzz issue 15983
Ordinarily the trailer doesn't contain any strings, so this is usually
a non-issue, but if the trailer contains strings, linearizing and
encrypting with object streams would include encrypted strings in the
trailer, which would blow out the padding because encrypted strings
are longer than their cleartext counterparts.
2019-08-28 23:06:32 -04:00
Jay Berkenbilt
47a38a942d Detect stream in object stream, fixing fuzz 16214
It's detected in QPDFWriter instead of at parse time because I can't
figure out how to construct a test case in a reasonable time. This
commit moves the fuzz file into the regular test suite for a QTC
coverage case.
2019-08-28 12:49:04 -04:00
Jay Berkenbilt
ba5fb69164 Make popping pipeline stack safer
Use destructors to pop the pipeline stack, and ensure that code that
pops the stack is actually popping the intended thing.
2019-08-27 22:27:47 -04:00
Jay Berkenbilt
dadf8307c8 Fix fuzz issues 15316 and 15390 2019-08-27 20:39:06 -04:00
Jay Berkenbilt
456c285b02 Fix fuzz issue 16172 (overflow checking in OffsetInputSource) 2019-08-27 13:08:07 -04:00
Jay Berkenbilt
ad8081daf5 Fix fuzz issue 15442 (overflow checking in BufferInputSource) 2019-08-27 11:26:25 -04:00
Jay Berkenbilt
9a095c5c76 Seek in two stages to avoid overflow
When seeing to a position based on a value read from the input, we are
prone to integer overflow (fuzz issue 15442). Seek in two stages to
move the overflow check into the input source code.
2019-08-27 11:26:25 -04:00
Jay Berkenbilt
ac5e6de2e8 Fix fuzz issue 15387 (overflow checking xref size) 2019-08-27 11:26:25 -04:00
Jay Berkenbilt
6bc4cc3d48 Fix fuzz issue 15475 2019-08-25 22:52:25 -04:00
Jay Berkenbilt
94e86e2528 Fix fuzz issue 16301 2019-08-25 22:52:25 -04:00
Jay Berkenbilt
5da146c8b5 Track separately whether password was user/owner (fixes #159) 2019-08-24 11:01:19 -04:00
Jay Berkenbilt
5a0aef55a0 Split long line 2019-08-24 10:58:51 -04:00
Jay Berkenbilt
2794bfb1a6 Add flags to control zlib compression level (fixes #113) 2019-08-23 20:34:21 -04:00
Jay Berkenbilt
dac0598b94 Add ability to set zlib compression level globally 2019-08-23 20:34:21 -04:00
Jay Berkenbilt
3f1ab64066 Pass offset and length to ParserCallbacks::handleObject 2019-08-22 22:54:29 -04:00
Jay Berkenbilt
4b2e72c4cd Test for direct, rather than resolved nulls in parser
Just because we know an indirect reference is null, doesn't mean we
shouldn't keep it indirect.
2019-08-22 17:55:16 -04:00
Jay Berkenbilt
3f3dbe22ea Remove array null flattening
For some reason, qpdf from the beginning was replacing indirect
references to null with literal null in arrays even after removing the
old behavior of flattening scalar references. This seems like a bad
idea.
2019-08-22 17:55:16 -04:00
Jay Berkenbilt
225cd9dac2 Protect against coding error of re-entrant parsing 2019-08-22 17:55:16 -04:00
Jay Berkenbilt
ae5bd7102d Accept extraneous space before xref (fixes #341) 2019-08-19 22:24:53 -04:00
Jay Berkenbilt
8a9086a689 Accept extraneous space after stream keyword (fixes #329) 2019-08-19 21:43:44 -04:00
Jay Berkenbilt
43f91f58b8 Improve invalid name token warning message
This message used to only appear for PDF >= 1.2. The invalid name is
valid for PDF 1.0 and 1.1. However, since QPDFWriter may write a newer
version, it's better to detect and warn in all cases. Therefore make
the warning more informative.
2019-08-19 19:48:27 -04:00
Jay Berkenbilt
42d396f1dd Handle invalid name tokens symmetrically for PDF < 1.2 (fixes #332) 2019-08-19 19:48:27 -04:00
Jay Berkenbilt
d9dd99eca3 Attempt to repair /Type key in pages nodes (fixes #349) 2019-08-18 18:54:37 -04:00
Jay Berkenbilt
522d2b2227 Improve efficiency of fixDanglingReferences 2019-08-18 09:00:40 -04:00
Jay Berkenbilt
5187a3ec85 Shallow copy arrays without removing sparseness 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
bf7c6a8070 Use SparseOHArray in parsing 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
e5f504b6c5 Use SparseOHArray in QPDF_Array 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
a89d8a0677 Refactor QPDF_Array in preparation for using SparseOHArray 2019-08-17 23:02:41 -04:00
Jay Berkenbilt
e83f3308fb SparseOHArray 2019-08-17 23:02:41 -04:00
Thorsten Schöning
8f06da7534 Change list to vector for outline helpers (fixes #297)
This change works around STL problems with Embarcadero C++ Builder
version 10.2, but std::vector is more common than std::list in qpdf,
and this is a relatively new API, so an API change is tolerable.

Thanks to Thorsten Schöning <6223655+ams-tschoening@users.noreply.github.com>
for the fix.
2019-07-03 20:08:47 -04:00
Jay Berkenbilt
4db1de97ce Convert some cases of logic_error to runtime_error
There were a few cases that could be caused by invalid input rather
than bugs in the code which were throwing logic_error instead of
runtime_error.
2019-06-25 12:43:06 -04:00
Jay Berkenbilt
201e8798d7 Convert previously overlooked static cast to QIntC 2019-06-25 12:43:06 -04:00
Jay Berkenbilt
04f45cf652 Treat all linearization errors as warnings
This also reverts the addition of a new checkLinearization that
distinguishes errors from warnings. There's no practical distinction
between what was considered an error and what was considered a
warning.
2019-06-23 13:45:45 -04:00
Jay Berkenbilt
c5ed1b8075 Handle invalid encryption Length (fixes #333) 2019-06-22 20:57:33 -04:00
Jay Berkenbilt
551dfbf697 Allow set*EncryptionParameters before filename iset (fixes #336) 2019-06-22 20:57:33 -04:00
Jay Berkenbilt
7bd38a3eb3 Provide error message in Windows crypto code (fixes #286)
Thanks to github user zdenop for supplying some additional
error-handling code.
2019-06-22 17:12:01 -04:00
Jay Berkenbilt
6c39aa8763 In shippable code, favor smart pointers (fixes #235)
Use PointerHolder in several places where manually memory allocation
and deallocation were being used. This helps to protect against memory
leaks when exceptions are thrown in surprising places.
2019-06-22 16:57:52 -04:00
Jay Berkenbilt
85a3f95a89 qpdf: exit 3 for linearization warnings without errors (fixes #50) 2019-06-22 16:57:51 -04:00
Jay Berkenbilt
1bde5c68a3 Add QUtil::read_file_into_memory
This code was essentially duplicated between test_driver and
standalone_fuzz_target_runner.
2019-06-22 10:14:25 -04:00
Jay Berkenbilt
658b5bb3be QPDFWriter: clean up overloaded functions
In a small number of cases, it makes sense to replace an overloaded
function with a function that takes a default argument. We can do this
now because we've already broken binary compatibility since the last
release.
2019-06-22 10:13:27 -04:00
Jay Berkenbilt
79f6b4823b Convert remaining public classes to use Members pattern
Have classes contain only a single private member of type
PointerHolder<Members>. This makes it safe to change the structure of
the Members class without breaking binary compatibility. Many of the
classes already follow this pattern quite successfully. This brings in
the rest of the class that are part of the public API.
2019-06-22 10:13:27 -04:00
Jay Berkenbilt
45dac410b5 Remove broken QPDFTokenizer::expectInlineImage 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
25dd3c6750 Remove QPDF::copyForeignObject with unused parameter 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
c6cfd64503 Rename QUtil::strcasecmp to QUtil::str_compare_nocase (fixes #242) 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
848351f1fc Add missing #include <cstring> 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
b07ad6794e Fix bugs found by fuzz tests
* Several assertions in linearization were not always true; change
  them to run time errors
* Handle a few cases of uninitialized objects
* Handle pages with no contents when doing form operations
* Handle invalid page tree nodes when traversing pages
2019-06-21 17:56:24 -04:00
Jay Berkenbilt
a35d4ce9cc Fix bounds error in utf16_to_utf8 conversion 2019-06-21 17:40:24 -04:00
Jay Berkenbilt
63a643a3c7 Remove implicit conversion from int/pointer to bool
This fixes cases of warning C4800 from msvc
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
d71f05ca07 Fix sign and conversion warnings (major)
This makes all integer type conversions that have potential data loss
explicit with calls that do range checks and raise an exception. After
this commit, qpdf builds with no warnings when -Wsign-conversion
-Wconversion is used with gcc or clang or when -W3 -Wd4800 is used
with MSVC. This significantly reduces the likelihood of potential
crashes from bogus integer values.

There are some parts of the code that take int when they should take
size_t or an offset. Such places would make qpdf not support files
with more than 2^31 of something that usually wouldn't be so large. In
the event that such a file shows up and is valid, at least qpdf would
raise an error in the right spot so the issue could be legitimately
addressed rather than failing in some weird way because of a silent
overflow condition.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
f40ffc9d63 Pl_Flate: constructor's out_bufsize is now unsigned int
This is the type we need for the underlying zlib implementation.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
da30764bce Change QPDFObjectHandle::pipeStreamData's encode_flags type
Change from unsigned long to int since we pass enumerated type values
to this field.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
3608afd5c5 Add new integer accessors to QPDFObjectHandle 2019-06-21 13:17:21 -04:00
Jay Berkenbilt
42306e2ff8 QUtil: add unsigned int/string functions 2019-06-21 13:17:21 -04:00
Jay Berkenbilt
2155815234 configure: determine wordsize automatically
Based on sizeof(size_t). Assumes 64 if not 32.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
713d961990 Appearance streams: some floating point values were truncated
Bounding box X coordinates could be truncated, causing them to be off
by a fraction of a point. This was most likely not visible, but it was
still wrong.
2019-06-20 21:32:30 -04:00
Jay Berkenbilt
eb7948876b Fix problems found in fuzz corpus 2019-06-15 17:24:24 -04:00
Jay Berkenbilt
cf469d7890 Give up reading objects with too many consecutive errors 2019-06-15 08:52:19 -04:00
Jay Berkenbilt
cd830968ef Eliminate one potential integer overflow
There are more to handle, but this resolves an issue already caught by
oss-fuzz.
2019-06-15 08:52:19 -04:00
Jay Berkenbilt
31bde2f9d7 Handle empty DecodeParams array for (fixes #331)
On read, ignore /DecodeParms when empty list; on write, delete it.
Some files have been found that include an empty list for
/DecodeParms, but this is not technically compliant with the spec, and
the only sensible interpretation is to treat it as if there are no
decode parameters.
2019-06-09 17:19:49 -04:00
Jay Berkenbilt
b1a78be1a8 Prepare 8.4.2 release 2019-05-18 08:56:37 -04:00
Jay Berkenbilt
b3f0dbff62 Fix Windows memory error (fixes #330) 2019-05-16 14:26:51 -04:00
Jay Berkenbilt
a323f6f49f Prepare 8.4.1 release 2019-04-27 20:44:20 -04:00
Jay Berkenbilt
81205e007b Spell check 2019-04-21 13:09:11 -04:00
Jay Berkenbilt
011695dfdf Support Unicode in filenames (fixes #298) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
4ccb29912a Tighten isPageObject (fixes #310) 2019-04-20 21:00:43 -04:00
Thorsten Schöning
2c704b99a1 Undefined functions because of missing std:: or header. (#295)
* [bcc32 Error] QPDF.cc(375): E2268 Call to undefined function 'atof'
  Full parser context
    QPDF.cc(358): parsing: void QPDF::parse(const char *)

* [bcc32 Error] QPDFTokenizer.cc(183): E2268 Call to undefined function 'strtol'
  Full parser context
    QPDFTokenizer.cc(163): parsing: void QPDFTokenizer::resolveLiteral()

* [bcc32 Error] pdf-split-pages.cc(52): E2268 Call to undefined function 'exit'
  Full parser context
    pdf-split-pages.cc(50): parsing: void usage()

* PR #295: Including "cstdlib" should be replaced with "stdlib.h" to be more consistent. At the same time I changed the order of the surrounding includes to reflect alphabetical order, because at some files this already have been the case.
2019-03-12 10:05:29 -04:00
Thorsten Schöning
71b7ed9f4f "_setmode" and "_stricmp" are not available on Borland C++Builder, neither the classic one nor newer ones based on CLANG. 2019-03-11 16:58:55 -04:00
Jay Berkenbilt
da7c2c0ee9 Fix json serialization for {x | -1 < x < 1} (fixes #308)
JSON serialization was preserving the value as presented, but JSON
doesn't accept decimal values without a 0 before the decimal point.
2019-03-11 16:22:59 -04:00
Jay Berkenbilt
03074ca5a0 Prepare 8.4.0 release 2019-02-01 22:25:25 -05:00
Jay Berkenbilt
fec5bb124c Spell check 2019-01-31 21:41:29 -05:00
Jay Berkenbilt
eb49e07c0a Make inline image token exactly contain the image data
Do not include the trailing EI, and handle cases where EI is not
preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31 20:28:44 -05:00
Jay Berkenbilt
5211bcb5ea Externalize inline images (fixes #278) 2019-01-31 10:38:13 -05:00
Jay Berkenbilt
1eb35a355f Exclude space after ID in image data 2019-01-31 10:38:10 -05:00
Jay Berkenbilt
2b6c79bcae Improve locating inline image's EI
We've actually seen a PDF file in the wild that contained EI
surrounded by delimiters inside the image data, which confused qpdf's
naive code. This significantly improves EI detection.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
ec9e310c9e Refactor QPDFTokenizer's inline image handling
Add a version of expectInlineImage that takes an input source and
searches for EI. This is in preparation for improving the way EI is
found. This commit just refactors the code without changing the
functionality and adds tests to make sure the old and new code behave
identically.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
31372edce0 Inline image token value ends with EI, not delimiter
The inline image token erroneously included the delimiter that
followed EI. The ObjectHandle created from it was correct.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
b776dcd2d3 Clean up some private functions 2019-01-29 22:14:20 -05:00
Jay Berkenbilt
8a9cfd2605 Handle direct page objects (fixes #164) 2019-01-29 17:01:36 -05:00
Jay Berkenbilt
2d0885bc11 Clarify documentation for copyForeignObject regarding pages
Make explicit that copyForeignObject can be used on page objects and
will copy them properly but not update the pages tree.
2019-01-28 21:53:55 -05:00
Jay Berkenbilt
2712869cf9 Fix logic for when to compress object and xref streams (fixes #271) 2019-01-28 21:43:06 -05:00
Jay Berkenbilt
52f9d326a5 Resolve duplicated page objects (fixes #268)
When linearizing a file or getting the list of all pages in a file,
detect if the pages tree contains a duplicated page object and, if so,
shallow copy it. This makes it possible to have a one to one mapping
of page positions to page objects.
2019-01-28 20:29:58 -05:00
Jay Berkenbilt
623f5b664e Convert pages to form XObjects
Support conversion of pages to form XObjects and placement of form
XObjects on pages.
2019-01-27 07:50:30 -05:00
Jay Berkenbilt
68ccd87c9e Move rectangle transformation into QPDFMatrix 2019-01-27 07:50:30 -05:00
Jay Berkenbilt
8cb245739c Add QPDFObjectHandle::getUniqueResourceName 2019-01-27 07:50:30 -05:00
Jay Berkenbilt
009767d97a Handle inheritable page attributes
Add getAttribute for handling inheritable page attributes, and fix
getPageImages and annotation flattening code to use it.
2019-01-25 22:30:05 -05:00
Jay Berkenbilt
2d32f4db8f Handle fallback font size in text appearances
If we end up using our fallback font size when generating appearances
for text fields, reflect that in the Tf operator used in the
appearance stream.
2019-01-21 07:38:21 -05:00
Jay Berkenbilt
9cb599875b Improve text objects used in text appearance streams 2019-01-20 23:05:58 -05:00
Jay Berkenbilt
930eade6d3 Fix omissions in text appearance generation
When generating appearance streams for variable text annotations,
properly handle the cases of there being no appearance dictionary, no
appearance stream, or an appearance stream with no BMC..EMC marker.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt
65ef0bf313 When flattening, remove annotations with no appearance stream
With the exception of form field annotations when /NeedAppearances is
true, remove annotations that don't have appearance streams when
flattening. There is no reason to keep these when flattening since
they are invisible. This may include unchecked checkboxes, unshown
popup windows, etc.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt
c18ee440a3 mingw workaround for QPDFExc destructor
mingw doesn't like it when you don't inline empty virtual destructors.
2019-01-19 10:14:07 -05:00
Jay Berkenbilt
e87d149918 Add QUtil::possible_repaired_encodings 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
6ec22f117d Modernize encryption API for more granularity
Setting encryption permissions for R >= 3 set permission bits in
groups corresponding to menu options in Acrobat 5. The new API allows
the bits to be set individually.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt
4630377731 Add status-reporting transcoders to QUtil 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
8f389f14c0 QUtil::analyze_encoding 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
6817ca585a Bidirectional transcoding for win, mac, pdf, utf8, utf16 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
698485468a Move remaining existing transcoding to QUtil 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
5cfcd4f361 Additional checks for unreferenced resources
Explicitly abandon removal of unreferenced resources if there are any
lexical errors in the page's contents. This case always generated a
warning, but it now also prevents removal of unreferenced resources,
this strongly decreasing the likelihood of data loss.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt
4bc434000c Copy subdictionaries when removing resources (fixes #276)
When removing unreferenced resources, the code was copying the overall
resource dictionaries but not the subdictionaries being modified. This
was a "typo" in the code -- the comment clearly stated the need to do
this, but the code replaced the dictionary with itself rather than
with a shallow copy of itself.
2019-01-17 09:40:05 -05:00
Jay Berkenbilt
654c0e8caf Allow adding the same page more than once in --pages (fixes #272) 2019-01-12 10:01:47 -05:00
Jay Berkenbilt
4ecd1df6f2 Add configure option AVOID_WINDOWS_HANDLE
If set, we avoid using Windows I/O HANDLE, which is disallowed in some
versions of the Windows SDK, such as for Windows phones.
QUtil::same_file will always return false in this case. Only applies
to Windows builds.
2019-01-10 22:35:08 -05:00
Jay Berkenbilt
d24a120c7f Add QPDF::setImmediateCopyFrom 2019-01-10 22:35:08 -05:00
Jay Berkenbilt
b653929c93 Update version to 8.3.0 2019-01-07 11:16:54 -05:00
Jay Berkenbilt
aa602fd107 Fix integer overflow in large file test 2019-01-07 08:49:14 -05:00
Jay Berkenbilt
c3cee5f154 Exercise out of scope original pdf for copyForeignObject 2019-01-07 07:38:03 -05:00
Jay Berkenbilt
fddbcab0e7 Mostly don't require original QPDF for copyForeignObject (fixes #219)
The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt
fbbb0ee016 Make a static version of QPDF::pipeStreamData
This is in preparation of being able to pipe a stream's data without
keeping a copy of its containing qpdf object.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt
7588cac295 Create an application-scope unique ID for each QPDF object
Use this instead of QPDF* as a map key for object_copiers.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt
e27ac682e0 Move encryption parameters into a class 2019-01-06 09:58:16 -05:00
Jay Berkenbilt
a70fbaaf50 Honor other base encodings when generating appearances 2019-01-05 23:01:59 -05:00
Jay Berkenbilt
b341d742db Add WinAnsi and MacRoman encoding 2019-01-05 23:01:44 -05:00
Jay Berkenbilt
3ef1b77304 Refactor QUtil::utf8_to_ascii 2019-01-05 22:59:29 -05:00
Jay Berkenbilt
089ce5902e Move utf8_to_utf16 into QUtil 2019-01-05 22:59:27 -05:00
Jay Berkenbilt
ae18bfd142 Refactor string transcoding in QPDF_String 2019-01-05 22:56:58 -05:00
Jay Berkenbilt
2e342ee5bb Spell check 2019-01-04 21:33:14 -05:00
Jay Berkenbilt
16fd6e64f9 Add QPDFWriter::getFinalVersion (fixes #266) 2019-01-04 12:37:22 -05:00
Jay Berkenbilt
837dcf8fc2 Don't call assert while checking linearization data (fixes #209, #231)
Instead of calling assert for problems found during checking
linearization data, throw an exception which is later caught and
issued as an error. Ideally we would handle errors more robustly, but
this is still a significant improvement.
2019-01-04 11:55:42 -05:00
Jay Berkenbilt
a01359189b Fix dangling references (fixes #240)
On certain operations, such as iterating through all objects and
adding new indirect objects, walk through the entire object structure
and explicitly resolve any indirect references to non-existent
objects. That prevents new objects from springing into existence and
causing the previously dangling references to point to them.
2019-01-04 10:29:29 -05:00
Jay Berkenbilt
158156d506 Add basic appearance stream generation 2019-01-04 08:00:19 -05:00
Jay Berkenbilt
02281632cc Add QUtil::utf8_to_ascii 2019-01-03 23:18:13 -05:00
Jay Berkenbilt
b55567a0fa Add special case setV code for button fields 2019-01-03 23:18:13 -05:00
Jay Berkenbilt
e3144ac417 Add form fields to json output
Also add some additional methods for detecting form field types to
assist in the json creation and for later use.
2019-01-03 23:18:13 -05:00
Jay Berkenbilt
ca94ac68d9 Honor flags when flattening annotations 2019-01-03 11:59:55 -05:00
Jay Berkenbilt
06d6438ddf Minor fixes 2019-01-03 09:17:43 -05:00
Jay Berkenbilt
3e74916c5a Fix seg fault on empty xref stream (fixes #263)
Thanks to @p-cher for supplying a patch.
2019-01-03 09:17:43 -05:00
Jay Berkenbilt
f78ea057ca Switch annotation flattening to use the form xobjects
Instead of directly putting the contents of the annotation appearance
streams into the page's content stream, add commands to render the
form xobjects directly. This is a more robust way to do it than the
original solution as it works properly with patterns and avoids
problems with resource name clashes between the pages and the form
xobjects.
2019-01-02 21:49:47 -05:00
Jay Berkenbilt
3b8ce4f12a Annotation flattening including form fields
Flatten annotations by integrating their appearance streams into the
content stream of the containing page. In the case of form fields,
only flatten if /NeedAppearance is false (or equivalently absent). If
flattening form fields, also remove /AcroForm from the document
catalog.
2019-01-01 08:14:15 -05:00
Jay Berkenbilt
95d6b17a89 Add QPDFObjectHandle::mergeDictionary() 2019-01-01 08:12:56 -05:00
Jay Berkenbilt
104fd6da52 Add matrix and annotation appearance stream handling
Generate page content fragment for rendering appearance streams
including all matrix calculation.
2019-01-01 08:07:21 -05:00
Jay Berkenbilt
5059ec0d35 Add Matrix class under QPDFObjectHandle 2018-12-31 23:02:43 -05:00
Jay Berkenbilt
daeb5a85b6 Transformation matrix 2018-12-31 18:23:47 -05:00
Jay Berkenbilt
3440ea7d3c JSON::serialize -> unparse
Unparse is admittedly strange, but I'd rather be strange and
consistent, and everything else in the qpdf library uses unparse to
serialize. (If you're reading this, the convention of using "unparse"
comes from the "clu" programming language.)
2018-12-25 11:52:21 -05:00
Jay Berkenbilt
fa3664357b Move numrange code from qpdf.cc to QUtil.cc
Also move tests to libtests.
2018-12-21 19:11:57 -05:00
Jay Berkenbilt
d5d179f441 Add document and object helpers for outlines (bookmarks) 2018-12-21 19:11:57 -05:00
Jay Berkenbilt
30a0c070e4 Add QPDFObjectHandle::getJSON() 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
651179b5da Add simple JSON serializer 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
0776c00129 Add QPDFNameTreeObjectHelper 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
cc500eda91 Minor cleanup 2018-12-21 17:25:31 -05:00
Jay Berkenbilt
6ef9e31233 Add QPDFPageLabelDocumentHelper 2018-12-18 16:59:24 -05:00
Jay Berkenbilt
f38df27aa3 Add QPDFNumberTreeObjectHelper 2018-12-18 16:46:10 -05:00
Jay Berkenbilt
077d3d4512 Add QPDFObjectHandle::wrapInArray()
Wrap an object in an array if it is not already an array.
2018-12-18 16:45:48 -05:00
Jay Berkenbilt
d1368a3851 Commit automatically generated files 2018-10-11 17:27:54 -04:00
Jay Berkenbilt
6ee761fc86 Prepare 8.2.1 release 2018-08-18 10:56:19 -04:00
Jay Berkenbilt
5e9e17e62a Prepare 8.2.0 release 2018-08-16 11:53:10 -04:00
Jay Berkenbilt
693cdaac35 Missing header for std::max 2018-08-16 11:53:10 -04:00
Jay Berkenbilt
b4ce557be5 Fix error in QPDFSystemError.cc 2018-08-14 11:39:07 -04:00
Jay Berkenbilt
b4bdc42b4f New exception class QPDFSystemError (fixes #221) 2018-08-13 20:01:51 -04:00
Jay Berkenbilt
5d9d80beba Fix fallback logic for encryption (fixes #229) 2018-08-12 22:32:40 -04:00
Jay Berkenbilt
60fe8061cb Fix one more identifier (fixes #236) 2018-08-12 22:01:51 -04:00
Jay Berkenbilt
a2f62935b3 Catch exceptions as const references (fixes #236)
This fix allows qpdf to compile/test cleanly with gcc 8.
2018-08-12 21:57:52 -04:00
Jay Berkenbilt
3d6615b276 Pl_Buffer: reduce memory growth (fixes #228)
Rather than keeping a list of buffers for every write, accumulate
bytes in a single buffer, doubling the size of the buffer when needed
to accommodate new data.

This is not the best possible implementation, but the change was
implemented in this way to avoid changing the shape of Pl_Buffer and
thus breaking backward compatibility.
2018-08-12 17:45:43 -04:00
Jay Berkenbilt
3873f5fd9b Protect headers with compliant identifiers (fixes #233) 2018-08-12 14:10:32 -04:00
Jay Berkenbilt
932799baab Fix memory access error
A previous fix introduced a potentially memory overrun under certain
rare conditions. The test suite now once again passes with address
sanitizer.
2018-08-12 13:16:17 -04:00
Jay Berkenbilt
b6e414b10b Remove some extraneous null pointer checks (fixes #234)
There were a few places in the code that were checking that a pointer
wasn't null before deleting it, even though C++ has always allowed
delete 0. Most of the code did not perform these checks.
2018-08-12 12:58:39 -04:00
Jay Berkenbilt
4a4736c695 Fix EOL handling inside strings (fixes #226)
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is
to be ignored after backslash.
2018-08-05 20:48:35 -04:00
Jay Berkenbilt
1619cad1e8 Return correct method for string encryption (fixes #227) 2018-08-05 16:58:21 -04:00
Jay Berkenbilt
e1cd5891af Fix infinite loop on small files with progress reporting (fixes #230)
Turns out you can keep adding zero to a number over and over again and
it just doesn't get any bigger. Who would have known?
2018-08-05 15:43:34 -04:00
Jay Berkenbilt
4f4c627b77 ClosedFileInputSource: add method to keep file open
During periods of intensive operation on a specific file, this method
can reduce the overhead of repeated open/close operations.
2018-08-04 19:52:46 -04:00
Jay Berkenbilt
1bd2a2e79b Prepare 8.1.0 release 2018-06-23 07:50:11 -04:00
Jay Berkenbilt
3aad28aed0 Bug fix: honor encryption key length with R=3 (fixes #212) 2018-06-22 19:24:26 -04:00
Jay Berkenbilt
a433ed24f9 Add progress reporting for QPDFWriter (fixes #200) 2018-06-22 16:14:54 -04:00
Jay Berkenbilt
2a82f6e1e0 Add method to get count of objects in QPDF 2018-06-22 15:53:40 -04:00
Jay Berkenbilt
c81836076f Correct incorrect comment 2018-06-22 13:13:09 -04:00
Jay Berkenbilt
4ccc8b1a44 Add ClosedFileInputSource
ClosedFileInputSource is an input source that keeps the file closed
when not reading it.
2018-06-22 12:52:45 -04:00
Jay Berkenbilt
c71dc6888c Don't prune resource dictionaries on errors or by request
If we are unable to filter a page's content streams, don't attempt to
remove objects from the page's resource dictionary. Also provide a
command line option to suppress resource removal in case we ever need
this as a workaround for some bug or broken PDF files.
2018-06-22 10:45:31 -04:00
Jay Berkenbilt
38c9ed23c3 Treat content stream parsing errors as an error, not a warning
If parsing content streams is treated as a warning, there is no way
for a caller to know if a parsing operation has failed. This is very
dangerous and will likely result in data loss when token filters are
parser callbacks are in use.
2018-06-22 10:44:08 -04:00
Jay Berkenbilt
6c89d4b35b When splitting files, remove unreferenced objects (fixes #203) 2018-06-21 21:03:30 -04:00
Jay Berkenbilt
ddd78c1b7f Fix QPDFObjectHandle::shallowCopy
It's not really a shallow copy. It just doesn't cross indirect object
boundaries. The old implementation had a bug that would cause multiple
shallow copies of the same object to share memory, which was not the
intention.
2018-06-21 20:34:45 -04:00
Jay Berkenbilt
397b097c46 Allow setting a form field's value 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
952a665a4e Better support for creating Unicode strings 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
e44c395c51 QUtil::toUTF16 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
0b05111db8 Implement helper class for interactive forms 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
2e7ee23bf6 Add QPDFPageDocumentHelper and QPDFPageObjectHelper
This is the beginning of higher-level API support using helper
classes. The goal is to be able to add more helpers without continuing
to pollute QPDF's and QPDFObjectHandle's public interfaces.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt
4cded10821 Add QPDFObjectHandle::Rectangle type
Provide a convenient way of accessing rectangles.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt
078cf9bf90 newline before endstream fix for object streams (fixes #205) 2018-05-12 13:17:43 -04:00
Jay Berkenbilt
15ed9f8565 Fix small logic error in Token construct (fixes #206)
The special case around name token was not reachable. This would only
affect constructors of name tokens that were represented in
non-canonical form such as with a hex substitution for a printable
character. The error was harmless but still a bug.
2018-05-05 17:47:56 -04:00
Jay Berkenbilt
b4d6cf6836 Limit depth of nesting in direct objects (fixes #202)
This fixes CVE-2018-9918.
2018-04-15 16:11:22 -04:00
Jay Berkenbilt
f8c8e4dcc0 Prepare 8.0.2 release 2018-03-06 11:34:07 -05:00
Jay Berkenbilt
e4e2e26d99 Properly handle pages with no contents (fixes #194)
Remove calls to assertPageObject(). All cases in the library that
called assertPageObject() work fine if you don't call
assertPageObject() because nothing assumes anything that was being
checked by that call. Removing the calls enables more files to be
successfully processed.
2018-03-06 11:34:07 -05:00
Jay Berkenbilt
1a4dcb4aaf Pl_Buffer starts in a ready state 2018-03-06 11:31:03 -05:00
Jay Berkenbilt
ee44aef8d0 Treat loop in xref tables as damage (fixes #192)
Prior to this fix, if there was a loop detected in following /Prev
pointers in xref streams/tables, it would cause qpdf to lose data.
Note that this condition causes many PDF readers to hang or fail.
2018-03-05 14:26:58 -05:00
Jay Berkenbilt
6fe1e9de40 Prepare 8.0.1 release 2018-03-04 07:16:20 -05:00
Jay Berkenbilt
7b9f23a99a Ignore zlib data check errors (fixes #191) 2018-03-03 11:35:01 -05:00
Jay Berkenbilt
3e8b643ae3 Release 8.0.0 2018-02-25 16:00:11 -05:00
Jay Berkenbilt
111ec50950 8.0.rc3 2018-02-25 14:17:59 -05:00
Jay Berkenbilt
d3d3970cf6 8.0.rc2 2018-02-25 13:50:22 -05:00
Jay Berkenbilt
a16d703f4d Update version to 8.0.rc1
This is for testing the release process, particularly as it pertains
to AppImage creation.
2018-02-25 09:03:27 -05:00
Jay Berkenbilt
82cae01a76 Bump version number and soname
Bump to an alpha release. This version is not being widely released
but is being used to push the new shared library version through the
debian packaging system and to test out github releases.
2018-02-20 21:31:38 -05:00
Jay Berkenbilt
4bb3046f0b Properly handle strings with PDF Doc Encoding (fixes #179)
The QPDF_String::getUTF8Val() method was not treating strings that
weren't explicitly Unicode as PDF Doc Encoded. This only affects
characters in the range 0x80 through 0xa0.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
2780a1871d Add C API for checking PDF files 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
c2e16827b6 Replace "file position" with "offset" in error messages
Sometimes it's an offset in an object stream or a content stream, so
file position is confusing in some cases.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
52e024f701 Include omitted object description in error message 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
cb3b705cf9 Include filename in object stream parse error 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
21b7481b0e Push members of QPDFObjectHandle into a Members object
As in other cases, this is to enable adding new member variables in
the future without breaking ABI compatibility.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
e410b0fe0d Simplify TokenFilter interface
Expose Pl_QPDFTokenizer, and have it do more of the work of managing
the token filter's pipeline.
2018-02-18 21:05:47 -05:00
Jay Berkenbilt
1fdd86a049 Move Pl_QPDFTokenizer to public interface 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
5708b5d0aa Add additional interface for filtering page contents 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
fd02944e19 Clean up comment 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
25988e8d10 Bug fix: content normalizer should not add trailing newline
Adding a trailing newline in content normalization damages files whose
contents are split across streams in the middle of tokens. Let
QPDFWriter add the newline with the indicator to ignore the newline,
which it already does. This changes the way some qdf files look.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
05ff619b09 Remove redundant method
Remove a redundant method that was equal to another one with
additional arguments. This breaks binary compatibility, but there are
other ABI breaking changes in the upcoming release, so now is the time
to do it.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
55ee55394c Use inline image token in content parser 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
ba453ba4ff Use space tokens in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
ec538792fa Use inline image token type in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fefe25030e Inline image token type 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
2699ecf13e Push QPDFTokenizer members into a nested structure
This is for protection against future ABI breaking changes.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
d97474868d Lexer enhancements: EOF, comment, space
Significant enhancements to the lexer to improve EOF handling and to
support comments and spaces as tokens. Various other minor issues were
fixed as well.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
ebd5ed63de Add option to save pass 1 of lineariziation
This is useful only for debugging the linearization code.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
2ebdd6929e Prepare 7.1.1 release 2018-02-04 18:31:42 -05:00
Jay Berkenbilt
e3167c1a60 Fix linearization for files with nonstandard ID length 2018-02-04 18:16:23 -05:00
Jay Berkenbilt
3b2a3cdd77 Fix setLineBuf for bsd (fixes #177)
Use 0 instead of NULL in a cast.
2018-02-04 14:19:00 -05:00
Jay Berkenbilt
d5bfd49cb2 Remove use of std::abs (fixes #172)
Different compilers want different choices of headers for std::abs.
It's easier to just to not use it.
2018-02-04 14:19:00 -05:00
Jay Berkenbilt
34a9b835b0 Fix indentation 2018-02-04 14:19:00 -05:00
Jay Berkenbilt
7e5e1a7158 Fix offset in error message 2018-02-04 14:19:00 -05:00
Jay Berkenbilt
633fb414af Pl_QPDFTokenizer: Use unsigned_char_pointer instead of copy 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
13d9756a45 Minor fixes to tokenizer 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
2e4ca7ecf4 Update version numbers for 7.1.0 2018-01-14 20:09:20 -05:00
Jay Berkenbilt
04e47deaf9 Fixes for clang 2018-01-14 19:18:04 -05:00
Jay Berkenbilt
569d74d36b Allow raw encryption key to be specified
Add options to enable the raw encryption key to be directly shown or
specified. Thanks to Didier Stevens <didier.stevens@gmail.com> for the
idea and contribution of one implementation of this idea.
2018-01-14 10:21:05 -05:00
Jay Berkenbilt
3e306ae64c Add QUtil::hex_decode 2018-01-14 09:04:13 -05:00
Jay Berkenbilt
791e0db762 Allow trailing . in numeric token (fixes #165) 2018-01-13 20:05:40 -05:00
Jay Berkenbilt
ec0087e3ce Support TIFF Predictor (fixes #171) 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
53971d50be Add Pl_TIFFPredictor 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
d9c9049708 Add signed support to BitStream and BitWriter 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
661ed1d28e Minor fixes to Pl_PNGFilter
Fix comment, remove restriction that doesn't actually matter.
2018-01-13 19:49:42 -05:00
Jay Berkenbilt
be27d47bdc Use better error for getStreamData failure
If the stream isn't filterable but we call getStreamData, throw a
regular exception instead of a logic error so that normal error
handling and reporting mechanisms will be used.
2018-01-13 19:49:42 -05:00
Jay Berkenbilt
4edfe1f41d Add tests for new PNG filters 2017-12-25 18:20:52 -05:00
Jay Berkenbilt
a3a55be9cd Correct errors in PNG filters and make use from library 2017-12-25 14:24:48 -05:00
Casey Rojas
9a48720246 Initial implementation of other PNG decode filters
Initial implementation provided by Casey Rojas <crojas@infotechfl.com>
Some problems are fixed in a subsequent commit.
2017-12-24 22:59:51 -05:00
Jay Berkenbilt
0f1ce8e646 Prepare 7.0.0 release 2017-09-16 13:22:15 -04:00
Jay Berkenbilt
249e95f608 Fix test failure on MSVC 2017-09-15 23:09:04 -04:00
Jay Berkenbilt
6898bc8d98 Spell check 2017-09-15 23:09:04 -04:00
Jay Berkenbilt
f2ffb6968a Fix Windows compilation errors 2017-09-15 21:44:57 -04:00
Jay Berkenbilt
d31a7b76e7 Improve message for stream decoding error
Tweak the message so that we inform the user that we are mitigating
data loss.
2017-09-12 16:03:48 -04:00
Jay Berkenbilt
eaacf94005 Update C API with new QPDFWriter methods 2017-09-12 14:30:39 -04:00
Jay Berkenbilt
40ecba4172 Pl_DCT: Use custom source and destination managers (fixes #153)
Avoid calling jpeg_mem_src and jpeg_mem_dest. The custom destination
manager writes to the pipeline in smaller chunks to avoid having the
whole image in memory at once. The source manager works directly with
the Buffer object. Using customer managers avoids use of memory source
and destination managers, which are not present in older versions of
libjpeg still in use by some Linux distributions.
2017-09-07 22:59:11 -04:00
Jay Berkenbilt
3ef1be9783 PNGFilter: Better range checking for columns 2017-08-31 07:26:58 -04:00
Jay Berkenbilt
1868a10f8b Replace all atoi calls with QUtil::string_to_int
The latter catches underflow/overflow.
2017-08-29 12:28:32 -04:00
Jay Berkenbilt
742190bd98 Pl_PNGFilter: disallow columns = 0 2017-08-29 12:28:32 -04:00
Jay Berkenbilt
6d46346eb9 Detect integer overflow/underflow 2017-08-29 12:28:32 -04:00
Jay Berkenbilt
e999bbae43 Fix memory leak with bad jpeg data 2017-08-28 22:16:45 -04:00
Jay Berkenbilt
c6872d2c70 Clean up circular references in QPDF_Stream 2017-08-28 22:16:31 -04:00
Jay Berkenbilt
728dc9e6d8 Fix error caught by clang 2017-08-26 21:51:17 -04:00
Jay Berkenbilt
dea704f0ab Pad keys to avoid memory errors (fixes #147) 2017-08-26 21:35:59 -04:00
Jay Berkenbilt
021c229331 Fix Pl_Flate memory leak on error (fixes #148) 2017-08-25 22:26:53 -04:00
Jay Berkenbilt
ad527a64f9 Parse iteratively to avoid stack overflow (fixes #146) 2017-08-25 21:56:45 -04:00
Jay Berkenbilt
85f05cc57f Detect xref pointer infinite loop (fixes #149) 2017-08-25 19:58:31 -04:00
Jay Berkenbilt
1e52d33822 Bump soname to 18 and version to 7.0.b1 2017-08-22 16:50:48 -04:00
Jay Berkenbilt
e452d9dca6 Spell check 2017-08-22 14:22:20 -04:00
Jay Berkenbilt
6219111ed7 Update references to README files
Most of the README files have been renamed. Refer to the new names.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
83ec09f66c Do memory checks
Slightly improve memory cleanup in Pl_DCT
Make it easier to test with valgrind
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
fabff0f3ec Limit token length during xref recovery
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
caf5e39c2e Fix compiler warnings for clang/mac OS X 2017-08-22 14:13:10 -04:00
Jay Berkenbilt
6884ad2ead Fix logic error in recovery
A stray semicolon caused a condition to be incorrectly applied during
stream length recovery.
2017-08-22 07:19:41 -04:00
Jay Berkenbilt
ce435222b2 Push QPDFWriter member variables into a nested class 2017-08-21 22:04:07 -04:00
Jay Berkenbilt
a8c93bd324 Push QPDF member variables into a nested class
Pushing member variables into a nested class enables addition of new
member variables without breaking binary compatibility.
2017-08-21 21:35:11 -04:00
Jay Berkenbilt
198856a825 Improve pclm parameter settings 2017-08-21 21:05:48 -04:00
Jay Berkenbilt
8ab52fa558 Combine writePCLm with writeStandard
Reduce code duplication
2017-08-21 21:05:48 -04:00
Jay Berkenbilt
9f60a864a0 Combine PCLm header into writeHeader 2017-08-21 21:05:47 -04:00
Jay Berkenbilt
adbcfcff2d Remove duplicated coverage cases
Remove duplicated coverage cases from Sahil's code so existing test
suite passes.
2017-08-21 18:55:02 -04:00
Sahil Arora
b19210fa7d QPDFWriter: Add setPCLm() and writePCLm() methods
* Add support for PCLm using setPCLm() and writePCLm() methods in
  QPDFWriter.hh and QPDFWriter.cc
* Add a function writePCLmHeader() for PCLm header in QPDFWriter
2017-08-21 18:55:02 -04:00
Jay Berkenbilt
ddc6cf0cf6 Precheck streams by default
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
ae90d2c485 Implement Pl_DCT pipeline
Additional testing is added in later commits to be supported by
additional changes in the library.
2017-08-21 17:44:02 -04:00
Jay Berkenbilt
2d2f619665 Implement Pl_RunLength pipeline 2017-08-19 14:50:55 -04:00
Jay Berkenbilt
cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt
8249a26d69 Fix infinite loop in QPDFWriter (fixes #143) 2017-08-12 08:36:36 -04:00
Jay Berkenbilt
36b3fe5af7 Fix --newline-before-endstream option (fixes #133)
Add a newline unconditionally before endstream even if a newline was
already written as part of the stream data.
2017-08-11 20:57:05 -04:00
Jay Berkenbilt
46611f0710 Prevent a division by zero error (fixes #141)
Bad /W in an xref stream could cause a division by zero error. Now
this is handled as a special case.
2017-08-11 20:11:19 -04:00
Jay Berkenbilt
8fe0b06cd8 Pad encryption parameters that are too short (fixes #96) 2017-08-11 19:53:56 -04:00
Jay Berkenbilt
e7d0019bf4 Generate libqpdf.map from autoconf
Rather than checking consistency of libqpdf.map, generate it.
2017-08-11 04:56:22 -04:00
Jay Berkenbilt
6247aaa57c Fix libqpdf.map and prevent future breakage
The build now checks to make sure libqpdf.map has the right library
version number in it.
2017-08-10 21:53:19 -04:00
Jay Berkenbilt
9a96e233b0 Remove PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
98a843c2a2 Reconstruct xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ca5b1d267a Improve stream length recovery
Eliminate PCRE and find endobj not preceded by endstream. Be more lax
about placement of endstream and endobj.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
3082e4e606 Find xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
1765c6ec20 Find header without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
296b679d6e Implement findFirst and findLast in InputSource
Preparing to refactor some pattern searching code to use these instead
of their own memchr loops. This should simplify the code that replaces
PCRE.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ef8ae5449d Allow QPDFTokenizer::readToken to return bad tokens
Sometimes we want to ignore bad tokens rather than having them throw
an exception. A coverage case is commented out here and added in a
later commit.
2017-08-10 19:01:41 -04:00
Jay Berkenbilt
8fe261d8b4 QUtil::strcasecmp 2017-08-05 10:22:33 -04:00
Pranjal Bhor
6f88fd36ab Include missing header in QPDFTokenizer.cc (fixes #125)
Required for strtol()
2017-07-30 08:47:05 -04:00
Jay Berkenbilt
2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt
5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt
570db9b60b Catch more exceptions while resolving objects 2017-07-29 19:31:12 -04:00
Jay Berkenbilt
b43a0ac237 When recover stream length, indicate the length (fixes #44) 2017-07-29 19:15:06 -04:00