Commit Graph

663 Commits

Author SHA1 Message Date
Jay Berkenbilt ed62be888c Fix --completion-* args to work from AppImage (fixes #285) 2019-06-22 17:12:01 -04:00
Jay Berkenbilt 85a3f95a89 qpdf: exit 3 for linearization warnings without errors (fixes #50) 2019-06-22 16:57:51 -04:00
Jay Berkenbilt 1bde5c68a3 Add QUtil::read_file_into_memory
This code was essentially duplicated between test_driver and
standalone_fuzz_target_runner.
2019-06-22 10:14:25 -04:00
Jay Berkenbilt 45dac410b5 Remove broken QPDFTokenizer::expectInlineImage 2019-06-21 22:29:31 -04:00
Jay Berkenbilt c6cfd64503 Rename QUtil::strcasecmp to QUtil::str_compare_nocase (fixes #242) 2019-06-21 22:29:31 -04:00
Jay Berkenbilt b07ad6794e Fix bugs found by fuzz tests
* Several assertions in linearization were not always true; change
  them to run time errors
* Handle a few cases of uninitialized objects
* Handle pages with no contents when doing form operations
* Handle invalid page tree nodes when traversing pages
2019-06-21 17:56:24 -04:00
Jay Berkenbilt ed7f2a6c76 Add smaller image streams file for testing 2019-06-21 17:39:53 -04:00
Jay Berkenbilt d71f05ca07 Fix sign and conversion warnings (major)
This makes all integer type conversions that have potential data loss
explicit with calls that do range checks and raise an exception. After
this commit, qpdf builds with no warnings when -Wsign-conversion
-Wconversion is used with gcc or clang or when -W3 -Wd4800 is used
with MSVC. This significantly reduces the likelihood of potential
crashes from bogus integer values.

There are some parts of the code that take int when they should take
size_t or an offset. Such places would make qpdf not support files
with more than 2^31 of something that usually wouldn't be so large. In
the event that such a file shows up and is valid, at least qpdf would
raise an error in the right spot so the issue could be legitimately
addressed rather than failing in some weird way because of a silent
overflow condition.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt 3608afd5c5 Add new integer accessors to QPDFObjectHandle 2019-06-21 13:17:21 -04:00
Jay Berkenbilt 713d961990 Appearance streams: some floating point values were truncated
Bounding box X coordinates could be truncated, causing them to be off
by a fraction of a point. This was most likely not visible, but it was
still wrong.
2019-06-20 21:32:30 -04:00
Jay Berkenbilt bcfa407912 As a test suite, run stand-alone fuzzer on seed corpus
Temporarily skip fuzz tests on Windows. There are Windows-specific
failures to address later.
2019-06-15 17:24:24 -04:00
Jay Berkenbilt 320702c086 Add test files from oss-fuzz bugs (fixes #335) 2019-06-15 17:24:24 -04:00
Jay Berkenbilt cf469d7890 Give up reading objects with too many consecutive errors 2019-06-15 08:52:19 -04:00
Jay Berkenbilt 3a180a0591 Commit forgotten test files 2019-06-09 18:11:37 -04:00
Jay Berkenbilt 31bde2f9d7 Handle empty DecodeParams array for (fixes #331)
On read, ignore /DecodeParms when empty list; on write, delete it.
Some files have been found that include an empty list for
/DecodeParms, but this is not technically compliant with the spec, and
the only sensible interpretation is to treat it as if there are no
decode parameters.
2019-06-09 17:19:49 -04:00
Jay Berkenbilt b1a78be1a8 Prepare 8.4.2 release 2019-05-18 08:56:37 -04:00
Jay Berkenbilt a323f6f49f Prepare 8.4.1 release 2019-04-27 20:44:20 -04:00
Jay Berkenbilt 03e27709f3 Improve Unicode filename testing
Remove dependency on the behavior of perl for reliable creation of
Unicode file names on Windows.
2019-04-27 20:37:33 -04:00
Jay Berkenbilt 7ff234a92f Remove stray comment 2019-04-27 20:37:33 -04:00
Jay Berkenbilt 7db5bc289b Fix typo 2019-04-22 09:37:23 -04:00
Jay Berkenbilt 12b159118a Compare versions between CLI and library 2019-04-20 21:00:43 -04:00
Jay Berkenbilt 2b011f9d81 Add --remove-page-labels option (fixes #317) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt e50d5201df Add --keep-files-open-threshold (fixes #288) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt 011695dfdf Support Unicode in filenames (fixes #298) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt 131a21d36f Document that linearize disables qdf (fixes #312) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt a5a016cdd2 Revert preservations of outlines with --split-pages
The preservation of outlines didn't provide very useful behavior
anyway as it copied all outlines but most didn't work. This
implementation also caused a very significant performance hit and so
is being reverted until a proper solution can be coded. The eventual
solution will not be compatible with the reverted solution anyway, so
it's best not to leave this in.
2019-04-20 21:00:43 -04:00
Thorsten Schöning af42fe9daf Don't open more than 50 files.
Embarcadero C++Builder doesn't support more than 50 files open at the same time for legacy 32 Bit apps, which makes a test fail trying to open more than that many files. This changes the number of open files for that test to far less to make the test succeed. Alternatively one could reduce the hard coded number of 200 in QPDF itself, which I didn't do currently because it needs adoption of manuals etc. and is something which needs to be discussed with the author of QPDF. I guess chances are better to get the test changed upstream.

This fixes #288: https://github.com/qpdf/qpdf/issues/288
2019-03-11 17:14:22 -04:00
Jay Berkenbilt 62baad2264
Merge pull request #294 from ams-tschoening/two_ops_same_val
Two operands must evaluate to the same value.
2019-03-11 16:59:42 -04:00
Thorsten Schöning de5c91f324 [bcc32 Error] test_driver.cc(1634): E2354 Two operands must evaluate to the same type
Full parser context
    test_driver.cc(208): parsing: void runtest(int,const char *,const char *)
2019-02-14 19:47:30 +01:00
Thorsten Schöning 2e7f81452f [bcc32 Error] qpdf.cc(3837): E2354 Two operands must evaluate to the same type
Full parser context
    qpdf.cc(3803): parsing: PointerHolder<Pipeline> ImageOptimizer::makePipeline(const std::string &,Pipeline *)
2019-02-14 19:45:00 +01:00
Thorsten Schöning 27f18e0f67 The kfo-PDF files for testing need to be copied using "binmode" or Windows will introduce \r\n.
qpdf: selecting --keep-open-files=n
qpdf: processing 001-kfo.pdf
WARNING: 001-kfo.pdf: file is damaged
WARNING: 001-kfo.pdf (offset 556): xref not found
WARNING: 001-kfo.pdf: Attempting to reconstruct cross-reference table
2019-02-14 18:54:38 +01:00
Jay Berkenbilt fc2e491f74 Add test for exception handling
There have been issues reported where exceptions are not thrown
properly across shared library/DLL boundaries, so add a test
specifically to ensure that exceptions are caught as thrown.
2019-02-07 19:21:26 -05:00
Jay Berkenbilt 8acf636b4e Incorporate improved Windows fragility workaround from qtest 2019-02-01 22:25:25 -05:00
Jay Berkenbilt fec5bb124c Spell check 2019-01-31 21:41:29 -05:00
Jay Berkenbilt 1fba24aada Add another test case for weird page trees 2019-01-31 21:29:28 -05:00
Jay Berkenbilt 0a470d2daf Don't optimize non-8-bit images
Also add test cases for additional coverage on image optimization.
2019-01-31 21:29:28 -05:00
Jay Berkenbilt eb49e07c0a Make inline image token exactly contain the image data
Do not include the trailing EI, and handle cases where EI is not
preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31 20:28:44 -05:00
Jay Berkenbilt 5211bcb5ea Externalize inline images (fixes #278) 2019-01-31 10:38:13 -05:00
Jay Berkenbilt 22bcdbe786 Remove acroread from tests
This hasn't worked or been exercised in years since Adobe stopped
releasing a Linux version of reader.
2019-01-31 10:38:13 -05:00
Jay Berkenbilt 1eb35a355f Exclude space after ID in image data 2019-01-31 10:38:10 -05:00
Jay Berkenbilt 2b6c79bcae Improve locating inline image's EI
We've actually seen a PDF file in the wild that contained EI
surrounded by delimiters inside the image data, which confused qpdf's
naive code. This significantly improves EI detection.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt ec9e310c9e Refactor QPDFTokenizer's inline image handling
Add a version of expectInlineImage that takes an input source and
searches for EI. This is in preparation for improving the way EI is
found. This commit just refactors the code without changing the
functionality and adds tests to make sure the old and new code behave
identically.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt 31372edce0 Inline image token value ends with EI, not delimiter
The inline image token erroneously included the delimiter that
followed EI. The ObjectHandle created from it was correct.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt c136356378 Typo in message 2019-01-31 09:26:37 -05:00
Jay Berkenbilt 8d229e078f Improve info message in optimize images (fixes #280)
When qpdf can't optimize an image because of an unsupported color
space, state this specifically. Recognize that many valid colorspaces
are not represented as name objects.
2019-01-29 18:16:02 -05:00
Jay Berkenbilt 8a9cfd2605 Handle direct page objects (fixes #164) 2019-01-29 17:01:36 -05:00
Jay Berkenbilt 2712869cf9 Fix logic for when to compress object and xref streams (fixes #271) 2019-01-28 21:43:06 -05:00
Jay Berkenbilt 52f9d326a5 Resolve duplicated page objects (fixes #268)
When linearizing a file or getting the list of all pages in a file,
detect if the pages tree contains a duplicated page object and, if so,
shallow copy it. This makes it possible to have a one to one mapping
of page positions to page objects.
2019-01-28 20:29:58 -05:00
Jay Berkenbilt 426434c772 Add --overlay and --underlay to qpdf CLI (fixes #207) 2019-01-27 09:30:13 -05:00
Jay Berkenbilt c2ae35540e Add boundary condition test for getUniqueResourceName 2019-01-27 09:26:33 -05:00
Jay Berkenbilt 623f5b664e Convert pages to form XObjects
Support conversion of pages to form XObjects and placement of form
XObjects on pages.
2019-01-27 07:50:30 -05:00
Jay Berkenbilt 009767d97a Handle inheritable page attributes
Add getAttribute for handling inheritable page attributes, and fix
getPageImages and annotation flattening code to use it.
2019-01-25 22:30:05 -05:00
Jay Berkenbilt 2d32f4db8f Handle fallback font size in text appearances
If we end up using our fallback font size when generating appearances
for text fields, reflect that in the Tf operator used in the
appearance stream.
2019-01-21 07:38:21 -05:00
Jay Berkenbilt 9cb599875b Improve text objects used in text appearance streams 2019-01-20 23:05:58 -05:00
Jay Berkenbilt 930eade6d3 Fix omissions in text appearance generation
When generating appearance streams for variable text annotations,
properly handle the cases of there being no appearance dictionary, no
appearance stream, or an appearance stream with no BMC..EMC marker.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt 65ef0bf313 When flattening, remove annotations with no appearance stream
With the exception of form field annotations when /NeedAppearances is
true, remove annotations that don't have appearance streams when
flattening. There is no reason to keep these when flattening since
they are invisible. This may include unchecked checkboxes, unshown
popup windows, etc.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt 0a3057dc0a More testing for Unicode passwords 2019-01-19 14:16:03 -05:00
Jay Berkenbilt c2030d1f33 Implement password recovery suppression and password mode (fixes #215)
Allow fine control over how passwords are encoded for writing, and
allow password for reading to be given as a hexademical encoded
string. Allow suppression of password recovery as a means to ensure
that the password you specify is actually the right one.
2019-01-19 10:14:07 -05:00
Jay Berkenbilt 392f2ece51 Try passwords with different string encodings 2019-01-19 10:10:58 -05:00
Jay Berkenbilt e4fa5a3c2a Refactor qpdf processing
Push calls to processFile and processInputSource into separate
functions in preparation for password recovery changes
2019-01-19 10:10:58 -05:00
Jay Berkenbilt 997f4ab6cb Remove incorrect content code from test files 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 966429e718 Update CLI and manual for new encryption granularity (fixes #214) 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 6ec22f117d Modernize encryption API for more granularity
Setting encryption permissions for R >= 3 set permission bits in
groups corresponding to menu options in Acrobat 5. The new API allows
the bits to be set individually.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt 429ffcf397 Unicode main for Windows qpdf.cc 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 698485468a Move remaining existing transcoding to QUtil 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 5cfcd4f361 Additional checks for unreferenced resources
Explicitly abandon removal of unreferenced resources if there are any
lexical errors in the page's contents. This case always generated a
warning, but it now also prevents removal of unreferenced resources,
this strongly decreasing the likelihood of data loss.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt e09ae710dc Add tests for shared font/xobject
The tests are in a separate commit so the bug-fix commit can be taken
as a patch for older versions.
2019-01-17 09:44:29 -05:00
Jay Berkenbilt 654c0e8caf Allow adding the same page more than once in --pages (fixes #272) 2019-01-12 10:01:47 -05:00
Jay Berkenbilt 53d8e916b7 Interpret . in --pages as a shortcut for the primary file 2019-01-12 09:59:03 -05:00
Jay Berkenbilt d24a120c7f Add QPDF::setImmediateCopyFrom 2019-01-10 22:35:08 -05:00
Jay Berkenbilt 3472f6c984 Update copyrights for 2019 2019-01-07 07:54:55 -05:00
Jay Berkenbilt 8a5ca0e406 Don't keep QPDF objects for merging longer than needed 2019-01-07 07:38:03 -05:00
Jay Berkenbilt c3cee5f154 Exercise out of scope original pdf for copyForeignObject 2019-01-07 07:38:03 -05:00
Jay Berkenbilt fddbcab0e7 Mostly don't require original QPDF for copyForeignObject (fixes #219)
The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt fbbb0ee016 Make a static version of QPDF::pipeStreamData
This is in preparation of being able to pipe a stream's data without
keeping a copy of its containing qpdf object.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt a70fbaaf50 Honor other base encodings when generating appearances 2019-01-05 23:01:59 -05:00
Jay Berkenbilt 5c682f6d1e Fix image optimization evaluation
Don't attempt to pass data through a JPEG filter if we are unable to
filter the data.
2019-01-05 22:37:49 -05:00
Jay Berkenbilt ee437705fc Update documentation for new features 2019-01-04 21:58:22 -05:00
Jay Berkenbilt ab9f4cc212 Split help string
It was too long for some compilers.
2019-01-04 21:33:14 -05:00
Jay Berkenbilt 2e342ee5bb Spell check 2019-01-04 21:33:14 -05:00
Jay Berkenbilt ee2aad4381 Add CLI flags for image optimization 2019-01-04 21:33:14 -05:00
Jay Berkenbilt 6f3b76b6c1 Fix image-streams.pdf in test suite
Some of the images were supposed to have no filter, but somewhere
along the line, they ended up with /FlateDecode, most likely because
qpdf rewrote the file without having --compress-streams=n specified.
If this error is repeated, it will cause a test failure.
2019-01-04 20:13:56 -05:00
Jay Berkenbilt 7b6ab900dc Support page collation with --collate (fixes #259) 2019-01-04 15:13:02 -05:00
Jay Berkenbilt 16fd6e64f9 Add QPDFWriter::getFinalVersion (fixes #266) 2019-01-04 12:37:22 -05:00
Jay Berkenbilt a01359189b Fix dangling references (fixes #240)
On certain operations, such as iterating through all objects and
adding new indirect objects, walk through the entire object structure
and explicitly resolve any indirect references to non-existent
objects. That prevents new objects from springing into existence and
causing the previously dangling references to point to them.
2019-01-04 10:29:29 -05:00
Jay Berkenbilt 158156d506 Add basic appearance stream generation 2019-01-04 08:00:19 -05:00
Jay Berkenbilt b55567a0fa Add special case setV code for button fields 2019-01-03 23:18:13 -05:00
Jay Berkenbilt 1342612308 Replace need-appearances.pdf
Create a new need-appearances.pdf based on newer test files with more
modified fields.
2019-01-03 23:18:13 -05:00
Jay Berkenbilt e3144ac417 Add form fields to json output
Also add some additional methods for detecting form field types to
assist in the json creation and for later use.
2019-01-03 23:18:13 -05:00
Jay Berkenbilt 26393f5137 New test file with form field types 2019-01-03 23:18:13 -05:00
Jay Berkenbilt 87f855dbfc Rename test file 2019-01-03 23:18:13 -05:00
Jay Berkenbilt ca94ac68d9 Honor flags when flattening annotations 2019-01-03 11:59:55 -05:00
Jay Berkenbilt 3e74916c5a Fix seg fault on empty xref stream (fixes #263)
Thanks to @p-cher for supplying a patch.
2019-01-03 09:17:43 -05:00
Jay Berkenbilt f78ea057ca Switch annotation flattening to use the form xobjects
Instead of directly putting the contents of the annotation appearance
streams into the page's content stream, add commands to render the
form xobjects directly. This is a more robust way to do it than the
original solution as it works properly with patterns and avoids
problems with resource name clashes between the pages and the form
xobjects.
2019-01-02 21:49:47 -05:00
Jay Berkenbilt 23bcfeb336 Remove bogus test cheating code 2019-01-02 21:49:47 -05:00
Jay Berkenbilt 3b8ce4f12a Annotation flattening including form fields
Flatten annotations by integrating their appearance streams into the
content stream of the containing page. In the case of form fields,
only flatten if /NeedAppearance is false (or equivalently absent). If
flattening form fields, also remove /AcroForm from the document
catalog.
2019-01-01 08:14:15 -05:00
Jay Berkenbilt 95d6b17a89 Add QPDFObjectHandle::mergeDictionary() 2019-01-01 08:12:56 -05:00
Jay Berkenbilt 3440ea7d3c JSON::serialize -> unparse
Unparse is admittedly strange, but I'd rather be strange and
consistent, and everything else in the qpdf library uses unparse to
serialize. (If you're reading this, the convention of using "unparse"
comes from the "clu" programming language.)
2018-12-25 11:52:21 -05:00
Jay Berkenbilt 6048c6e2f0 Don't crash on @file when file doesn't exist (fixes #265)
When @file is used and file doesn't exist, just treat it as a normal
argument.
2018-12-23 11:46:56 -05:00
Jay Berkenbilt 968e7e60b7 Add json tests 2018-12-23 11:21:59 -05:00
Jay Berkenbilt 64c1579544 Support zsh completion 2018-12-23 11:21:59 -05:00
Jay Berkenbilt 76bf863aaa Add page position information to json 2018-12-23 09:15:40 -05:00
Jay Berkenbilt 52a0b767c8 Slightly improve bash completion arg parsing 2018-12-23 09:15:40 -05:00
Jay Berkenbilt 86f9b4c43b Add colorspace and depth information in json for images 2018-12-22 11:42:38 -05:00
Jay Berkenbilt 62ea3b9197 Add outlines to json at document level 2018-12-22 11:42:38 -05:00
Jay Berkenbilt ae9455bf44 Implement --json-objects 2018-12-22 11:42:38 -05:00
Jay Berkenbilt ce714ac9b8 Call cleanup between test sections 2018-12-22 11:42:38 -05:00
Jay Berkenbilt fa3051d977 Implement --json-keys 2018-12-22 11:42:38 -05:00
Jay Berkenbilt 2008d037b3 Handle help args using option tables; add json help 2018-12-22 11:42:38 -05:00
Jay Berkenbilt b3da5a2cba Switch json args and structure 2018-12-22 11:42:38 -05:00
Jay Berkenbilt 7985c77326 Completion: ignore characters at and after point 2018-12-22 11:42:37 -05:00
Jay Berkenbilt bb89382f93 Allow --show-object=trailer 2018-12-21 19:11:57 -05:00
Jay Berkenbilt dd1aca552c Support bash completion using complete -C 2018-12-21 19:11:57 -05:00
Jay Berkenbilt 3c075fc017 Table-driven parsing of encrypt options 2018-12-21 19:11:57 -05:00
Jay Berkenbilt 245723c570 Table-driven parsing for top-level arguments 2018-12-21 19:11:57 -05:00
Jay Berkenbilt 151206603b Move argument parsing into a class 2018-12-21 19:11:57 -05:00
Jay Berkenbilt 6580ffe983 Preliminary implementation of json mode
The json mode implemented in this commit is not the final version, or
are the command line arguments used to invoke it.
2018-12-21 19:11:57 -05:00
Jay Berkenbilt fa3664357b Move numrange code from qpdf.cc to QUtil.cc
Also move tests to libtests.
2018-12-21 19:11:57 -05:00
Jay Berkenbilt 313ba08126 Preserve some outline functionality in page splitting 2018-12-21 19:11:57 -05:00
Jay Berkenbilt d5d179f441 Add document and object helpers for outlines (bookmarks) 2018-12-21 19:11:57 -05:00
Jay Berkenbilt 0776c00129 Add QPDFNameTreeObjectHelper 2018-12-21 18:34:56 -05:00
Jay Berkenbilt 352ce9b22b Preserve page labels (numbers) when splitting and merging 2018-12-18 16:59:24 -05:00
Jay Berkenbilt 6ef9e31233 Add QPDFPageLabelDocumentHelper 2018-12-18 16:59:24 -05:00
Jay Berkenbilt f38df27aa3 Add QPDFNumberTreeObjectHelper 2018-12-18 16:46:10 -05:00
Jay Berkenbilt 88fb2e5258 Workaround for fragile test on Windows 2018-10-16 11:41:00 -04:00
Jay Berkenbilt 28453a4908 Add --keep-files-open flag (fixes #237) 2018-08-18 10:56:01 -04:00
Jay Berkenbilt 7214ba2303 Fix memory error on virus workaround code 2018-08-14 16:41:13 -04:00
Jay Berkenbilt 164cbdde46 Protect against virus warnings (fixes #216)
Some files in the test suite trigger antivirus warnings. These are
not infected files with malicious intent. They are test files to
ensure that qpdf does not crash when it encounters the files. This
change enables those files to be obfuscated in the source repository
so that checking out qpdf from version control or extracting the
source code doesn't trigger antivirus warnings.
2018-08-13 19:26:20 -04:00
Jay Berkenbilt fb1e29476c Add --no-warn option to suppress warnings (fixes #232) 2018-08-12 22:20:40 -04:00
Jay Berkenbilt a2f62935b3 Catch exceptions as const references (fixes #236)
This fix allows qpdf to compile/test cleanly with gcc 8.
2018-08-12 21:57:52 -04:00
Jay Berkenbilt 4a4736c695 Fix EOL handling inside strings (fixes #226)
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is
to be ignored after backslash.
2018-08-05 20:48:35 -04:00
Jay Berkenbilt e1cd5891af Fix infinite loop on small files with progress reporting (fixes #230)
Turns out you can keep adding zero to a number over and over again and
it just doesn't get any bigger. Who would have known?
2018-08-05 15:43:34 -04:00
Jay Berkenbilt fe769f2723 Keep file open while adding its pages during merge (fixes #217) 2018-08-04 19:58:13 -04:00
Jay Berkenbilt 3aad28aed0 Bug fix: honor encryption key length with R=3 (fixes #212) 2018-06-22 19:24:26 -04:00
Jay Berkenbilt c852af2a57 Add tests for progress and verbose changes 2018-06-22 16:14:54 -04:00
Jay Berkenbilt 6bf47ac6e8 With --verbose, give information on processing merge inputs 2018-06-22 16:14:54 -04:00
Jay Berkenbilt a433ed24f9 Add progress reporting for QPDFWriter (fixes #200) 2018-06-22 16:14:54 -04:00
Jay Berkenbilt 99593e0eef Use ClosedFileInputSource when merging files (fixes #154) 2018-06-22 12:53:41 -04:00
Jay Berkenbilt c71dc6888c Don't prune resource dictionaries on errors or by request
If we are unable to filter a page's content streams, don't attempt to
remove objects from the page's resource dictionary. Also provide a
command line option to suppress resource removal in case we ever need
this as a workaround for some bug or broken PDF files.
2018-06-22 10:45:31 -04:00
Jay Berkenbilt 38c9ed23c3 Treat content stream parsing errors as an error, not a warning
If parsing content streams is treated as a warning, there is no way
for a caller to know if a parsing operation has failed. This is very
dangerous and will likely result in data loss when token filters are
parser callbacks are in use.
2018-06-22 10:44:08 -04:00
Jay Berkenbilt 6c89d4b35b When splitting files, remove unreferenced objects (fixes #203) 2018-06-21 21:03:30 -04:00
Jay Berkenbilt ddd78c1b7f Fix QPDFObjectHandle::shallowCopy
It's not really a shallow copy. It just doesn't cross indirect object
boundaries. The old implementation had a bug that would cause multiple
shallow copies of the same object to share memory, which was not the
intention.
2018-06-21 20:34:45 -04:00
Jay Berkenbilt 84cd53f5af Make page range optional in --rotate (fixes #211) 2018-06-21 16:28:44 -04:00
Jay Berkenbilt 397b097c46 Allow setting a form field's value 2018-06-21 15:57:13 -04:00
Jay Berkenbilt 952a665a4e Better support for creating Unicode strings 2018-06-21 15:57:13 -04:00
Jay Berkenbilt 0b05111db8 Implement helper class for interactive forms 2018-06-21 15:57:13 -04:00
Jay Berkenbilt 0dadf17ab7 Convert command-line and test suite to use page helper classes
This provides better test coverage and more useful code for people to
read and copy.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt 4cded10821 Add QPDFObjectHandle::Rectangle type
Provide a convenient way of accessing rectangles.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt 078cf9bf90 newline before endstream fix for object streams (fixes #205) 2018-05-12 13:17:43 -04:00
Jay Berkenbilt b8ccbff413 doc: point out use of @filename for specifying password (fixes #198) 2018-05-05 17:52:04 -04:00
Jay Berkenbilt b4d6cf6836 Limit depth of nesting in direct objects (fixes #202)
This fixes CVE-2018-9918.
2018-04-15 16:11:22 -04:00
Jay Berkenbilt e4e2e26d99 Properly handle pages with no contents (fixes #194)
Remove calls to assertPageObject(). All cases in the library that
called assertPageObject() work fine if you don't call
assertPageObject() because nothing assumes anything that was being
checked by that call. Removing the calls enables more files to be
successfully processed.
2018-03-06 11:34:07 -05:00
Jay Berkenbilt ee44aef8d0 Treat loop in xref tables as damage (fixes #192)
Prior to this fix, if there was a loop detected in following /Prev
pointers in xref streams/tables, it would cause qpdf to lose data.
Note that this condition causes many PDF readers to hang or fail.
2018-03-05 14:26:58 -05:00
Jay Berkenbilt 666f794393 Support "r" in page ranges (fixes #155) 2018-03-04 07:05:14 -05:00
Jay Berkenbilt 7b9f23a99a Ignore zlib data check errors (fixes #191) 2018-03-03 11:35:01 -05:00
Jay Berkenbilt a8682e0b75 Spell check 2018-02-25 15:06:44 -05:00
Jay Berkenbilt 9a4ef8c95d Separate copyright notice from --version option 2018-02-25 09:03:27 -05:00
Jay Berkenbilt 4bb3046f0b Properly handle strings with PDF Doc Encoding (fixes #179)
The QPDF_String::getUTF8Val() method was not treating strings that
weren't explicitly Unicode as PDF Doc Encoded. This only affects
characters in the range 0x80 through 0xa0.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt 2780a1871d Add C API for checking PDF files 2018-02-18 21:06:27 -05:00
Jay Berkenbilt b72a38bf5f Reorganize some test cases
Too many test cases were "miscellaneous".
2018-02-18 21:06:27 -05:00
Jay Berkenbilt d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt c2e16827b6 Replace "file position" with "offset" in error messages
Sometimes it's an offset in an object stream or a content stream, so
file position is confusing in some cases.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt 52e024f701 Include omitted object description in error message 2018-02-18 21:06:27 -05:00
Jay Berkenbilt cb3b705cf9 Include filename in object stream parse error 2018-02-18 21:06:27 -05:00
Jay Berkenbilt e410b0fe0d Simplify TokenFilter interface
Expose Pl_QPDFTokenizer, and have it do more of the work of managing
the token filter's pipeline.
2018-02-18 21:05:47 -05:00
Jay Berkenbilt 5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt 9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt 25988e8d10 Bug fix: content normalizer should not add trailing newline
Adding a trailing newline in content normalization damages files whose
contents are split across streams in the middle of tokens. Let
QPDFWriter add the newline with the indicator to ignore the newline,
which it already does. This changes the way some qdf files look.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt cc108a7f1b Use pipePageContents in tokenizer test 2018-02-18 21:05:46 -05:00
Jay Berkenbilt 6afe83978f Switch from parseContentStream to parsePageContents 2018-02-18 21:05:46 -05:00
Jay Berkenbilt fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt ec538792fa Use inline image token type in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt fefe25030e Inline image token type 2018-02-18 21:05:46 -05:00
Jay Berkenbilt d97474868d Lexer enhancements: EOF, comment, space
Significant enhancements to the lexer to improve EOF handling and to
support comments and spaces as tokens. Various other minor issues were
fixed as well.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt bb9e91adbd Create isolated tokenizer tests
This tokenizes outer parts of the file, page content streams, and
object streams. It is for exercising the tokenizer in isolation and is
being introduced before reworking the lexical layer of qpdf.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt ebd5ed63de Add option to save pass 1 of lineariziation
This is useful only for debugging the linearization code.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt e3167c1a60 Fix linearization for files with nonstandard ID length 2018-02-04 18:16:23 -05:00
Jay Berkenbilt cffb6fd64a Test stream that ends with name token and no newline 2018-01-28 18:34:43 -05:00
Jay Berkenbilt 13d9756a45 Minor fixes to tokenizer 2018-01-28 18:34:43 -05:00
Jay Berkenbilt 569d74d36b Allow raw encryption key to be specified
Add options to enable the raw encryption key to be directly shown or
specified. Thanks to Didier Stevens <didier.stevens@gmail.com> for the
idea and contribution of one implementation of this idea.
2018-01-14 10:21:05 -05:00
Jay Berkenbilt 68572df2bf Update copyright to 2018 2018-01-13 20:25:58 -05:00
Jay Berkenbilt 791e0db762 Allow trailing . in numeric token (fixes #165) 2018-01-13 20:05:40 -05:00
Jay Berkenbilt 6299c64cf3 Use correct link directory order (fixes #158)
Make sure to link from the source tree before linking from the system.
In many environments, this is necessary to allow a newly built qpdf to
link properly instead of trying to link or resolve libraries from an
older installed version.
2018-01-13 19:53:52 -05:00
Jay Berkenbilt ec0087e3ce Support TIFF Predictor (fixes #171) 2018-01-13 19:49:42 -05:00
Jay Berkenbilt be27d47bdc Use better error for getStreamData failure
If the stream isn't filterable but we call getStreamData, throw a
regular exception instead of a logic error so that normal error
handling and reporting mechanisms will be used.
2018-01-13 19:49:42 -05:00
Jay Berkenbilt 48864b8d6e Clarify documentation of advanced parsing options 2017-12-25 18:42:33 -05:00
Jay Berkenbilt 4edfe1f41d Add tests for new PNG filters 2017-12-25 18:20:52 -05:00
Jay Berkenbilt 07c8bb2843 Additionally license under Apache License version 2.0
The Apache License version 2.0 is now the primary license for qpdf.
However, users may, at their option, continue to use Artistic version
2.0.
2017-09-14 12:59:25 -04:00
Jay Berkenbilt d31a7b76e7 Improve message for stream decoding error
Tweak the message so that we inform the user that we are mitigating
data loss.
2017-09-12 16:03:48 -04:00
Jay Berkenbilt eaacf94005 Update C API with new QPDFWriter methods 2017-09-12 14:30:39 -04:00
Jay Berkenbilt cbb2614975 Fix command-line parsing for --rotate 2017-09-07 22:58:37 -04:00
Jay Berkenbilt ec7d74a386 Add test case for overflow in PNG filter (fixes #150) 2017-08-29 12:33:01 -04:00
Jay Berkenbilt 1868a10f8b Replace all atoi calls with QUtil::string_to_int
The latter catches underflow/overflow.
2017-08-29 12:28:32 -04:00
Jay Berkenbilt abb3191c32 Add tests for previous memory issues
Now that the test suite runs clean with address sanitizer, add some
test cases that previously were used to expose memory errors.
2017-08-28 22:28:12 -04:00
Jay Berkenbilt 4f8c734d8e Missing free in some test code
There was a missing free causing a memory leak in some test code. The
memory leak was not in library code.
2017-08-26 22:04:49 -04:00
Jay Berkenbilt ad527a64f9 Parse iteratively to avoid stack overflow (fixes #146) 2017-08-25 21:56:45 -04:00
Jay Berkenbilt 85f05cc57f Detect xref pointer infinite loop (fixes #149) 2017-08-25 19:58:31 -04:00
Jay Berkenbilt e452d9dca6 Spell check 2017-08-22 14:22:20 -04:00
Jay Berkenbilt fabff0f3ec Limit token length during xref recovery
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt 6884ad2ead Fix logic error in recovery
A stray semicolon caused a condition to be incorrectly applied during
stream length recovery.
2017-08-22 07:19:41 -04:00
Jay Berkenbilt 8288a4eb3a Update copyright to 2017 2017-08-21 21:18:47 -04:00
Jay Berkenbilt f08ce00e62 Add tests for PCLm
Files written in PCLm mode have to be created in a very specific way.
qpdf doesn't know how to create PCLm files from scratch. All it knows
how to do is to write an already valid file in a suitable way.
Therefore there is no command-line support for PCLm.
2017-08-21 21:05:47 -04:00
Jay Berkenbilt ddc6cf0cf6 Precheck streams by default
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt 9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt e0d1cd1f4b Fix test case
There was an unintended recoverable error in a test file. It wasn't
hurting anything, but it was obscuring the actual intent of the test.
2017-08-19 14:50:55 -04:00
Jay Berkenbilt cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt d926d78059 Add --verbose flag 2017-08-12 12:30:18 -04:00
Jay Berkenbilt 2c6fe1805a Support groups of pages in --split-pages (fixes #30) 2017-08-12 12:08:23 -04:00
Jay Berkenbilt df33c368b4 Change --single-pages to --split-pages
This is in preparation for implementing page groups.
2017-08-12 11:49:04 -04:00
Jay Berkenbilt ad82706003 Note about veraPDF 2017-08-12 11:35:02 -04:00
Jay Berkenbilt 8249a26d69 Fix infinite loop in QPDFWriter (fixes #143) 2017-08-12 08:36:36 -04:00
Jay Berkenbilt 36b3fe5af7 Fix --newline-before-endstream option (fixes #133)
Add a newline unconditionally before endstream even if a newline was
already written as part of the stream data.
2017-08-11 20:57:05 -04:00
Jay Berkenbilt 46611f0710 Prevent a division by zero error (fixes #141)
Bad /W in an xref stream could cause a division by zero error. Now
this is handled as a special case.
2017-08-11 20:11:19 -04:00
Jay Berkenbilt 8fe0b06cd8 Pad encryption parameters that are too short (fixes #96) 2017-08-11 19:53:56 -04:00
Jay Berkenbilt 0c99cf874b Sanitize test suite
Remove problematic test files
2017-08-11 07:41:11 -04:00
Jay Berkenbilt 30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt ca5b1d267a Improve stream length recovery
Eliminate PCRE and find endobj not preceded by endstream. Be more lax
about placement of endstream and endobj.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt 3082e4e606 Find xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt 90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt 03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt 1765c6ec20 Find header without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt ef8ae5449d Allow QPDFTokenizer::readToken to return bad tokens
Sometimes we want to ignore bad tokens rather than having them throw
an exception. A coverage case is commented out here and added in a
later commit.
2017-08-10 19:01:41 -04:00
Jay Berkenbilt c5dc6d8067 Remove unused PointerHolder interface
Also fix a bug resulting from incorrect use of PointerHolder because
of this unused parameter.
2017-08-10 19:01:38 -04:00
Jay Berkenbilt ff6971fb1c Call PointerHolder constructor properly (fixes #135)
Passed arguments to the constructor in the wrong order.
2017-08-09 22:00:49 -04:00
Jay Berkenbilt 49825e5cb6 Add --split-pages option (fixes #30) 2017-08-05 10:22:33 -04:00
Jay Berkenbilt a60eb552d3 Split bug tests into separate chunk 2017-08-05 10:22:33 -04:00
Jay Berkenbilt 1ec59c299d Refactor write_output 2017-08-05 10:22:33 -04:00
Jay Berkenbilt 909daf9543 Move page spec processing earlier 2017-08-05 10:22:33 -04:00
Jay Berkenbilt 24f28f0768 Split qpdf.cc's main into reasonably sized functions
main() had gotten absurdly long. Split it into reasonable chunks. This
refactoring is in preparation for handling splitting output into
single pages.
2017-08-05 08:24:05 -04:00
Jay Berkenbilt c88eaae2f2 Fix off-by-one error in --pages argument parsing (fixes #129) 2017-08-02 21:08:43 -04:00
Jay Berkenbilt 2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt 5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt 885b8781cc Allow --check to coexist with and precede other operations (fixes #42) 2017-07-29 19:56:21 -04:00
Jay Berkenbilt b43a0ac237 When recover stream length, indicate the length (fixes #44) 2017-07-29 19:15:06 -04:00
Jay Berkenbilt f37d399d82 Add newline-before-endstream option (fixes #103) 2017-07-29 12:21:38 -04:00
Jay Berkenbilt 6a7d53ad2b Handle zlib data errors better (fixes #106) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt 07d6f770b2 Better recovery of bad stream start (fixes #104) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt 3a1ff5ded9 Add option to preserve unreferenced objects 2017-07-28 19:19:11 -04:00
Jay Berkenbilt a94a729fee Explicitly check root dictionary type
Very badly corrupted files may not have a retrievable root dictionary.
Handle that as a special case so that a more helpful error message can
be provided.
2017-07-28 18:03:30 -04:00
Jay Berkenbilt 7f8892525f Add precheck streams capability
When requested, QPDFWriter will do more aggress prechecking of streams
to make sure it can actually succeed in decoding them before
attempting to do so. This will allow preservation of raw data even
when the raw data is corrupted relative to the specified filters.
2017-07-27 23:42:27 -04:00
Jay Berkenbilt 428d96dfe1 Convert many more errors to warnings 2017-07-27 22:57:55 -04:00
Jay Berkenbilt a4fd4b91c6 Convert stream filtering errors to warnings 2017-07-27 18:43:07 -04:00
Jay Berkenbilt 40f00122b8 Convert object parsing errors to warnings
QPDFObjectHandle::parseInternal now issues warnings instead of
throwing exceptions for all error conditions that it finds (except
internal logic errors) and has stronger recovery for things like
invalid tokens and malformed dictionaries. This should improve qpdf's
ability to recover from a wide range of broken files that currently
cause it to fail.
2017-07-27 18:20:31 -04:00
Jay Berkenbilt ac3c81a8ed Include tests for other infinite loop bugs
fixes #117
fixes #118
fixes #119
fixes #120

Several other infinite loop bugs were fixed by previous changes.
Include their test files in the test suite.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt 701b518d5c Detect recursion loops resolving objects (fixes #51)
During parsing of an object, sometimes parts of the object have to be
resolved. An example is stream lengths. If such an object directly or
indirectly points to the object being parsed, it can cause an infinite
loop. Guard against all cases of re-entrant resolution of objects.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt afe0242b26 Handle object ID 0 (fixes #99)
This is CVE-2017-9208.

The QPDF library uses object ID 0 internally as a sentinel to
represent a direct object, but prior to this fix, was not blocking
handling of 0 0 obj or 0 0 R as a special case. Creating an object in
the file with 0 0 obj could cause various infinite loops. The PDF spec
doesn't allow for object 0. Having qpdf handle object 0 might be a
better fix, but changing all the places in the code that assumes objid
== 0 means direct would be risky.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt 315092dd98 Avoid xref reconstruction infinite loop (fixes #100)
This is CVE-2017-9209.
2017-07-26 06:24:07 -04:00
Jay Berkenbilt 603f222365 Fix infinite loop while reporting an error (fixes #101)
This is CVE-2017-9210.

The description string for an error message included unparsing an
object, which is too complex of a thing to try to do while throwing an
exception. There was only one example of this in the entire codebase,
so it is not a pervasive problem. Fixing this eliminated one class of
infinite loop errors.
2017-07-26 06:24:07 -04:00