Commit Graph

886 Commits

Author SHA1 Message Date
Jay Berkenbilt 278710fbe8 Refactor QPDFPageObjectHelper::removeUnreferencedResources()
Refactor removeUnreferencedResources to prepare for filtering form
XObjects.
2020-03-31 17:39:20 -04:00
Jay Berkenbilt bb6768b8f0 Include header for wcslen (fixes #405) 2020-02-29 08:43:33 -05:00
Jay Berkenbilt bb3137296d Handle root /Pages pointing to other than page tree root (fixes #398) 2020-02-22 11:10:31 -05:00
Jay Berkenbilt 52a2e95dd5 Prepare 9.1.1 release 2020-01-26 18:49:04 -05:00
Jay Berkenbilt 57c01ef81f In qdf mode, don't write extra XRef streams (fixes #386)
fix-qdf assumes there is exactly one XRef stream and that it is at the
end of the file.
2020-01-26 16:50:57 -05:00
Jay Berkenbilt bbc2f8ffae Bug fix: handle ColorSpace lookup for inline images (fixes #392)
If the value of /CS in the inline image dictionary was is key in the
page's /Resource -> /ColorSpace dictionary, properly resolve it by
referencing the proper colorspace, and not just the name, in the
external image dictionary.
2020-01-26 15:29:10 -05:00
Cloudmersive a8b6ff5763 Fix for Windows unable to acquire crypt context with new keyset (fixes #387)
Fix is based on guidance
https://support.microsoft.com/en-us/help/238187/cryptacquirecontext-use-and-troubleshooting
and is the proper fix for #285/#286
2020-01-14 18:45:54 -05:00
Jay Berkenbilt a44b5a34a0 Pull wmain -> main code from qpdf.cc into QUtil.cc 2020-01-14 11:40:51 -05:00
Jay Berkenbilt ab4061f1ee Add error detection for read_lines_from_file(FILE*) 2020-01-14 11:07:09 -05:00
Jay Berkenbilt 211a7f57be QUtil::read_lines_from_file: optional EOL preservation 2020-01-13 11:26:18 -05:00
Jay Berkenbilt 9a398504ca Refactor QUtil::read_lines_from_file
This commit adds the preserve_eol flags but doesn't implement EOL
preservation yet.
2020-01-13 09:19:53 -05:00
Jay Berkenbilt 9b0c6022d7 Prepare 9.1.0 release 2019-11-16 22:29:54 -05:00
Jay Berkenbilt 5e6dfc938e Prepare 9.1.rc1 release 2019-11-09 22:00:53 -05:00
Jay Berkenbilt c4478e5249 Allow odd/even modifiers in numeric range (fixes #364) 2019-11-09 13:23:12 -05:00
Jay Berkenbilt 5508f74603 Allow /P in encryption dictionary to be positive (fixes #382)
Even though this is disallowed by the spec, files like this have been
encountered in the wild.
2019-11-09 12:33:15 -05:00
Jay Berkenbilt 127a957aee Allow runtime inspection/override of crypto provider 2019-11-09 09:53:42 -05:00
Jay Berkenbilt 88bedb41fe Implement gnutls crypto provider (fixes #218)
Thanks to Zdenek Dohnal <zdohnal@redhat.com> for contributing the code
used for the gnutls crypto provider.
2019-11-09 09:53:38 -05:00
Jay Berkenbilt cc14523440 Update autoconf to support crypto selection 2019-11-09 08:18:02 -05:00
Jay Berkenbilt d0a53cd3ea Fix typos in configure.ac 2019-11-09 08:18:02 -05:00
Jay Berkenbilt c03ced09c0 Isolate source files used for native crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt d1ffe46c04 AES_PDF: move CBC logic from pipeline to AES_PDF implementation 2019-11-09 08:18:02 -05:00
Jay Berkenbilt c8cda4f965 AES_PDF: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt bb427bd117 SHA2: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt eadc222ff9 Rename SHA2 implementation (non-bisectable) 2019-11-09 08:18:02 -05:00
Jay Berkenbilt 4287fcc002 RC4: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt 0cdcd10228 Rename RC4 implementation (non-bisectable) 2019-11-09 08:18:02 -05:00
Jay Berkenbilt ce8f9b6608 MD5: switch to pluggable crypto 2019-11-09 08:18:02 -05:00
Jay Berkenbilt 5c3e856e9f Rename MD5 implementation (non-bisectable)
Just rename MD5 -> MD5_native in place so that git annotate will show
the lines as having originated there.
2019-11-09 08:18:02 -05:00
Jay Berkenbilt 2de41856a0 QPDFCryptoProvider: initial implementation 2019-11-09 08:18:02 -05:00
Jay Berkenbilt 700f5b961e Remove int type checks -- subsumed by C++-11 2019-11-09 08:18:02 -05:00
Jay Berkenbilt 653ce3550d Require C++-11
Includes updates to m4/ax_cxx_compile_stdcxx.m4 to make it work with
msvc, which supports C++-11 with no flags but doesn't set __cplusplus
to a recent value.
2019-11-09 08:18:02 -05:00
Jay Berkenbilt 9094fb1f8e Fix two additional fuzz test cases 2019-11-03 18:59:12 -05:00
Masamichi Hosoda 5a842792b6 Parse Contents in signature dictionary without encryption
Various PDF digital signing tools do not encrypt /Contents value in
signature dictionary. Adobe Acrobat Reader DC can handle a PDF with
the /Contents value not encrypted.

Write Contents in signature dictionary without encryption

Tests ensure that string /Contents are not handled specially when not
found in sig dicts.
2019-10-22 16:20:21 -04:00
Masamichi Hosoda cdc46d78f4 Add QPDFObject::getParsedOffset() 2019-10-22 16:19:06 -04:00
Masamichi Hosoda 50b329ee9f Add QPDFWriter::getWrittenXRefTable() 2019-10-22 16:16:16 -04:00
Masamichi Hosoda 5cf4090aee Add QPDFWriter::getRenumberedObjGen() 2019-10-22 16:16:16 -04:00
Masamichi Hosoda 46ac3e21b3 Add QPDF::getXRefTable() 2019-10-22 16:16:16 -04:00
Masamichi Hosoda 06b818dcd3 Exclude signature dictionary from compressible objects
It seems better not to compress signature dictionaries. Various PDF
digital signing tools, including Adobe Acrobat Reader DC, do not
compress signature dictionaries.

Table 8.93 "Entries in a signature dictionary" in PDF 1.5 reference
describes that /ByteRange in the signature dictionary shall be used to
describe a digest that does not include the signature value
(/Contents) itself.

The byte ranges cannot be determined if the dictionary is compressed.
2019-10-22 16:16:16 -04:00
Masamichi Hosoda 5e0ba12687 Fix /Contents value representation in a signature dictionary
Table 8.93 "Entries in a signature dictionary" in PDF 1.5 reference
describes that the value of Contents entry is a hexadecimal string
representation when ByteRange is specified.

This commit makes QPDF always uses hexadecimal strings representation
instead of literal strings for it.
2019-10-22 16:16:16 -04:00
Jay Berkenbilt 3094955dee Prepare 9.0.2 release 2019-10-12 19:37:40 -04:00
Jay Berkenbilt 4ea940b03c Prepare 9.0.1 release 2019-09-20 07:38:18 -04:00
Jay Berkenbilt 685250d7d6 Correct reversed Rectangle coordinates (fixes #363) 2019-09-19 21:25:34 -04:00
Jay Berkenbilt 48b7de2cc3 Fix typo in comment 2019-09-19 21:04:32 -04:00
Jay Berkenbilt 8b1e307741 Warn for duplicated dictionary keys (fixes #345) 2019-09-19 20:22:34 -04:00
Jay Berkenbilt bb83e65193 Fix fuzz issue 16953 (overflow checking in xref stream index) 2019-09-17 19:48:47 -04:00
Jay Berkenbilt 17d431dfd5 Fix integer type warnings for big-endian systems 2019-09-17 19:14:27 -04:00
Jay Berkenbilt 5462dfce31 Prepare 9.0.0 release 2019-08-31 20:07:36 -04:00
Jay Berkenbilt babd12c9b2 Add methods QPDF::anyWarnings and QPDF::closeInputSource 2019-08-31 15:51:20 -04:00
Jay Berkenbilt 4fa7b1eb60 Add remove_file and rename_file to QUtil 2019-08-31 15:51:04 -04:00
Jay Berkenbilt 0e51a9aca6 Don't encrypt trailer, fixes fuzz issue 15983
Ordinarily the trailer doesn't contain any strings, so this is usually
a non-issue, but if the trailer contains strings, linearizing and
encrypting with object streams would include encrypted strings in the
trailer, which would blow out the padding because encrypted strings
are longer than their cleartext counterparts.
2019-08-28 23:06:32 -04:00
Jay Berkenbilt 47a38a942d Detect stream in object stream, fixing fuzz 16214
It's detected in QPDFWriter instead of at parse time because I can't
figure out how to construct a test case in a reasonable time. This
commit moves the fuzz file into the regular test suite for a QTC
coverage case.
2019-08-28 12:49:04 -04:00
Jay Berkenbilt ba5fb69164 Make popping pipeline stack safer
Use destructors to pop the pipeline stack, and ensure that code that
pops the stack is actually popping the intended thing.
2019-08-27 22:27:47 -04:00
Jay Berkenbilt dadf8307c8 Fix fuzz issues 15316 and 15390 2019-08-27 20:39:06 -04:00
Jay Berkenbilt 456c285b02 Fix fuzz issue 16172 (overflow checking in OffsetInputSource) 2019-08-27 13:08:07 -04:00
Jay Berkenbilt ad8081daf5 Fix fuzz issue 15442 (overflow checking in BufferInputSource) 2019-08-27 11:26:25 -04:00
Jay Berkenbilt 9a095c5c76 Seek in two stages to avoid overflow
When seeing to a position based on a value read from the input, we are
prone to integer overflow (fuzz issue 15442). Seek in two stages to
move the overflow check into the input source code.
2019-08-27 11:26:25 -04:00
Jay Berkenbilt ac5e6de2e8 Fix fuzz issue 15387 (overflow checking xref size) 2019-08-27 11:26:25 -04:00
Jay Berkenbilt 6bc4cc3d48 Fix fuzz issue 15475 2019-08-25 22:52:25 -04:00
Jay Berkenbilt 94e86e2528 Fix fuzz issue 16301 2019-08-25 22:52:25 -04:00
Jay Berkenbilt 5da146c8b5 Track separately whether password was user/owner (fixes #159) 2019-08-24 11:01:19 -04:00
Jay Berkenbilt 5a0aef55a0 Split long line 2019-08-24 10:58:51 -04:00
Jay Berkenbilt 2794bfb1a6 Add flags to control zlib compression level (fixes #113) 2019-08-23 20:34:21 -04:00
Jay Berkenbilt dac0598b94 Add ability to set zlib compression level globally 2019-08-23 20:34:21 -04:00
Jay Berkenbilt 3f1ab64066 Pass offset and length to ParserCallbacks::handleObject 2019-08-22 22:54:29 -04:00
Jay Berkenbilt 4b2e72c4cd Test for direct, rather than resolved nulls in parser
Just because we know an indirect reference is null, doesn't mean we
shouldn't keep it indirect.
2019-08-22 17:55:16 -04:00
Jay Berkenbilt 3f3dbe22ea Remove array null flattening
For some reason, qpdf from the beginning was replacing indirect
references to null with literal null in arrays even after removing the
old behavior of flattening scalar references. This seems like a bad
idea.
2019-08-22 17:55:16 -04:00
Jay Berkenbilt 225cd9dac2 Protect against coding error of re-entrant parsing 2019-08-22 17:55:16 -04:00
Jay Berkenbilt ae5bd7102d Accept extraneous space before xref (fixes #341) 2019-08-19 22:24:53 -04:00
Jay Berkenbilt 8a9086a689 Accept extraneous space after stream keyword (fixes #329) 2019-08-19 21:43:44 -04:00
Jay Berkenbilt 43f91f58b8 Improve invalid name token warning message
This message used to only appear for PDF >= 1.2. The invalid name is
valid for PDF 1.0 and 1.1. However, since QPDFWriter may write a newer
version, it's better to detect and warn in all cases. Therefore make
the warning more informative.
2019-08-19 19:48:27 -04:00
Jay Berkenbilt 42d396f1dd Handle invalid name tokens symmetrically for PDF < 1.2 (fixes #332) 2019-08-19 19:48:27 -04:00
Jay Berkenbilt d9dd99eca3 Attempt to repair /Type key in pages nodes (fixes #349) 2019-08-18 18:54:37 -04:00
Jay Berkenbilt 522d2b2227 Improve efficiency of fixDanglingReferences 2019-08-18 09:00:40 -04:00
Jay Berkenbilt 5187a3ec85 Shallow copy arrays without removing sparseness 2019-08-17 23:02:41 -04:00
Jay Berkenbilt bf7c6a8070 Use SparseOHArray in parsing 2019-08-17 23:02:41 -04:00
Jay Berkenbilt e5f504b6c5 Use SparseOHArray in QPDF_Array 2019-08-17 23:02:41 -04:00
Jay Berkenbilt a89d8a0677 Refactor QPDF_Array in preparation for using SparseOHArray 2019-08-17 23:02:41 -04:00
Jay Berkenbilt e83f3308fb SparseOHArray 2019-08-17 23:02:41 -04:00
Thorsten Schöning 8f06da7534 Change list to vector for outline helpers (fixes #297)
This change works around STL problems with Embarcadero C++ Builder
version 10.2, but std::vector is more common than std::list in qpdf,
and this is a relatively new API, so an API change is tolerable.

Thanks to Thorsten Schöning <6223655+ams-tschoening@users.noreply.github.com>
for the fix.
2019-07-03 20:08:47 -04:00
Jay Berkenbilt 4db1de97ce Convert some cases of logic_error to runtime_error
There were a few cases that could be caused by invalid input rather
than bugs in the code which were throwing logic_error instead of
runtime_error.
2019-06-25 12:43:06 -04:00
Jay Berkenbilt 201e8798d7 Convert previously overlooked static cast to QIntC 2019-06-25 12:43:06 -04:00
Jay Berkenbilt 04f45cf652 Treat all linearization errors as warnings
This also reverts the addition of a new checkLinearization that
distinguishes errors from warnings. There's no practical distinction
between what was considered an error and what was considered a
warning.
2019-06-23 13:45:45 -04:00
Jay Berkenbilt c5ed1b8075 Handle invalid encryption Length (fixes #333) 2019-06-22 20:57:33 -04:00
Jay Berkenbilt 551dfbf697 Allow set*EncryptionParameters before filename iset (fixes #336) 2019-06-22 20:57:33 -04:00
Jay Berkenbilt 7bd38a3eb3 Provide error message in Windows crypto code (fixes #286)
Thanks to github user zdenop for supplying some additional
error-handling code.
2019-06-22 17:12:01 -04:00
Jay Berkenbilt 6c39aa8763 In shippable code, favor smart pointers (fixes #235)
Use PointerHolder in several places where manually memory allocation
and deallocation were being used. This helps to protect against memory
leaks when exceptions are thrown in surprising places.
2019-06-22 16:57:52 -04:00
Jay Berkenbilt 85a3f95a89 qpdf: exit 3 for linearization warnings without errors (fixes #50) 2019-06-22 16:57:51 -04:00
Jay Berkenbilt 1bde5c68a3 Add QUtil::read_file_into_memory
This code was essentially duplicated between test_driver and
standalone_fuzz_target_runner.
2019-06-22 10:14:25 -04:00
Jay Berkenbilt 658b5bb3be QPDFWriter: clean up overloaded functions
In a small number of cases, it makes sense to replace an overloaded
function with a function that takes a default argument. We can do this
now because we've already broken binary compatibility since the last
release.
2019-06-22 10:13:27 -04:00
Jay Berkenbilt 79f6b4823b Convert remaining public classes to use Members pattern
Have classes contain only a single private member of type
PointerHolder<Members>. This makes it safe to change the structure of
the Members class without breaking binary compatibility. Many of the
classes already follow this pattern quite successfully. This brings in
the rest of the class that are part of the public API.
2019-06-22 10:13:27 -04:00
Jay Berkenbilt 45dac410b5 Remove broken QPDFTokenizer::expectInlineImage 2019-06-21 22:29:31 -04:00
Jay Berkenbilt 25dd3c6750 Remove QPDF::copyForeignObject with unused parameter 2019-06-21 22:29:31 -04:00
Jay Berkenbilt c6cfd64503 Rename QUtil::strcasecmp to QUtil::str_compare_nocase (fixes #242) 2019-06-21 22:29:31 -04:00
Jay Berkenbilt 848351f1fc Add missing #include <cstring> 2019-06-21 22:29:31 -04:00
Jay Berkenbilt b07ad6794e Fix bugs found by fuzz tests
* Several assertions in linearization were not always true; change
  them to run time errors
* Handle a few cases of uninitialized objects
* Handle pages with no contents when doing form operations
* Handle invalid page tree nodes when traversing pages
2019-06-21 17:56:24 -04:00
Jay Berkenbilt a35d4ce9cc Fix bounds error in utf16_to_utf8 conversion 2019-06-21 17:40:24 -04:00
Jay Berkenbilt 63a643a3c7 Remove implicit conversion from int/pointer to bool
This fixes cases of warning C4800 from msvc
2019-06-21 13:17:21 -04:00
Jay Berkenbilt d71f05ca07 Fix sign and conversion warnings (major)
This makes all integer type conversions that have potential data loss
explicit with calls that do range checks and raise an exception. After
this commit, qpdf builds with no warnings when -Wsign-conversion
-Wconversion is used with gcc or clang or when -W3 -Wd4800 is used
with MSVC. This significantly reduces the likelihood of potential
crashes from bogus integer values.

There are some parts of the code that take int when they should take
size_t or an offset. Such places would make qpdf not support files
with more than 2^31 of something that usually wouldn't be so large. In
the event that such a file shows up and is valid, at least qpdf would
raise an error in the right spot so the issue could be legitimately
addressed rather than failing in some weird way because of a silent
overflow condition.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt f40ffc9d63 Pl_Flate: constructor's out_bufsize is now unsigned int
This is the type we need for the underlying zlib implementation.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt da30764bce Change QPDFObjectHandle::pipeStreamData's encode_flags type
Change from unsigned long to int since we pass enumerated type values
to this field.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt 3608afd5c5 Add new integer accessors to QPDFObjectHandle 2019-06-21 13:17:21 -04:00
Jay Berkenbilt 42306e2ff8 QUtil: add unsigned int/string functions 2019-06-21 13:17:21 -04:00
Jay Berkenbilt 2155815234 configure: determine wordsize automatically
Based on sizeof(size_t). Assumes 64 if not 32.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt 713d961990 Appearance streams: some floating point values were truncated
Bounding box X coordinates could be truncated, causing them to be off
by a fraction of a point. This was most likely not visible, but it was
still wrong.
2019-06-20 21:32:30 -04:00
Jay Berkenbilt eb7948876b Fix problems found in fuzz corpus 2019-06-15 17:24:24 -04:00
Jay Berkenbilt cf469d7890 Give up reading objects with too many consecutive errors 2019-06-15 08:52:19 -04:00
Jay Berkenbilt cd830968ef Eliminate one potential integer overflow
There are more to handle, but this resolves an issue already caught by
oss-fuzz.
2019-06-15 08:52:19 -04:00
Jay Berkenbilt 31bde2f9d7 Handle empty DecodeParams array for (fixes #331)
On read, ignore /DecodeParms when empty list; on write, delete it.
Some files have been found that include an empty list for
/DecodeParms, but this is not technically compliant with the spec, and
the only sensible interpretation is to treat it as if there are no
decode parameters.
2019-06-09 17:19:49 -04:00
Jay Berkenbilt b1a78be1a8 Prepare 8.4.2 release 2019-05-18 08:56:37 -04:00
Jay Berkenbilt b3f0dbff62 Fix Windows memory error (fixes #330) 2019-05-16 14:26:51 -04:00
Jay Berkenbilt a323f6f49f Prepare 8.4.1 release 2019-04-27 20:44:20 -04:00
Jay Berkenbilt 81205e007b Spell check 2019-04-21 13:09:11 -04:00
Jay Berkenbilt 011695dfdf Support Unicode in filenames (fixes #298) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt 4ccb29912a Tighten isPageObject (fixes #310) 2019-04-20 21:00:43 -04:00
Thorsten Schöning 2c704b99a1 Undefined functions because of missing std:: or header. (#295)
* [bcc32 Error] QPDF.cc(375): E2268 Call to undefined function 'atof'
  Full parser context
    QPDF.cc(358): parsing: void QPDF::parse(const char *)

* [bcc32 Error] QPDFTokenizer.cc(183): E2268 Call to undefined function 'strtol'
  Full parser context
    QPDFTokenizer.cc(163): parsing: void QPDFTokenizer::resolveLiteral()

* [bcc32 Error] pdf-split-pages.cc(52): E2268 Call to undefined function 'exit'
  Full parser context
    pdf-split-pages.cc(50): parsing: void usage()

* PR #295: Including "cstdlib" should be replaced with "stdlib.h" to be more consistent. At the same time I changed the order of the surrounding includes to reflect alphabetical order, because at some files this already have been the case.
2019-03-12 10:05:29 -04:00
Thorsten Schöning 71b7ed9f4f "_setmode" and "_stricmp" are not available on Borland C++Builder, neither the classic one nor newer ones based on CLANG. 2019-03-11 16:58:55 -04:00
Jay Berkenbilt da7c2c0ee9 Fix json serialization for {x | -1 < x < 1} (fixes #308)
JSON serialization was preserving the value as presented, but JSON
doesn't accept decimal values without a 0 before the decimal point.
2019-03-11 16:22:59 -04:00
Jay Berkenbilt 03074ca5a0 Prepare 8.4.0 release 2019-02-01 22:25:25 -05:00
Jay Berkenbilt fec5bb124c Spell check 2019-01-31 21:41:29 -05:00
Jay Berkenbilt eb49e07c0a Make inline image token exactly contain the image data
Do not include the trailing EI, and handle cases where EI is not
preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31 20:28:44 -05:00
Jay Berkenbilt 5211bcb5ea Externalize inline images (fixes #278) 2019-01-31 10:38:13 -05:00
Jay Berkenbilt 1eb35a355f Exclude space after ID in image data 2019-01-31 10:38:10 -05:00
Jay Berkenbilt 2b6c79bcae Improve locating inline image's EI
We've actually seen a PDF file in the wild that contained EI
surrounded by delimiters inside the image data, which confused qpdf's
naive code. This significantly improves EI detection.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt ec9e310c9e Refactor QPDFTokenizer's inline image handling
Add a version of expectInlineImage that takes an input source and
searches for EI. This is in preparation for improving the way EI is
found. This commit just refactors the code without changing the
functionality and adds tests to make sure the old and new code behave
identically.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt 31372edce0 Inline image token value ends with EI, not delimiter
The inline image token erroneously included the delimiter that
followed EI. The ObjectHandle created from it was correct.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt b776dcd2d3 Clean up some private functions 2019-01-29 22:14:20 -05:00
Jay Berkenbilt 8a9cfd2605 Handle direct page objects (fixes #164) 2019-01-29 17:01:36 -05:00
Jay Berkenbilt 2d0885bc11 Clarify documentation for copyForeignObject regarding pages
Make explicit that copyForeignObject can be used on page objects and
will copy them properly but not update the pages tree.
2019-01-28 21:53:55 -05:00
Jay Berkenbilt 2712869cf9 Fix logic for when to compress object and xref streams (fixes #271) 2019-01-28 21:43:06 -05:00
Jay Berkenbilt 52f9d326a5 Resolve duplicated page objects (fixes #268)
When linearizing a file or getting the list of all pages in a file,
detect if the pages tree contains a duplicated page object and, if so,
shallow copy it. This makes it possible to have a one to one mapping
of page positions to page objects.
2019-01-28 20:29:58 -05:00
Jay Berkenbilt 623f5b664e Convert pages to form XObjects
Support conversion of pages to form XObjects and placement of form
XObjects on pages.
2019-01-27 07:50:30 -05:00
Jay Berkenbilt 68ccd87c9e Move rectangle transformation into QPDFMatrix 2019-01-27 07:50:30 -05:00
Jay Berkenbilt 8cb245739c Add QPDFObjectHandle::getUniqueResourceName 2019-01-27 07:50:30 -05:00
Jay Berkenbilt 009767d97a Handle inheritable page attributes
Add getAttribute for handling inheritable page attributes, and fix
getPageImages and annotation flattening code to use it.
2019-01-25 22:30:05 -05:00
Jay Berkenbilt 2d32f4db8f Handle fallback font size in text appearances
If we end up using our fallback font size when generating appearances
for text fields, reflect that in the Tf operator used in the
appearance stream.
2019-01-21 07:38:21 -05:00
Jay Berkenbilt 9cb599875b Improve text objects used in text appearance streams 2019-01-20 23:05:58 -05:00
Jay Berkenbilt 930eade6d3 Fix omissions in text appearance generation
When generating appearance streams for variable text annotations,
properly handle the cases of there being no appearance dictionary, no
appearance stream, or an appearance stream with no BMC..EMC marker.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt 65ef0bf313 When flattening, remove annotations with no appearance stream
With the exception of form field annotations when /NeedAppearances is
true, remove annotations that don't have appearance streams when
flattening. There is no reason to keep these when flattening since
they are invisible. This may include unchecked checkboxes, unshown
popup windows, etc.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt c18ee440a3 mingw workaround for QPDFExc destructor
mingw doesn't like it when you don't inline empty virtual destructors.
2019-01-19 10:14:07 -05:00
Jay Berkenbilt e87d149918 Add QUtil::possible_repaired_encodings 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 6ec22f117d Modernize encryption API for more granularity
Setting encryption permissions for R >= 3 set permission bits in
groups corresponding to menu options in Acrobat 5. The new API allows
the bits to be set individually.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt 4630377731 Add status-reporting transcoders to QUtil 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 8f389f14c0 QUtil::analyze_encoding 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 6817ca585a Bidirectional transcoding for win, mac, pdf, utf8, utf16 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 698485468a Move remaining existing transcoding to QUtil 2019-01-17 11:43:56 -05:00
Jay Berkenbilt 5cfcd4f361 Additional checks for unreferenced resources
Explicitly abandon removal of unreferenced resources if there are any
lexical errors in the page's contents. This case always generated a
warning, but it now also prevents removal of unreferenced resources,
this strongly decreasing the likelihood of data loss.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt 4bc434000c Copy subdictionaries when removing resources (fixes #276)
When removing unreferenced resources, the code was copying the overall
resource dictionaries but not the subdictionaries being modified. This
was a "typo" in the code -- the comment clearly stated the need to do
this, but the code replaced the dictionary with itself rather than
with a shallow copy of itself.
2019-01-17 09:40:05 -05:00
Jay Berkenbilt 654c0e8caf Allow adding the same page more than once in --pages (fixes #272) 2019-01-12 10:01:47 -05:00
Jay Berkenbilt 4ecd1df6f2 Add configure option AVOID_WINDOWS_HANDLE
If set, we avoid using Windows I/O HANDLE, which is disallowed in some
versions of the Windows SDK, such as for Windows phones.
QUtil::same_file will always return false in this case. Only applies
to Windows builds.
2019-01-10 22:35:08 -05:00
Jay Berkenbilt d24a120c7f Add QPDF::setImmediateCopyFrom 2019-01-10 22:35:08 -05:00
Jay Berkenbilt b653929c93 Update version to 8.3.0 2019-01-07 11:16:54 -05:00
Jay Berkenbilt aa602fd107 Fix integer overflow in large file test 2019-01-07 08:49:14 -05:00
Jay Berkenbilt c3cee5f154 Exercise out of scope original pdf for copyForeignObject 2019-01-07 07:38:03 -05:00
Jay Berkenbilt fddbcab0e7 Mostly don't require original QPDF for copyForeignObject (fixes #219)
The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt fbbb0ee016 Make a static version of QPDF::pipeStreamData
This is in preparation of being able to pipe a stream's data without
keeping a copy of its containing qpdf object.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt 7588cac295 Create an application-scope unique ID for each QPDF object
Use this instead of QPDF* as a map key for object_copiers.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt e27ac682e0 Move encryption parameters into a class 2019-01-06 09:58:16 -05:00
Jay Berkenbilt a70fbaaf50 Honor other base encodings when generating appearances 2019-01-05 23:01:59 -05:00
Jay Berkenbilt b341d742db Add WinAnsi and MacRoman encoding 2019-01-05 23:01:44 -05:00
Jay Berkenbilt 3ef1b77304 Refactor QUtil::utf8_to_ascii 2019-01-05 22:59:29 -05:00
Jay Berkenbilt 089ce5902e Move utf8_to_utf16 into QUtil 2019-01-05 22:59:27 -05:00
Jay Berkenbilt ae18bfd142 Refactor string transcoding in QPDF_String 2019-01-05 22:56:58 -05:00
Jay Berkenbilt 2e342ee5bb Spell check 2019-01-04 21:33:14 -05:00
Jay Berkenbilt 16fd6e64f9 Add QPDFWriter::getFinalVersion (fixes #266) 2019-01-04 12:37:22 -05:00
Jay Berkenbilt 837dcf8fc2 Don't call assert while checking linearization data (fixes #209, #231)
Instead of calling assert for problems found during checking
linearization data, throw an exception which is later caught and
issued as an error. Ideally we would handle errors more robustly, but
this is still a significant improvement.
2019-01-04 11:55:42 -05:00
Jay Berkenbilt a01359189b Fix dangling references (fixes #240)
On certain operations, such as iterating through all objects and
adding new indirect objects, walk through the entire object structure
and explicitly resolve any indirect references to non-existent
objects. That prevents new objects from springing into existence and
causing the previously dangling references to point to them.
2019-01-04 10:29:29 -05:00
Jay Berkenbilt 158156d506 Add basic appearance stream generation 2019-01-04 08:00:19 -05:00
Jay Berkenbilt 02281632cc Add QUtil::utf8_to_ascii 2019-01-03 23:18:13 -05:00
Jay Berkenbilt b55567a0fa Add special case setV code for button fields 2019-01-03 23:18:13 -05:00
Jay Berkenbilt e3144ac417 Add form fields to json output
Also add some additional methods for detecting form field types to
assist in the json creation and for later use.
2019-01-03 23:18:13 -05:00
Jay Berkenbilt ca94ac68d9 Honor flags when flattening annotations 2019-01-03 11:59:55 -05:00
Jay Berkenbilt 06d6438ddf Minor fixes 2019-01-03 09:17:43 -05:00
Jay Berkenbilt 3e74916c5a Fix seg fault on empty xref stream (fixes #263)
Thanks to @p-cher for supplying a patch.
2019-01-03 09:17:43 -05:00
Jay Berkenbilt f78ea057ca Switch annotation flattening to use the form xobjects
Instead of directly putting the contents of the annotation appearance
streams into the page's content stream, add commands to render the
form xobjects directly. This is a more robust way to do it than the
original solution as it works properly with patterns and avoids
problems with resource name clashes between the pages and the form
xobjects.
2019-01-02 21:49:47 -05:00
Jay Berkenbilt 3b8ce4f12a Annotation flattening including form fields
Flatten annotations by integrating their appearance streams into the
content stream of the containing page. In the case of form fields,
only flatten if /NeedAppearance is false (or equivalently absent). If
flattening form fields, also remove /AcroForm from the document
catalog.
2019-01-01 08:14:15 -05:00
Jay Berkenbilt 95d6b17a89 Add QPDFObjectHandle::mergeDictionary() 2019-01-01 08:12:56 -05:00
Jay Berkenbilt 104fd6da52 Add matrix and annotation appearance stream handling
Generate page content fragment for rendering appearance streams
including all matrix calculation.
2019-01-01 08:07:21 -05:00
Jay Berkenbilt 5059ec0d35 Add Matrix class under QPDFObjectHandle 2018-12-31 23:02:43 -05:00
Jay Berkenbilt daeb5a85b6 Transformation matrix 2018-12-31 18:23:47 -05:00
Jay Berkenbilt 3440ea7d3c JSON::serialize -> unparse
Unparse is admittedly strange, but I'd rather be strange and
consistent, and everything else in the qpdf library uses unparse to
serialize. (If you're reading this, the convention of using "unparse"
comes from the "clu" programming language.)
2018-12-25 11:52:21 -05:00
Jay Berkenbilt fa3664357b Move numrange code from qpdf.cc to QUtil.cc
Also move tests to libtests.
2018-12-21 19:11:57 -05:00
Jay Berkenbilt d5d179f441 Add document and object helpers for outlines (bookmarks) 2018-12-21 19:11:57 -05:00
Jay Berkenbilt 30a0c070e4 Add QPDFObjectHandle::getJSON() 2018-12-21 18:34:56 -05:00
Jay Berkenbilt 651179b5da Add simple JSON serializer 2018-12-21 18:34:56 -05:00
Jay Berkenbilt 0776c00129 Add QPDFNameTreeObjectHelper 2018-12-21 18:34:56 -05:00
Jay Berkenbilt cc500eda91 Minor cleanup 2018-12-21 17:25:31 -05:00
Jay Berkenbilt 6ef9e31233 Add QPDFPageLabelDocumentHelper 2018-12-18 16:59:24 -05:00
Jay Berkenbilt f38df27aa3 Add QPDFNumberTreeObjectHelper 2018-12-18 16:46:10 -05:00
Jay Berkenbilt 077d3d4512 Add QPDFObjectHandle::wrapInArray()
Wrap an object in an array if it is not already an array.
2018-12-18 16:45:48 -05:00
Jay Berkenbilt d1368a3851 Commit automatically generated files 2018-10-11 17:27:54 -04:00
Jay Berkenbilt 6ee761fc86 Prepare 8.2.1 release 2018-08-18 10:56:19 -04:00
Jay Berkenbilt 5e9e17e62a Prepare 8.2.0 release 2018-08-16 11:53:10 -04:00
Jay Berkenbilt 693cdaac35 Missing header for std::max 2018-08-16 11:53:10 -04:00
Jay Berkenbilt b4ce557be5 Fix error in QPDFSystemError.cc 2018-08-14 11:39:07 -04:00
Jay Berkenbilt b4bdc42b4f New exception class QPDFSystemError (fixes #221) 2018-08-13 20:01:51 -04:00
Jay Berkenbilt 5d9d80beba Fix fallback logic for encryption (fixes #229) 2018-08-12 22:32:40 -04:00
Jay Berkenbilt 60fe8061cb Fix one more identifier (fixes #236) 2018-08-12 22:01:51 -04:00
Jay Berkenbilt a2f62935b3 Catch exceptions as const references (fixes #236)
This fix allows qpdf to compile/test cleanly with gcc 8.
2018-08-12 21:57:52 -04:00
Jay Berkenbilt 3d6615b276 Pl_Buffer: reduce memory growth (fixes #228)
Rather than keeping a list of buffers for every write, accumulate
bytes in a single buffer, doubling the size of the buffer when needed
to accommodate new data.

This is not the best possible implementation, but the change was
implemented in this way to avoid changing the shape of Pl_Buffer and
thus breaking backward compatibility.
2018-08-12 17:45:43 -04:00
Jay Berkenbilt 3873f5fd9b Protect headers with compliant identifiers (fixes #233) 2018-08-12 14:10:32 -04:00
Jay Berkenbilt 932799baab Fix memory access error
A previous fix introduced a potentially memory overrun under certain
rare conditions. The test suite now once again passes with address
sanitizer.
2018-08-12 13:16:17 -04:00
Jay Berkenbilt b6e414b10b Remove some extraneous null pointer checks (fixes #234)
There were a few places in the code that were checking that a pointer
wasn't null before deleting it, even though C++ has always allowed
delete 0. Most of the code did not perform these checks.
2018-08-12 12:58:39 -04:00
Jay Berkenbilt 4a4736c695 Fix EOL handling inside strings (fixes #226)
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is
to be ignored after backslash.
2018-08-05 20:48:35 -04:00
Jay Berkenbilt 1619cad1e8 Return correct method for string encryption (fixes #227) 2018-08-05 16:58:21 -04:00
Jay Berkenbilt e1cd5891af Fix infinite loop on small files with progress reporting (fixes #230)
Turns out you can keep adding zero to a number over and over again and
it just doesn't get any bigger. Who would have known?
2018-08-05 15:43:34 -04:00
Jay Berkenbilt 4f4c627b77 ClosedFileInputSource: add method to keep file open
During periods of intensive operation on a specific file, this method
can reduce the overhead of repeated open/close operations.
2018-08-04 19:52:46 -04:00
Jay Berkenbilt 1bd2a2e79b Prepare 8.1.0 release 2018-06-23 07:50:11 -04:00
Jay Berkenbilt 3aad28aed0 Bug fix: honor encryption key length with R=3 (fixes #212) 2018-06-22 19:24:26 -04:00
Jay Berkenbilt a433ed24f9 Add progress reporting for QPDFWriter (fixes #200) 2018-06-22 16:14:54 -04:00
Jay Berkenbilt 2a82f6e1e0 Add method to get count of objects in QPDF 2018-06-22 15:53:40 -04:00
Jay Berkenbilt c81836076f Correct incorrect comment 2018-06-22 13:13:09 -04:00
Jay Berkenbilt 4ccc8b1a44 Add ClosedFileInputSource
ClosedFileInputSource is an input source that keeps the file closed
when not reading it.
2018-06-22 12:52:45 -04:00
Jay Berkenbilt c71dc6888c Don't prune resource dictionaries on errors or by request
If we are unable to filter a page's content streams, don't attempt to
remove objects from the page's resource dictionary. Also provide a
command line option to suppress resource removal in case we ever need
this as a workaround for some bug or broken PDF files.
2018-06-22 10:45:31 -04:00
Jay Berkenbilt 38c9ed23c3 Treat content stream parsing errors as an error, not a warning
If parsing content streams is treated as a warning, there is no way
for a caller to know if a parsing operation has failed. This is very
dangerous and will likely result in data loss when token filters are
parser callbacks are in use.
2018-06-22 10:44:08 -04:00
Jay Berkenbilt 6c89d4b35b When splitting files, remove unreferenced objects (fixes #203) 2018-06-21 21:03:30 -04:00
Jay Berkenbilt ddd78c1b7f Fix QPDFObjectHandle::shallowCopy
It's not really a shallow copy. It just doesn't cross indirect object
boundaries. The old implementation had a bug that would cause multiple
shallow copies of the same object to share memory, which was not the
intention.
2018-06-21 20:34:45 -04:00
Jay Berkenbilt 397b097c46 Allow setting a form field's value 2018-06-21 15:57:13 -04:00
Jay Berkenbilt 952a665a4e Better support for creating Unicode strings 2018-06-21 15:57:13 -04:00
Jay Berkenbilt e44c395c51 QUtil::toUTF16 2018-06-21 15:57:13 -04:00
Jay Berkenbilt 0b05111db8 Implement helper class for interactive forms 2018-06-21 15:57:13 -04:00
Jay Berkenbilt 2e7ee23bf6 Add QPDFPageDocumentHelper and QPDFPageObjectHelper
This is the beginning of higher-level API support using helper
classes. The goal is to be able to add more helpers without continuing
to pollute QPDF's and QPDFObjectHandle's public interfaces.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt 4cded10821 Add QPDFObjectHandle::Rectangle type
Provide a convenient way of accessing rectangles.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt 078cf9bf90 newline before endstream fix for object streams (fixes #205) 2018-05-12 13:17:43 -04:00
Jay Berkenbilt 15ed9f8565 Fix small logic error in Token construct (fixes #206)
The special case around name token was not reachable. This would only
affect constructors of name tokens that were represented in
non-canonical form such as with a hex substitution for a printable
character. The error was harmless but still a bug.
2018-05-05 17:47:56 -04:00
Jay Berkenbilt b4d6cf6836 Limit depth of nesting in direct objects (fixes #202)
This fixes CVE-2018-9918.
2018-04-15 16:11:22 -04:00
Jay Berkenbilt f8c8e4dcc0 Prepare 8.0.2 release 2018-03-06 11:34:07 -05:00
Jay Berkenbilt e4e2e26d99 Properly handle pages with no contents (fixes #194)
Remove calls to assertPageObject(). All cases in the library that
called assertPageObject() work fine if you don't call
assertPageObject() because nothing assumes anything that was being
checked by that call. Removing the calls enables more files to be
successfully processed.
2018-03-06 11:34:07 -05:00
Jay Berkenbilt 1a4dcb4aaf Pl_Buffer starts in a ready state 2018-03-06 11:31:03 -05:00
Jay Berkenbilt ee44aef8d0 Treat loop in xref tables as damage (fixes #192)
Prior to this fix, if there was a loop detected in following /Prev
pointers in xref streams/tables, it would cause qpdf to lose data.
Note that this condition causes many PDF readers to hang or fail.
2018-03-05 14:26:58 -05:00
Jay Berkenbilt 6fe1e9de40 Prepare 8.0.1 release 2018-03-04 07:16:20 -05:00
Jay Berkenbilt 7b9f23a99a Ignore zlib data check errors (fixes #191) 2018-03-03 11:35:01 -05:00
Jay Berkenbilt 3e8b643ae3 Release 8.0.0 2018-02-25 16:00:11 -05:00
Jay Berkenbilt 111ec50950 8.0.rc3 2018-02-25 14:17:59 -05:00
Jay Berkenbilt d3d3970cf6 8.0.rc2 2018-02-25 13:50:22 -05:00
Jay Berkenbilt a16d703f4d Update version to 8.0.rc1
This is for testing the release process, particularly as it pertains
to AppImage creation.
2018-02-25 09:03:27 -05:00
Jay Berkenbilt 82cae01a76 Bump version number and soname
Bump to an alpha release. This version is not being widely released
but is being used to push the new shared library version through the
debian packaging system and to test out github releases.
2018-02-20 21:31:38 -05:00
Jay Berkenbilt 4bb3046f0b Properly handle strings with PDF Doc Encoding (fixes #179)
The QPDF_String::getUTF8Val() method was not treating strings that
weren't explicitly Unicode as PDF Doc Encoded. This only affects
characters in the range 0x80 through 0xa0.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt 2780a1871d Add C API for checking PDF files 2018-02-18 21:06:27 -05:00
Jay Berkenbilt d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt c2e16827b6 Replace "file position" with "offset" in error messages
Sometimes it's an offset in an object stream or a content stream, so
file position is confusing in some cases.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt 52e024f701 Include omitted object description in error message 2018-02-18 21:06:27 -05:00
Jay Berkenbilt cb3b705cf9 Include filename in object stream parse error 2018-02-18 21:06:27 -05:00
Jay Berkenbilt 21b7481b0e Push members of QPDFObjectHandle into a Members object
As in other cases, this is to enable adding new member variables in
the future without breaking ABI compatibility.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt e410b0fe0d Simplify TokenFilter interface
Expose Pl_QPDFTokenizer, and have it do more of the work of managing
the token filter's pipeline.
2018-02-18 21:05:47 -05:00
Jay Berkenbilt 1fdd86a049 Move Pl_QPDFTokenizer to public interface 2018-02-18 21:05:47 -05:00
Jay Berkenbilt 5708b5d0aa Add additional interface for filtering page contents 2018-02-18 21:05:47 -05:00
Jay Berkenbilt fd02944e19 Clean up comment 2018-02-18 21:05:47 -05:00
Jay Berkenbilt 5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt 9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00