2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-12-22 19:08:59 +00:00
Commit Graph

498 Commits

Author SHA1 Message Date
Jay Berkenbilt
85a3f95a89 qpdf: exit 3 for linearization warnings without errors (fixes #50) 2019-06-22 16:57:51 -04:00
Jay Berkenbilt
a5814d9302 ChangeLog: fix errors in previous entries 2019-06-22 16:57:51 -04:00
Jay Berkenbilt
1bde5c68a3 Add QUtil::read_file_into_memory
This code was essentially duplicated between test_driver and
standalone_fuzz_target_runner.
2019-06-22 10:14:25 -04:00
Jay Berkenbilt
864a546af6 Build with -fvisibility=hidden when supported 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
45dac410b5 Remove broken QPDFTokenizer::expectInlineImage 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
25dd3c6750 Remove QPDF::copyForeignObject with unused parameter 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
c6cfd64503 Rename QUtil::strcasecmp to QUtil::str_compare_nocase (fixes #242) 2019-06-21 22:29:31 -04:00
Jay Berkenbilt
cc2e8853b5 Enable int warnings by default
Now that there aren't any more...
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
d71f05ca07 Fix sign and conversion warnings (major)
This makes all integer type conversions that have potential data loss
explicit with calls that do range checks and raise an exception. After
this commit, qpdf builds with no warnings when -Wsign-conversion
-Wconversion is used with gcc or clang or when -W3 -Wd4800 is used
with MSVC. This significantly reduces the likelihood of potential
crashes from bogus integer values.

There are some parts of the code that take int when they should take
size_t or an offset. Such places would make qpdf not support files
with more than 2^31 of something that usually wouldn't be so large. In
the event that such a file shows up and is valid, at least qpdf would
raise an error in the right spot so the issue could be legitimately
addressed rather than failing in some weird way because of a silent
overflow condition.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
f40ffc9d63 Pl_Flate: constructor's out_bufsize is now unsigned int
This is the type we need for the underlying zlib implementation.
2019-06-21 13:17:21 -04:00
Jay Berkenbilt
3608afd5c5 Add new integer accessors to QPDFObjectHandle 2019-06-21 13:17:21 -04:00
Jay Berkenbilt
42306e2ff8 QUtil: add unsigned int/string functions 2019-06-21 13:17:21 -04:00
Jay Berkenbilt
a66828caff New safe type converters in QIntC 2019-06-21 13:17:21 -04:00
Jay Berkenbilt
616ae15595 Remove qpdf_read_memory_fuzzer 2019-06-18 08:43:55 -04:00
Jay Berkenbilt
127859a6d3 Run tests with sanitizers in CI 2019-06-15 17:24:24 -04:00
Jay Berkenbilt
bcfa407912 As a test suite, run stand-alone fuzzer on seed corpus
Temporarily skip fuzz tests on Windows. There are Windows-specific
failures to address later.
2019-06-15 17:24:24 -04:00
Jay Berkenbilt
cf469d7890 Give up reading objects with too many consecutive errors 2019-06-15 08:52:19 -04:00
Jay Berkenbilt
3d03024ab2 oss-fuzz initial integration 2019-06-13 09:28:38 -04:00
Jay Berkenbilt
31bde2f9d7 Handle empty DecodeParams array for (fixes #331)
On read, ignore /DecodeParms when empty list; on write, delete it.
Some files have been found that include an empty list for
/DecodeParms, but this is not technically compliant with the spec, and
the only sensible interpretation is to treat it as if there are no
decode parameters.
2019-06-09 17:19:49 -04:00
Jay Berkenbilt
b1a78be1a8 Prepare 8.4.2 release 2019-05-18 08:56:37 -04:00
Jay Berkenbilt
b3f0dbff62 Fix Windows memory error (fixes #330) 2019-05-16 14:26:51 -04:00
Jay Berkenbilt
a323f6f49f Prepare 8.4.1 release 2019-04-27 20:44:20 -04:00
Jay Berkenbilt
12b159118a Compare versions between CLI and library 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
2b011f9d81 Add --remove-page-labels option (fixes #317) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
e50d5201df Add --keep-files-open-threshold (fixes #288) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
011695dfdf Support Unicode in filenames (fixes #298) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
4ccb29912a Tighten isPageObject (fixes #310) 2019-04-20 21:00:43 -04:00
Jay Berkenbilt
a5a016cdd2 Revert preservations of outlines with --split-pages
The preservation of outlines didn't provide very useful behavior
anyway as it copied all outlines but most didn't work. This
implementation also caused a very significant performance hit and so
is being reverted until a proper solution can be coded. The eventual
solution will not be compatible with the reverted solution anyway, so
it's best not to leave this in.
2019-04-20 21:00:43 -04:00
Jay Berkenbilt
da7c2c0ee9 Fix json serialization for {x | -1 < x < 1} (fixes #308)
JSON serialization was preserving the value as presented, but JSON
doesn't accept decimal values without a 0 before the decimal point.
2019-03-11 16:22:59 -04:00
Jay Berkenbilt
03074ca5a0 Prepare 8.4.0 release 2019-02-01 22:25:25 -05:00
Jay Berkenbilt
0a470d2daf Don't optimize non-8-bit images
Also add test cases for additional coverage on image optimization.
2019-01-31 21:29:28 -05:00
Jay Berkenbilt
eb49e07c0a Make inline image token exactly contain the image data
Do not include the trailing EI, and handle cases where EI is not
preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31 20:28:44 -05:00
Jay Berkenbilt
5211bcb5ea Externalize inline images (fixes #278) 2019-01-31 10:38:13 -05:00
Jay Berkenbilt
22bcdbe786 Remove acroread from tests
This hasn't worked or been exercised in years since Adobe stopped
releasing a Linux version of reader.
2019-01-31 10:38:13 -05:00
Jay Berkenbilt
1eb35a355f Exclude space after ID in image data 2019-01-31 10:38:10 -05:00
Jay Berkenbilt
2b6c79bcae Improve locating inline image's EI
We've actually seen a PDF file in the wild that contained EI
surrounded by delimiters inside the image data, which confused qpdf's
naive code. This significantly improves EI detection.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
31372edce0 Inline image token value ends with EI, not delimiter
The inline image token erroneously included the delimiter that
followed EI. The ObjectHandle created from it was correct.
2019-01-31 09:26:37 -05:00
Jay Berkenbilt
8a9cfd2605 Handle direct page objects (fixes #164) 2019-01-29 17:01:36 -05:00
Jay Berkenbilt
2712869cf9 Fix logic for when to compress object and xref streams (fixes #271) 2019-01-28 21:43:06 -05:00
Jay Berkenbilt
52f9d326a5 Resolve duplicated page objects (fixes #268)
When linearizing a file or getting the list of all pages in a file,
detect if the pages tree contains a duplicated page object and, if so,
shallow copy it. This makes it possible to have a one to one mapping
of page positions to page objects.
2019-01-28 20:29:58 -05:00
Jay Berkenbilt
426434c772 Add --overlay and --underlay to qpdf CLI (fixes #207) 2019-01-27 09:30:13 -05:00
Jay Berkenbilt
2d1db06042 Example of form XObject, page overlay 2019-01-27 07:50:30 -05:00
Jay Berkenbilt
623f5b664e Convert pages to form XObjects
Support conversion of pages to form XObjects and placement of form
XObjects on pages.
2019-01-27 07:50:30 -05:00
Jay Berkenbilt
8cb245739c Add QPDFObjectHandle::getUniqueResourceName 2019-01-27 07:50:30 -05:00
Jay Berkenbilt
009767d97a Handle inheritable page attributes
Add getAttribute for handling inheritable page attributes, and fix
getPageImages and annotation flattening code to use it.
2019-01-25 22:30:05 -05:00
Jay Berkenbilt
2d32f4db8f Handle fallback font size in text appearances
If we end up using our fallback font size when generating appearances
for text fields, reflect that in the Tf operator used in the
appearance stream.
2019-01-21 07:38:21 -05:00
Jay Berkenbilt
930eade6d3 Fix omissions in text appearance generation
When generating appearance streams for variable text annotations,
properly handle the cases of there being no appearance dictionary, no
appearance stream, or an appearance stream with no BMC..EMC marker.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt
65ef0bf313 When flattening, remove annotations with no appearance stream
With the exception of form field annotations when /NeedAppearances is
true, remove annotations that don't have appearance streams when
flattening. There is no reason to keep these when flattening since
they are invisible. This may include unchecked checkboxes, unshown
popup windows, etc.
2019-01-20 23:05:58 -05:00
Jay Berkenbilt
c2030d1f33 Implement password recovery suppression and password mode (fixes #215)
Allow fine control over how passwords are encoded for writing, and
allow password for reading to be given as a hexademical encoded
string. Allow suppression of password recovery as a means to ensure
that the password you specify is actually the right one.
2019-01-19 10:14:07 -05:00
Jay Berkenbilt
392f2ece51 Try passwords with different string encodings 2019-01-19 10:10:58 -05:00
Jay Berkenbilt
e87d149918 Add QUtil::possible_repaired_encodings 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
966429e718 Update CLI and manual for new encryption granularity (fixes #214) 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
6ec22f117d Modernize encryption API for more granularity
Setting encryption permissions for R >= 3 set permission bits in
groups corresponding to menu options in Acrobat 5. The new API allows
the bits to be set individually.
2019-01-17 11:43:56 -05:00
Jay Berkenbilt
4630377731 Add status-reporting transcoders to QUtil 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
8f389f14c0 QUtil::analyze_encoding 2019-01-17 11:43:56 -05:00
Jay Berkenbilt
e09ae710dc Add tests for shared font/xobject
The tests are in a separate commit so the bug-fix commit can be taken
as a patch for older versions.
2019-01-17 09:44:29 -05:00
Jay Berkenbilt
654c0e8caf Allow adding the same page more than once in --pages (fixes #272) 2019-01-12 10:01:47 -05:00
Jay Berkenbilt
53d8e916b7 Interpret . in --pages as a shortcut for the primary file 2019-01-12 09:59:03 -05:00
Jay Berkenbilt
4ecd1df6f2 Add configure option AVOID_WINDOWS_HANDLE
If set, we avoid using Windows I/O HANDLE, which is disallowed in some
versions of the Windows SDK, such as for Windows phones.
QUtil::same_file will always return false in this case. Only applies
to Windows builds.
2019-01-10 22:35:08 -05:00
Jay Berkenbilt
d24a120c7f Add QPDF::setImmediateCopyFrom 2019-01-10 22:35:08 -05:00
Jay Berkenbilt
1dc235e56d Add completion files for packagers 2019-01-07 19:56:46 -05:00
Jay Berkenbilt
2d0336d862 Add --disable-check-autofiles to configure 2019-01-07 19:56:36 -05:00
Jay Berkenbilt
8f6f7cec50 Prepare 8.3.0 release 2019-01-07 11:16:54 -05:00
Jay Berkenbilt
74bef044cc Update release notes for 8.3.0 2019-01-07 11:16:54 -05:00
Jay Berkenbilt
fddbcab0e7 Mostly don't require original QPDF for copyForeignObject (fixes #219)
The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
2019-01-07 00:11:15 -05:00
Jay Berkenbilt
a70fbaaf50 Honor other base encodings when generating appearances 2019-01-05 23:01:59 -05:00
Jay Berkenbilt
b341d742db Add WinAnsi and MacRoman encoding 2019-01-05 23:01:44 -05:00
Jay Berkenbilt
089ce5902e Move utf8_to_utf16 into QUtil 2019-01-05 22:59:27 -05:00
Jay Berkenbilt
ee2aad4381 Add CLI flags for image optimization 2019-01-04 21:33:14 -05:00
Jay Berkenbilt
7b6ab900dc Support page collation with --collate (fixes #259) 2019-01-04 15:13:02 -05:00
Jay Berkenbilt
16fd6e64f9 Add QPDFWriter::getFinalVersion (fixes #266) 2019-01-04 12:37:22 -05:00
Jay Berkenbilt
837dcf8fc2 Don't call assert while checking linearization data (fixes #209, #231)
Instead of calling assert for problems found during checking
linearization data, throw an exception which is later caught and
issued as an error. Ideally we would handle errors more robustly, but
this is still a significant improvement.
2019-01-04 11:55:42 -05:00
Jay Berkenbilt
a01359189b Fix dangling references (fixes #240)
On certain operations, such as iterating through all objects and
adding new indirect objects, walk through the entire object structure
and explicitly resolve any indirect references to non-existent
objects. That prevents new objects from springing into existence and
causing the previously dangling references to point to them.
2019-01-04 10:29:29 -05:00
Jay Berkenbilt
158156d506 Add basic appearance stream generation 2019-01-04 08:00:19 -05:00
Jay Berkenbilt
02281632cc Add QUtil::utf8_to_ascii 2019-01-03 23:18:13 -05:00
Jay Berkenbilt
ca94ac68d9 Honor flags when flattening annotations 2019-01-03 11:59:55 -05:00
Jay Berkenbilt
06d6438ddf Minor fixes 2019-01-03 09:17:43 -05:00
Jay Berkenbilt
f78ea057ca Switch annotation flattening to use the form xobjects
Instead of directly putting the contents of the annotation appearance
streams into the page's content stream, add commands to render the
form xobjects directly. This is a more robust way to do it than the
original solution as it works properly with patterns and avoids
problems with resource name clashes between the pages and the form
xobjects.
2019-01-02 21:49:47 -05:00
Jay Berkenbilt
3b8ce4f12a Annotation flattening including form fields
Flatten annotations by integrating their appearance streams into the
content stream of the containing page. In the case of form fields,
only flatten if /NeedAppearance is false (or equivalently absent). If
flattening form fields, also remove /AcroForm from the document
catalog.
2019-01-01 08:14:15 -05:00
Jay Berkenbilt
95d6b17a89 Add QPDFObjectHandle::mergeDictionary() 2019-01-01 08:12:56 -05:00
Jay Berkenbilt
5059ec0d35 Add Matrix class under QPDFObjectHandle 2018-12-31 23:02:43 -05:00
Jay Berkenbilt
6048c6e2f0 Don't crash on @file when file doesn't exist (fixes #265)
When @file is used and file doesn't exist, just treat it as a normal
argument.
2018-12-23 11:46:56 -05:00
Jay Berkenbilt
64c1579544 Support zsh completion 2018-12-23 11:21:59 -05:00
Jay Berkenbilt
24aeb9ae22 Document json support 2018-12-22 14:05:01 -05:00
Jay Berkenbilt
bb89382f93 Allow --show-object=trailer 2018-12-21 19:11:57 -05:00
Jay Berkenbilt
dd1aca552c Support bash completion using complete -C 2018-12-21 19:11:57 -05:00
Jay Berkenbilt
313ba08126 Preserve some outline functionality in page splitting 2018-12-21 19:11:57 -05:00
Jay Berkenbilt
d5d179f441 Add document and object helpers for outlines (bookmarks) 2018-12-21 19:11:57 -05:00
Jay Berkenbilt
30a0c070e4 Add QPDFObjectHandle::getJSON() 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
651179b5da Add simple JSON serializer 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
0776c00129 Add QPDFNameTreeObjectHelper 2018-12-21 18:34:56 -05:00
Jay Berkenbilt
352ce9b22b Preserve page labels (numbers) when splitting and merging 2018-12-18 16:59:24 -05:00
Jay Berkenbilt
6ef9e31233 Add QPDFPageLabelDocumentHelper 2018-12-18 16:59:24 -05:00
Jay Berkenbilt
f38df27aa3 Add QPDFNumberTreeObjectHelper 2018-12-18 16:46:10 -05:00
Jay Berkenbilt
077d3d4512 Add QPDFObjectHandle::wrapInArray()
Wrap an object in an array if it is not already an array.
2018-12-18 16:45:48 -05:00
Jay Berkenbilt
a5ee55f2e8 ChangeLog 2018-10-11 19:16:26 -04:00
Jay Berkenbilt
4628461383 Set up Azure Pipelines
Use free Azure Pipelines to do Linux, Windows, and Mac build and test
and to generate Windows binary distributions.
2018-10-11 15:07:51 -04:00
Jay Berkenbilt
6ee761fc86 Prepare 8.2.1 release 2018-08-18 10:56:19 -04:00
Jay Berkenbilt
28453a4908 Add --keep-files-open flag (fixes #237) 2018-08-18 10:56:01 -04:00
Jay Berkenbilt
5e9e17e62a Prepare 8.2.0 release 2018-08-16 11:53:10 -04:00
Jay Berkenbilt
723b054bf9 Spell check 2018-08-16 11:53:10 -04:00
Jay Berkenbilt
e37ce85190 Clarify static vs. import library on Windows (fixes #225) 2018-08-14 16:57:37 -04:00
Jay Berkenbilt
b4bdc42b4f New exception class QPDFSystemError (fixes #221) 2018-08-13 20:01:51 -04:00
Jay Berkenbilt
fb1e29476c Add --no-warn option to suppress warnings (fixes #232) 2018-08-12 22:20:40 -04:00
Jay Berkenbilt
3d6615b276 Pl_Buffer: reduce memory growth (fixes #228)
Rather than keeping a list of buffers for every write, accumulate
bytes in a single buffer, doubling the size of the buffer when needed
to accommodate new data.

This is not the best possible implementation, but the change was
implemented in this way to avoid changing the shape of Pl_Buffer and
thus breaking backward compatibility.
2018-08-12 17:45:43 -04:00
Jay Berkenbilt
4a4736c695 Fix EOL handling inside strings (fixes #226)
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is
to be ignored after backslash.
2018-08-05 20:48:35 -04:00
Jay Berkenbilt
e1cd5891af Fix infinite loop on small files with progress reporting (fixes #230)
Turns out you can keep adding zero to a number over and over again and
it just doesn't get any bigger. Who would have known?
2018-08-05 15:43:34 -04:00
Jay Berkenbilt
fe769f2723 Keep file open while adding its pages during merge (fixes #217) 2018-08-04 19:58:13 -04:00
Jay Berkenbilt
4f4c627b77 ClosedFileInputSource: add method to keep file open
During periods of intensive operation on a specific file, this method
can reduce the overhead of repeated open/close operations.
2018-08-04 19:52:46 -04:00
Jay Berkenbilt
1bd2a2e79b Prepare 8.1.0 release 2018-06-23 07:50:11 -04:00
Jay Berkenbilt
6bf47ac6e8 With --verbose, give information on processing merge inputs 2018-06-22 16:14:54 -04:00
Jay Berkenbilt
a433ed24f9 Add progress reporting for QPDFWriter (fixes #200) 2018-06-22 16:14:54 -04:00
Jay Berkenbilt
2a82f6e1e0 Add method to get count of objects in QPDF 2018-06-22 15:53:40 -04:00
Jay Berkenbilt
99593e0eef Use ClosedFileInputSource when merging files (fixes #154) 2018-06-22 12:53:41 -04:00
Jay Berkenbilt
4ccc8b1a44 Add ClosedFileInputSource
ClosedFileInputSource is an input source that keeps the file closed
when not reading it.
2018-06-22 12:52:45 -04:00
Jay Berkenbilt
c71dc6888c Don't prune resource dictionaries on errors or by request
If we are unable to filter a page's content streams, don't attempt to
remove objects from the page's resource dictionary. Also provide a
command line option to suppress resource removal in case we ever need
this as a workaround for some bug or broken PDF files.
2018-06-22 10:45:31 -04:00
Jay Berkenbilt
6c89d4b35b When splitting files, remove unreferenced objects (fixes #203) 2018-06-21 21:03:30 -04:00
Jay Berkenbilt
84cd53f5af Make page range optional in --rotate (fixes #211) 2018-06-21 16:28:44 -04:00
Jay Berkenbilt
2e8a3e163f Add interactive form example 2018-06-21 16:04:54 -04:00
Jay Berkenbilt
397b097c46 Allow setting a form field's value 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
952a665a4e Better support for creating Unicode strings 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
0b05111db8 Implement helper class for interactive forms 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
2e6e1204a5 Convert examples to use new page helper classes 2018-06-21 15:57:13 -04:00
Jay Berkenbilt
2e7ee23bf6 Add QPDFPageDocumentHelper and QPDFPageObjectHelper
This is the beginning of higher-level API support using helper
classes. The goal is to be able to add more helpers without continuing
to pollute QPDF's and QPDFObjectHandle's public interfaces.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt
4cded10821 Add QPDFObjectHandle::Rectangle type
Provide a convenient way of accessing rectangles.
2018-06-21 15:57:13 -04:00
Jay Berkenbilt
078cf9bf90 newline before endstream fix for object streams (fixes #205) 2018-05-12 13:17:43 -04:00
Jay Berkenbilt
b4d6cf6836 Limit depth of nesting in direct objects (fixes #202)
This fixes CVE-2018-9918.
2018-04-15 16:11:22 -04:00
Jay Berkenbilt
f8c8e4dcc0 Prepare 8.0.2 release 2018-03-06 11:34:07 -05:00
Jay Berkenbilt
e4e2e26d99 Properly handle pages with no contents (fixes #194)
Remove calls to assertPageObject(). All cases in the library that
called assertPageObject() work fine if you don't call
assertPageObject() because nothing assumes anything that was being
checked by that call. Removing the calls enables more files to be
successfully processed.
2018-03-06 11:34:07 -05:00
Jay Berkenbilt
ee44aef8d0 Treat loop in xref tables as damage (fixes #192)
Prior to this fix, if there was a loop detected in following /Prev
pointers in xref streams/tables, it would cause qpdf to lose data.
Note that this condition causes many PDF readers to hang or fail.
2018-03-05 14:26:58 -05:00
Jay Berkenbilt
6fe1e9de40 Prepare 8.0.1 release 2018-03-04 07:16:20 -05:00
Jay Berkenbilt
666f794393 Support "r" in page ranges (fixes #155) 2018-03-04 07:05:14 -05:00
Jay Berkenbilt
7b9f23a99a Ignore zlib data check errors (fixes #191) 2018-03-03 11:35:01 -05:00
Jay Berkenbilt
3e8b643ae3 Release 8.0.0 2018-02-25 16:00:11 -05:00
Jay Berkenbilt
4bb3046f0b Properly handle strings with PDF Doc Encoding (fixes #179)
The QPDF_String::getUTF8Val() method was not treating strings that
weren't explicitly Unicode as PDF Doc Encoded. This only affects
characters in the range 0x80 through 0xa0.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
2780a1871d Add C API for checking PDF files 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
d0e99f195a More robust handling of type errors
Give objects descriptions and context so it is possible to issue
warnings instead of fatal errors for attempts to access objects of the
wrong type.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
c2e16827b6 Replace "file position" with "offset" in error messages
Sometimes it's an offset in an object stream or a content stream, so
file position is confusing in some cases.
2018-02-18 21:06:27 -05:00
Jay Berkenbilt
52e024f701 Include omitted object description in error message 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
cb3b705cf9 Include filename in object stream parse error 2018-02-18 21:06:27 -05:00
Jay Berkenbilt
5708b5d0aa Add additional interface for filtering page contents 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
510d45d00d General comment in ChangeLog 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
30709935af Filter tokens example 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
25988e8d10 Bug fix: content normalizer should not add trailing newline
Adding a trailing newline in content normalization damages files whose
contents are split across streams in the middle of tokens. Let
QPDFWriter add the newline with the indicator to ignore the newline,
which it already does. This changes the way some qdf files look.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
6afe83978f Switch from parseContentStream to parsePageContents 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fefe25030e Inline image token type 2018-02-18 21:05:46 -05:00