2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-11-15 17:17:08 +00:00
Commit Graph

489 Commits

Author SHA1 Message Date
Jay Berkenbilt
5136238f2a Detect and report bad tokens in content normalization 2018-02-18 21:05:47 -05:00
Jay Berkenbilt
9910104442 Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
b8723e97f4 Add coalesce contents capability 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
25988e8d10 Bug fix: content normalizer should not add trailing newline
Adding a trailing newline in content normalization damages files whose
contents are split across streams in the middle of tokens. Let
QPDFWriter add the newline with the indicator to ignore the newline,
which it already does. This changes the way some qdf files look.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fcd611b61e Refactor parseContentStream 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
05ff619b09 Remove redundant method
Remove a redundant method that was equal to another one with
additional arguments. This breaks binary compatibility, but there are
other ABI breaking changes in the upcoming release, so now is the time
to do it.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
55ee55394c Use inline image token in content parser 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
ba453ba4ff Use space tokens in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
ec538792fa Use inline image token type in tokenizer filter 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
fefe25030e Inline image token type 2018-02-18 21:05:46 -05:00
Jay Berkenbilt
2699ecf13e Push QPDFTokenizer members into a nested structure
This is for protection against future ABI breaking changes.
2018-02-18 21:05:46 -05:00
Jay Berkenbilt
d97474868d Lexer enhancements: EOF, comment, space
Significant enhancements to the lexer to improve EOF handling and to
support comments and spaces as tokens. Various other minor issues were
fixed as well.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
ebd5ed63de Add option to save pass 1 of lineariziation
This is useful only for debugging the linearization code.
2018-02-18 20:18:40 -05:00
Jay Berkenbilt
2ebdd6929e Prepare 7.1.1 release 2018-02-04 18:31:42 -05:00
Jay Berkenbilt
e3167c1a60 Fix linearization for files with nonstandard ID length 2018-02-04 18:16:23 -05:00
Jay Berkenbilt
3b2a3cdd77 Fix setLineBuf for bsd (fixes #177)
Use 0 instead of NULL in a cast.
2018-02-04 14:19:00 -05:00
Jay Berkenbilt
d5bfd49cb2 Remove use of std::abs (fixes #172)
Different compilers want different choices of headers for std::abs.
It's easier to just to not use it.
2018-02-04 14:19:00 -05:00
Jay Berkenbilt
34a9b835b0 Fix indentation 2018-02-04 14:19:00 -05:00
Jay Berkenbilt
7e5e1a7158 Fix offset in error message 2018-02-04 14:19:00 -05:00
Jay Berkenbilt
633fb414af Pl_QPDFTokenizer: Use unsigned_char_pointer instead of copy 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
13d9756a45 Minor fixes to tokenizer 2018-01-28 18:34:43 -05:00
Jay Berkenbilt
2e4ca7ecf4 Update version numbers for 7.1.0 2018-01-14 20:09:20 -05:00
Jay Berkenbilt
04e47deaf9 Fixes for clang 2018-01-14 19:18:04 -05:00
Jay Berkenbilt
569d74d36b Allow raw encryption key to be specified
Add options to enable the raw encryption key to be directly shown or
specified. Thanks to Didier Stevens <didier.stevens@gmail.com> for the
idea and contribution of one implementation of this idea.
2018-01-14 10:21:05 -05:00
Jay Berkenbilt
3e306ae64c Add QUtil::hex_decode 2018-01-14 09:04:13 -05:00
Jay Berkenbilt
791e0db762 Allow trailing . in numeric token (fixes #165) 2018-01-13 20:05:40 -05:00
Jay Berkenbilt
ec0087e3ce Support TIFF Predictor (fixes #171) 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
53971d50be Add Pl_TIFFPredictor 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
d9c9049708 Add signed support to BitStream and BitWriter 2018-01-13 19:49:42 -05:00
Jay Berkenbilt
661ed1d28e Minor fixes to Pl_PNGFilter
Fix comment, remove restriction that doesn't actually matter.
2018-01-13 19:49:42 -05:00
Jay Berkenbilt
be27d47bdc Use better error for getStreamData failure
If the stream isn't filterable but we call getStreamData, throw a
regular exception instead of a logic error so that normal error
handling and reporting mechanisms will be used.
2018-01-13 19:49:42 -05:00
Jay Berkenbilt
4edfe1f41d Add tests for new PNG filters 2017-12-25 18:20:52 -05:00
Jay Berkenbilt
a3a55be9cd Correct errors in PNG filters and make use from library 2017-12-25 14:24:48 -05:00
Casey Rojas
9a48720246 Initial implementation of other PNG decode filters
Initial implementation provided by Casey Rojas <crojas@infotechfl.com>
Some problems are fixed in a subsequent commit.
2017-12-24 22:59:51 -05:00
Jay Berkenbilt
0f1ce8e646 Prepare 7.0.0 release 2017-09-16 13:22:15 -04:00
Jay Berkenbilt
249e95f608 Fix test failure on MSVC 2017-09-15 23:09:04 -04:00
Jay Berkenbilt
6898bc8d98 Spell check 2017-09-15 23:09:04 -04:00
Jay Berkenbilt
f2ffb6968a Fix Windows compilation errors 2017-09-15 21:44:57 -04:00
Jay Berkenbilt
d31a7b76e7 Improve message for stream decoding error
Tweak the message so that we inform the user that we are mitigating
data loss.
2017-09-12 16:03:48 -04:00
Jay Berkenbilt
eaacf94005 Update C API with new QPDFWriter methods 2017-09-12 14:30:39 -04:00
Jay Berkenbilt
40ecba4172 Pl_DCT: Use custom source and destination managers (fixes #153)
Avoid calling jpeg_mem_src and jpeg_mem_dest. The custom destination
manager writes to the pipeline in smaller chunks to avoid having the
whole image in memory at once. The source manager works directly with
the Buffer object. Using customer managers avoids use of memory source
and destination managers, which are not present in older versions of
libjpeg still in use by some Linux distributions.
2017-09-07 22:59:11 -04:00
Jay Berkenbilt
3ef1be9783 PNGFilter: Better range checking for columns 2017-08-31 07:26:58 -04:00
Jay Berkenbilt
1868a10f8b Replace all atoi calls with QUtil::string_to_int
The latter catches underflow/overflow.
2017-08-29 12:28:32 -04:00
Jay Berkenbilt
742190bd98 Pl_PNGFilter: disallow columns = 0 2017-08-29 12:28:32 -04:00
Jay Berkenbilt
6d46346eb9 Detect integer overflow/underflow 2017-08-29 12:28:32 -04:00
Jay Berkenbilt
e999bbae43 Fix memory leak with bad jpeg data 2017-08-28 22:16:45 -04:00
Jay Berkenbilt
c6872d2c70 Clean up circular references in QPDF_Stream 2017-08-28 22:16:31 -04:00
Jay Berkenbilt
728dc9e6d8 Fix error caught by clang 2017-08-26 21:51:17 -04:00
Jay Berkenbilt
dea704f0ab Pad keys to avoid memory errors (fixes #147) 2017-08-26 21:35:59 -04:00
Jay Berkenbilt
021c229331 Fix Pl_Flate memory leak on error (fixes #148) 2017-08-25 22:26:53 -04:00
Jay Berkenbilt
ad527a64f9 Parse iteratively to avoid stack overflow (fixes #146) 2017-08-25 21:56:45 -04:00
Jay Berkenbilt
85f05cc57f Detect xref pointer infinite loop (fixes #149) 2017-08-25 19:58:31 -04:00
Jay Berkenbilt
1e52d33822 Bump soname to 18 and version to 7.0.b1 2017-08-22 16:50:48 -04:00
Jay Berkenbilt
e452d9dca6 Spell check 2017-08-22 14:22:20 -04:00
Jay Berkenbilt
6219111ed7 Update references to README files
Most of the README files have been renamed. Refer to the new names.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
83ec09f66c Do memory checks
Slightly improve memory cleanup in Pl_DCT
Make it easier to test with valgrind
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
fabff0f3ec Limit token length during xref recovery
While scanning the file looking for objects, limit the length of
tokens we allow. This prevents us from getting caught up in reading a
file character by character while digging through large streams.
2017-08-22 14:13:10 -04:00
Jay Berkenbilt
caf5e39c2e Fix compiler warnings for clang/mac OS X 2017-08-22 14:13:10 -04:00
Jay Berkenbilt
6884ad2ead Fix logic error in recovery
A stray semicolon caused a condition to be incorrectly applied during
stream length recovery.
2017-08-22 07:19:41 -04:00
Jay Berkenbilt
ce435222b2 Push QPDFWriter member variables into a nested class 2017-08-21 22:04:07 -04:00
Jay Berkenbilt
a8c93bd324 Push QPDF member variables into a nested class
Pushing member variables into a nested class enables addition of new
member variables without breaking binary compatibility.
2017-08-21 21:35:11 -04:00
Jay Berkenbilt
198856a825 Improve pclm parameter settings 2017-08-21 21:05:48 -04:00
Jay Berkenbilt
8ab52fa558 Combine writePCLm with writeStandard
Reduce code duplication
2017-08-21 21:05:48 -04:00
Jay Berkenbilt
9f60a864a0 Combine PCLm header into writeHeader 2017-08-21 21:05:47 -04:00
Jay Berkenbilt
adbcfcff2d Remove duplicated coverage cases
Remove duplicated coverage cases from Sahil's code so existing test
suite passes.
2017-08-21 18:55:02 -04:00
Sahil Arora
b19210fa7d QPDFWriter: Add setPCLm() and writePCLm() methods
* Add support for PCLm using setPCLm() and writePCLm() methods in
  QPDFWriter.hh and QPDFWriter.cc
* Add a function writePCLmHeader() for PCLm header in QPDFWriter
2017-08-21 18:55:02 -04:00
Jay Berkenbilt
ddc6cf0cf6 Precheck streams by default
There is no need for a --precheck-streams option. We can do the
precheck without imposing any penalty, only re-encoding the stream if
it fails the first time.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
9744414c66 Enable finer grained control of stream decoding
This commit adds several API methods that enable control over which
types of filters QPDF will attempt to decode. It also adds support for
/RunLengthDecode and /DCTDecode filters for both encoding and
decoding.
2017-08-21 17:44:22 -04:00
Jay Berkenbilt
ae90d2c485 Implement Pl_DCT pipeline
Additional testing is added in later commits to be supported by
additional changes in the library.
2017-08-21 17:44:02 -04:00
Jay Berkenbilt
2d2f619665 Implement Pl_RunLength pipeline 2017-08-19 14:50:55 -04:00
Jay Berkenbilt
cfa2eb97fb Add page rotation (fixes #132) 2017-08-12 22:57:38 -04:00
Jay Berkenbilt
8249a26d69 Fix infinite loop in QPDFWriter (fixes #143) 2017-08-12 08:36:36 -04:00
Jay Berkenbilt
36b3fe5af7 Fix --newline-before-endstream option (fixes #133)
Add a newline unconditionally before endstream even if a newline was
already written as part of the stream data.
2017-08-11 20:57:05 -04:00
Jay Berkenbilt
46611f0710 Prevent a division by zero error (fixes #141)
Bad /W in an xref stream could cause a division by zero error. Now
this is handled as a special case.
2017-08-11 20:11:19 -04:00
Jay Berkenbilt
8fe0b06cd8 Pad encryption parameters that are too short (fixes #96) 2017-08-11 19:53:56 -04:00
Jay Berkenbilt
e7d0019bf4 Generate libqpdf.map from autoconf
Rather than checking consistency of libqpdf.map, generate it.
2017-08-11 04:56:22 -04:00
Jay Berkenbilt
6247aaa57c Fix libqpdf.map and prevent future breakage
The build now checks to make sure libqpdf.map has the right library
version number in it.
2017-08-10 21:53:19 -04:00
Jay Berkenbilt
9a96e233b0 Remove PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
30f109e244 Read xref table without PCRE
Also accept more errors than before.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
98a843c2a2 Reconstruct xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ca5b1d267a Improve stream length recovery
Eliminate PCRE and find endobj not preceded by endstream. Be more lax
about placement of endstream and endobj.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
3082e4e606 Find xref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
90840be594 Find lindict without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
03aa9679ac Find starxref without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
1765c6ec20 Find header without PCRE 2017-08-10 21:30:32 -04:00
Jay Berkenbilt
296b679d6e Implement findFirst and findLast in InputSource
Preparing to refactor some pattern searching code to use these instead
of their own memchr loops. This should simplify the code that replaces
PCRE.
2017-08-10 21:30:32 -04:00
Jay Berkenbilt
ef8ae5449d Allow QPDFTokenizer::readToken to return bad tokens
Sometimes we want to ignore bad tokens rather than having them throw
an exception. A coverage case is commented out here and added in a
later commit.
2017-08-10 19:01:41 -04:00
Jay Berkenbilt
8fe261d8b4 QUtil::strcasecmp 2017-08-05 10:22:33 -04:00
Pranjal Bhor
6f88fd36ab Include missing header in QPDFTokenizer.cc (fixes #125)
Required for strtol()
2017-07-30 08:47:05 -04:00
Jay Berkenbilt
2d5b854468 Allow reading command-line args from files (fixes #16) 2017-07-29 22:23:21 -04:00
Jay Berkenbilt
5993c3e83c Detect input file = output file (fixes #29) 2017-07-29 20:58:01 -04:00
Jay Berkenbilt
570db9b60b Catch more exceptions while resolving objects 2017-07-29 19:31:12 -04:00
Jay Berkenbilt
b43a0ac237 When recover stream length, indicate the length (fixes #44) 2017-07-29 19:15:06 -04:00
Jay Berkenbilt
f37d399d82 Add newline-before-endstream option (fixes #103) 2017-07-29 12:21:38 -04:00
Jay Berkenbilt
6a7d53ad2b Handle zlib data errors better (fixes #106) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
07d6f770b2 Better recovery of bad stream start (fixes #104) 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
b389268f16 Better handle split content streams (fixes #73)
When parsing content streams, allow content to be split arbitrarily
across stream boundaries.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
a136824243 Fix exception catch 2017-07-29 12:19:04 -04:00
Jay Berkenbilt
ba2bae4acc Use 1.2 as the version if we can't read it from the header
The code was using 1.0, but we use /FlateDecode, which didn't appear
until 1.2.
2017-07-29 12:19:04 -04:00
Jay Berkenbilt
3a1ff5ded9 Add option to preserve unreferenced objects 2017-07-28 19:19:11 -04:00