2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-09-27 12:39:06 +00:00
Commit Graph

930 Commits

Author SHA1 Message Date
Jay Berkenbilt
fe36ef141c
Merge pull request #924 from cdosborn/main
Improve --optimize-images to find images nested within XObjects
2023-03-18 15:34:27 -04:00
Jay Berkenbilt
1e53da74bc
Merge pull request #918 from m-holger/fixqdf
Code tidy QdfFixer methods
2023-03-18 14:00:11 -04:00
Connor Osborn
f6b13fcc05 Add test validating that images in nested XObjects are included in optimization
The sample file (nested-images.pdf) includes a pdf with an image that is
nested within an XObject within an XObject in the Resources dict of the
only page. These images were ignored in prior versions of qpdf.
2023-03-15 23:27:05 -04:00
m-holger
cfcceff6aa Replace std::regex_search with string_view methods in QdfFixer::processLines 2023-03-09 12:16:33 +00:00
m-holger
011b1d7e3a Use std::string_view in QdfFixer::processLines
Change type of local var lines to string_view. Also, instead of
constructing a list of lines, read the entire input into a single string
and break it up into lines on the fly.stash
2023-03-09 11:44:26 +00:00
m-holger
82efe52b7d Tidy QdfFixer::adjustOstreamXref 2023-03-08 10:00:17 +00:00
m-holger
fc828c2a50 Tidy QdfFixer::checkObjId 2023-03-08 09:56:53 +00:00
m-holger
71bba5d40d Code tidy QdfFixer::writeBinary 2023-03-08 09:50:49 +00:00
Jay Berkenbilt
78f7dc9fe8 Overlay/underlay: capture origial page as form XObject (fixes #904) 2023-02-25 12:58:51 -05:00
Jay Berkenbilt
0f97e98203 Handle linearization warnings as proper warning (fixes #851) 2023-02-18 19:38:49 -05:00
m-holger
bb89a60320 Add data member JSONParser::token_start 2023-02-04 13:52:55 +00:00
Jay Berkenbilt
1308c45090 Implement --remove-restrictions (fixes #833) 2023-01-28 13:42:19 -05:00
Jay Berkenbilt
e4e0f40fc0 Fix annotations properly for page with no /Resources (fixes #827) 2023-01-09 07:31:39 -05:00
Jay Berkenbilt
bf477fbb96 Do double indirect test correctly 2023-01-01 07:33:34 -05:00
Jay Berkenbilt
ce8e63cb9a Add test case for broken indirect object reference
...where the first "number" is an indirect object that happens to be a
number.
2022-12-31 15:12:58 -05:00
m-holger
0ca44ef84c Fix QPDFObjectHandle::isScalar
Exclude uninitialized, destroyed and reserved objects.
2022-12-31 09:27:19 -05:00
Jay Berkenbilt
ff42ea4e6c Fix logic for fixDanglingReferences 2022-11-26 18:13:46 -05:00
m-holger
3f632458ae Refactor QPDF::fixDanglingReferences 2022-11-26 16:26:42 -05:00
Jay Berkenbilt
19a8d3fea2 Add test case of dangling ref not found until xref reconstruction 2022-11-25 15:16:16 -05:00
Jay Berkenbilt
32251497c1 Temporary (revert after fix): mark test as expected failure 2022-11-25 15:16:16 -05:00
Jay Berkenbilt
bd337b8055 Preserve unreferenced objects in dangling test 2022-11-25 15:16:16 -05:00
Jay Berkenbilt
f6367bbada Dangling ref test: show new object ID 2022-11-25 15:16:16 -05:00
Jay Berkenbilt
5489f1d8d6 Code formatting updates 2022-11-25 15:16:16 -05:00
m-holger
b1eb1a9584 Refactor QPDFObjectHandle::copyObject1 2022-11-20 12:07:22 -05:00
Jay Berkenbilt
e9980efec8 Correctly handle reuse of xref stream (fixes #809) 2022-11-19 17:03:17 -05:00
m-holger
f69ed209d0 Use QPDF::newStream in examples 2022-11-19 14:10:42 -05:00
Jay Berkenbilt
db6598b449 Attempt to test for QPDFNameTreeObjectHelper's vtable
It has disappeared from the DLL on Windows a few times.
2022-10-06 08:40:08 -04:00
m-holger
5ccab4be03 Add private methods QPDF::damagedPDF 2022-10-01 11:17:39 -04:00
Jay Berkenbilt
f4ca04cec1 Fix edge case in character encoding (fixes #778)
Avoid representing as PDF Doc encoding any string whose PDF Doc
encoding representation starts with a UTF-16 or UTF-8 marker.
2022-09-26 08:06:47 -04:00
Jay Berkenbilt
77111086eb Add code to CI to verify signed/unsigned char
Make sure that our attempt to test both signed and unsigned char is
actually right.
2022-09-23 17:44:16 -04:00
m-holger
2e6869483b Replace calls to QUtil::int_to_string with std::to_string 2022-09-21 15:57:14 -04:00
Jay Berkenbilt
2394acf7a6 Remove explicit direct object check from getObject
An indirect object reference to 0, 0 is invalid. If it appears in the
file or is parsed from a string, the parser catches it. This check
would only be useful for someone explicitly calling getObject with 0,
0, and that would trigger an error during resolve().
2022-09-13 11:21:29 -04:00
Jay Berkenbilt
4963ce6a53 Remove obsolete LL_FMT check from build (fixes #768)
This was broken for cross-compilation and has probably been
unnecessary for several years now.

Also fix extraneous whitespace in related some tests.
2022-09-12 11:48:38 -04:00
Jay Berkenbilt
8a3cdfd2af Change QPDFObjectHandle == to isSameObjectAs
Replace operator== and operator!=, which were testing for the same
underlying object, with isSameObjectAs. This change was motivated by
the fact that pikepdf internally had its own operator== method for
QPDFObjectHandle that did structural comparison. I backed out qpdf's
operator== as a courtesy to pikepdf (in my own testing) but also
because I think people might naturally assume that operator== does a
structural comparison, and isSameObjectAs is clearer in its intent.
2022-09-09 18:09:40 -04:00
Jay Berkenbilt
3dbab589e3 Add C API functions for using custom loggers
Expose functions to the C API to create new loggers and to setLogger
and getLogger for QPDF and QPDFJob.
2022-09-09 10:49:25 -04:00
Andreas Stieger
7049588bff Fix tests with GNU grep 3.8
GNU grep 3.8 started to emit warnings when invoking egrep.
Convert all calls to grep -E.
2022-09-09 06:57:38 -04:00
Jay Berkenbilt
f1a2d3160a Add JSON v2 support to C API 2022-09-09 06:19:09 -04:00
Jay Berkenbilt
c7a4967d10 Change reset to disconnect and clarify comments
I decided that it's actually fine to copy a direct object to another
QPDF. Even if we eventually prevent a QPDFObject from having multiple
parents, this could happen if an object is moved.
2022-09-08 11:06:15 -04:00
Jay Berkenbilt
dba61da1bf Create a special "destroyed" type rather than using null
When a QPDF is destroyed, changing indirect objects to direct nulls
makes them effectively disappear silently when they sneak into other
places. Instead, we should treat this as an error. Adding a destroyed
object type makes this possible.
2022-09-08 10:36:39 -04:00
Jay Berkenbilt
264e25f391 Clear owning QPDF information for all objects, not just indirect 2022-09-08 10:19:38 -04:00
Jay Berkenbilt
a615985865 Update QPDFObject with comment
Also, since it's just there for compatibility, we don't need to add
new object types to it.
2022-09-08 10:19:38 -04:00
Jay Berkenbilt
4422588d7d Remove unneeded owning_qpdf from QPDFValue
The qpdf member was already sufficient. Removing this actually fixed a
few pre-existing issues around detecting foreign ownership and
allowing certain conditions to be warnings rather than exceptions.
2022-09-08 10:19:38 -04:00
Jay Berkenbilt
bac559559e Fix typo 2022-09-07 17:26:39 -04:00
Jay Berkenbilt
76cd7ea67a Clarify and improve QPDFPageObjectHelper::get*Box methods
Add copy_if_fallback and explain how it differs from copy_if_shared.
2022-09-06 19:00:40 -04:00
Jay Berkenbilt
c1def4ead4 Implement QPDFObjectHandle equality 2022-09-06 18:34:23 -04:00
Jay Berkenbilt
94c79bb8f6 Support --show-encryption without a valid password (fixes #598) 2022-09-06 12:45:12 -04:00
Jay Berkenbilt
55cc2ab680 Re-introduce QPDFObject.hh as deprecated
* Just removing a header file would cause build errors with no hint as
  to what happened. This way, people get a warning rather than error
  for the life of qpdf 11, and the warning tells them what to do.

* This avoids build surprises resulting from having two versions of
  QPDF headers installed at once. If you were building code out of a
  checkout of qpdf but had an older version installed on your system,
  if your code included <qpdf/QPDFObject.hh>, everything would work,
  but then your code would break without QPDFObject.hh later.
2022-09-05 18:52:59 -04:00
Jay Berkenbilt
a5ae042e2b Add workaround for bug in ghostscript 9.56 (fixes #732) 2022-09-02 11:51:38 -04:00
Jay Berkenbilt
31396f61c9 Disallow --empty with --replace-input (fixes #728) 2022-09-02 09:37:17 -04:00
Jay Berkenbilt
a59e7ac7ec Disable copying/assigning to QPDF objects, add QPDF::create() 2022-09-02 08:53:27 -04:00
Jay Berkenbilt
f772c43de8 Stop including QPDFObject.hh from other than private files
This required moving some newly inlined functions back to the cc file,
but that seems to have had no measurable performance impact.
2022-09-01 18:19:47 -04:00
Jay Berkenbilt
b663926538 Remove QPDFObject::object_type_e as alias for qpdf_object_type_e 2022-09-01 18:11:22 -04:00
Jay Berkenbilt
4f4b908605 Add a file with arrays with lots of nulls to the test suite
A bug was fixed between qpdf 8.4.2 and 9.0.0 regarding this type of
file (see #305 and #311), but it was necessary to retest after some
major refactoring work at the lexical and parsing layers. This lays
the groundwork for including this in performance benchmarks and in the
qpdf test suite rather than having to keep a large,
non-redistributable file around.

20 arrays of 20K nulls is plenty for performance memory testing and
doesn't take too long to run. Compared to qpdf 8.4.2, in qpdf 11.0.0,
the file generated here uses 3% of the RAM and runs over 4 times
faster.
2022-09-01 16:15:54 -04:00
Jay Berkenbilt
3d029fb17e
Merge pull request #730 from m-holger/allpages
Tidy QPDF::getAllPagesInternal and QPDF::pushInheritedAttributesToPageInternal
2022-09-01 15:28:32 -04:00
m-holger
805c1ad479 Reset QPDFValue::qpdf and QPDFValue::og when the owning QPDF object gets destroyed 2022-09-01 17:20:16 +01:00
m-holger
356b582cec Remove QPDFObjectHandle::newIndirect
Modify QPDFParser::parse to call QPDF::getObject instead.
2022-09-01 16:59:01 +01:00
m-holger
c5d0428da2 Modify QPDF::getObject to not to resolve the object 2022-09-01 14:47:24 +01:00
m-holger
6670c685ab Move QPDFObjectHandle::parseInternal to new class QPDFParser
Part of #729
2022-08-30 05:56:23 +01:00
m-holger
931fbb6156 Integrate names into state machine in QPDFTokenizer 2022-08-25 11:26:38 +01:00
m-holger
e4fe0d5cf5 Refactor QPDFTokenizer::inHexstring 2022-08-25 10:50:06 +01:00
m-holger
ff69773b35 Fix warnings in QPDF::getAllPagesInternal 2022-08-01 13:29:14 +01:00
m-holger
9dea7d3080 Tune QPDF::getAllPagesInternal
Avoid calling getAllPagesInternal for each /Page object.
2022-08-01 13:29:14 +01:00
Jay Berkenbilt
12d065c751 Provide a simpler QPDF::writeJSON 2022-07-31 16:23:17 -04:00
Jay Berkenbilt
13cf35ce2f Use calledgetallpages and pushedinheritedpageresources 2022-07-31 16:23:17 -04:00
Jay Berkenbilt
69820847af Change the output of --json to use "qpdf" instead of "objects" 2022-07-31 15:17:01 -04:00
Jay Berkenbilt
d01c4f8819 Change --json-output format
from "qpdf-v2" to "qpdf": [..., ...]
2022-07-31 10:32:55 -04:00
Jay Berkenbilt
b3e6d445cb Tweak "AndGet" mutator functions again
Remove any ambiguity around whether old or new value is being
returned.
2022-07-24 15:42:23 -04:00
m-holger
afd35f9a30 Overload StreamDataProvider::provideStreamData
Use 'QPDFObjGen const&' instead of 'int, int' in signature.
2022-07-24 16:02:35 +01:00
m-holger
f0a8178091 Refactor QPDFObject creation and cloning
Move responsibility for creating shared pointers to objects and cloning from QPDFObjectHandle to QPDFObject.
2022-06-27 12:47:02 -04:00
Jay Berkenbilt
0c7c7e4ba4 Track whether certain page modifying methods have been called
We need to know whether pushInheritedAttributesToPage or getAllPages
have been called when generating JSON output. When reading the JSON
back in, we have to call the same methods so that object numbers will
line up properly.
2022-06-25 13:55:45 -04:00
Jay Berkenbilt
8a32515a62 Add warnings for some additional page tree repair 2022-06-25 13:25:35 -04:00
Jay Berkenbilt
eae75dbe44 Add Pl_Function -- a generic function pipeline 2022-06-19 09:12:29 -04:00
Jay Berkenbilt
bb0ea2f8e7 Add qpdfjob_register_progress_reporter 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
87412eb05b Add QPDFJob::registerProgressReporter 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
3a7ee7e938 Move C-based ProgressReporter helper into QPDFWriter 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
daef4e8fb8 Add more flexible funtions to qpdfjob C API 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
e0720eaa78 Use the default logger for other writes to stdout/stderr
When there is no context for writing output or error messages, use the
default logger.
2022-06-18 10:38:50 -04:00
Jay Berkenbilt
83be2191b4 Use "save" logger when saving data to standard output
This includes the output PDF, streams from --show-object and
attachments from --save-attachment. This also enables --verbose and
--progress to work with saving to stdout.
2022-06-18 09:54:40 -04:00
Jay Berkenbilt
641e92c6a7 QPDF, QPDFJob: use QPDFLogger instead of custom output streams 2022-06-18 09:02:55 -04:00
Jay Berkenbilt
f1f711963b Add and test QPDFLogger class 2022-06-18 09:02:55 -04:00
Jay Berkenbilt
b7bbf12e85 In json mode, reveal recovered user password when otherwise unavailable 2022-05-30 20:03:08 -04:00
Jay Berkenbilt
f049a77c59 Add additional information when listing attachments 2022-05-30 20:03:08 -04:00
Jay Berkenbilt
27a42c16c7 Change default decode level to "none" with --json-output 2022-05-21 17:51:34 -04:00
Jay Berkenbilt
b0f1564376 Add another binary utf8 to JSON test 2022-05-21 17:39:35 -04:00
Jay Berkenbilt
752f43d4e4 Allow empty b: binary JSON strings 2022-05-21 17:36:32 -04:00
m-holger
6c69a747b9 Code clean up: use range-style for loops wherever possible
Remove variables obsoleted by commit 4f24617.
2022-05-21 16:06:29 -04:00
Jay Berkenbilt
905f47a55f Add json to large file test 2022-05-21 09:43:45 -04:00
Jay Berkenbilt
9b2eb01e25 Exercise object description in tests 2022-05-20 14:23:32 -04:00
Jay Berkenbilt
6c2fb5b8f0 Add test for bad data and bad datafile 2022-05-20 13:33:30 -04:00
Jay Berkenbilt
d065098089 Test --update-from-json 2022-05-20 11:10:12 -04:00
Jay Berkenbilt
6d4e3ba8a4 Test (and fix) handling of dangling references 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
35b1e1c493 Explicitly test ignoring unknown keys in JSON input 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
dc8df962d8 Make version default to latest for --json-output (like --json) 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
907df2c823 Round-trip tests with --json-stream-data=file 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
a83b7b0611 Tests with manually constructed qpdf json 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
7f8c4b183d Add tests for --json-input 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
1ec561daa4 Add more names and strings in good13
* native UTF-8 strings
* names whose PDF and canonical syntax differ in both dictionary key
  positions and other positions

For json, names are converted both as names and directly when used as
dictionary keys.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
6c5e590673 Rename all test files: _ to - 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
6f43bf8de3 Major rework -- see long comments
* Replace --create-from-json=file with --json-input, which causes the
  regular input to be treated as json.
* Eliminate --to-json
* In --json=2, bring back "objects" and eliminate "objectinfo". Stream
  data is never present.
* In --json-output=2, write "qpdf-v2" with "objects" and include
  stream data.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
56f1b411fe Back out fluent QPDFObjectHandle methods. Keep the andGet methods.
I decided these were confusing and inconsistent with how JSON works.
They muddle the API rather than improving it.
2022-05-20 09:16:25 -04:00