2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-12-23 11:28:56 +00:00
Commit Graph

1227 Commits

Author SHA1 Message Date
Jay Berkenbilt
87412eb05b Add QPDFJob::registerProgressReporter 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
3a7ee7e938 Move C-based ProgressReporter helper into QPDFWriter 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
8130d50e3b Add C API to QPDFLogger 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
daef4e8fb8 Add more flexible funtions to qpdfjob C API 2022-06-19 08:46:58 -04:00
Jay Berkenbilt
e0720eaa78 Use the default logger for other writes to stdout/stderr
When there is no context for writing output or error messages, use the
default logger.
2022-06-18 10:38:50 -04:00
Jay Berkenbilt
83be2191b4 Use "save" logger when saving data to standard output
This includes the output PDF, streams from --show-object and
attachments from --save-attachment. This also enables --verbose and
--progress to work with saving to stdout.
2022-06-18 09:54:40 -04:00
Jay Berkenbilt
641e92c6a7 QPDF, QPDFJob: use QPDFLogger instead of custom output streams 2022-06-18 09:02:55 -04:00
Jay Berkenbilt
f1f711963b Add and test QPDFLogger class 2022-06-18 09:02:55 -04:00
Jay Berkenbilt
f588d74140 Add integer types to Pipeline::operator<< 2022-06-18 09:02:55 -04:00
m-holger
057bd659bc Code tidy: remove redundant variable in QPDF::writeJSON 2022-06-05 18:46:21 -04:00
Jay Berkenbilt
0bd908b550 Update documentation for qpdf JSON v2 2022-05-30 20:03:08 -04:00
Jay Berkenbilt
b7bbf12e85 In json mode, reveal recovered user password when otherwise unavailable 2022-05-30 20:03:08 -04:00
Jay Berkenbilt
f049a77c59 Add additional information when listing attachments 2022-05-30 20:03:08 -04:00
Jay Berkenbilt
04fc7c4bea Add conversions to ISO-8601 date format 2022-05-30 20:03:08 -04:00
Jay Berkenbilt
27a42c16c7 Change default decode level to "none" with --json-output 2022-05-21 17:51:34 -04:00
Jay Berkenbilt
752f43d4e4 Allow empty b: binary JSON strings 2022-05-21 17:36:32 -04:00
Jay Berkenbilt
05460d405c Format code 2022-05-21 16:11:42 -04:00
m-holger
6c69a747b9 Code clean up: use range-style for loops wherever possible
Remove variables obsoleted by commit 4f24617.
2022-05-21 16:06:29 -04:00
Jay Berkenbilt
c56a9ca7f6 JSON: Fix large file support 2022-05-21 09:43:45 -04:00
Jay Berkenbilt
47c093c48b Replace std::regex with validators for better performance 2022-05-21 08:43:21 -04:00
Jay Berkenbilt
9b2eb01e25 Exercise object description in tests 2022-05-20 14:23:32 -04:00
Jay Berkenbilt
6c2fb5b8f0 Add test for bad data and bad datafile 2022-05-20 13:33:30 -04:00
Jay Berkenbilt
d065098089 Test --update-from-json 2022-05-20 11:10:12 -04:00
Jay Berkenbilt
ef955b04b5 Bug fix: don't clobber stream length with replaceDict 2022-05-20 11:09:45 -04:00
Jay Berkenbilt
3eb77a7004 JSON: detect duplicate dictionary keys while parsing 2022-05-20 10:13:15 -04:00
Jay Berkenbilt
6d4e3ba8a4 Test (and fix) handling of dangling references 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
5a2aa59479 Bug fix: isReserved() true for indirect reference to reserved object 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
35b1e1c493 Explicitly test ignoring unknown keys in JSON input 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
dc8df962d8 Make version default to latest for --json-output (like --json) 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
6c7326b290 JSON fix: correctly parse UTF-16 surrogate pairs 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
6f43bf8de3 Major rework -- see long comments
* Replace --create-from-json=file with --json-input, which causes the
  regular input to be treated as json.
* Eliminate --to-json
* In --json=2, bring back "objects" and eliminate "objectinfo". Stream
  data is never present.
* In --json-output=2, write "qpdf-v2" with "objects" and include
  stream data.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
23fc6756f1 Add QUtil::FileCloser to the public API 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
0fe8d44762 Support stream data -- not tested
There are no automated tests yet, but committing work so far in
preparation for some refactoring.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
63c7eefe9d replaceStreamData: accept uninitialized filter/decode_parms
These mean to leave the original values alone. This is needed for
reconstructing streams from JSON given that the stream data and stream
dictionary may appear in any order in the JSON.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
56f1b411fe Back out fluent QPDFObjectHandle methods. Keep the andGet methods.
I decided these were confusing and inconsistent with how JSON works.
They muddle the API rather than improving it.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
7e7a9c4379 Parse objects; stream data is not yet handled 2022-05-20 09:16:25 -04:00
Jay Berkenbilt
9064542b5f Add private methods for reserving specific objects 2022-05-20 07:54:09 -04:00
Jay Berkenbilt
7fa5d1773b Implement top-level qpdf json parsing 2022-05-16 13:41:40 -04:00
Jay Berkenbilt
8d42eb2632 Add scaffolding for QPDF JSON reactor 2022-05-16 13:41:40 -04:00
Jay Berkenbilt
4fe2e06b47 Add --create-from-json and --update-from-json arguments
Also add stubs for top-level QPDF methods (createFromJSON,
updateFromJSON)
2022-05-16 13:41:40 -04:00
Jay Berkenbilt
9a0e9a1a9e Remove offset from missing /Root error
The last offset is irrelevant to not being able to find /Root.
2022-05-16 13:39:26 -04:00
Jay Berkenbilt
051ae7c282 Improve handling of replacing stream data with empty strings
When an empty string was passed to replaceStreamData, the code was
passing a null pointer to memcpy. Since a 0 size was also passed, this
was harmless, but it triggers sanitizer errors. The code properly
handles a null pointer as the buffer in other places.
2022-05-16 13:39:26 -04:00
Jay Berkenbilt
60ec94a7c3 Add QUtil::is_long_long 2022-05-16 13:39:26 -04:00
Jay Berkenbilt
4c7cfd5cbc JSON reactor: improve handling of nested containers
Call the parent container's item method before calling the child
item's start method so we can easily know the current nesting level
when nested items are added.
2022-05-14 17:35:06 -04:00
Jay Berkenbilt
2a2f7f1bba Add maxobjectid to JSON 2022-05-08 13:45:20 -04:00
Jay Berkenbilt
e9390aeaaa Add --to-json option 2022-05-08 13:45:20 -04:00
Jay Berkenbilt
c76536dd9a Implement JSON v2 output 2022-05-08 13:45:20 -04:00
Jay Berkenbilt
15272662f6 Fix typo in json output key name
moddify -> modify. Also carefully spell checked all remaining keys by
splitting them into words and running a spell checker, not just
relying on visual proofreading. That was the only one.
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
1bc8abfdd3 Implement JSON v2 for Stream
Not fully exercised in this commit
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
3246923cf2 Implement JSON v2 for String
Also refine the herustic for deciding whether to use hexadecimal
notation for a string.
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
16f4f94cd9 Prepare code for JSON v2
Update getJSON() methods and calls to them
2022-05-07 11:12:01 -04:00
Jay Berkenbilt
a9fbbd5dca Objectinfo json: write incrementally and in numeric order
This script was used on test data:

----------
#!/usr/bin/env python3
import json
import sys
import re

def json_dumps(data):
    return json.dumps(data, ensure_ascii=False,
                      indent=2, separators=(',', ': '))

for filename in sys.argv[1:]:
    with open(filename, 'r') as f:
        data = json.loads(f.read())
    if 'objectinfo' not in data:
        continue
    trailer = None
    to_sort = []
    for k, v in data['objectinfo'].items():
        if k == 'trailer':
            trailer = v
        else:
            m = re.match(r'^(\d+) \d+ R', k)
            if m:
                to_sort.append([int(m.group(1)), k, v])
    newobjectinfo = {x[1]: x[2] for x in sorted(to_sort)}
    if trailer is not None:
        newobjectinfo['trailer'] = trailer
    data['objectinfo'] = newobjectinfo
print(json_dumps(data))
----------
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
948de60990 Objects json: write incrementally and in numeric order
The following script was used to adjust test data:

----------
#!/usr/bin/env python3
import json
import sys
import re

def json_dumps(data):
    return json.dumps(data, ensure_ascii=False,
                      indent=2, separators=(',', ': '))

for filename in sys.argv[1:]:
    with open(filename, 'r') as f:
        data = json.loads(f.read())
    if 'objects' not in data:
        continue
    trailer = None
    to_sort = []
    for k, v in data['objects'].items():
        if k == 'trailer':
            trailer = v
        else:
            m = re.match(r'^(\d+) \d+ R', k)
            if m:
                to_sort.append([int(m.group(1)), k, v])
    newobjects = {x[1]: x[2] for x in sorted(to_sort)}
    if trailer is not None:
        newobjects['trailer'] = trailer
    data['objects'] = newobjects
print(json_dumps(data))
----------
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
f50274ef46 Pages json: write each page incrementally 2022-05-07 08:26:31 -04:00
Jay Berkenbilt
dc9b7287cd Top-level json: write incrementally
This commit just changes the order in which fields are written to the
json without changing their content. All the json files in the test
suite were modified with this script to ensure that we didn't get any
changes other than ordering.

----------
#!/usr/bin/env python3
import json
import sys

def json_dumps(data):
    return json.dumps(data, ensure_ascii=False,
                      indent=2, separators=(',', ': '))

for filename in sys.argv[1:]:
    with open(filename, 'r') as f:
        data = json.loads(f.read())
    newdata = {}
    for i in ('version', 'parameters', 'pages', 'pagelabels',
              'acroform', 'attachments', 'encrypt', 'outlines',
              'objects', 'objectinfo'):
        if i in data:
            newdata[i] = data[i]
print(json_dumps(newdata))
----------
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
7f65a5c21f Test json against schema only on demand
Testing json against schema requires an in-memory copy, so do it only
when requested by the test suite.
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
a3c9980395 Add next to Pl_String and fix comments 2022-05-07 08:26:31 -04:00
Jay Berkenbilt
b361c5ce19 Add --test-json-schema command-line option 2022-05-07 08:26:31 -04:00
Jay Berkenbilt
7604ac5cb2 QPDFJob: have doJSON write to a pipeline 2022-05-07 08:26:31 -04:00
Jay Berkenbilt
0500d4347a JSON: add blob type that generates base64-encoded binary data 2022-05-06 19:14:52 -04:00
Jay Berkenbilt
05fda4afa2 Change JSON parser to parse from an InputSource 2022-05-04 12:07:11 -04:00
Jay Berkenbilt
e5f3910c3e Add new FileInputSource constructors 2022-05-04 12:07:11 -04:00
Jay Berkenbilt
e259635986 JSON: add write methods and implement unparse() in terms of those 2022-05-04 12:07:11 -04:00
Jay Berkenbilt
8b25de24c9 Make "objects" and "pages" consistent in JSON output 2022-05-04 08:32:44 -04:00
Jay Berkenbilt
6b576797cd Don't call pushInheritedAttributesToPage in json mode
We used to have to do that, but for quite some time, the code that
gets images has no longer required it.
2022-05-04 07:11:13 -04:00
Jay Berkenbilt
f4206a0938 Add new Pl_String Pipeline 2022-05-03 18:54:51 -04:00
Jay Berkenbilt
16139d97c8 Add new Pl_OStream Pipeline 2022-05-03 18:54:51 -04:00
Jay Berkenbilt
21d6e3231f Make use of the new Pipeline methods in some places 2022-05-03 18:31:23 -04:00
Jay Berkenbilt
f1c6bb97db Add new Pipeline convenience methods 2022-05-03 18:31:22 -04:00
Jay Berkenbilt
59f3e09edf Make Pipeline::write take an unsigned char const* (API change) 2022-05-03 18:31:22 -04:00
Jay Berkenbilt
62bf296a9c Make assert handling less error-prone
Prevent my future self or other contributors from using assert in
tests and then having that assert not do anything because of the
NDEBUG macro.
2022-05-03 18:31:22 -04:00
Jay Berkenbilt
92b692466f Remove remaining incorrect assert calls from implementation 2022-05-03 18:31:22 -04:00
Jay Berkenbilt
3d9bac43da Add internal Pl_Base64
Bidirectional base64; will be used by JSON v2.
2022-05-03 18:31:22 -04:00
Jay Berkenbilt
6724a362c3 Move generate_auto_job to the top-level CMakeLists.txt 2022-05-03 08:39:50 -04:00
Jay Berkenbilt
8d2a0eda5a Add reactors to the JSON parser 2022-05-01 19:55:52 -04:00
Jay Berkenbilt
72e5c73419 Limit parser depth for json parser 2022-05-01 12:56:22 -04:00
Jay Berkenbilt
e34dbbfa18 Spell check 2022-05-01 12:56:22 -04:00
Jay Berkenbilt
8ccd3a8a89 Mark weak encryption with API changes (fixes #576) 2022-04-30 17:24:15 -04:00
Jay Berkenbilt
2213ed0c3d Remove deprecated (pre-8.4.0) encryption APIs 2022-04-30 17:23:58 -04:00
Jay Berkenbilt
cff26040d8 Using insecure crytpo from the CLI is now an error by default 2022-04-30 17:23:58 -04:00
Jay Berkenbilt
ce19471f18 Add comments around non-security-related uses of MD5 2022-04-30 14:15:07 -04:00
Jay Berkenbilt
c365a26e9d Revert "Remove QPDFObjectHandle::replaceOrRemoveKey"
This reverts commit dc059560e7.

I changed my mind. There's no harm in leaving it deprecated for a
release cycle.
2022-04-30 14:15:07 -04:00
Jay Berkenbilt
dc059560e7 Remove QPDFObjectHandle::replaceOrRemoveKey
See ChangeLog for rationale for not deprecating it as originally
planned.
2022-04-30 13:39:45 -04:00
Jay Berkenbilt
4f24617e1e Code clean up: use range-style for loops wherever possible
Where not possible, use "auto" to get the iterator type.

Editorial note: I have avoid this change for a long time because of
not wanting to make gratuitous changes to version history, which can
obscure when certain changes were made, but with having recently
touched every single file to apply automatic code formatting and with
making several broad changes to the API, I decided it was time to take
the plunge and get rid of the older (pre-C++11) verbose iterator
syntax. The new code is just easier to read and understand, and in
many cases, it will be more effecient as fewer temporary copies are
being made.

m-holger, if you're reading, you can see that I've finally come
around. :-)
2022-04-30 13:27:18 -04:00
Jay Berkenbilt
7f023701dd Formatting: remove space in range-style for loops
Change .clang-format and commit automated changes from a fresh run of
format-code
2022-04-30 13:26:43 -04:00
Jay Berkenbilt
2878c186bf Use fluent appendItem 2022-04-30 10:54:16 -04:00
Jay Berkenbilt
ab9d557cb0 Use fluent replaceKey 2022-04-29 20:39:54 -04:00
Jay Berkenbilt
d8fdf632a9 Use replaceKeyAndGet in a few places in existing code 2022-04-29 20:28:02 -04:00
Jay Berkenbilt
e80fad86e9 Add new QPDFObjectHandle methods for more fluent programming 2022-04-29 20:09:10 -04:00
Jay Berkenbilt
d0b7cc8ac6 QPDFJob json: make removeAttachment take an array (fixes #693) 2022-04-24 13:06:19 -04:00
Jay Berkenbilt
63c5a56f38 Fix build logic around generate_auto_job
It was being run at configuration time, not build time.
2022-04-24 13:06:16 -04:00
Jay Berkenbilt
08ba21cf49 Fix some bugs around null values in dictionaries
Make it so that a key with a null value is always treated as not being
present. This was inconsistent before.
2022-04-24 10:08:32 -04:00
Jay Berkenbilt
4be2f36049 Deprecate replaceOrRemoveKey -- it's the same as replaceKey 2022-04-24 09:31:32 -04:00
Jay Berkenbilt
4925f0d18c Have dictionary/streams mutators take const& where possible 2022-04-24 09:05:50 -04:00
Jay Berkenbilt
68e721981a Add new QPDF::warn that takes most of QPDFExc's arguments 2022-04-23 18:25:43 -04:00
Jay Berkenbilt
22b35c4928 Expose QUtil::get_next_utf8_codepoint 2022-04-23 18:25:43 -04:00
Jay Berkenbilt
5bbb0d4c30 Replace switch statements with static map initializers
Character transcoding from Unicode to single-byte characters used
hard-coded switch statements because the code predated our adoption of
C++11. Now we have thread-safe, static initialization of map literals,
so use that instead.
2022-04-23 18:25:43 -04:00
Jay Berkenbilt
ce5c3bcad8 QPDFJob: pass capture output streams through to underlying QPDF 2022-04-18 11:24:17 -04:00
Jay Berkenbilt
75fe4f60c3 Use anonymous namespaces for file-private classes 2022-04-16 13:35:27 -04:00
Jay Berkenbilt
80ed3076a0 Remove deprecated name/number tree constructors
Remove the name/number tree object helper constructors that don't take
a QPDF&.
2022-04-16 13:13:15 -04:00