Jay Berkenbilt
0e3d4cdc97
Fix/clarify meaning of depth parameter to json write methods
2022-07-31 10:32:55 -04:00
Jay Berkenbilt
4feb10fdaf
Merge pull request #734 from m-holger/nullptr
...
Code tidy : replace 0 with nullptr or true
2022-07-31 08:33:45 -04:00
m-holger
073808aa50
Code tidy : replace 0 with nullptr or true
2022-07-26 13:40:13 +01:00
Jay Berkenbilt
4674c04cb8
JSON schema: support multi-element array validation
2022-07-24 16:44:51 -04:00
Jay Berkenbilt
f8d1ab9462
JSON schema -- accept single item in place of array
...
When the schema wants a variable-length array, allow a single item as
well as allowing an array.
2022-07-24 16:17:03 -04:00
Jay Berkenbilt
b3e6d445cb
Tweak "AndGet" mutator functions again
...
Remove any ambiguity around whether old or new value is being
returned.
2022-07-24 15:42:23 -04:00
m-holger
8b4afa428e
Revert making second parameter of QPDFObjGen::QPDFObjGen optional
...
Also, change test for QPDFObjGen::isIndirect to obj != 0.
Delete comment from commit afd35f9
.
2022-07-24 16:55:10 +01:00
m-holger
afd35f9a30
Overload StreamDataProvider::provideStreamData
...
Use 'QPDFObjGen const&' instead of 'int, int' in signature.
2022-07-24 16:02:35 +01:00
m-holger
5d0469f1bc
QPDFObjGen : tidy QPDFJob
...
Use QPDFObjGen::unparse where appropriate.
2022-07-24 16:02:35 +01:00
m-holger
4b73d057fb
QPDFObjGen : tidy QPDF_Stream
...
Change method signatures to use QPDFObjGen.
Replace QPDF_Stream::objid and generation with QPDF_Stream::og.
2022-07-24 16:02:35 +01:00
m-holger
f7978db1f6
QPDFObjGen : tidy QPDF private methods
...
Change method signatures to use QPDFObjGen.
Use QPDFObjGen methods where possible.
Remove redundant QPDF::objGenToIndirect.
2022-07-24 16:02:35 +01:00
m-holger
3404ca8ac8
QPDFObjGen : tidy QPDFObjectHandle private methods
...
Change method signature to use QPDFObjGen.
2022-07-24 15:59:49 +01:00
m-holger
b123f79dfd
Replace QPDFObjectHandle::objid and generation with QPDFObjectHandle::og
2022-07-24 15:59:49 +01:00
m-holger
c0168cf88c
QPPFObjGen : tidy QPDF::readObjectAtOffset
...
Change method signature to use QPDFObjGen.
2022-07-24 15:59:49 +01:00
m-holger
eeb6162f76
Add optional parameter separator to QPDFObjGen::unparse
...
Also, revert inlining of unparse and operator << from commit 4c6640c
in
order to avoid exposing QUtil.
2022-07-24 15:41:48 +01:00
Jay Berkenbilt
6f1041afb8
Clarify intent in readObjectAtOffset
...
Rather than using object id -1 to mean "don't care", use object ID 0,
and clarify the difference between that use and indication of a direct
object.
2022-07-24 09:40:11 -04:00
m-holger
4c6640cb45
Inline QPDFObjGen methods
...
ABI breaking change
2022-07-16 14:32:48 -04:00
Jay Berkenbilt
a603c1e395
Run format-code
2022-06-27 12:50:35 -04:00
m-holger
f0a8178091
Refactor QPDFObject creation and cloning
...
Move responsibility for creating shared pointers to objects and cloning from QPDFObjectHandle to QPDFObject.
2022-06-27 12:47:02 -04:00
m-holger
5aa8225f49
Refactor QPDFObjectTypeAccessor and QPDFObjectHandle::dereference
2022-06-27 10:39:04 -04:00
Jay Berkenbilt
0c7c7e4ba4
Track whether certain page modifying methods have been called
...
We need to know whether pushInheritedAttributesToPage or getAllPages
have been called when generating JSON output. When reading the JSON
back in, we have to call the same methods so that object numbers will
line up properly.
2022-06-25 13:55:45 -04:00
Jay Berkenbilt
25aff0bd52
TODO: abandon (again) and update notes about QPDFPagesTree
2022-06-25 13:26:53 -04:00
Jay Berkenbilt
8a32515a62
Add warnings for some additional page tree repair
2022-06-25 13:25:35 -04:00
Jay Berkenbilt
6c4537885e
Reformat code
2022-06-25 11:11:24 -04:00
m-holger
7836e19747
Code tidy: remove redundant calls to QPDFObjectHandle::isInitialized
2022-06-25 11:10:06 -04:00
m-holger
3b3bcab349
Remove QPDF_Stream::setStreamDescription
2022-06-25 08:26:46 -04:00
m-holger
9eda1fdc41
Remove redundant QPDF_Array::setDescription and QPDF_Dictionary::setDescription
2022-06-25 08:25:58 -04:00
m-holger
e9c1637353
Add private method QPDFObjectHandle::getObjGenAsStr
...
Also, use methods to access objid and generation.
2022-06-25 08:25:32 -04:00
m-holger
97f737a562
Code tidy: QPDFJob::doJSONPageLabels
...
Remove redundant variables pages and next.
2022-06-25 08:24:50 -04:00
Jay Berkenbilt
1eb2f208ec
Use Pl_Function in qpdflogger C API implementation
2022-06-19 09:12:59 -04:00
Jay Berkenbilt
eae75dbe44
Add Pl_Function -- a generic function pipeline
2022-06-19 09:12:29 -04:00
Jay Berkenbilt
bb0ea2f8e7
Add qpdfjob_register_progress_reporter
2022-06-19 08:46:58 -04:00
Jay Berkenbilt
87412eb05b
Add QPDFJob::registerProgressReporter
2022-06-19 08:46:58 -04:00
Jay Berkenbilt
3a7ee7e938
Move C-based ProgressReporter helper into QPDFWriter
2022-06-19 08:46:58 -04:00
Jay Berkenbilt
8130d50e3b
Add C API to QPDFLogger
2022-06-19 08:46:58 -04:00
Jay Berkenbilt
daef4e8fb8
Add more flexible funtions to qpdfjob C API
2022-06-19 08:46:58 -04:00
Jay Berkenbilt
e0720eaa78
Use the default logger for other writes to stdout/stderr
...
When there is no context for writing output or error messages, use the
default logger.
2022-06-18 10:38:50 -04:00
Jay Berkenbilt
83be2191b4
Use "save" logger when saving data to standard output
...
This includes the output PDF, streams from --show-object and
attachments from --save-attachment. This also enables --verbose and
--progress to work with saving to stdout.
2022-06-18 09:54:40 -04:00
Jay Berkenbilt
641e92c6a7
QPDF, QPDFJob: use QPDFLogger instead of custom output streams
2022-06-18 09:02:55 -04:00
Jay Berkenbilt
f1f711963b
Add and test QPDFLogger class
2022-06-18 09:02:55 -04:00
Jay Berkenbilt
f588d74140
Add integer types to Pipeline::operator<<
2022-06-18 09:02:55 -04:00
m-holger
057bd659bc
Code tidy: remove redundant variable in QPDF::writeJSON
2022-06-05 18:46:21 -04:00
Jay Berkenbilt
0bd908b550
Update documentation for qpdf JSON v2
2022-05-30 20:03:08 -04:00
Jay Berkenbilt
b7bbf12e85
In json mode, reveal recovered user password when otherwise unavailable
2022-05-30 20:03:08 -04:00
Jay Berkenbilt
f049a77c59
Add additional information when listing attachments
2022-05-30 20:03:08 -04:00
Jay Berkenbilt
04fc7c4bea
Add conversions to ISO-8601 date format
2022-05-30 20:03:08 -04:00
Jay Berkenbilt
27a42c16c7
Change default decode level to "none" with --json-output
2022-05-21 17:51:34 -04:00
Jay Berkenbilt
752f43d4e4
Allow empty b: binary JSON strings
2022-05-21 17:36:32 -04:00
Jay Berkenbilt
05460d405c
Format code
2022-05-21 16:11:42 -04:00
m-holger
6c69a747b9
Code clean up: use range-style for loops wherever possible
...
Remove variables obsoleted by commit 4f24617
.
2022-05-21 16:06:29 -04:00
Jay Berkenbilt
c56a9ca7f6
JSON: Fix large file support
2022-05-21 09:43:45 -04:00
Jay Berkenbilt
47c093c48b
Replace std::regex with validators for better performance
2022-05-21 08:43:21 -04:00
Jay Berkenbilt
9b2eb01e25
Exercise object description in tests
2022-05-20 14:23:32 -04:00
Jay Berkenbilt
6c2fb5b8f0
Add test for bad data and bad datafile
2022-05-20 13:33:30 -04:00
Jay Berkenbilt
d065098089
Test --update-from-json
2022-05-20 11:10:12 -04:00
Jay Berkenbilt
ef955b04b5
Bug fix: don't clobber stream length with replaceDict
2022-05-20 11:09:45 -04:00
Jay Berkenbilt
3eb77a7004
JSON: detect duplicate dictionary keys while parsing
2022-05-20 10:13:15 -04:00
Jay Berkenbilt
6d4e3ba8a4
Test (and fix) handling of dangling references
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
5a2aa59479
Bug fix: isReserved() true for indirect reference to reserved object
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
35b1e1c493
Explicitly test ignoring unknown keys in JSON input
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
dc8df962d8
Make version default to latest for --json-output (like --json)
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
6c7326b290
JSON fix: correctly parse UTF-16 surrogate pairs
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
6f43bf8de3
Major rework -- see long comments
...
* Replace --create-from-json=file with --json-input, which causes the
regular input to be treated as json.
* Eliminate --to-json
* In --json=2, bring back "objects" and eliminate "objectinfo". Stream
data is never present.
* In --json-output=2, write "qpdf-v2" with "objects" and include
stream data.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
23fc6756f1
Add QUtil::FileCloser to the public API
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
0fe8d44762
Support stream data -- not tested
...
There are no automated tests yet, but committing work so far in
preparation for some refactoring.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
63c7eefe9d
replaceStreamData: accept uninitialized filter/decode_parms
...
These mean to leave the original values alone. This is needed for
reconstructing streams from JSON given that the stream data and stream
dictionary may appear in any order in the JSON.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
56f1b411fe
Back out fluent QPDFObjectHandle methods. Keep the andGet methods.
...
I decided these were confusing and inconsistent with how JSON works.
They muddle the API rather than improving it.
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
7e7a9c4379
Parse objects; stream data is not yet handled
2022-05-20 09:16:25 -04:00
Jay Berkenbilt
9064542b5f
Add private methods for reserving specific objects
2022-05-20 07:54:09 -04:00
Jay Berkenbilt
7fa5d1773b
Implement top-level qpdf json parsing
2022-05-16 13:41:40 -04:00
Jay Berkenbilt
8d42eb2632
Add scaffolding for QPDF JSON reactor
2022-05-16 13:41:40 -04:00
Jay Berkenbilt
4fe2e06b47
Add --create-from-json and --update-from-json arguments
...
Also add stubs for top-level QPDF methods (createFromJSON,
updateFromJSON)
2022-05-16 13:41:40 -04:00
Jay Berkenbilt
9a0e9a1a9e
Remove offset from missing /Root error
...
The last offset is irrelevant to not being able to find /Root.
2022-05-16 13:39:26 -04:00
Jay Berkenbilt
051ae7c282
Improve handling of replacing stream data with empty strings
...
When an empty string was passed to replaceStreamData, the code was
passing a null pointer to memcpy. Since a 0 size was also passed, this
was harmless, but it triggers sanitizer errors. The code properly
handles a null pointer as the buffer in other places.
2022-05-16 13:39:26 -04:00
Jay Berkenbilt
60ec94a7c3
Add QUtil::is_long_long
2022-05-16 13:39:26 -04:00
Jay Berkenbilt
4c7cfd5cbc
JSON reactor: improve handling of nested containers
...
Call the parent container's item method before calling the child
item's start method so we can easily know the current nesting level
when nested items are added.
2022-05-14 17:35:06 -04:00
Jay Berkenbilt
2a2f7f1bba
Add maxobjectid to JSON
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
e9390aeaaa
Add --to-json option
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
c76536dd9a
Implement JSON v2 output
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
15272662f6
Fix typo in json output key name
...
moddify -> modify. Also carefully spell checked all remaining keys by
splitting them into words and running a spell checker, not just
relying on visual proofreading. That was the only one.
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
1bc8abfdd3
Implement JSON v2 for Stream
...
Not fully exercised in this commit
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
3246923cf2
Implement JSON v2 for String
...
Also refine the herustic for deciding whether to use hexadecimal
notation for a string.
2022-05-08 13:45:20 -04:00
Jay Berkenbilt
16f4f94cd9
Prepare code for JSON v2
...
Update getJSON() methods and calls to them
2022-05-07 11:12:01 -04:00
Jay Berkenbilt
a9fbbd5dca
Objectinfo json: write incrementally and in numeric order
...
This script was used on test data:
----------
#!/usr/bin/env python3
import json
import sys
import re
def json_dumps(data):
return json.dumps(data, ensure_ascii=False,
indent=2, separators=(',', ': '))
for filename in sys.argv[1:]:
with open(filename, 'r') as f:
data = json.loads(f.read())
if 'objectinfo' not in data:
continue
trailer = None
to_sort = []
for k, v in data['objectinfo'].items():
if k == 'trailer':
trailer = v
else:
m = re.match(r'^(\d+) \d+ R', k)
if m:
to_sort.append([int(m.group(1)), k, v])
newobjectinfo = {x[1]: x[2] for x in sorted(to_sort)}
if trailer is not None:
newobjectinfo['trailer'] = trailer
data['objectinfo'] = newobjectinfo
print(json_dumps(data))
----------
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
948de60990
Objects json: write incrementally and in numeric order
...
The following script was used to adjust test data:
----------
#!/usr/bin/env python3
import json
import sys
import re
def json_dumps(data):
return json.dumps(data, ensure_ascii=False,
indent=2, separators=(',', ': '))
for filename in sys.argv[1:]:
with open(filename, 'r') as f:
data = json.loads(f.read())
if 'objects' not in data:
continue
trailer = None
to_sort = []
for k, v in data['objects'].items():
if k == 'trailer':
trailer = v
else:
m = re.match(r'^(\d+) \d+ R', k)
if m:
to_sort.append([int(m.group(1)), k, v])
newobjects = {x[1]: x[2] for x in sorted(to_sort)}
if trailer is not None:
newobjects['trailer'] = trailer
data['objects'] = newobjects
print(json_dumps(data))
----------
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
f50274ef46
Pages json: write each page incrementally
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
dc9b7287cd
Top-level json: write incrementally
...
This commit just changes the order in which fields are written to the
json without changing their content. All the json files in the test
suite were modified with this script to ensure that we didn't get any
changes other than ordering.
----------
#!/usr/bin/env python3
import json
import sys
def json_dumps(data):
return json.dumps(data, ensure_ascii=False,
indent=2, separators=(',', ': '))
for filename in sys.argv[1:]:
with open(filename, 'r') as f:
data = json.loads(f.read())
newdata = {}
for i in ('version', 'parameters', 'pages', 'pagelabels',
'acroform', 'attachments', 'encrypt', 'outlines',
'objects', 'objectinfo'):
if i in data:
newdata[i] = data[i]
print(json_dumps(newdata))
----------
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
7f65a5c21f
Test json against schema only on demand
...
Testing json against schema requires an in-memory copy, so do it only
when requested by the test suite.
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
a3c9980395
Add next to Pl_String and fix comments
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
b361c5ce19
Add --test-json-schema command-line option
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
7604ac5cb2
QPDFJob: have doJSON write to a pipeline
2022-05-07 08:26:31 -04:00
Jay Berkenbilt
0500d4347a
JSON: add blob type that generates base64-encoded binary data
2022-05-06 19:14:52 -04:00
Jay Berkenbilt
05fda4afa2
Change JSON parser to parse from an InputSource
2022-05-04 12:07:11 -04:00
Jay Berkenbilt
e5f3910c3e
Add new FileInputSource constructors
2022-05-04 12:07:11 -04:00
Jay Berkenbilt
e259635986
JSON: add write methods and implement unparse() in terms of those
2022-05-04 12:07:11 -04:00
Jay Berkenbilt
8b25de24c9
Make "objects" and "pages" consistent in JSON output
2022-05-04 08:32:44 -04:00
Jay Berkenbilt
6b576797cd
Don't call pushInheritedAttributesToPage in json mode
...
We used to have to do that, but for quite some time, the code that
gets images has no longer required it.
2022-05-04 07:11:13 -04:00
Jay Berkenbilt
f4206a0938
Add new Pl_String Pipeline
2022-05-03 18:54:51 -04:00
Jay Berkenbilt
16139d97c8
Add new Pl_OStream Pipeline
2022-05-03 18:54:51 -04:00
Jay Berkenbilt
21d6e3231f
Make use of the new Pipeline methods in some places
2022-05-03 18:31:23 -04:00