mirror of
https://github.com/qpdf/qpdf.git
synced 2024-12-22 10:58:58 +00:00
Update documentation for qpdf JSON v2
This commit is contained in:
parent
b7bbf12e85
commit
0bd908b550
201
TODO
201
TODO
@ -2,14 +2,13 @@
|
|||||||
Next
|
Next
|
||||||
====
|
====
|
||||||
|
|
||||||
|
Before Release:
|
||||||
|
|
||||||
* At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs
|
* At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs
|
||||||
* Stay on top of https://github.com/pikepdf/pikepdf/pull/315
|
* Stay on top of https://github.com/pikepdf/pikepdf/pull/315
|
||||||
* Release qtest with updates to qtest-driver and copy back into qpdf
|
* Release qtest with updates to qtest-driver and copy back into qpdf
|
||||||
|
|
||||||
In order:
|
Pending changes:
|
||||||
* json v2
|
|
||||||
|
|
||||||
Other (do in any order):
|
|
||||||
|
|
||||||
* Good C API for json v2
|
* Good C API for json v2
|
||||||
* QPDFPagesTree -- avoid ever flattening the pages tree.
|
* QPDFPagesTree -- avoid ever flattening the pages tree.
|
||||||
@ -50,180 +49,10 @@ Other (do in any order):
|
|||||||
* Rework tests so that nothing is written into the source directory.
|
* Rework tests so that nothing is written into the source directory.
|
||||||
Ideally then the entire build could be done with a read-only
|
Ideally then the entire build could be done with a read-only
|
||||||
source tree.
|
source tree.
|
||||||
|
* Consider adding fuzzer code for JSON
|
||||||
|
|
||||||
Soon: Break ground on "Document-level work"
|
Soon: Break ground on "Document-level work"
|
||||||
|
|
||||||
Output JSON v2
|
|
||||||
==============
|
|
||||||
|
|
||||||
Remaining work:
|
|
||||||
|
|
||||||
* Make sure all the information from informational options is
|
|
||||||
available in the json output.
|
|
||||||
|
|
||||||
* --check: add but maybe not by default?
|
|
||||||
|
|
||||||
* --show-linearization: add but maybe not by default? Also figure
|
|
||||||
out whether warnings reported for some of the PDF specs (1.7) are
|
|
||||||
qpdf problems. This may not be worth adding in the first
|
|
||||||
increment.
|
|
||||||
|
|
||||||
* --show-xref: add
|
|
||||||
|
|
||||||
* Consider having --check, --show-encryption, etc., just select the
|
|
||||||
right keys when in json mode. I don't think I want check on by
|
|
||||||
default, so that might be different.
|
|
||||||
|
|
||||||
* Consider having warnings be included in the json in a "warnings" key
|
|
||||||
in json mode.
|
|
||||||
|
|
||||||
Notes for documentation:
|
|
||||||
|
|
||||||
* Find all mentions of json in the manual and update.
|
|
||||||
|
|
||||||
* Document typo fix in encrypt in release notes along with any other
|
|
||||||
non-compatible json 2 changes. Scrutinize all the output to decide
|
|
||||||
what should change.
|
|
||||||
|
|
||||||
* Keys other than "qpdf-v2" are ignored so people can stash their own
|
|
||||||
stuff. Unknown keys are ignored at other places for future
|
|
||||||
compatibility. Readers of qpdf json should continue to ignore keys
|
|
||||||
they don't recognize.
|
|
||||||
|
|
||||||
* Change: names are written in canonical form with a leading slash
|
|
||||||
just as they are treated in the code. In v1, they were written in
|
|
||||||
PDF syntax in the json file. Example: /text#2fplain in pdf will be
|
|
||||||
written as /text/plain in json v2 and as /text#2fplain in json v1.
|
|
||||||
|
|
||||||
* Document changes to strings, objects, streams, object keys.
|
|
||||||
|
|
||||||
* CLI: --json-input, --json-output[=version], --update-from-json. With
|
|
||||||
--json-input, the input file is a JSON file instead of a PDF file.
|
|
||||||
It must be complete, meaning that a PDF version must be given, all
|
|
||||||
streams must have exactly one of data or datafile, and a trailer
|
|
||||||
dictionary must be present, even if empty.
|
|
||||||
|
|
||||||
With --update-from-json, the JSON file updates objects in place. If
|
|
||||||
updating an old stream, if stream data is omitted, the data remains
|
|
||||||
untouched. The dictionary is always required. Remember that
|
|
||||||
QPDFWriter does not preserve object numbers, though --json-output
|
|
||||||
does. Therefore, if you want to update a PDF with a JSON, the input
|
|
||||||
to --update-from-json must be the same PDF as the one that
|
|
||||||
--json-output was run on previously. Otherwise, object numbers won't
|
|
||||||
match. Show this with an example. When updating,
|
|
||||||
|
|
||||||
* Certain fields are ignored when reading the JSON. This includes
|
|
||||||
maxobjectid, any computed fields in trailer (such as /Size), and all
|
|
||||||
/Length keys in stream dictionaries. There is no need for the user
|
|
||||||
to correct, remove, or otherwise worry about any values those keys
|
|
||||||
might have. The maxobjectid field is present in the original output
|
|
||||||
to assist with adding new objects to the file.
|
|
||||||
|
|
||||||
* JSON strings within PDF objects:
|
|
||||||
|
|
||||||
* "n n R" is an indirect object
|
|
||||||
|
|
||||||
* "/Name" is a name in canonical form with a leading slash (like
|
|
||||||
"/text/plain"), not PDF syntax (like "/text#2fplain").
|
|
||||||
|
|
||||||
* "b:hex-digits" is a binary string ("b:feff03c0"). Hex digits may be
|
|
||||||
mixed case. There must be an even number of digits.
|
|
||||||
|
|
||||||
* "u:utf-8" is a UTF-8 encoded string ("u:π", "u:\u03c0"). UTF-16
|
|
||||||
surrogate pairs are allowed. These are all equivalent: "u:🥔",
|
|
||||||
"u:\ud83e\udd54", "b:FEFFD83EDD54", "b:efbbbff09fa594".
|
|
||||||
|
|
||||||
* Both "b:" and "u:" are valid representations of the empty string.
|
|
||||||
|
|
||||||
* Anything else is an error
|
|
||||||
|
|
||||||
* Document use of --json-input and --json-output together to show
|
|
||||||
preservation of object numbers. Draw attention to "original object
|
|
||||||
ID" comments in qdf as another way to show it.
|
|
||||||
|
|
||||||
* Document top-level keys of "qpdf-v2" ("pdfversion", "objects",
|
|
||||||
"maxobjectid") noting that "maxobjectid" is ignored when reading.
|
|
||||||
|
|
||||||
* Stream data: "data" is base64-encoded stream data. "datafile" is the
|
|
||||||
path to a file (relative path recommended but not required)
|
|
||||||
containing the binary data. As with any PDF representation, the data
|
|
||||||
must be consistent with the filters. --decode-level is honored by
|
|
||||||
--json-output.
|
|
||||||
|
|
||||||
* Other changes from v1:
|
|
||||||
|
|
||||||
* in "objects", keys are "obj:o g R" or "trailer"
|
|
||||||
|
|
||||||
* Non-stream objects are dictionaries with a "value" key whose value
|
|
||||||
is the object. Stream objects are dictionaries with a "stream" key
|
|
||||||
whose value is {"dict": stream-dictionary}. The "/Length" key is
|
|
||||||
omitted from the stream dictionary.
|
|
||||||
|
|
||||||
* "objectinfo" is gone as it is now possible to tell a stream from a
|
|
||||||
non-stream directly. To get stream data, use the --json-output
|
|
||||||
option. Note about how "pages" may cause the pages tree to be
|
|
||||||
corrected.
|
|
||||||
|
|
||||||
For non-streams:
|
|
||||||
|
|
||||||
"obj:o g R": {
|
|
||||||
"value": ...
|
|
||||||
}
|
|
||||||
|
|
||||||
For streams:
|
|
||||||
|
|
||||||
"obj:o g R": {
|
|
||||||
"stream": {
|
|
||||||
"dict": { ... stream dictionary ... },
|
|
||||||
"data": "base64-encoded data",
|
|
||||||
"datafile": "path to base64-encoded data"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
Rationale of "obj:o g R" is that indirect object references are just
|
|
||||||
"o g R", and so code that wants to resolve one can do so easily by
|
|
||||||
just prepending "obj:" and not having to parse or split the string.
|
|
||||||
Having a prefix rather than making the key just "o g R" makes it much
|
|
||||||
easier to search in the JSON for the definition of an object.
|
|
||||||
|
|
||||||
CLI:
|
|
||||||
|
|
||||||
Example workflow:
|
|
||||||
* qpdf in.pdf --json-output pdf.json
|
|
||||||
* edit pdf.json
|
|
||||||
* qpdf --json-input pdf.json out.pdf
|
|
||||||
|
|
||||||
* qpdf in.pdf --json-output pdf.json
|
|
||||||
* edit pdf.json keeping only objects that need to be changed
|
|
||||||
* qpdf in.pdf --update-from-json=pdf.json out.pdf
|
|
||||||
|
|
||||||
To modify a single object:
|
|
||||||
|
|
||||||
* qpdf in.pdf --json-output pdf.json --json-object=o,g
|
|
||||||
* edit pdf.json
|
|
||||||
* qpdf in.pdf --update-from-json=pdf.json out.pdf
|
|
||||||
|
|
||||||
Historical note: you can't create a PDF from v1 json because
|
|
||||||
|
|
||||||
* The PDF version header is not recorded
|
|
||||||
|
|
||||||
* Strings cannot be unambiguously encoded/decoded
|
|
||||||
|
|
||||||
* Can't tell string from name from indirect object
|
|
||||||
|
|
||||||
* Strings are treated as PDF doc encoding and output as UTF-8, which
|
|
||||||
doesn't work since multiple PDF doc code points are undefined and
|
|
||||||
is absurd for binary strings
|
|
||||||
|
|
||||||
* There is no representation of stream data
|
|
||||||
|
|
||||||
* You can't tell a stream from a dictionary except by looking in both
|
|
||||||
"object" and "objectinfo".
|
|
||||||
|
|
||||||
* Using "n n R" as a key in "objects" and "objectinfo" makes it hard
|
|
||||||
to search for things when viewing the JSON file in an editor.
|
|
||||||
|
|
||||||
|
|
||||||
QPDFPagesTree
|
QPDFPagesTree
|
||||||
=============
|
=============
|
||||||
|
|
||||||
@ -256,6 +85,28 @@ sure /Count and /Parent are correct.
|
|||||||
refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up
|
refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up
|
||||||
when done.
|
when done.
|
||||||
|
|
||||||
|
Possible future JSON enhancements
|
||||||
|
=================================
|
||||||
|
|
||||||
|
* Add to JSON output the information available from a few additional
|
||||||
|
informational options:
|
||||||
|
|
||||||
|
* --check: add but maybe not by default?
|
||||||
|
|
||||||
|
* --show-linearization: add but maybe not by default? Also figure
|
||||||
|
out whether warnings reported for some of the PDF specs (1.7) are
|
||||||
|
qpdf problems. This may not be worth adding in the first
|
||||||
|
increment.
|
||||||
|
|
||||||
|
* --show-xref: add
|
||||||
|
|
||||||
|
* Consider having --check, --show-encryption, etc., just select the
|
||||||
|
right keys when in json mode. I don't think I want check on by
|
||||||
|
default, so that might be different.
|
||||||
|
|
||||||
|
* Consider having warnings be included in the json in a "warnings" key
|
||||||
|
in json mode.
|
||||||
|
|
||||||
QPDFJob
|
QPDFJob
|
||||||
=======
|
=======
|
||||||
|
|
||||||
|
@ -271,6 +271,7 @@
|
|||||||
"mkinstalldirs",
|
"mkinstalldirs",
|
||||||
"mklink",
|
"mklink",
|
||||||
"moddate",
|
"moddate",
|
||||||
|
"modifyannotations",
|
||||||
"monoseq",
|
"monoseq",
|
||||||
"msvc",
|
"msvc",
|
||||||
"msvcrt",
|
"msvcrt",
|
||||||
|
@ -112,8 +112,11 @@ class QPDF
|
|||||||
|
|
||||||
// Create a PDF from an input source that contains JSON as written
|
// Create a PDF from an input source that contains JSON as written
|
||||||
// by writeJSON (or qpdf --json-output, version 2 or higher). The
|
// by writeJSON (or qpdf --json-output, version 2 or higher). The
|
||||||
// JSON must be a complete representation of a PDF. See "QPDF JSON
|
// JSON must be a complete representation of a PDF. See "qpdf
|
||||||
// Format" in the manual for details.
|
// JSON" in the manual for details. The input JSON may be
|
||||||
|
// arbitrarily large. QPDF does not load stream data into memory
|
||||||
|
// for more than one stream at a time, even if the stream data is
|
||||||
|
// specified inline.
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
void createFromJSON(std::string const& json_file);
|
void createFromJSON(std::string const& json_file);
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
@ -122,24 +125,40 @@ class QPDF
|
|||||||
// Update a PDF from an input source that contains JSON in the
|
// Update a PDF from an input source that contains JSON in the
|
||||||
// same format as is written by writeJSON (or qpdf --json-output,
|
// same format as is written by writeJSON (or qpdf --json-output,
|
||||||
// version 2 or higher). Objects in the PDF and not in the JSON
|
// version 2 or higher). Objects in the PDF and not in the JSON
|
||||||
// are not modified. See "QPDF JSON Format" in the manual for
|
// are not modified. See "qpdf JSON" in the manual for details. As
|
||||||
// details.
|
// with createFromJSON, the input JSON may be arbitrarily large.
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
void updateFromJSON(std::string const& json_file);
|
void updateFromJSON(std::string const& json_file);
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
void updateFromJSON(std::shared_ptr<InputSource>);
|
void updateFromJSON(std::shared_ptr<InputSource>);
|
||||||
|
|
||||||
// Write qpdf json format. The only supported version is 2. If
|
// Write qpdf json format to the pipeline "p". The only supported
|
||||||
// wanted_objects is empty, write all objects. Otherwise, write
|
// version is 2. The finish() method is called on the pipeline at
|
||||||
// only objects whose keys are in wanted_objects. Keys may be
|
// the end. The decode_level parameter controls which streams are
|
||||||
// either "trailer" or of the form "obj:n n R". Invalid keys are
|
// uncompressed in the JSON. Use qpdf_dl_none to preserve all
|
||||||
// ignored.
|
// stream data exactly as it appears in the input. The possible
|
||||||
|
// values for json_stream_data can be found in qpdf/Constants.h
|
||||||
|
// and correspond to the --json-stream-data command-line argument.
|
||||||
|
// If json_stream_data is qpdf_sj_file, file_prefix must be
|
||||||
|
// specified. Each stream will be written to a file whose path is
|
||||||
|
// constructed by appending "-nnn" to file_prefix, where "nnn" is
|
||||||
|
// the object number (not zero-filled). If wanted_objects is
|
||||||
|
// empty, write all objects. Otherwise, write only objects whose
|
||||||
|
// keys are in wanted_objects. Keys may be either "trailer" or of
|
||||||
|
// the form "obj:n n R". Invalid keys are ignored. This
|
||||||
|
// corresponds to the --json-object command-line argument.
|
||||||
|
//
|
||||||
|
// QPDF is efficient with regard to memory when writing, allowing
|
||||||
|
// you to write arbitrarily large PDF files to a pipeline. You can
|
||||||
|
// use a pipeline like Pl_Buffer or Pl_String to capture the JSON
|
||||||
|
// output in memory, but do so with caution as this will allocate
|
||||||
|
// enough memory to hold the entire PDF file.
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
void writeJSON(
|
void writeJSON(
|
||||||
int version,
|
int version,
|
||||||
Pipeline*,
|
Pipeline* p,
|
||||||
qpdf_stream_decode_level_e,
|
qpdf_stream_decode_level_e decode_level,
|
||||||
qpdf_json_stream_data_e,
|
qpdf_json_stream_data_e json_stream_data,
|
||||||
std::string const& file_prefix,
|
std::string const& file_prefix,
|
||||||
std::set<std::string> wanted_objects);
|
std::set<std::string> wanted_objects);
|
||||||
|
|
||||||
|
4
job.sums
4
job.sums
@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c
|
|||||||
include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1
|
include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1
|
||||||
job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d
|
job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d
|
||||||
libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df
|
libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df
|
||||||
libqpdf/qpdf/auto_job_help.hh c1cc99f6fe17285ee5e40730f6280e37d17da1a5f408086ce34e01af121df7ad
|
libqpdf/qpdf/auto_job_help.hh 3aaae4cde004e5314d3ac6d554da575e40209c0f0611f6a308957986f9c7967b
|
||||||
libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd
|
libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd
|
||||||
libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297
|
libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297
|
||||||
libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335
|
libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335
|
||||||
libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc
|
libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc
|
||||||
manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580
|
manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580
|
||||||
manual/cli.rst 82ead389c03bbf5e0498bd0571a11dc06544d591f4e4454c00322e3473fc556d
|
manual/cli.rst e3f4331befa17450e0d0fff87569722a5aab42ea619ef64f0a3a04e1f99ed65c
|
||||||
|
@ -817,4 +817,5 @@ QPDF::writeJSON(
|
|||||||
JSON::writeDictionaryClose(p, first_qpdf, 1);
|
JSON::writeDictionaryClose(p, first_qpdf, 1);
|
||||||
JSON::writeDictionaryClose(p, first, 0);
|
JSON::writeDictionaryClose(p, first, 0);
|
||||||
*p << "\n";
|
*p << "\n";
|
||||||
|
p->finish();
|
||||||
}
|
}
|
||||||
|
@ -70,6 +70,9 @@ ap.addOptionHelp("--copyright", "help", "show copyright information", R"(Display
|
|||||||
ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The
|
ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The
|
||||||
default provider is shown first.
|
default provider is shown first.
|
||||||
)");
|
)");
|
||||||
|
ap.addOptionHelp("--job-json-help", "help", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
|
||||||
|
--job-json-file.
|
||||||
|
)");
|
||||||
ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not
|
ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not
|
||||||
directly related to the operation it is performing.
|
directly related to the operation it is performing.
|
||||||
)");
|
)");
|
||||||
@ -87,11 +90,11 @@ ap.addOptionHelp("--verbose", "general", "print additional information", R"(Outp
|
|||||||
doing, including information about files created and operations
|
doing, including information about files created and operations
|
||||||
performed.
|
performed.
|
||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files.
|
|
||||||
)");
|
|
||||||
}
|
}
|
||||||
static void add_help_2(QPDFArgParser& ap)
|
static void add_help_2(QPDFArgParser& ap)
|
||||||
{
|
{
|
||||||
|
ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files.
|
||||||
|
)");
|
||||||
ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were
|
ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were
|
||||||
encountered, qpdf still exits with exit status 3.
|
encountered, qpdf still exits with exit status 3.
|
||||||
Use --warning-exit-0 with --no-warn to completely ignore
|
Use --warning-exit-0 with --no-warn to completely ignore
|
||||||
@ -172,12 +175,12 @@ companion tool "fix-qdf" can be used to repair hand-edited QDF
|
|||||||
files. QDF is a feature specific to the qpdf tool. Please see
|
files. QDF is a feature specific to the qpdf tool. Please see
|
||||||
the "QDF Mode" chapter in the manual.
|
the "QDF Mode" chapter in the manual.
|
||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object
|
|
||||||
had in the original file.
|
|
||||||
)");
|
|
||||||
}
|
}
|
||||||
static void add_help_3(QPDFArgParser& ap)
|
static void add_help_3(QPDFArgParser& ap)
|
||||||
{
|
{
|
||||||
|
ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object
|
||||||
|
had in the original file.
|
||||||
|
)");
|
||||||
ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n]
|
ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n]
|
||||||
|
|
||||||
Setting --compress-streams=n prevents qpdf from compressing
|
Setting --compress-streams=n prevents qpdf from compressing
|
||||||
@ -188,9 +191,11 @@ ap.addOptionHelp("--decode-level", "transformation", "control which streams to u
|
|||||||
|
|
||||||
When uncompressing streams, control which types of compression
|
When uncompressing streams, control which types of compression
|
||||||
schemes should be uncompressed:
|
schemes should be uncompressed:
|
||||||
- none: don't uncompress anything. This is the default with --json-output.
|
- none: don't uncompress anything. This is the default with
|
||||||
|
--json-output.
|
||||||
- generalized: uncompress streams compressed with a
|
- generalized: uncompress streams compressed with a
|
||||||
general-purpose compression algorithm. This is the default.
|
general-purpose compression algorithm. This is the default
|
||||||
|
except when --json-output is given.
|
||||||
- specialized: in addition to generalized, also uncompress
|
- specialized: in addition to generalized, also uncompress
|
||||||
streams compressed with a special-purpose but non-lossy
|
streams compressed with a special-purpose but non-lossy
|
||||||
compression scheme
|
compression scheme
|
||||||
@ -290,13 +295,13 @@ from the resulting set, not based on the original page numbers.
|
|||||||
ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of
|
ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of
|
||||||
the PDF, causing the PDF to render differently from the original.
|
the PDF, causing the PDF to render differently from the original.
|
||||||
)");
|
)");
|
||||||
|
}
|
||||||
|
static void add_help_4(QPDFArgParser& ap)
|
||||||
|
{
|
||||||
ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] --
|
ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] --
|
||||||
|
|
||||||
Run qpdf --help=page-selection for details.
|
Run qpdf --help=page-selection for details.
|
||||||
)");
|
)");
|
||||||
}
|
|
||||||
static void add_help_4(QPDFArgParser& ap)
|
|
||||||
{
|
|
||||||
ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n]
|
ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n]
|
||||||
|
|
||||||
Collate rather than concatenate pages specified with --pages.
|
Collate rather than concatenate pages specified with --pages.
|
||||||
@ -460,14 +465,14 @@ ap.addOptionHelp("--assemble", "encryption", "restrict document assembly", R"(--
|
|||||||
Enable/disable document assembly (rotation and reordering of
|
Enable/disable document assembly (rotation and reordering of
|
||||||
pages). This option is not available with 40-bit encryption.
|
pages). This option is not available with 40-bit encryption.
|
||||||
)");
|
)");
|
||||||
|
}
|
||||||
|
static void add_help_5(QPDFArgParser& ap)
|
||||||
|
{
|
||||||
ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n]
|
ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n]
|
||||||
|
|
||||||
Enable/disable text/graphic extraction for purposes other than
|
Enable/disable text/graphic extraction for purposes other than
|
||||||
accessibility.
|
accessibility.
|
||||||
)");
|
)");
|
||||||
}
|
|
||||||
static void add_help_5(QPDFArgParser& ap)
|
|
||||||
{
|
|
||||||
ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n]
|
ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n]
|
||||||
|
|
||||||
Enable/disable whether filling form fields is allowed even if
|
Enable/disable whether filling form fields is allowed even if
|
||||||
@ -638,6 +643,9 @@ ap.addOptionHelp("--remove-attachment", "attachments", "remove an embedded file"
|
|||||||
Remove an embedded file using its key. Get the key with
|
Remove an embedded file using its key. Get the key with
|
||||||
--list-attachments.
|
--list-attachments.
|
||||||
)");
|
)");
|
||||||
|
}
|
||||||
|
static void add_help_6(QPDFArgParser& ap)
|
||||||
|
{
|
||||||
ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date
|
ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date
|
||||||
format specification, which is "D:yyyymmddhhmmssz" where "z" is
|
format specification, which is "D:yyyymmddhhmmssz" where "z" is
|
||||||
either literally upper case "Z" for UTC or a timezone offset in
|
either literally upper case "Z" for UTC or a timezone offset in
|
||||||
@ -650,9 +658,6 @@ Examples:
|
|||||||
- D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m.
|
- D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m.
|
||||||
- D:20210207211528Z February 7, 2021 at 21:15:28 UTC
|
- D:20210207211528Z February 7, 2021 at 21:15:28 UTC
|
||||||
)");
|
)");
|
||||||
}
|
|
||||||
static void add_help_6(QPDFArgParser& ap)
|
|
||||||
{
|
|
||||||
ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its
|
ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its
|
||||||
terminating "--".
|
terminating "--".
|
||||||
)");
|
)");
|
||||||
@ -747,14 +752,14 @@ the linearization hint tables are correct.
|
|||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables.
|
ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables.
|
||||||
)");
|
)");
|
||||||
|
}
|
||||||
|
static void add_help_7(QPDFArgParser& ap)
|
||||||
|
{
|
||||||
ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object
|
ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object
|
||||||
locations in the file) in a human-readable form. This is
|
locations in the file) in a human-readable form. This is
|
||||||
especially useful for files with cross-reference streams, which
|
especially useful for files with cross-reference streams, which
|
||||||
are stored in a binary format.
|
are stored in a binary format.
|
||||||
)");
|
)");
|
||||||
}
|
|
||||||
static void add_help_7(QPDFArgParser& ap)
|
|
||||||
{
|
|
||||||
ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]}
|
ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]}
|
||||||
|
|
||||||
Show the contents of the given object. This is especially useful
|
Show the contents of the given object. This is especially useful
|
||||||
@ -814,21 +819,20 @@ This option is repeatable. If given, only specified objects will
|
|||||||
be shown in the "objects" key of the JSON output. Otherwise, all
|
be shown in the "objects" key of the JSON output. Otherwise, all
|
||||||
objects will be shown.
|
objects will be shown.
|
||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
|
|
||||||
--job-json-file.
|
|
||||||
)");
|
|
||||||
ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file}
|
ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file}
|
||||||
|
|
||||||
Control whether streams in json output should be omitted,
|
When used with --json-output, this option controls whether
|
||||||
written inline (base64-encoded) or written to a file. If "file"
|
streams in json output should be omitted, written inline
|
||||||
is chosen, the file will be the name of the input file appended
|
(base64-encoded) or written to a file. If "file" is chosen, the
|
||||||
with -nnn where nnn is the object number. The prefix can be
|
file will be the name of the output file appended with -nnn where
|
||||||
overridden with --json-stream-prefix.
|
nnn is the object number. The prefix can be overridden with
|
||||||
|
--json-stream-prefix.
|
||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix
|
ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix
|
||||||
|
|
||||||
When --json-stream-data=file is given, override the input file
|
When used with --json-output, --json-stream-data=file-prefix
|
||||||
name as the prefix for stream data files. Whatever is given here
|
sets the prefix for stream data files, overriding the default,
|
||||||
|
which is to use the output file name. Whatever is given here
|
||||||
will be appended with -nnn to create the name of the file that
|
will be appended with -nnn to create the name of the file that
|
||||||
will contain the data for the stream stream in object nnn.
|
will contain the data for the stream stream in object nnn.
|
||||||
)");
|
)");
|
||||||
@ -836,19 +840,19 @@ ap.addOptionHelp("--json-output", "json", "serialize to JSON", R"(--json-output[
|
|||||||
|
|
||||||
The output file will be qpdf JSON format at the given version.
|
The output file will be qpdf JSON format at the given version.
|
||||||
"version" may be a specific version or "latest" (the default).
|
"version" may be a specific version or "latest" (the default).
|
||||||
Version 1 is not supported. See also --json-stream-data,
|
The only supported version is 2. See also --json-stream-data,
|
||||||
--json-stream-prefix, and --decode-level.
|
--json-stream-prefix, and --decode-level.
|
||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as
|
ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as
|
||||||
written by qpdf --json-output. See the "QPDF JSON Format"
|
written by qpdf --json-output. See the "qpdf JSON Format"
|
||||||
section of the manual for information about how to use this
|
section of the manual for information about how to use this
|
||||||
option.
|
option.
|
||||||
)");
|
)");
|
||||||
ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file
|
ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file
|
||||||
|
|
||||||
Update a PDF file from a JSON file. Please see the "QPDF JSON
|
Update a PDF file from a JSON file. Please see the "qpdf JSON"
|
||||||
Format" section of the manual for information about how to use
|
chapter of the manual for information about how to use this
|
||||||
this option.
|
option.
|
||||||
)");
|
)");
|
||||||
}
|
}
|
||||||
static void add_help_8(QPDFArgParser& ap)
|
static void add_help_8(QPDFArgParser& ap)
|
||||||
|
154
manual/cli.rst
154
manual/cli.rst
@ -171,7 +171,9 @@ Related Options
|
|||||||
equivalent command-line arguments were supplied. It can be repeated
|
equivalent command-line arguments were supplied. It can be repeated
|
||||||
and mixed freely with other options. Run ``qpdf`` with
|
and mixed freely with other options. Run ``qpdf`` with
|
||||||
:qpdf:ref:`--job-json-help` for a description of the job JSON input
|
:qpdf:ref:`--job-json-help` for a description of the job JSON input
|
||||||
file format. For more information, see :ref:`qpdf-job`.
|
file format. For more information, see :ref:`qpdf-job`. Note that
|
||||||
|
this is unrelated to :qpdf:ref:`--json` but may be combined with
|
||||||
|
it.
|
||||||
|
|
||||||
.. _exit-status:
|
.. _exit-status:
|
||||||
|
|
||||||
@ -341,6 +343,17 @@ Related Options
|
|||||||
itself. The default provider is always listed first. See
|
itself. The default provider is always listed first. See
|
||||||
:ref:`crypto` for more information about crypto providers.
|
:ref:`crypto` for more information about crypto providers.
|
||||||
|
|
||||||
|
.. qpdf:option:: --job-json-help
|
||||||
|
|
||||||
|
.. help: show format of job JSON
|
||||||
|
|
||||||
|
Describe the format of the QPDFJob JSON input used by
|
||||||
|
--job-json-file.
|
||||||
|
|
||||||
|
Describe the format of the QPDFJob JSON input used by
|
||||||
|
:qpdf:ref:`--job-json-file`. For more information about QPDFJob,
|
||||||
|
see :ref:`qpdf-job`.
|
||||||
|
|
||||||
.. _general-options:
|
.. _general-options:
|
||||||
|
|
||||||
General Options
|
General Options
|
||||||
@ -852,9 +865,11 @@ Related Options
|
|||||||
|
|
||||||
When uncompressing streams, control which types of compression
|
When uncompressing streams, control which types of compression
|
||||||
schemes should be uncompressed:
|
schemes should be uncompressed:
|
||||||
- none: don't uncompress anything. This is the default with --json-output.
|
- none: don't uncompress anything. This is the default with
|
||||||
|
--json-output.
|
||||||
- generalized: uncompress streams compressed with a
|
- generalized: uncompress streams compressed with a
|
||||||
general-purpose compression algorithm. This is the default.
|
general-purpose compression algorithm. This is the default
|
||||||
|
except when --json-output is given.
|
||||||
- specialized: in addition to generalized, also uncompress
|
- specialized: in addition to generalized, also uncompress
|
||||||
streams compressed with a special-purpose but non-lossy
|
streams compressed with a special-purpose but non-lossy
|
||||||
compression scheme
|
compression scheme
|
||||||
@ -875,7 +890,8 @@ Related Options
|
|||||||
``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define
|
``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define
|
||||||
generalized filters as those to be used for general-purpose
|
generalized filters as those to be used for general-purpose
|
||||||
compression or encoding, as opposed to filters specifically
|
compression or encoding, as opposed to filters specifically
|
||||||
designed for image data. This is the default.
|
designed for image data. This is the default except when
|
||||||
|
:qpdf:ref:`--json-output` is given.
|
||||||
|
|
||||||
- :samp:`specialized`: in addition to generalized, decode streams
|
- :samp:`specialized`: in addition to generalized, decode streams
|
||||||
with supported non-lossy specialized filters; currently this is
|
with supported non-lossy specialized filters; currently this is
|
||||||
@ -3126,8 +3142,9 @@ Related Options
|
|||||||
is usually but not always equal to the file name and is needed by
|
is usually but not always equal to the file name and is needed by
|
||||||
some of the other options. See also :ref:`attachments`. Note that
|
some of the other options. See also :ref:`attachments`. Note that
|
||||||
this option displays dates in PDF timestamp syntax. When attachment
|
this option displays dates in PDF timestamp syntax. When attachment
|
||||||
information is included in json output (see :ref:`--json`), dates
|
information is included in json output in the ``"attachments"`` key
|
||||||
are shown in ISO-8601 format.
|
(see :ref:`--json`), dates are shown (just within that object) in
|
||||||
|
ISO-8601 format.
|
||||||
|
|
||||||
.. qpdf:option:: --show-attachment=key
|
.. qpdf:option:: --show-attachment=key
|
||||||
|
|
||||||
@ -3169,14 +3186,11 @@ Related Options
|
|||||||
|
|
||||||
Generate a JSON representation of the file. This is described in
|
Generate a JSON representation of the file. This is described in
|
||||||
depth in :ref:`json`. The version parameter can be used to specify
|
depth in :ref:`json`. The version parameter can be used to specify
|
||||||
which version of the qpdf JSON format should be output. The only
|
which version of the qpdf JSON format should be output. The version
|
||||||
supported value is ``1``, but it's possible that a new JSON output
|
number be a number or ``latest``. The default is ``latest``. As of
|
||||||
version will be added in a future version. You can also specify
|
qpdf 11, the latest version is ``2``. If you have code that reads
|
||||||
``latest`` to use the latest JSON version. For backward
|
qpdf JSON output, you can tell what version of the JSON output you
|
||||||
compatibility, the default value will remain ``1`` until qpdf
|
have from the ``"version"`` key in the output. Use the
|
||||||
version 11, after which point it will become ``latest``. In all
|
|
||||||
case, you can tell what version of the JSON output you have from
|
|
||||||
the ``"version"`` key in the output. Use the
|
|
||||||
:qpdf:ref:`--json-help` option to get a description of the JSON
|
:qpdf:ref:`--json-help` option to get a description of the JSON
|
||||||
object.
|
object.
|
||||||
|
|
||||||
@ -3189,11 +3203,11 @@ Related Options
|
|||||||
containing descriptive text.
|
containing descriptive text.
|
||||||
|
|
||||||
Describe the format of the JSON output by writing to standard
|
Describe the format of the JSON output by writing to standard
|
||||||
output a JSON object with the same structure with the same keys as
|
output a JSON object with the same structure as the JSON generated
|
||||||
the JSON generated by qpdf. In the output written by
|
by qpdf. In the output written by ``--json-help``, each key's value
|
||||||
``--json-help``, each key's value is a description of the key. The
|
is a description of the key. The specific contract guaranteed by
|
||||||
specific contract guaranteed by qpdf in its JSON representation is
|
qpdf in its JSON representation is explained in more detail in the
|
||||||
explained in more detail in the :ref:`json`.
|
:ref:`json`.
|
||||||
|
|
||||||
.. qpdf:option:: --json-key=key
|
.. qpdf:option:: --json-key=key
|
||||||
|
|
||||||
@ -3216,53 +3230,50 @@ Related Options
|
|||||||
be shown in the "objects" key of the JSON output. Otherwise, all
|
be shown in the "objects" key of the JSON output. Otherwise, all
|
||||||
objects will be shown.
|
objects will be shown.
|
||||||
|
|
||||||
This option is repeatable. If given, only specified objects will
|
This option is repeatable. If given, only specified objects will be
|
||||||
be shown in the "``objects``" key of the JSON output. Otherwise, all
|
shown in the ``"objects"`` key of the JSON output. Otherwise, all
|
||||||
objects will be shown.
|
objects will be shown. For qpdf JSON version 1, this also affects
|
||||||
|
the ``"objectinfo"`` key, which is not present in version 2. This
|
||||||
.. qpdf:option:: --job-json-help
|
option may be used with :qpdf:ref:`--json` and also with
|
||||||
|
:qpdf:ref:`--json-output`.
|
||||||
.. help: show format of job JSON
|
|
||||||
|
|
||||||
Describe the format of the QPDFJob JSON input used by
|
|
||||||
--job-json-file.
|
|
||||||
|
|
||||||
Describe the format of the QPDFJob JSON input used by
|
|
||||||
:qpdf:ref:`--job-json-file`. For more information about QPDFJob,
|
|
||||||
see :ref:`qpdf-job`.
|
|
||||||
|
|
||||||
.. qpdf:option:: --json-stream-data={none|inline|file}
|
.. qpdf:option:: --json-stream-data={none|inline|file}
|
||||||
|
|
||||||
.. help: how to handle streams in json output
|
.. help: how to handle streams in json output
|
||||||
|
|
||||||
Control whether streams in json output should be omitted,
|
When used with --json-output, this option controls whether
|
||||||
written inline (base64-encoded) or written to a file. If "file"
|
streams in json output should be omitted, written inline
|
||||||
is chosen, the file will be the name of the input file appended
|
(base64-encoded) or written to a file. If "file" is chosen, the
|
||||||
with -nnn where nnn is the object number. The prefix can be
|
file will be the name of the output file appended with -nnn where
|
||||||
overridden with --json-stream-prefix.
|
nnn is the object number. The prefix can be overridden with
|
||||||
|
--json-stream-prefix.
|
||||||
|
|
||||||
Control whether streams in json output should be omitted, written
|
When used with :qpdf:ref:`--json-output`, this option controls
|
||||||
inline (base64-encoded) or written to a file. If ``file`` is
|
whether streams in JSON output should be omitted, written inline
|
||||||
chosen, the file will be the name of the input file appended with
|
(base64-encoded) or written to a file. If ``file`` is chosen, the
|
||||||
:samp:`-{nnn}` where :samp:`{nnn}` is the object number. The prefix
|
file will be the name of the output file appended with
|
||||||
can be overridden with :qpdf:ref:`--json-stream-prefix`. This
|
:samp:`-{nnn}` where :samp:`{nnn}` is the object number. The stream
|
||||||
option only applies when used with :qpdf:ref:`--json-output`.
|
data file prefix can be overridden with
|
||||||
|
:qpdf:ref:`--json-stream-prefix`. This option only applies when
|
||||||
|
used with :qpdf:ref:`--json-output`.
|
||||||
|
|
||||||
.. qpdf:option:: --json-stream-prefix=file-prefix
|
.. qpdf:option:: --json-stream-prefix=file-prefix
|
||||||
|
|
||||||
.. help: prefix for json stream data files
|
.. help: prefix for json stream data files
|
||||||
|
|
||||||
When --json-stream-data=file is given, override the input file
|
When used with --json-output, --json-stream-data=file-prefix
|
||||||
name as the prefix for stream data files. Whatever is given here
|
sets the prefix for stream data files, overriding the default,
|
||||||
|
which is to use the output file name. Whatever is given here
|
||||||
will be appended with -nnn to create the name of the file that
|
will be appended with -nnn to create the name of the file that
|
||||||
will contain the data for the stream stream in object nnn.
|
will contain the data for the stream stream in object nnn.
|
||||||
|
|
||||||
When :qpdf:ref:`--json-stream-data` is given with the value
|
When used with :qpdf:ref:`--json-output`,
|
||||||
``file``, override the input file name as the prefix for stream
|
``--json-stream-data=file-prefix`` sets the prefix for stream data
|
||||||
data files. Whatever is given here will be appended with
|
files, overriding the default, which is to use the output file
|
||||||
:samp:`-{nnn}` to create the name of the file that will contain the
|
name. Whatever is given here will be appended with :samp:`-{nnn}`
|
||||||
data for the stream stream in object :samp:`{nnn}`. This
|
to create the name of the file that will contain the data for the
|
||||||
option only applies when used with :qpdf:ref:`--json-output`.
|
stream stream in object :samp:`{nnn}`. This option only applies
|
||||||
|
when used with :qpdf:ref:`--json-output`.
|
||||||
|
|
||||||
.. qpdf:option:: --json-output[=version]
|
.. qpdf:option:: --json-output[=version]
|
||||||
|
|
||||||
@ -3270,44 +3281,45 @@ Related Options
|
|||||||
|
|
||||||
The output file will be qpdf JSON format at the given version.
|
The output file will be qpdf JSON format at the given version.
|
||||||
"version" may be a specific version or "latest" (the default).
|
"version" may be a specific version or "latest" (the default).
|
||||||
Version 1 is not supported. See also --json-stream-data,
|
The only supported version is 2. See also --json-stream-data,
|
||||||
--json-stream-prefix, and --decode-level.
|
--json-stream-prefix, and --decode-level.
|
||||||
|
|
||||||
The output file will be qpdf JSON format at the given version.
|
The output file, instead of being a PDF file, will be a JSON file
|
||||||
``version`` may be a specific version or ``latest`` (the default).
|
in qpdf JSON format at the given version. ``version`` may be a
|
||||||
Version 1 is not supported. See also :qpdf:ref:`--json-stream-data`
|
specific version or ``latest`` (the default). The only supported
|
||||||
and :qpdf:ref:`--json-stream-prefix`. The default decode level is
|
version is 2. See also :qpdf:ref:`--json-stream-data` and
|
||||||
``none``, but you can override it with :qpdf:ref:`--decode-level`.
|
:qpdf:ref:`--json-stream-prefix`. When this option is specified,
|
||||||
If you want to look at the contents of streams easily as you would
|
the default decode level for stream data is ``none``, but you can
|
||||||
in QDF mode (see :ref:`qdf`), you can use
|
override it with :qpdf:ref:`--decode-level`. If you want to look at
|
||||||
``--decode-level=generalized`` and ``--json-stream-data=file`` for
|
the contents of streams easily as you would in QDF mode (see
|
||||||
a convenient way to do that.
|
:ref:`qdf`), you can use ``--decode-level=generalized`` and
|
||||||
|
``--json-stream-data=file`` for a convenient way to do that.
|
||||||
|
|
||||||
.. qpdf:option:: --json-input
|
.. qpdf:option:: --json-input
|
||||||
|
|
||||||
.. help: input file is qpdf JSON
|
.. help: input file is qpdf JSON
|
||||||
|
|
||||||
Treat the input file as a JSON file in qpdf JSON format as
|
Treat the input file as a JSON file in qpdf JSON format as
|
||||||
written by qpdf --json-output. See the "QPDF JSON Format"
|
written by qpdf --json-output. See the "qpdf JSON Format"
|
||||||
section of the manual for information about how to use this
|
section of the manual for information about how to use this
|
||||||
option.
|
option.
|
||||||
|
|
||||||
Treat the input file as a JSON file in qpdf JSON format as written
|
Treat the input file as a JSON file in qpdf JSON format as written
|
||||||
by ``qpdf --json-output``. The input file must be complete and
|
by ``qpdf --json-output``. The input file must be complete and
|
||||||
include all stream data. For information about converting between
|
include all stream data. For information about converting between
|
||||||
PDF and JSON, please see :ref:`qpdf-json`.
|
PDF and JSON, please see :ref:`json`.
|
||||||
|
|
||||||
.. qpdf:option:: --update-from-json=qpdf-json-file
|
.. qpdf:option:: --update-from-json=qpdf-json-file
|
||||||
|
|
||||||
.. help: update a PDF from qpdf JSON
|
.. help: update a PDF from qpdf JSON
|
||||||
|
|
||||||
Update a PDF file from a JSON file. Please see the "QPDF JSON
|
Update a PDF file from a JSON file. Please see the "qpdf JSON"
|
||||||
Format" section of the manual for information about how to use
|
chapter of the manual for information about how to use this
|
||||||
this option.
|
option.
|
||||||
|
|
||||||
This option updates a PDF file from a qpdf JSON file. For a
|
This option updates a PDF file from the specified qpdf JSON file.
|
||||||
information about how to use this option, please see
|
For a information about how to use this option, please see
|
||||||
:ref:`qpdf-json`.
|
:ref:`json`.
|
||||||
|
|
||||||
.. _test-options:
|
.. _test-options:
|
||||||
|
|
||||||
@ -3420,7 +3432,7 @@ Related Options
|
|||||||
|
|
||||||
This is used by qpdf's test suite to check consistency between the
|
This is used by qpdf's test suite to check consistency between the
|
||||||
output of ``qpdf --json`` and the output of ``qpdf --json-help``.
|
output of ``qpdf --json`` and the output of ``qpdf --json-help``.
|
||||||
This option causes an extra copy of the generated json to appear in
|
This option causes an extra copy of the generated JSON to appear in
|
||||||
memory and is therefore unsuitable for use with large files. This
|
memory and is therefore unsuitable for use with large files. This
|
||||||
is why it's also not on by default.
|
is why it's also not on by default.
|
||||||
|
|
||||||
|
@ -242,7 +242,7 @@ the current file position. If the token is a not either a dictionary or
|
|||||||
array opener, an object is immediately constructed from the single token
|
array opener, an object is immediately constructed from the single token
|
||||||
and the parser returns. Otherwise, the parser iterates in a special mode
|
and the parser returns. Otherwise, the parser iterates in a special mode
|
||||||
in which it accumulates objects until it finds a balancing closer.
|
in which it accumulates objects until it finds a balancing closer.
|
||||||
During this process, the "``R``" keyword is recognized and an indirect
|
During this process, the ``R`` keyword is recognized and an indirect
|
||||||
``QPDFObjectHandle`` may be constructed.
|
``QPDFObjectHandle`` may be constructed.
|
||||||
|
|
||||||
The ``QPDF::resolve()`` method, which is used to resolve an indirect
|
The ``QPDF::resolve()`` method, which is used to resolve an indirect
|
||||||
@ -280,15 +280,15 @@ file.
|
|||||||
it is looking before the last ``%%EOF``. After getting to ``trailer``
|
it is looking before the last ``%%EOF``. After getting to ``trailer``
|
||||||
keyword, it invokes the parser.
|
keyword, it invokes the parser.
|
||||||
|
|
||||||
- The parser sees "``<<``", so it calls itself recursively in
|
- The parser sees ``<<``, so it calls itself recursively in
|
||||||
dictionary creation mode.
|
dictionary creation mode.
|
||||||
|
|
||||||
- In dictionary creation mode, the parser keeps accumulating objects
|
- In dictionary creation mode, the parser keeps accumulating objects
|
||||||
until it encounters "``>>``". Each object that is read is pushed onto
|
until it encounters ``>>``. Each object that is read is pushed onto
|
||||||
a stack. If "``R``" is read, the last two objects on the stack are
|
a stack. If ``R`` is read, the last two objects on the stack are
|
||||||
inspected. If they are integers, they are popped off the stack and
|
inspected. If they are integers, they are popped off the stack and
|
||||||
their values are used to construct an indirect object handle which is
|
their values are used to construct an indirect object handle which is
|
||||||
then pushed onto the stack. When "``>>``" is finally read, the stack
|
then pushed onto the stack. When ``>>`` is finally read, the stack
|
||||||
is converted into a ``QPDF_Dictionary`` which is placed in a
|
is converted into a ``QPDF_Dictionary`` which is placed in a
|
||||||
``QPDFObjectHandle`` and returned.
|
``QPDFObjectHandle`` and returned.
|
||||||
|
|
||||||
|
796
manual/json.rst
796
manual/json.rst
@ -1,6 +1,9 @@
|
|||||||
|
.. cSpell:ignore moddifyannotations
|
||||||
|
.. cSpell:ignore feff
|
||||||
|
|
||||||
.. _json:
|
.. _json:
|
||||||
|
|
||||||
QPDF JSON
|
qpdf JSON
|
||||||
=========
|
=========
|
||||||
|
|
||||||
.. _json-overview:
|
.. _json-overview:
|
||||||
@ -8,27 +11,540 @@ QPDF JSON
|
|||||||
Overview
|
Overview
|
||||||
--------
|
--------
|
||||||
|
|
||||||
Beginning with qpdf version 8.3.0, the :command:`qpdf`
|
Beginning with qpdf version 11.0.0, the qpdf library and command-line
|
||||||
command-line program can produce a JSON representation of the
|
program can produce a JSON representation of the in a PDF file. qpdf
|
||||||
non-content data in a PDF file. It includes a dump in JSON format of all
|
version 11 introduces JSON format version 2. Prior to qpdf 11,
|
||||||
objects in the PDF file excluding the content of streams. This JSON
|
versions 8.3.0 onward had a more limited JSON representation
|
||||||
representation makes it very easy to look in detail at the structure of
|
accessible only from the command-line. For details on what changed,
|
||||||
a given PDF file, and it also provides a great way to work with PDF
|
see :ref:`json-v2-changes`. The rest of this chapter documents qpdf
|
||||||
files programmatically from the command-line in languages that can't
|
JSON version 2.
|
||||||
call or link with the qpdf library directly. Note that stream data can
|
|
||||||
be extracted from PDF files using other qpdf command-line options.
|
Please note: this chapter discusses *qpdf JSON format*, which
|
||||||
|
represents the contents of a PDF file. This is distinct from the
|
||||||
|
*QPDFJob JSON format* which provides a higher-level interface
|
||||||
|
interacting with qpdf the way the command-line tool does. For
|
||||||
|
information about that, see :ref:`qpdf-job`.
|
||||||
|
|
||||||
|
The qpdf JSON format is specific to qpdf. There are two ways to use
|
||||||
|
qpdf JSON:
|
||||||
|
|
||||||
|
- The :qpdf:ref:`--json` command-ine flag causes creation of a JSON
|
||||||
|
representation of all the objects in a PDF file, excluding stream
|
||||||
|
data. This includes an unambiguous representation of the PDF object
|
||||||
|
structure and also provides JSON-formatted summaries of other
|
||||||
|
information about the file. This functionality is built into
|
||||||
|
``QPDFJob`` and can be accessed from the ``qpdf`` command-line tool
|
||||||
|
or from the ``QPDFJob`` C or C++ API.
|
||||||
|
|
||||||
|
- qpdf can create a JSON file that completely represents a PDF file.
|
||||||
|
You can think of this as using JSON as an *alternative syntax* for
|
||||||
|
representing a PDF file. Using qpdf JSON, it is possible to
|
||||||
|
convert a PDF file to JSON, manipulate the structure or contents of
|
||||||
|
the objects at a low level, and convert the results back to a PDF
|
||||||
|
file. This functionality can be accessed from the command-line with
|
||||||
|
the :qpdf:ref:`--json-output`, :qpdf:ref:`--json-input`, and
|
||||||
|
:qpdf:ref:`--update-from-json` flags, or from the API using the
|
||||||
|
``QPDF::writeJSON``, ``QPDF::createFromJSON``, and
|
||||||
|
``QPDF::updateFromJSON`` methods.
|
||||||
|
|
||||||
|
.. _json-terminology:
|
||||||
|
|
||||||
|
JSON Terminology
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Notes about terminology:
|
||||||
|
|
||||||
|
- In JavaScript and JSON, that thing that has keys and values is
|
||||||
|
typically called an *object*.
|
||||||
|
|
||||||
|
- In PDF, that thing that has keys and values is typically called a
|
||||||
|
*dictionary*. An *object* is a PDF object such as integer, real,
|
||||||
|
boolean, null, string, array, dictionary, or stream.
|
||||||
|
|
||||||
|
- Some languages that use JSON call an *object* a *dictionary*, a
|
||||||
|
*map*, or a *hash*.
|
||||||
|
|
||||||
|
- Sometimes, it's called on *object* if it has fixed keys and a
|
||||||
|
*dictionary* if it has variable keys.
|
||||||
|
|
||||||
|
This manual is not entirely consistent about its use of *dictionary*
|
||||||
|
vs. *object* because sometimes one term or another is clearer in
|
||||||
|
context. Just be aware of the ambiguity when reading the manual. We
|
||||||
|
frequently use the term *dictionary* to refer to a JSON object because
|
||||||
|
of the consistency with PDF terminology.
|
||||||
|
|
||||||
|
.. _what-qpdf-json-is-not:
|
||||||
|
|
||||||
|
What qpdf JSON is not
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Please note that qpdf JSON offers a convenient syntax for manipulating
|
||||||
|
PDF files at a low level using JSON syntax. JSON syntax is much easier
|
||||||
|
to work with than native PDF syntax, and there are good JSON libraries
|
||||||
|
in virtually every commonly used programming language. Working with
|
||||||
|
PDF objects in JSON removes the need to worry about stream lengths,
|
||||||
|
cross reference tables, and PDF-specific representations of Unicode or
|
||||||
|
binary strings that appear outside of content streams. It does not
|
||||||
|
eliminate the need to understand the semantic structure of PDF files.
|
||||||
|
Working with qpdf JSON still requires familiarity with the PDF
|
||||||
|
specification.
|
||||||
|
|
||||||
|
In particular, qpdf JSON *does not* provide any of the following
|
||||||
|
capabilities:
|
||||||
|
|
||||||
|
- Text extraction. While you could use qpdf JSON syntax to navigate to
|
||||||
|
a page's content streams and font structures, text within pages is
|
||||||
|
still encoded using PDF syntax within content streams, and there is
|
||||||
|
no assistance for text extraction.
|
||||||
|
|
||||||
|
- Reflowing text, document structure. qpdf JSON does not add any new
|
||||||
|
information or insight into the content of PDF files. If you have a
|
||||||
|
PDF file that lacks any structural information, qpdf JSON won't help
|
||||||
|
you solve any of those problems.
|
||||||
|
|
||||||
|
This is what we mean when we say that JSON provides a *alternative
|
||||||
|
syntax* for working with PDF data. Semantically, it is identical to
|
||||||
|
native PDF.
|
||||||
|
|
||||||
.. _qpdf-json:
|
.. _qpdf-json:
|
||||||
|
|
||||||
QPDF JSON Format
|
qpdf JSON Format
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
XXX Write this.
|
This section describes how qpdf represents PDF objects in JSON format.
|
||||||
|
It also describes how to work with qpdf JSON to create or
|
||||||
|
modify PDF files.
|
||||||
|
|
||||||
|
.. _json.objects:
|
||||||
|
|
||||||
|
qpdf JSON Object Representation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This section describes the representation of PDF objects in qpdf JSON
|
||||||
|
version 2. PDF objects are represented within the ``"objects"``
|
||||||
|
dictionary of a qpdf JSON file. This is true both for PDF serialized
|
||||||
|
to JSON (:qpdf:ref:`--json-output`, ``QPDF::writeJSON``) or objects as
|
||||||
|
they appear in the output of ``qpdf`` with the :qpdf:ref:`--json`
|
||||||
|
option.
|
||||||
|
|
||||||
|
Each key in the ``"objects"`` dictionary is either ``"trailer"`` or a
|
||||||
|
string of the form ``"obj:O G R"`` where ``O`` and ``G`` are the
|
||||||
|
object and generation numbers and ``R`` is the literal string ``R``.
|
||||||
|
This is the PDF syntax for the indirect object reference prepended by
|
||||||
|
``obj:``. The value, representing the object itself, is a JSON object
|
||||||
|
whose structure is described below.
|
||||||
|
|
||||||
|
Top-level Stream Objects
|
||||||
|
Stream objects are represented as a JSON object with the single key
|
||||||
|
``"stream"``. The stream object has a key called ``"dict"`` whose
|
||||||
|
value is the stream dictionary as an object value (described below)
|
||||||
|
with the ``"/Length"`` key omitted. Other keys are determined by the
|
||||||
|
value for json stream data (:qpdf:ref:`--json-stream-data`, or a
|
||||||
|
parameter of type ``qpdf_json_stream_data_e``) as follows:
|
||||||
|
|
||||||
|
- ``none``: stream data is not represented; no other keys are
|
||||||
|
present
|
||||||
|
|
||||||
|
- ``inline``: the stream data appears as a base64-encoded string as
|
||||||
|
the value of the ``"data"`` key
|
||||||
|
|
||||||
|
- ``file``: the stream data is written to a file, and the path to
|
||||||
|
the file is stored in the ``"datafile"`` key. A relative path is
|
||||||
|
interpreted as relative to the current directory when qpdf is
|
||||||
|
invoked.
|
||||||
|
|
||||||
|
Keys other than ``"dict"``, ``"data"``, and ``"datafile"`` are
|
||||||
|
ignored. This is primarily for future compatibility in case a newer
|
||||||
|
version of qpdf includes additional information.
|
||||||
|
|
||||||
|
As with the native PDF representation, the stream data must be
|
||||||
|
consistent with whatever filters and decode parameters are specified
|
||||||
|
in the stream dictionary.
|
||||||
|
|
||||||
|
Top-level Non-stream Objects
|
||||||
|
Non-stream objects are represented as a dictionary with the single
|
||||||
|
key ``"value"``. Other keys are ignored for future compatibility.
|
||||||
|
The value's structure is described in "Object Values" below.
|
||||||
|
|
||||||
|
Note: in files that use object streams, the trailer "dictionary" is
|
||||||
|
actually a stream, but in the JSON representation, the value of the
|
||||||
|
``"trailer"`` key is always written as a dictionary (with a
|
||||||
|
``"value"`` key like other non-stream objects). There will also be a
|
||||||
|
a stream object whose key is the object ID of the cross-reference
|
||||||
|
stream, even though this stream will generally be unreferenced. This
|
||||||
|
makes it possible to assume ``"trailer"`` points to a dictionary
|
||||||
|
without having to consider whether the file uses object streams or
|
||||||
|
not. It is also consistent with how ``QPDF::getTrailer`` behaves in
|
||||||
|
the C++ API.
|
||||||
|
|
||||||
|
Object Values
|
||||||
|
Within ``"value"`` or ``"stream"."dict"``, PDF objects are
|
||||||
|
represented as follows:
|
||||||
|
|
||||||
|
- Objects of type Boolean or null are represented as JSON objects of
|
||||||
|
the same type.
|
||||||
|
|
||||||
|
- Objects that are numeric are represented as numeric in the JSON
|
||||||
|
without regard to precision. Internally, qpdf stores numeric
|
||||||
|
values as strings, so qpdf will preserve arbitrary precision
|
||||||
|
numerical values when reading and writing JSON. It is likely that
|
||||||
|
other JSON readers and writers will have implementation-dependent
|
||||||
|
ways of handling numerical values that are out of range.
|
||||||
|
|
||||||
|
- Name objects are represented as JSON strings that start with ``/``
|
||||||
|
and are followed by the PDF name in canonical form with all PDF
|
||||||
|
syntax resolved. For example, the name whose canonical form (per
|
||||||
|
the PDF specification) is ``text/plain`` would be represented in
|
||||||
|
JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``.
|
||||||
|
|
||||||
|
- Indirect object references are represented as JSON strings that
|
||||||
|
look like a PDF indirect object reference and have the form ``"O G
|
||||||
|
R"`` where ``O`` and ``G`` are the object and generation numbers
|
||||||
|
and ``R`` is the literal string ``R``. For example, ``"3 0 R"``
|
||||||
|
would represent a reference to the object with object ID 3 and
|
||||||
|
generation 0.
|
||||||
|
|
||||||
|
- PDF strings are represented as JSON strings in one of two ways:
|
||||||
|
|
||||||
|
- ``"u:utf8-encoded-string"``: this format is used when the PDF
|
||||||
|
string can be unambiguously represented as a Unicode string and
|
||||||
|
contains no unprintable characters. This is the case whether the
|
||||||
|
input string is encoded as UTF-16, UTF-8 (as allowed by PDF
|
||||||
|
2.0), or PDF doc encoding. Strings are only represented this way
|
||||||
|
if they can be encoded without loss of information.
|
||||||
|
|
||||||
|
- ``"b:hex-string"``: this format is used to represent any binary
|
||||||
|
string value that can't be represented as a Unicode string.
|
||||||
|
``hex-string`` must have an even number of characters that range
|
||||||
|
from ``a`` through ``f``, ``A`` through ``F``, or ``0`` through
|
||||||
|
``9``.
|
||||||
|
|
||||||
|
qpdf writes empty strings as ``"u:"``, but both ``"b:"`` and
|
||||||
|
``"u:"`` are valid representations of the empty string.
|
||||||
|
|
||||||
|
There is full support for UTF-16 surrogate pairs. Binary strings
|
||||||
|
encoded with ``"b:..."`` are the internal PDF representations.
|
||||||
|
As such, the following are equivalent:
|
||||||
|
|
||||||
|
- ``"u:\ud83e\udd54"`` -- representation of U+1F954 as a surrogate
|
||||||
|
pair in JSON syntax
|
||||||
|
|
||||||
|
- ``"b:FEFFD83EDD54"`` -- representation of U+1F954 as the bytes
|
||||||
|
of a UTF-16 string in PDF syntax with the leading ``FEFF``
|
||||||
|
indicating UTF-16
|
||||||
|
|
||||||
|
- ``"b:efbbbff09fa594"`` -- representation of U+1F954 as the
|
||||||
|
bytes of a UTF-8 string in PDF syntax (as allowed by PDF 2.0)
|
||||||
|
with the leading ``EF``, ``BB``, ``BF`` sequence (which is just
|
||||||
|
UTF-8 encoding of ``FEFF``).
|
||||||
|
|
||||||
|
- A JSON string whose contents are ``u:`` followed by the UTF-8
|
||||||
|
representation of U+1F954. This is the potato emoji.
|
||||||
|
Unfortunately, I am not able to render it in the PDF version
|
||||||
|
of this manual.
|
||||||
|
|
||||||
|
- PDF arrays are represented as JSON arrays of objects as described
|
||||||
|
above
|
||||||
|
|
||||||
|
- PDF dictionaries are represented as JSON objects whose keys are
|
||||||
|
the string representations of names and whose values are
|
||||||
|
representations of PDF objects.
|
||||||
|
|
||||||
|
.. _json.output:
|
||||||
|
|
||||||
|
qpdf JSON Output
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The format of the JSON written by qpdf's :qpdf:ref:`--json-output`
|
||||||
|
flag or the ``QPDF::writeJSON`` API call is a JSON object consisting
|
||||||
|
of a single key: ``"qpdf-v2"``. Any other top-level keys are ignored.
|
||||||
|
While unknown keys in other places are ignored for future
|
||||||
|
compatibility, in this case, ignoring other top-level keys is an
|
||||||
|
explicit decision to allow users to include other keys for their own
|
||||||
|
use. No new top-level keys will be added in JSON version 2.
|
||||||
|
|
||||||
|
The ``"qpdf-v2"`` key points to a JSON object with the following keys:
|
||||||
|
|
||||||
|
- ``"pdfversion"`` -- a string containing PDF version as indicated in
|
||||||
|
the PDF header (e.g. ``"1.7"``, ``"2.0"``)
|
||||||
|
|
||||||
|
- ``"maxobjectid"`` -- a number indicating the object ID of the
|
||||||
|
highest numbered object in the file. This is provided to make it
|
||||||
|
easier for software that wants to add new objects to the file as you
|
||||||
|
can safely start with one above that number when creating new
|
||||||
|
objects. Note that the value of ``"maxobjectid"`` may be higher than
|
||||||
|
the actual maximum object that appears in the input PDF since it
|
||||||
|
takes into consideration any dangling indirect object references
|
||||||
|
from the original file. This prevents you from unwittingly creating
|
||||||
|
an object that doesn't exist but that is referenced, which may have
|
||||||
|
unintended side effects. (The PDF specification explicitly allows
|
||||||
|
dangling references and says to treat them as nulls. This can happen
|
||||||
|
if objects are removed from a PDF file.)
|
||||||
|
|
||||||
|
- ``"objects"`` -- the actual PDF objects as described in
|
||||||
|
:ref:`json.objects`.
|
||||||
|
|
||||||
|
Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
|
||||||
|
As such, none of the things ``QPDFWriter`` does apply. This includes
|
||||||
|
recompression of streams, renumbering of objects, anything to do with
|
||||||
|
object streams (which are not represented by qpdf JSON at all since
|
||||||
|
they are PDF syntax, not semantics), encryption, decryption,
|
||||||
|
linearization, QDF mode, etc.
|
||||||
|
|
||||||
|
.. _json.example:
|
||||||
|
|
||||||
|
qpdf JSON Example
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The JSON below shows an example of a simple PDF file represented in
|
||||||
|
qpdf JSON format.
|
||||||
|
|
||||||
|
.. code-block:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"qpdf-v2": {
|
||||||
|
"pdfversion": "1.3",
|
||||||
|
"maxobjectid": 5,
|
||||||
|
"objects": {
|
||||||
|
"obj:1 0 R": {
|
||||||
|
"value": {
|
||||||
|
"/Pages": "2 0 R",
|
||||||
|
"/Type": "/Catalog"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"obj:2 0 R": {
|
||||||
|
"value": {
|
||||||
|
"/Count": 1,
|
||||||
|
"/Kids": [ "3 0 R" ],
|
||||||
|
"/Type": "/Pages"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"obj:3 0 R": {
|
||||||
|
"value": {
|
||||||
|
"/Contents": "4 0 R",
|
||||||
|
"/MediaBox": [ 0, 0, 612, 792 ],
|
||||||
|
"/Parent": "2 0 R",
|
||||||
|
"/Resources": {
|
||||||
|
"/Font": {
|
||||||
|
"/F1": "5 0 R"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/Type": "/Page"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"obj:4 0 R": {
|
||||||
|
"stream": {
|
||||||
|
"data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI",
|
||||||
|
"dict": {
|
||||||
|
"/Filter": "/FlateDecode"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"obj:5 0 R": {
|
||||||
|
"value": {
|
||||||
|
"/BaseFont": "/Helvetica",
|
||||||
|
"/Encoding": "/WinAnsiEncoding",
|
||||||
|
"/Subtype": "/Type1",
|
||||||
|
"/Type": "/Font"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"trailer": {
|
||||||
|
"value": {
|
||||||
|
"/ID": [
|
||||||
|
"b:98b5a26966fba4d3a769b715b2558da6",
|
||||||
|
"b:98b5a26966fba4d3a769b715b2558da6"
|
||||||
|
],
|
||||||
|
"/Root": "1 0 R",
|
||||||
|
"/Size": 6
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
.. _json.input:
|
||||||
|
|
||||||
|
qpdf JSON Input
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Output in the JSON output format described in :ref:`json.output` can
|
||||||
|
be used in two different ways:
|
||||||
|
|
||||||
|
- By using the :qpdf:ref:`--json-input` flag or calling
|
||||||
|
``QPDF::createFromJSON`` in place of ``QPDF::processFile``, a qpdf
|
||||||
|
JSON file can be used in place of a PDF file as the input to qpdf.
|
||||||
|
|
||||||
|
- By using the :qpdf:ref:`--update-from-json` flag or calling
|
||||||
|
``QPDF::updateFromJSON`` on an initialized ``QPDF`` object, a qpdf
|
||||||
|
JSON file can be used to apply changes to an existing ``QPDF``
|
||||||
|
object. That ``QPDF`` object can have come from any source including
|
||||||
|
a PDF file, a qpdf JSON file, or the result of any other process
|
||||||
|
that results in a valid, initialized ``QPDF`` object.
|
||||||
|
|
||||||
|
Here are some important things to know about qpdf JSON input.
|
||||||
|
|
||||||
|
- When a qpdf JSON file is used as the primary input file, it must be
|
||||||
|
complete. This means
|
||||||
|
|
||||||
|
- A PDF version number must be specified with the ``"pdfversion"``
|
||||||
|
key
|
||||||
|
|
||||||
|
- Stream data must be present for all streams
|
||||||
|
|
||||||
|
- The trailer dictionary must be present, though only the
|
||||||
|
``"/Root"`` key is required.
|
||||||
|
|
||||||
|
- Certain fields from the input are ignored whether creating or
|
||||||
|
updating from a JSON file:
|
||||||
|
|
||||||
|
- ``"maxobjectid"`` is ignored, so it is not necessary to update it
|
||||||
|
when adding new objects.
|
||||||
|
|
||||||
|
- ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't
|
||||||
|
put it there when it creates JSON output, and it is not necessary
|
||||||
|
to add it.
|
||||||
|
|
||||||
|
- ``"/Size"`` is ignored if it appears in a trailer dictionary as
|
||||||
|
that is always recomputed by ``QPDFWriter``.
|
||||||
|
|
||||||
|
- Unknown keys at the to top level of the file, within ``objects``,
|
||||||
|
at the top level of each individual object (inside the object that
|
||||||
|
has the ``"value"`` or ``"stream"`` key) and directly within
|
||||||
|
``"stream"`` are ignored for future compatibility. You should
|
||||||
|
avoid putting your own values in those places if you wish to avoid
|
||||||
|
risking that your JSON files will not work in future versions of
|
||||||
|
qpdf. The exception to this advice is at the top level of the
|
||||||
|
overall file where it is explicitly supported for you to add your
|
||||||
|
own keys. For example, you could add your own metadata at the top
|
||||||
|
level, and qpdf will ignore it. Note that extra top-level keys are
|
||||||
|
not preserved when qpdf reads your JSON file.
|
||||||
|
|
||||||
|
- When qpdf reads a PDF file, the internal object numbers are always
|
||||||
|
preserved. However, when qpdf writes a file using ``QPDFWriter``,
|
||||||
|
``QPDFWriter`` does its own numbering and, in general, does not
|
||||||
|
preserve input object numbers. That means that a qpdf JSON file that
|
||||||
|
is used to update an existing PDF must have object numbers that
|
||||||
|
match the input file it is modifying. In practical terms, this means
|
||||||
|
that you can't use a JSON file created from one PDF file to modify
|
||||||
|
the *output of running qpdf on that file*.
|
||||||
|
|
||||||
|
To put this more concretely, the following is valid:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
qpdf --json-output in.pdf pdf.json
|
||||||
|
# edit pdf.json
|
||||||
|
qpdf in.pdf out.pdf --update-from-json=pdf.json
|
||||||
|
|
||||||
|
The following will not produce predictable results because
|
||||||
|
``out.pdf`` won't have the same object numbers as ``pdf.json`` and
|
||||||
|
``in.pdf``.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
qpdf --json-output in.pdf pdf.json
|
||||||
|
# edit pdf.json
|
||||||
|
qpdf in.pdf out.pdf --update-from-json=pdf.json
|
||||||
|
# edit pdf.json again
|
||||||
|
# Don't do this
|
||||||
|
qpdf out.pdf out2.pdf --update-from-json=pdf.json
|
||||||
|
|
||||||
|
- When updating from a JSON file (:qpdf:ref:`--update-from-json`,
|
||||||
|
``QPDF::updateFromJSON``), existing objects are updated in place.
|
||||||
|
This has the following implications:
|
||||||
|
|
||||||
|
- You may omit both ``"data"`` and ``"datafile"`` if the object you
|
||||||
|
are updating is already a stream. In that case the original stream
|
||||||
|
data is preserved. You must always provide a stream dictionary,
|
||||||
|
but it may be empty. Note that an empty stream dictionary will
|
||||||
|
clear the old dictionary. There is no way to indicate that an old
|
||||||
|
stream dictionary should be left alone, so if your intention is to
|
||||||
|
replace the stream data and preserve the dictionary, the
|
||||||
|
original dictionary must appear in the JSON file.
|
||||||
|
|
||||||
|
- You can change one object type to another object type including
|
||||||
|
replacing a stream with a non-stream or a non-stream with a
|
||||||
|
stream. If you replace a non-stream with a stream, you must
|
||||||
|
provide data for the stream.
|
||||||
|
|
||||||
|
- Objects that you do not wish to modify can be omitted from the
|
||||||
|
JSON. That includes the trailer. That means you can use the output
|
||||||
|
of a qpdf JSON file that was written using
|
||||||
|
:qpdf:ref:`--json-object` to have it include only the objects you
|
||||||
|
intend to modify.
|
||||||
|
|
||||||
|
- You can omit the ``"pdfversion"`` key. The input PDF version will
|
||||||
|
be preserved.
|
||||||
|
|
||||||
|
.. _json.workflow-cli:
|
||||||
|
|
||||||
|
qpdf JSON Workflow: CLI
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This section includes a few examples of using qpdf JSON.
|
||||||
|
|
||||||
|
- Convert a PDF file to JSON format, edit the JSON, and convert back
|
||||||
|
to PDF. This is an alternative to using QDF mode (see :ref:`qdf`) to
|
||||||
|
modify PDF files in a text editor. Each method has its own
|
||||||
|
advantages and disadvantages.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
qpdf --json-output in.pdf pdf.json
|
||||||
|
# edit pdf.json
|
||||||
|
qpdf --json-input pdf.json out.pdf
|
||||||
|
|
||||||
|
- Extract only a specific object into a JSON file, modify the object
|
||||||
|
in JSON, and use the modified object to update the original PDF. In
|
||||||
|
this case, we're editing object 4, whatever that may happen to be.
|
||||||
|
You would have to know through some other means which object you
|
||||||
|
wanted to edit, such as by looking at other JSON output or using a
|
||||||
|
tool (possibly but not necessarily qpdf) to identify the object.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
qpdf --json-output in.pdf pdf.json --json-object=4,0
|
||||||
|
# edit pdf.json
|
||||||
|
qpdf in.pdf --update-from-json=pdf.json out.pdf
|
||||||
|
|
||||||
|
Rather than using :qpdf:ref:`--json-object` as in the above example,
|
||||||
|
you could edit the JSON file to remove the objects you didn't need.
|
||||||
|
You could also just leave them there, though the update process
|
||||||
|
would be slower.
|
||||||
|
|
||||||
|
You could also add new objects to a file by adding them to
|
||||||
|
``pdf.json``. Just be sure the object number doesn't conflict with
|
||||||
|
an existing object. The ``"maxobjectid"`` field in the original
|
||||||
|
output can help with this. You don't have to update it if you add
|
||||||
|
objects as it is ignored when the file is read back in.
|
||||||
|
|
||||||
|
- Use :qpdf:ref:`--json-input` and :qpdf:ref:`--json-output` together
|
||||||
|
to demonstrate preservation of object numbers. In this example,
|
||||||
|
``a.json`` and ``b.json`` will have the same objects and object
|
||||||
|
numbers. The files may not be identical since strings may be
|
||||||
|
normalized, fields may appear in a different order, etc. However
|
||||||
|
``b.json`` and ``c.json`` are probably identical.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
qpdf --json-output in.pdf a.json
|
||||||
|
qpdf --json-input --json-output a.json b.json
|
||||||
|
qpdf --json-input --json-output b.json c.json
|
||||||
|
|
||||||
|
|
||||||
|
.. _json.workflow-api:
|
||||||
|
|
||||||
|
qpdf JSON Workflow: API
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Everything that can be done using the qpdf CLI can be done using the
|
||||||
|
C++ API. See comments in :file:`QPDF.hh` for ``writeJSON``,
|
||||||
|
``createFromJSON``, and ``updateFromJSON`` for details.
|
||||||
|
|
||||||
.. _json-guarantees:
|
.. _json-guarantees:
|
||||||
|
|
||||||
JSON Guarantees
|
JSON Compatibility Guarantees
|
||||||
---------------
|
-----------------------------
|
||||||
|
|
||||||
The qpdf JSON representation includes a JSON serialization of the raw
|
The qpdf JSON representation includes a JSON serialization of the raw
|
||||||
objects in the PDF file as well as some computed information in a more
|
objects in the PDF file as well as some computed information in a more
|
||||||
@ -37,24 +553,23 @@ format. These guarantees are designed to simplify the experience of a
|
|||||||
developer working with the JSON format.
|
developer working with the JSON format.
|
||||||
|
|
||||||
Compatibility
|
Compatibility
|
||||||
The top-level JSON object output is a dictionary. The JSON output
|
The top-level JSON object is a dictionary (JSON "object"). The JSON
|
||||||
contains various nested dictionaries and arrays. With the exception
|
output contains various nested dictionaries and arrays. With the
|
||||||
of dictionaries that are populated by the fields of objects from the
|
exception of dictionaries that are populated by the fields of
|
||||||
file, all instances of a dictionary are guaranteed to have exactly
|
PDF objects from the file, all instances of a dictionary are
|
||||||
the same keys. Future versions of qpdf are free to add additional
|
guaranteed to have exactly the same keys.
|
||||||
keys but not to remove keys or change the type of object that a key
|
|
||||||
points to. The qpdf program validates this guarantee, and in the
|
|
||||||
unlikely event that a bug in qpdf should cause it to generate data
|
|
||||||
that doesn't conform to this rule, it will ask you to file a bug
|
|
||||||
report.
|
|
||||||
|
|
||||||
The top-level JSON structure contains a "``version``" key whose value
|
The top-level JSON structure contains a ``"version"`` key whose
|
||||||
is simple integer. The value of the ``version`` key will be
|
value is simple integer. The value of the ``version`` key will be
|
||||||
incremented if a non-compatible change is made. A non-compatible
|
incremented if a non-compatible change is made. A non-compatible
|
||||||
change would be any change that involves removal of a key, a change
|
change would be any change that involves removal of a key, a change
|
||||||
to the format of data pointed to by a key, or a semantic change that
|
to the format of data pointed to by a key, or a semantic change
|
||||||
requires a different interpretation of a previously existing key. A
|
that requires a different interpretation of a previously existing
|
||||||
strong effort will be made to avoid breaking compatibility.
|
key.
|
||||||
|
|
||||||
|
With a specific qpdf JSON version, future versions of qpdf are free
|
||||||
|
to add additional keys but not to remove keys or change the type of
|
||||||
|
object that a key points to.
|
||||||
|
|
||||||
Documentation
|
Documentation
|
||||||
The :command:`qpdf` command can be invoked with the
|
The :command:`qpdf` command can be invoked with the
|
||||||
@ -66,28 +581,29 @@ Documentation
|
|||||||
|
|
||||||
- A dictionary in the help output means that the corresponding
|
- A dictionary in the help output means that the corresponding
|
||||||
location in the actual JSON output is also a dictionary with
|
location in the actual JSON output is also a dictionary with
|
||||||
exactly the same keys; that is, no keys present in help are absent
|
exactly the same keys; that is, no keys present in help are
|
||||||
in the real output, and no keys will be present in the real output
|
absent in the real output, and no keys will be present in the
|
||||||
that are not in help. As a special case, if the dictionary has a
|
real output that are not in help. It is possible for a key to be
|
||||||
single key whose name starts with ``<`` and ends with ``>``, it
|
present and have a value that is explicitly ``null``. As a
|
||||||
means that the JSON output is a dictionary that can have any keys,
|
special case, if the dictionary has a single key whose name
|
||||||
each of which conforms to the value of the special key. This is
|
starts with ``<`` and ends with ``>``, it means that the JSON
|
||||||
used for cases in which the keys of the dictionary are things like
|
output is a dictionary that can have any value as a key. This is
|
||||||
object IDs.
|
used for cases in which the keys of the dictionary are things
|
||||||
|
like object IDs.
|
||||||
|
|
||||||
- A string in the help output is a description of the item that
|
- A string in the help output is a description of the item that
|
||||||
appears in the corresponding location of the actual output. The
|
appears in the corresponding location of the actual output. The
|
||||||
corresponding output can have any format.
|
corresponding output can have any value including ``null``.
|
||||||
|
|
||||||
- An array in the help output always contains a single element. It
|
- An array in the help output always contains a single element. It
|
||||||
indicates that the corresponding location in the actual output is
|
indicates that the corresponding location in the actual output is
|
||||||
also an array, and that each element of the array has whatever
|
an array of any length, and that each element of the array has
|
||||||
format is implied by the single element of the help output's
|
whatever format is implied by the single element of the help
|
||||||
array.
|
output's array.
|
||||||
|
|
||||||
For example, the help output indicates includes a "``pagelabels``"
|
For example, the help output indicates includes a ``"pagelabels"``
|
||||||
key whose value is an array of one element. That element is a
|
key whose value is an array of one element. That element is a
|
||||||
dictionary with keys "``index``" and "``label``". In addition to
|
dictionary with keys ``"index"`` and ``"label"``. In addition to
|
||||||
describing the meaning of those keys, this tells you that the actual
|
describing the meaning of those keys, this tells you that the actual
|
||||||
JSON output will contain a ``pagelabels`` array, each of whose
|
JSON output will contain a ``pagelabels`` array, each of whose
|
||||||
elements is a dictionary that contains an ``index`` key, a ``label``
|
elements is a dictionary that contains an ``index`` key, a ``label``
|
||||||
@ -95,56 +611,13 @@ Documentation
|
|||||||
|
|
||||||
Directness and Simplicity
|
Directness and Simplicity
|
||||||
The JSON output contains the value of every object in the file, but
|
The JSON output contains the value of every object in the file, but
|
||||||
it also contains some processed data. This is analogous to how qpdf's
|
it also contains some summary data. This is analogous to how qpdf's
|
||||||
library interface works. The processed data is similar to the helper
|
library interface works. The summary data is similar to the helper
|
||||||
functions in that it allows you to look at certain aspects of the PDF
|
functions in that it allows you to look at certain aspects of the
|
||||||
file without having to understand all the nuances of the PDF
|
PDF file without having to understand all the nuances of the PDF
|
||||||
specification, while the raw objects allow you to mine the PDF for
|
specification, while the raw objects allow you to mine the PDF for
|
||||||
anything that the higher-level interfaces are lacking.
|
anything that the higher-level interfaces are lacking.
|
||||||
|
|
||||||
.. _json.limitations:
|
|
||||||
|
|
||||||
Limitations of JSON Representation
|
|
||||||
----------------------------------
|
|
||||||
|
|
||||||
There are a few limitations to be aware of with the JSON structure:
|
|
||||||
|
|
||||||
- Strings, names, and indirect object references in the original PDF
|
|
||||||
file are all converted to strings in the JSON representation. In the
|
|
||||||
case of a "normal" PDF file, you can tell the difference because a
|
|
||||||
name starts with a slash (``/``), and an indirect object reference
|
|
||||||
looks like ``n n R``, but if there were to be a string that looked
|
|
||||||
like a name or indirect object reference, there would be no way to
|
|
||||||
tell this from the JSON output. Note that there are certain cases
|
|
||||||
where you know for sure what something is, such as knowing that
|
|
||||||
dictionary keys in objects are always names and that certain things
|
|
||||||
in the higher-level computed data are known to contain indirect
|
|
||||||
object references.
|
|
||||||
|
|
||||||
- The JSON format doesn't support binary data very well. Mostly the
|
|
||||||
details are not important, but they are presented here for
|
|
||||||
information. When qpdf outputs a string in the JSON representation,
|
|
||||||
it converts the string to UTF-8, assuming usual PDF string semantics.
|
|
||||||
Specifically, if the original string is UTF-16, it is converted to
|
|
||||||
UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
|
|
||||||
converted to UTF-8 with that assumption. This causes strange things
|
|
||||||
to happen to binary strings. For example, if you had the binary
|
|
||||||
string ``<038051>``, this would be output to the JSON as ``\u0003•Q``
|
|
||||||
because ``03`` is not a printable character and ``80`` is the bullet
|
|
||||||
character in PDF doc encoding and is mapped to the Unicode value
|
|
||||||
``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
|
|
||||||
convert back from here to a binary string, would have to recognize
|
|
||||||
Unicode values whose code points are higher than ``0xFF`` and map
|
|
||||||
those back to their corresponding PDF doc encoding characters. There
|
|
||||||
is no way to tell the difference between a Unicode string that was
|
|
||||||
originally encoded as UTF-16 or one that was converted from PDF doc
|
|
||||||
encoding. In other words, it's best if you don't try to use the JSON
|
|
||||||
format to extract binary strings from the PDF file, but if you really
|
|
||||||
had to, it could be done. Note that qpdf's
|
|
||||||
:qpdf:ref:`--show-object` option does not have this
|
|
||||||
limitation and will reveal the string as encoded in the original
|
|
||||||
file.
|
|
||||||
|
|
||||||
.. _json.considerations:
|
.. _json.considerations:
|
||||||
|
|
||||||
JSON: Special Considerations
|
JSON: Special Considerations
|
||||||
@ -157,12 +630,15 @@ be aware of:
|
|||||||
- If a PDF file has certain types of errors in its pages tree (such as
|
- If a PDF file has certain types of errors in its pages tree (such as
|
||||||
page objects that are direct or multiple pages sharing the same
|
page objects that are direct or multiple pages sharing the same
|
||||||
object ID), qpdf will automatically repair the pages tree. If you
|
object ID), qpdf will automatically repair the pages tree. If you
|
||||||
specify ``"objects"`` and/or ``"objectinfo"`` without any other
|
specify ``"objects"`` (and, with qpdf JSON version 1, also
|
||||||
keys, you will see the original pages tree without any corrections.
|
``"objectinfo"``) without any other keys, you will see the original
|
||||||
If you specify any of keys that require page tree traversal (for
|
pages tree without any corrections. If you specify any of keys that
|
||||||
example, ``"pages"``, ``"outlines"``, or ``"pagelabel"``), then
|
require page tree traversal (for example, ``"pages"``,
|
||||||
``"objects"`` and ``"objectinfo"`` will show the repaired page tree
|
``"outlines"``, or ``"pagelabel"``), then ``"objects"`` (and
|
||||||
so that object references will be consistent throughout the file.
|
``"objectinfo"``) will show the repaired page tree so that object
|
||||||
|
references will be consistent throughout the file. This is not an
|
||||||
|
issue with :qpdf:ref:`--json-output`, which doesn't repair the pages
|
||||||
|
tree.
|
||||||
|
|
||||||
- While qpdf guarantees that keys present in the help will be present
|
- While qpdf guarantees that keys present in the help will be present
|
||||||
in the output, those fields may be null or empty if the information
|
in the output, those fields may be null or empty if the information
|
||||||
@ -177,22 +653,128 @@ be aware of:
|
|||||||
1. Note that JSON indexes from 0, and you would also use 0-based
|
1. Note that JSON indexes from 0, and you would also use 0-based
|
||||||
indexing using the API. However, 1-based indexing is easier in this
|
indexing using the API. However, 1-based indexing is easier in this
|
||||||
case because the command-line syntax for specifying page ranges is
|
case because the command-line syntax for specifying page ranges is
|
||||||
1-based. If you were going to write a program that looked through the
|
1-based. If you were going to write a program that looked through
|
||||||
JSON for information about specific pages and then use the
|
the JSON for information about specific pages and then use the
|
||||||
command-line to extract those pages, 1-based indexing is easier.
|
command-line to extract those pages, 1-based indexing is easier.
|
||||||
Besides, it's more convenient to subtract 1 from a program in a real
|
Besides, it's more convenient to subtract 1 in a real programming
|
||||||
programming language than it is to add 1 from shell code.
|
language than it is to add 1 in shell code.
|
||||||
|
|
||||||
- The image information included in the ``page`` section of the JSON
|
- The image information included in the ``page`` section of the JSON
|
||||||
output includes the key "``filterable``". Note that the value of this
|
output includes the key ``"filterable"``. Note that the value of
|
||||||
field may depend on the :qpdf:ref:`--decode-level` that
|
this field may depend on the :qpdf:ref:`--decode-level` that you
|
||||||
you invoke qpdf with. The JSON output includes a top-level key
|
invoke qpdf with. The JSON output includes a top-level key
|
||||||
"``parameters``" that indicates the decode level used for computing
|
``"parameters"`` that indicates the decode level that was used for
|
||||||
whether a stream was filterable. For example, jpeg images will be
|
computing whether a stream was filterable. For example, jpeg images
|
||||||
shown as not filterable by default, but they will be shown as
|
will be shown as not filterable by default, but they will be shown
|
||||||
filterable if you run :command:`qpdf --json
|
as filterable if you run :command:`qpdf --json
|
||||||
--decode-level=all`.
|
--decode-level=all`.
|
||||||
|
|
||||||
- The ``encrypt`` key's values will be populated for non-encrypted
|
- The ``encrypt`` key's values will be populated for non-encrypted
|
||||||
files. Some values will be null, and others will have values that
|
files. Some values will be null, and others will have values that
|
||||||
apply to unencrypted files.
|
apply to unencrypted files.
|
||||||
|
|
||||||
|
- The qpdf library itself never loads an entire PDF into memory. This
|
||||||
|
remains true for PDF files represented in JSON format. In general,
|
||||||
|
qpdf will hold the entire object structure in memory once a file has
|
||||||
|
been fully read (objects are loaded into memory lazily but stay
|
||||||
|
there once loaded), but it will never have more than two copies of a
|
||||||
|
stream in memory at once. That said, if you ask qpdf to write JSON
|
||||||
|
to memory, it will do so, so be careful about this if you are
|
||||||
|
working with very large PDF files. There is nothing in the qpdf
|
||||||
|
library itself that prevents working with PDF files much larger than
|
||||||
|
available system memory. qpdf can both read and write such files in
|
||||||
|
JSON format. If you need to work with a PDF file's json
|
||||||
|
representation in memory, it is recommended that you use either
|
||||||
|
``none`` or ``file`` as the argument to
|
||||||
|
:qpdf:ref:`--json-stream-data`, or if using the API, use
|
||||||
|
``qpdf_sj_none`` or ``pdf_sj_file`` as the json stream data value.
|
||||||
|
If using ``none``, you can use other means to obtain the stream
|
||||||
|
data.
|
||||||
|
|
||||||
|
.. _json-v2-changes:
|
||||||
|
|
||||||
|
Changes from JSON v1 to v2
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
The following changes were made to qpdf's JSON output format for
|
||||||
|
version 2.
|
||||||
|
|
||||||
|
- The representation of objects has changed. For details, see
|
||||||
|
:ref:`json.objects`.
|
||||||
|
|
||||||
|
- The representation of strings is now unambiguous for all strings.
|
||||||
|
Strings a prefixed with either ``u:`` for Unicode strings or
|
||||||
|
``b:`` for byte strings.
|
||||||
|
|
||||||
|
- Names are shown in qpdf's canonical form rather than in PDF
|
||||||
|
syntax. (Example: the PDF-syntax name ``/text#2fplain`` appeared
|
||||||
|
as ``"/text#2fplain"`` in v1 but appears as ``"/text/plain"`` in
|
||||||
|
v2.
|
||||||
|
|
||||||
|
- The top-level representation of an object in ``"objects"`` is a
|
||||||
|
dictionary containing either a ``"value"`` key or a ``"stream"``
|
||||||
|
key, making it possible to distinguish streams from other objects.
|
||||||
|
|
||||||
|
- The ``"objectinfo"`` key has been removed in favor of a
|
||||||
|
representation in ``"objects"`` that differentiates between a stream
|
||||||
|
and other kinds of objects. In v1, it was not possible to tell a
|
||||||
|
stream from a dictionary within ``"objects"``.
|
||||||
|
|
||||||
|
- Within the ``"objects"`` dictionary, keys are now ``"obj:O G R"``
|
||||||
|
where ``O`` and ``G`` are the object and generation number.
|
||||||
|
``"trailer"`` remains the key for the trailer dictionary. In v1, the
|
||||||
|
``obj:`` prefix was not present. The rationale for this change is as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
- Having a unique prefix (``obj:``) makes it much easier to search
|
||||||
|
in the JSON file for the definition of an object
|
||||||
|
|
||||||
|
- Having the key still contain ``O G R`` makes it much easier to
|
||||||
|
construct the key from an indirect reference. You just have to
|
||||||
|
prepend ``obj:``. There is no need to parse the indirect object
|
||||||
|
reference.
|
||||||
|
|
||||||
|
- In the ``"encrypt"`` object, the ``"modifyannotations"`` was
|
||||||
|
misspelled as ``"moddifyannotations"`` in v1. This has been
|
||||||
|
corrected.
|
||||||
|
|
||||||
|
Motivation for qpdf JSON version 2
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
qpdf JSON version 2 was created to make it possible to manipulate PDF
|
||||||
|
files using JSON syntax instead of native PDF syntax. This makes it
|
||||||
|
possible to make low-level updates to PDF files from just about any
|
||||||
|
programming language or even to do so from the command-line using
|
||||||
|
tools like ``jq`` or any editor that's capable of working with JSON
|
||||||
|
files. There were several limitations of JSON format version 1 that
|
||||||
|
made this impossible:
|
||||||
|
|
||||||
|
- Strings, names, and indirect object references in the original PDF
|
||||||
|
file were all converted to strings in the JSON representation. For
|
||||||
|
casual human inspection, this was fine, but in the general case,
|
||||||
|
there was no way to tell the difference between a string that looked
|
||||||
|
like a name or indirect object reference from an actual name or
|
||||||
|
indirect object reference.
|
||||||
|
|
||||||
|
- PDF strings were not unambiguously represented in the JSON format.
|
||||||
|
The way qpdf JSON v1 represented a string was to try to convert the
|
||||||
|
string to UTF-8. This was done by assuming a string that was not
|
||||||
|
explicitly marked as Unicode was encoded in PDF doc encoding. The
|
||||||
|
problem is that there is not a perfect bidirectional mapping between
|
||||||
|
Unicode and PDF doc encoding, so if a binary string happened to
|
||||||
|
contain characters that couldn't be bidirectionally mapped, there
|
||||||
|
would be no way to get back to the original PDF string. Even when
|
||||||
|
possible, trying to map from the JSON representation of a binary
|
||||||
|
string back to the original string required knowledge of the mapping
|
||||||
|
between PDF doc encoding and Unicode.
|
||||||
|
|
||||||
|
- There was no representation of stream data. If you wanted to extract
|
||||||
|
stream data, you could use :qpdf:ref:`--show-object`, so this wasn't
|
||||||
|
that important for inspection, but it was a blocker for being able
|
||||||
|
to go from JSON back to PDF. qpdf JSON version 2 allows stream data
|
||||||
|
to be included inline as base64-encoded data. There is also an
|
||||||
|
option to write all stream data to external files, which makes it
|
||||||
|
possible to work with very large PDF files in JSON format even with
|
||||||
|
tools that try to read the entire JSON structure into memory.
|
||||||
|
|
||||||
|
- The PDF version from PDF header was not represented in qpdf JSON v1.
|
||||||
|
@ -70,12 +70,14 @@ Python
|
|||||||
qpdf's capabilities with other functionality provided by Python's
|
qpdf's capabilities with other functionality provided by Python's
|
||||||
rich standard library and available modules.
|
rich standard library and available modules.
|
||||||
|
|
||||||
Other Languages
|
Other Languages Starting with version 11.0.0, the :command:`qpdf`
|
||||||
Starting with version 8.3.0, the :command:`qpdf`
|
command-line tool can produce an unambiguous JSON representation of
|
||||||
command-line tool can produce a JSON representation of the PDF file's
|
a PDF file and can also create or update PDF files using this JSON
|
||||||
non-content data. This can facilitate interacting programmatically
|
representation. qpdf versions from 8.3.0 through 10.6.3 had a more
|
||||||
with PDF files through qpdf's command line interface. For more
|
limited JSON output format. The qpdf JSON format makes it possible
|
||||||
information, please see :ref:`json`.
|
to inspect and modify the structure of a PDF file down to the
|
||||||
|
object level from the command-line or from any language that can
|
||||||
|
handle JSON data. Please see :ref:`json` for details.
|
||||||
|
|
||||||
Wrappers
|
Wrappers
|
||||||
The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a
|
The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a
|
||||||
|
@ -122,7 +122,7 @@ entries in ``/W`` above. Each entry consists of one or more fields, the
|
|||||||
first of which is the type of the field. The number of bytes for each
|
first of which is the type of the field. The number of bytes for each
|
||||||
field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
|
field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
|
||||||
is omitted and has the default value. The default value for the field
|
is omitted and has the default value. The default value for the field
|
||||||
type is "``1``". All other default values are "``0``".
|
type is ``1``. All other default values are ``0``.
|
||||||
|
|
||||||
PDF 1.5 has three field types:
|
PDF 1.5 has three field types:
|
||||||
|
|
||||||
|
@ -28,6 +28,13 @@ able to restore edited files to a correct state. The
|
|||||||
arguments. It reads a possibly edited QDF file from standard input and
|
arguments. It reads a possibly edited QDF file from standard input and
|
||||||
writes a repaired file to standard output.
|
writes a repaired file to standard output.
|
||||||
|
|
||||||
|
For another way to work with PDF files in an editor, see :ref:`json`.
|
||||||
|
Using qpdf JSON format allows you to edit the PDF file semantically
|
||||||
|
without having to be concerned about PDF syntax. However, QDF files
|
||||||
|
are actually valid PDF files, so the feedback cycle may be faster if
|
||||||
|
previewing with a PDF reader. Also, since QDF files are valid PDF, you
|
||||||
|
can experiment with all aspects of the PDF file, including syntax.
|
||||||
|
|
||||||
The following attributes characterize a QDF file:
|
The following attributes characterize a QDF file:
|
||||||
|
|
||||||
- All objects appear in numerical order in the PDF file, including when
|
- All objects appear in numerical order in the PDF file, including when
|
||||||
|
@ -27,6 +27,10 @@ executable is available from inside the C++ library using the
|
|||||||
|
|
||||||
- Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h`
|
- Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h`
|
||||||
|
|
||||||
|
- Note: this is unrelated to :qpdf:ref:`--json` but can be combined
|
||||||
|
with it. For more information on qpdf JSON (vs. QPDFJob JSON), see
|
||||||
|
:ref:`json`.
|
||||||
|
|
||||||
- The ``QPDFJob`` C++ API
|
- The ``QPDFJob`` C++ API
|
||||||
|
|
||||||
If you can understand how to use the :command:`qpdf` CLI, you can
|
If you can understand how to use the :command:`qpdf` CLI, you can
|
||||||
|
@ -60,7 +60,8 @@ For a detailed list of changes, please see the file
|
|||||||
- CLI: breaking changes
|
- CLI: breaking changes
|
||||||
|
|
||||||
- The default json output version when :qpdf:ref:`--json` is
|
- The default json output version when :qpdf:ref:`--json` is
|
||||||
specified has been changed from ``1`` to ``latest``.
|
specified has been changed from ``1`` to ``latest``, which is
|
||||||
|
now ``2``.
|
||||||
|
|
||||||
- The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when
|
- The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when
|
||||||
explicitly creating files with weak cryptographic algorithms.
|
explicitly creating files with weak cryptographic algorithms.
|
||||||
@ -100,7 +101,7 @@ For a detailed list of changes, please see the file
|
|||||||
|
|
||||||
- ``qpdf --list-attachments --verbose`` include some additional
|
- ``qpdf --list-attachments --verbose`` include some additional
|
||||||
information about attachments. Additional information about
|
information about attachments. Additional information about
|
||||||
attachments is also included in the ``attachments`` json key
|
attachments is also included in the ``attachments`` JSON key
|
||||||
with ``--json``.
|
with ``--json``.
|
||||||
|
|
||||||
- For encrypted files, ``qpdf --json`` reveals the user password
|
- For encrypted files, ``qpdf --json`` reveals the user password
|
||||||
@ -647,8 +648,8 @@ For a detailed list of changes, please see the file
|
|||||||
passwords from files or standard input than using
|
passwords from files or standard input than using
|
||||||
:samp:`@file` for this purpose.
|
:samp:`@file` for this purpose.
|
||||||
|
|
||||||
- Add some information about attachments to the json output, and
|
- Add some information about attachments to the JSON output, and
|
||||||
added ``attachments`` as an additional json key. The
|
added ``attachments`` as an additional JSON key. The
|
||||||
information included here is limited to the preferred name and
|
information included here is limited to the preferred name and
|
||||||
content stream and a reference to the file spec object. This is
|
content stream and a reference to the file spec object. This is
|
||||||
enough detail for clients to avoid the hassle of navigating a
|
enough detail for clients to avoid the hassle of navigating a
|
||||||
|
Loading…
Reference in New Issue
Block a user