Update documentation for qpdf JSON v2

This commit is contained in:
Jay Berkenbilt 2022-05-30 16:38:17 -04:00
parent b7bbf12e85
commit 0bd908b550
14 changed files with 901 additions and 417 deletions

201
TODO
View File

@ -2,14 +2,13 @@
Next Next
==== ====
Before Release:
* At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs * At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs
* Stay on top of https://github.com/pikepdf/pikepdf/pull/315 * Stay on top of https://github.com/pikepdf/pikepdf/pull/315
* Release qtest with updates to qtest-driver and copy back into qpdf * Release qtest with updates to qtest-driver and copy back into qpdf
In order: Pending changes:
* json v2
Other (do in any order):
* Good C API for json v2 * Good C API for json v2
* QPDFPagesTree -- avoid ever flattening the pages tree. * QPDFPagesTree -- avoid ever flattening the pages tree.
@ -50,180 +49,10 @@ Other (do in any order):
* Rework tests so that nothing is written into the source directory. * Rework tests so that nothing is written into the source directory.
Ideally then the entire build could be done with a read-only Ideally then the entire build could be done with a read-only
source tree. source tree.
* Consider adding fuzzer code for JSON
Soon: Break ground on "Document-level work" Soon: Break ground on "Document-level work"
Output JSON v2
==============
Remaining work:
* Make sure all the information from informational options is
available in the json output.
* --check: add but maybe not by default?
* --show-linearization: add but maybe not by default? Also figure
out whether warnings reported for some of the PDF specs (1.7) are
qpdf problems. This may not be worth adding in the first
increment.
* --show-xref: add
* Consider having --check, --show-encryption, etc., just select the
right keys when in json mode. I don't think I want check on by
default, so that might be different.
* Consider having warnings be included in the json in a "warnings" key
in json mode.
Notes for documentation:
* Find all mentions of json in the manual and update.
* Document typo fix in encrypt in release notes along with any other
non-compatible json 2 changes. Scrutinize all the output to decide
what should change.
* Keys other than "qpdf-v2" are ignored so people can stash their own
stuff. Unknown keys are ignored at other places for future
compatibility. Readers of qpdf json should continue to ignore keys
they don't recognize.
* Change: names are written in canonical form with a leading slash
just as they are treated in the code. In v1, they were written in
PDF syntax in the json file. Example: /text#2fplain in pdf will be
written as /text/plain in json v2 and as /text#2fplain in json v1.
* Document changes to strings, objects, streams, object keys.
* CLI: --json-input, --json-output[=version], --update-from-json. With
--json-input, the input file is a JSON file instead of a PDF file.
It must be complete, meaning that a PDF version must be given, all
streams must have exactly one of data or datafile, and a trailer
dictionary must be present, even if empty.
With --update-from-json, the JSON file updates objects in place. If
updating an old stream, if stream data is omitted, the data remains
untouched. The dictionary is always required. Remember that
QPDFWriter does not preserve object numbers, though --json-output
does. Therefore, if you want to update a PDF with a JSON, the input
to --update-from-json must be the same PDF as the one that
--json-output was run on previously. Otherwise, object numbers won't
match. Show this with an example. When updating,
* Certain fields are ignored when reading the JSON. This includes
maxobjectid, any computed fields in trailer (such as /Size), and all
/Length keys in stream dictionaries. There is no need for the user
to correct, remove, or otherwise worry about any values those keys
might have. The maxobjectid field is present in the original output
to assist with adding new objects to the file.
* JSON strings within PDF objects:
* "n n R" is an indirect object
* "/Name" is a name in canonical form with a leading slash (like
"/text/plain"), not PDF syntax (like "/text#2fplain").
* "b:hex-digits" is a binary string ("b:feff03c0"). Hex digits may be
mixed case. There must be an even number of digits.
* "u:utf-8" is a UTF-8 encoded string ("u:π", "u:\u03c0"). UTF-16
surrogate pairs are allowed. These are all equivalent: "u:🥔",
"u:\ud83e\udd54", "b:FEFFD83EDD54", "b:efbbbff09fa594".
* Both "b:" and "u:" are valid representations of the empty string.
* Anything else is an error
* Document use of --json-input and --json-output together to show
preservation of object numbers. Draw attention to "original object
ID" comments in qdf as another way to show it.
* Document top-level keys of "qpdf-v2" ("pdfversion", "objects",
"maxobjectid") noting that "maxobjectid" is ignored when reading.
* Stream data: "data" is base64-encoded stream data. "datafile" is the
path to a file (relative path recommended but not required)
containing the binary data. As with any PDF representation, the data
must be consistent with the filters. --decode-level is honored by
--json-output.
* Other changes from v1:
* in "objects", keys are "obj:o g R" or "trailer"
* Non-stream objects are dictionaries with a "value" key whose value
is the object. Stream objects are dictionaries with a "stream" key
whose value is {"dict": stream-dictionary}. The "/Length" key is
omitted from the stream dictionary.
* "objectinfo" is gone as it is now possible to tell a stream from a
non-stream directly. To get stream data, use the --json-output
option. Note about how "pages" may cause the pages tree to be
corrected.
For non-streams:
"obj:o g R": {
"value": ...
}
For streams:
"obj:o g R": {
"stream": {
"dict": { ... stream dictionary ... },
"data": "base64-encoded data",
"datafile": "path to base64-encoded data"
}
}
Rationale of "obj:o g R" is that indirect object references are just
"o g R", and so code that wants to resolve one can do so easily by
just prepending "obj:" and not having to parse or split the string.
Having a prefix rather than making the key just "o g R" makes it much
easier to search in the JSON for the definition of an object.
CLI:
Example workflow:
* qpdf in.pdf --json-output pdf.json
* edit pdf.json
* qpdf --json-input pdf.json out.pdf
* qpdf in.pdf --json-output pdf.json
* edit pdf.json keeping only objects that need to be changed
* qpdf in.pdf --update-from-json=pdf.json out.pdf
To modify a single object:
* qpdf in.pdf --json-output pdf.json --json-object=o,g
* edit pdf.json
* qpdf in.pdf --update-from-json=pdf.json out.pdf
Historical note: you can't create a PDF from v1 json because
* The PDF version header is not recorded
* Strings cannot be unambiguously encoded/decoded
* Can't tell string from name from indirect object
* Strings are treated as PDF doc encoding and output as UTF-8, which
doesn't work since multiple PDF doc code points are undefined and
is absurd for binary strings
* There is no representation of stream data
* You can't tell a stream from a dictionary except by looking in both
"object" and "objectinfo".
* Using "n n R" as a key in "objects" and "objectinfo" makes it hard
to search for things when viewing the JSON file in an editor.
QPDFPagesTree QPDFPagesTree
============= =============
@ -256,6 +85,28 @@ sure /Count and /Parent are correct.
refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up
when done. when done.
Possible future JSON enhancements
=================================
* Add to JSON output the information available from a few additional
informational options:
* --check: add but maybe not by default?
* --show-linearization: add but maybe not by default? Also figure
out whether warnings reported for some of the PDF specs (1.7) are
qpdf problems. This may not be worth adding in the first
increment.
* --show-xref: add
* Consider having --check, --show-encryption, etc., just select the
right keys when in json mode. I don't think I want check on by
default, so that might be different.
* Consider having warnings be included in the json in a "warnings" key
in json mode.
QPDFJob QPDFJob
======= =======

View File

@ -271,6 +271,7 @@
"mkinstalldirs", "mkinstalldirs",
"mklink", "mklink",
"moddate", "moddate",
"modifyannotations",
"monoseq", "monoseq",
"msvc", "msvc",
"msvcrt", "msvcrt",

View File

@ -112,8 +112,11 @@ class QPDF
// Create a PDF from an input source that contains JSON as written // Create a PDF from an input source that contains JSON as written
// by writeJSON (or qpdf --json-output, version 2 or higher). The // by writeJSON (or qpdf --json-output, version 2 or higher). The
// JSON must be a complete representation of a PDF. See "QPDF JSON // JSON must be a complete representation of a PDF. See "qpdf
// Format" in the manual for details. // JSON" in the manual for details. The input JSON may be
// arbitrarily large. QPDF does not load stream data into memory
// for more than one stream at a time, even if the stream data is
// specified inline.
QPDF_DLL QPDF_DLL
void createFromJSON(std::string const& json_file); void createFromJSON(std::string const& json_file);
QPDF_DLL QPDF_DLL
@ -122,24 +125,40 @@ class QPDF
// Update a PDF from an input source that contains JSON in the // Update a PDF from an input source that contains JSON in the
// same format as is written by writeJSON (or qpdf --json-output, // same format as is written by writeJSON (or qpdf --json-output,
// version 2 or higher). Objects in the PDF and not in the JSON // version 2 or higher). Objects in the PDF and not in the JSON
// are not modified. See "QPDF JSON Format" in the manual for // are not modified. See "qpdf JSON" in the manual for details. As
// details. // with createFromJSON, the input JSON may be arbitrarily large.
QPDF_DLL QPDF_DLL
void updateFromJSON(std::string const& json_file); void updateFromJSON(std::string const& json_file);
QPDF_DLL QPDF_DLL
void updateFromJSON(std::shared_ptr<InputSource>); void updateFromJSON(std::shared_ptr<InputSource>);
// Write qpdf json format. The only supported version is 2. If // Write qpdf json format to the pipeline "p". The only supported
// wanted_objects is empty, write all objects. Otherwise, write // version is 2. The finish() method is called on the pipeline at
// only objects whose keys are in wanted_objects. Keys may be // the end. The decode_level parameter controls which streams are
// either "trailer" or of the form "obj:n n R". Invalid keys are // uncompressed in the JSON. Use qpdf_dl_none to preserve all
// ignored. // stream data exactly as it appears in the input. The possible
// values for json_stream_data can be found in qpdf/Constants.h
// and correspond to the --json-stream-data command-line argument.
// If json_stream_data is qpdf_sj_file, file_prefix must be
// specified. Each stream will be written to a file whose path is
// constructed by appending "-nnn" to file_prefix, where "nnn" is
// the object number (not zero-filled). If wanted_objects is
// empty, write all objects. Otherwise, write only objects whose
// keys are in wanted_objects. Keys may be either "trailer" or of
// the form "obj:n n R". Invalid keys are ignored. This
// corresponds to the --json-object command-line argument.
//
// QPDF is efficient with regard to memory when writing, allowing
// you to write arbitrarily large PDF files to a pipeline. You can
// use a pipeline like Pl_Buffer or Pl_String to capture the JSON
// output in memory, but do so with caution as this will allocate
// enough memory to hold the entire PDF file.
QPDF_DLL QPDF_DLL
void writeJSON( void writeJSON(
int version, int version,
Pipeline*, Pipeline* p,
qpdf_stream_decode_level_e, qpdf_stream_decode_level_e decode_level,
qpdf_json_stream_data_e, qpdf_json_stream_data_e json_stream_data,
std::string const& file_prefix, std::string const& file_prefix,
std::set<std::string> wanted_objects); std::set<std::string> wanted_objects);

View File

@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c
include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1 include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1
job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d
libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df
libqpdf/qpdf/auto_job_help.hh c1cc99f6fe17285ee5e40730f6280e37d17da1a5f408086ce34e01af121df7ad libqpdf/qpdf/auto_job_help.hh 3aaae4cde004e5314d3ac6d554da575e40209c0f0611f6a308957986f9c7967b
libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd
libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297 libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297
libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335 libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335
libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc
manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580 manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580
manual/cli.rst 82ead389c03bbf5e0498bd0571a11dc06544d591f4e4454c00322e3473fc556d manual/cli.rst e3f4331befa17450e0d0fff87569722a5aab42ea619ef64f0a3a04e1f99ed65c

View File

@ -817,4 +817,5 @@ QPDF::writeJSON(
JSON::writeDictionaryClose(p, first_qpdf, 1); JSON::writeDictionaryClose(p, first_qpdf, 1);
JSON::writeDictionaryClose(p, first, 0); JSON::writeDictionaryClose(p, first, 0);
*p << "\n"; *p << "\n";
p->finish();
} }

View File

@ -70,6 +70,9 @@ ap.addOptionHelp("--copyright", "help", "show copyright information", R"(Display
ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The
default provider is shown first. default provider is shown first.
)"); )");
ap.addOptionHelp("--job-json-help", "help", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
--job-json-file.
)");
ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not
directly related to the operation it is performing. directly related to the operation it is performing.
)"); )");
@ -87,11 +90,11 @@ ap.addOptionHelp("--verbose", "general", "print additional information", R"(Outp
doing, including information about files created and operations doing, including information about files created and operations
performed. performed.
)"); )");
ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files.
)");
} }
static void add_help_2(QPDFArgParser& ap) static void add_help_2(QPDFArgParser& ap)
{ {
ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files.
)");
ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were
encountered, qpdf still exits with exit status 3. encountered, qpdf still exits with exit status 3.
Use --warning-exit-0 with --no-warn to completely ignore Use --warning-exit-0 with --no-warn to completely ignore
@ -172,12 +175,12 @@ companion tool "fix-qdf" can be used to repair hand-edited QDF
files. QDF is a feature specific to the qpdf tool. Please see files. QDF is a feature specific to the qpdf tool. Please see
the "QDF Mode" chapter in the manual. the "QDF Mode" chapter in the manual.
)"); )");
ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object
had in the original file.
)");
} }
static void add_help_3(QPDFArgParser& ap) static void add_help_3(QPDFArgParser& ap)
{ {
ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object
had in the original file.
)");
ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n] ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n]
Setting --compress-streams=n prevents qpdf from compressing Setting --compress-streams=n prevents qpdf from compressing
@ -188,9 +191,11 @@ ap.addOptionHelp("--decode-level", "transformation", "control which streams to u
When uncompressing streams, control which types of compression When uncompressing streams, control which types of compression
schemes should be uncompressed: schemes should be uncompressed:
- none: don't uncompress anything. This is the default with --json-output. - none: don't uncompress anything. This is the default with
--json-output.
- generalized: uncompress streams compressed with a - generalized: uncompress streams compressed with a
general-purpose compression algorithm. This is the default. general-purpose compression algorithm. This is the default
except when --json-output is given.
- specialized: in addition to generalized, also uncompress - specialized: in addition to generalized, also uncompress
streams compressed with a special-purpose but non-lossy streams compressed with a special-purpose but non-lossy
compression scheme compression scheme
@ -290,13 +295,13 @@ from the resulting set, not based on the original page numbers.
ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of
the PDF, causing the PDF to render differently from the original. the PDF, causing the PDF to render differently from the original.
)"); )");
}
static void add_help_4(QPDFArgParser& ap)
{
ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] -- ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] --
Run qpdf --help=page-selection for details. Run qpdf --help=page-selection for details.
)"); )");
}
static void add_help_4(QPDFArgParser& ap)
{
ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n] ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n]
Collate rather than concatenate pages specified with --pages. Collate rather than concatenate pages specified with --pages.
@ -460,14 +465,14 @@ ap.addOptionHelp("--assemble", "encryption", "restrict document assembly", R"(--
Enable/disable document assembly (rotation and reordering of Enable/disable document assembly (rotation and reordering of
pages). This option is not available with 40-bit encryption. pages). This option is not available with 40-bit encryption.
)"); )");
}
static void add_help_5(QPDFArgParser& ap)
{
ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n] ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n]
Enable/disable text/graphic extraction for purposes other than Enable/disable text/graphic extraction for purposes other than
accessibility. accessibility.
)"); )");
}
static void add_help_5(QPDFArgParser& ap)
{
ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n] ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n]
Enable/disable whether filling form fields is allowed even if Enable/disable whether filling form fields is allowed even if
@ -638,6 +643,9 @@ ap.addOptionHelp("--remove-attachment", "attachments", "remove an embedded file"
Remove an embedded file using its key. Get the key with Remove an embedded file using its key. Get the key with
--list-attachments. --list-attachments.
)"); )");
}
static void add_help_6(QPDFArgParser& ap)
{
ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date
format specification, which is "D:yyyymmddhhmmssz" where "z" is format specification, which is "D:yyyymmddhhmmssz" where "z" is
either literally upper case "Z" for UTC or a timezone offset in either literally upper case "Z" for UTC or a timezone offset in
@ -650,9 +658,6 @@ Examples:
- D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m. - D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m.
- D:20210207211528Z February 7, 2021 at 21:15:28 UTC - D:20210207211528Z February 7, 2021 at 21:15:28 UTC
)"); )");
}
static void add_help_6(QPDFArgParser& ap)
{
ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its
terminating "--". terminating "--".
)"); )");
@ -747,14 +752,14 @@ the linearization hint tables are correct.
)"); )");
ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables. ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables.
)"); )");
}
static void add_help_7(QPDFArgParser& ap)
{
ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object
locations in the file) in a human-readable form. This is locations in the file) in a human-readable form. This is
especially useful for files with cross-reference streams, which especially useful for files with cross-reference streams, which
are stored in a binary format. are stored in a binary format.
)"); )");
}
static void add_help_7(QPDFArgParser& ap)
{
ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]} ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]}
Show the contents of the given object. This is especially useful Show the contents of the given object. This is especially useful
@ -814,21 +819,20 @@ This option is repeatable. If given, only specified objects will
be shown in the "objects" key of the JSON output. Otherwise, all be shown in the "objects" key of the JSON output. Otherwise, all
objects will be shown. objects will be shown.
)"); )");
ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
--job-json-file.
)");
ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file} ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file}
Control whether streams in json output should be omitted, When used with --json-output, this option controls whether
written inline (base64-encoded) or written to a file. If "file" streams in json output should be omitted, written inline
is chosen, the file will be the name of the input file appended (base64-encoded) or written to a file. If "file" is chosen, the
with -nnn where nnn is the object number. The prefix can be file will be the name of the output file appended with -nnn where
overridden with --json-stream-prefix. nnn is the object number. The prefix can be overridden with
--json-stream-prefix.
)"); )");
ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix
When --json-stream-data=file is given, override the input file When used with --json-output, --json-stream-data=file-prefix
name as the prefix for stream data files. Whatever is given here sets the prefix for stream data files, overriding the default,
which is to use the output file name. Whatever is given here
will be appended with -nnn to create the name of the file that will be appended with -nnn to create the name of the file that
will contain the data for the stream stream in object nnn. will contain the data for the stream stream in object nnn.
)"); )");
@ -836,19 +840,19 @@ ap.addOptionHelp("--json-output", "json", "serialize to JSON", R"(--json-output[
The output file will be qpdf JSON format at the given version. The output file will be qpdf JSON format at the given version.
"version" may be a specific version or "latest" (the default). "version" may be a specific version or "latest" (the default).
Version 1 is not supported. See also --json-stream-data, The only supported version is 2. See also --json-stream-data,
--json-stream-prefix, and --decode-level. --json-stream-prefix, and --decode-level.
)"); )");
ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as
written by qpdf --json-output. See the "QPDF JSON Format" written by qpdf --json-output. See the "qpdf JSON Format"
section of the manual for information about how to use this section of the manual for information about how to use this
option. option.
)"); )");
ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file
Update a PDF file from a JSON file. Please see the "QPDF JSON Update a PDF file from a JSON file. Please see the "qpdf JSON"
Format" section of the manual for information about how to use chapter of the manual for information about how to use this
this option. option.
)"); )");
} }
static void add_help_8(QPDFArgParser& ap) static void add_help_8(QPDFArgParser& ap)

View File

@ -171,7 +171,9 @@ Related Options
equivalent command-line arguments were supplied. It can be repeated equivalent command-line arguments were supplied. It can be repeated
and mixed freely with other options. Run ``qpdf`` with and mixed freely with other options. Run ``qpdf`` with
:qpdf:ref:`--job-json-help` for a description of the job JSON input :qpdf:ref:`--job-json-help` for a description of the job JSON input
file format. For more information, see :ref:`qpdf-job`. file format. For more information, see :ref:`qpdf-job`. Note that
this is unrelated to :qpdf:ref:`--json` but may be combined with
it.
.. _exit-status: .. _exit-status:
@ -341,6 +343,17 @@ Related Options
itself. The default provider is always listed first. See itself. The default provider is always listed first. See
:ref:`crypto` for more information about crypto providers. :ref:`crypto` for more information about crypto providers.
.. qpdf:option:: --job-json-help
.. help: show format of job JSON
Describe the format of the QPDFJob JSON input used by
--job-json-file.
Describe the format of the QPDFJob JSON input used by
:qpdf:ref:`--job-json-file`. For more information about QPDFJob,
see :ref:`qpdf-job`.
.. _general-options: .. _general-options:
General Options General Options
@ -852,9 +865,11 @@ Related Options
When uncompressing streams, control which types of compression When uncompressing streams, control which types of compression
schemes should be uncompressed: schemes should be uncompressed:
- none: don't uncompress anything. This is the default with --json-output. - none: don't uncompress anything. This is the default with
--json-output.
- generalized: uncompress streams compressed with a - generalized: uncompress streams compressed with a
general-purpose compression algorithm. This is the default. general-purpose compression algorithm. This is the default
except when --json-output is given.
- specialized: in addition to generalized, also uncompress - specialized: in addition to generalized, also uncompress
streams compressed with a special-purpose but non-lossy streams compressed with a special-purpose but non-lossy
compression scheme compression scheme
@ -875,7 +890,8 @@ Related Options
``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define
generalized filters as those to be used for general-purpose generalized filters as those to be used for general-purpose
compression or encoding, as opposed to filters specifically compression or encoding, as opposed to filters specifically
designed for image data. This is the default. designed for image data. This is the default except when
:qpdf:ref:`--json-output` is given.
- :samp:`specialized`: in addition to generalized, decode streams - :samp:`specialized`: in addition to generalized, decode streams
with supported non-lossy specialized filters; currently this is with supported non-lossy specialized filters; currently this is
@ -3126,8 +3142,9 @@ Related Options
is usually but not always equal to the file name and is needed by is usually but not always equal to the file name and is needed by
some of the other options. See also :ref:`attachments`. Note that some of the other options. See also :ref:`attachments`. Note that
this option displays dates in PDF timestamp syntax. When attachment this option displays dates in PDF timestamp syntax. When attachment
information is included in json output (see :ref:`--json`), dates information is included in json output in the ``"attachments"`` key
are shown in ISO-8601 format. (see :ref:`--json`), dates are shown (just within that object) in
ISO-8601 format.
.. qpdf:option:: --show-attachment=key .. qpdf:option:: --show-attachment=key
@ -3169,14 +3186,11 @@ Related Options
Generate a JSON representation of the file. This is described in Generate a JSON representation of the file. This is described in
depth in :ref:`json`. The version parameter can be used to specify depth in :ref:`json`. The version parameter can be used to specify
which version of the qpdf JSON format should be output. The only which version of the qpdf JSON format should be output. The version
supported value is ``1``, but it's possible that a new JSON output number be a number or ``latest``. The default is ``latest``. As of
version will be added in a future version. You can also specify qpdf 11, the latest version is ``2``. If you have code that reads
``latest`` to use the latest JSON version. For backward qpdf JSON output, you can tell what version of the JSON output you
compatibility, the default value will remain ``1`` until qpdf have from the ``"version"`` key in the output. Use the
version 11, after which point it will become ``latest``. In all
case, you can tell what version of the JSON output you have from
the ``"version"`` key in the output. Use the
:qpdf:ref:`--json-help` option to get a description of the JSON :qpdf:ref:`--json-help` option to get a description of the JSON
object. object.
@ -3189,11 +3203,11 @@ Related Options
containing descriptive text. containing descriptive text.
Describe the format of the JSON output by writing to standard Describe the format of the JSON output by writing to standard
output a JSON object with the same structure with the same keys as output a JSON object with the same structure as the JSON generated
the JSON generated by qpdf. In the output written by by qpdf. In the output written by ``--json-help``, each key's value
``--json-help``, each key's value is a description of the key. The is a description of the key. The specific contract guaranteed by
specific contract guaranteed by qpdf in its JSON representation is qpdf in its JSON representation is explained in more detail in the
explained in more detail in the :ref:`json`. :ref:`json`.
.. qpdf:option:: --json-key=key .. qpdf:option:: --json-key=key
@ -3216,53 +3230,50 @@ Related Options
be shown in the "objects" key of the JSON output. Otherwise, all be shown in the "objects" key of the JSON output. Otherwise, all
objects will be shown. objects will be shown.
This option is repeatable. If given, only specified objects will This option is repeatable. If given, only specified objects will be
be shown in the "``objects``" key of the JSON output. Otherwise, all shown in the ``"objects"`` key of the JSON output. Otherwise, all
objects will be shown. objects will be shown. For qpdf JSON version 1, this also affects
the ``"objectinfo"`` key, which is not present in version 2. This
.. qpdf:option:: --job-json-help option may be used with :qpdf:ref:`--json` and also with
:qpdf:ref:`--json-output`.
.. help: show format of job JSON
Describe the format of the QPDFJob JSON input used by
--job-json-file.
Describe the format of the QPDFJob JSON input used by
:qpdf:ref:`--job-json-file`. For more information about QPDFJob,
see :ref:`qpdf-job`.
.. qpdf:option:: --json-stream-data={none|inline|file} .. qpdf:option:: --json-stream-data={none|inline|file}
.. help: how to handle streams in json output .. help: how to handle streams in json output
Control whether streams in json output should be omitted, When used with --json-output, this option controls whether
written inline (base64-encoded) or written to a file. If "file" streams in json output should be omitted, written inline
is chosen, the file will be the name of the input file appended (base64-encoded) or written to a file. If "file" is chosen, the
with -nnn where nnn is the object number. The prefix can be file will be the name of the output file appended with -nnn where
overridden with --json-stream-prefix. nnn is the object number. The prefix can be overridden with
--json-stream-prefix.
Control whether streams in json output should be omitted, written When used with :qpdf:ref:`--json-output`, this option controls
inline (base64-encoded) or written to a file. If ``file`` is whether streams in JSON output should be omitted, written inline
chosen, the file will be the name of the input file appended with (base64-encoded) or written to a file. If ``file`` is chosen, the
:samp:`-{nnn}` where :samp:`{nnn}` is the object number. The prefix file will be the name of the output file appended with
can be overridden with :qpdf:ref:`--json-stream-prefix`. This :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The stream
option only applies when used with :qpdf:ref:`--json-output`. data file prefix can be overridden with
:qpdf:ref:`--json-stream-prefix`. This option only applies when
used with :qpdf:ref:`--json-output`.
.. qpdf:option:: --json-stream-prefix=file-prefix .. qpdf:option:: --json-stream-prefix=file-prefix
.. help: prefix for json stream data files .. help: prefix for json stream data files
When --json-stream-data=file is given, override the input file When used with --json-output, --json-stream-data=file-prefix
name as the prefix for stream data files. Whatever is given here sets the prefix for stream data files, overriding the default,
which is to use the output file name. Whatever is given here
will be appended with -nnn to create the name of the file that will be appended with -nnn to create the name of the file that
will contain the data for the stream stream in object nnn. will contain the data for the stream stream in object nnn.
When :qpdf:ref:`--json-stream-data` is given with the value When used with :qpdf:ref:`--json-output`,
``file``, override the input file name as the prefix for stream ``--json-stream-data=file-prefix`` sets the prefix for stream data
data files. Whatever is given here will be appended with files, overriding the default, which is to use the output file
:samp:`-{nnn}` to create the name of the file that will contain the name. Whatever is given here will be appended with :samp:`-{nnn}`
data for the stream stream in object :samp:`{nnn}`. This to create the name of the file that will contain the data for the
option only applies when used with :qpdf:ref:`--json-output`. stream stream in object :samp:`{nnn}`. This option only applies
when used with :qpdf:ref:`--json-output`.
.. qpdf:option:: --json-output[=version] .. qpdf:option:: --json-output[=version]
@ -3270,44 +3281,45 @@ Related Options
The output file will be qpdf JSON format at the given version. The output file will be qpdf JSON format at the given version.
"version" may be a specific version or "latest" (the default). "version" may be a specific version or "latest" (the default).
Version 1 is not supported. See also --json-stream-data, The only supported version is 2. See also --json-stream-data,
--json-stream-prefix, and --decode-level. --json-stream-prefix, and --decode-level.
The output file will be qpdf JSON format at the given version. The output file, instead of being a PDF file, will be a JSON file
``version`` may be a specific version or ``latest`` (the default). in qpdf JSON format at the given version. ``version`` may be a
Version 1 is not supported. See also :qpdf:ref:`--json-stream-data` specific version or ``latest`` (the default). The only supported
and :qpdf:ref:`--json-stream-prefix`. The default decode level is version is 2. See also :qpdf:ref:`--json-stream-data` and
``none``, but you can override it with :qpdf:ref:`--decode-level`. :qpdf:ref:`--json-stream-prefix`. When this option is specified,
If you want to look at the contents of streams easily as you would the default decode level for stream data is ``none``, but you can
in QDF mode (see :ref:`qdf`), you can use override it with :qpdf:ref:`--decode-level`. If you want to look at
``--decode-level=generalized`` and ``--json-stream-data=file`` for the contents of streams easily as you would in QDF mode (see
a convenient way to do that. :ref:`qdf`), you can use ``--decode-level=generalized`` and
``--json-stream-data=file`` for a convenient way to do that.
.. qpdf:option:: --json-input .. qpdf:option:: --json-input
.. help: input file is qpdf JSON .. help: input file is qpdf JSON
Treat the input file as a JSON file in qpdf JSON format as Treat the input file as a JSON file in qpdf JSON format as
written by qpdf --json-output. See the "QPDF JSON Format" written by qpdf --json-output. See the "qpdf JSON Format"
section of the manual for information about how to use this section of the manual for information about how to use this
option. option.
Treat the input file as a JSON file in qpdf JSON format as written Treat the input file as a JSON file in qpdf JSON format as written
by ``qpdf --json-output``. The input file must be complete and by ``qpdf --json-output``. The input file must be complete and
include all stream data. For information about converting between include all stream data. For information about converting between
PDF and JSON, please see :ref:`qpdf-json`. PDF and JSON, please see :ref:`json`.
.. qpdf:option:: --update-from-json=qpdf-json-file .. qpdf:option:: --update-from-json=qpdf-json-file
.. help: update a PDF from qpdf JSON .. help: update a PDF from qpdf JSON
Update a PDF file from a JSON file. Please see the "QPDF JSON Update a PDF file from a JSON file. Please see the "qpdf JSON"
Format" section of the manual for information about how to use chapter of the manual for information about how to use this
this option. option.
This option updates a PDF file from a qpdf JSON file. For a This option updates a PDF file from the specified qpdf JSON file.
information about how to use this option, please see For a information about how to use this option, please see
:ref:`qpdf-json`. :ref:`json`.
.. _test-options: .. _test-options:
@ -3420,7 +3432,7 @@ Related Options
This is used by qpdf's test suite to check consistency between the This is used by qpdf's test suite to check consistency between the
output of ``qpdf --json`` and the output of ``qpdf --json-help``. output of ``qpdf --json`` and the output of ``qpdf --json-help``.
This option causes an extra copy of the generated json to appear in This option causes an extra copy of the generated JSON to appear in
memory and is therefore unsuitable for use with large files. This memory and is therefore unsuitable for use with large files. This
is why it's also not on by default. is why it's also not on by default.

View File

@ -242,7 +242,7 @@ the current file position. If the token is a not either a dictionary or
array opener, an object is immediately constructed from the single token array opener, an object is immediately constructed from the single token
and the parser returns. Otherwise, the parser iterates in a special mode and the parser returns. Otherwise, the parser iterates in a special mode
in which it accumulates objects until it finds a balancing closer. in which it accumulates objects until it finds a balancing closer.
During this process, the "``R``" keyword is recognized and an indirect During this process, the ``R`` keyword is recognized and an indirect
``QPDFObjectHandle`` may be constructed. ``QPDFObjectHandle`` may be constructed.
The ``QPDF::resolve()`` method, which is used to resolve an indirect The ``QPDF::resolve()`` method, which is used to resolve an indirect
@ -280,15 +280,15 @@ file.
it is looking before the last ``%%EOF``. After getting to ``trailer`` it is looking before the last ``%%EOF``. After getting to ``trailer``
keyword, it invokes the parser. keyword, it invokes the parser.
- The parser sees "``<<``", so it calls itself recursively in - The parser sees ``<<``, so it calls itself recursively in
dictionary creation mode. dictionary creation mode.
- In dictionary creation mode, the parser keeps accumulating objects - In dictionary creation mode, the parser keeps accumulating objects
until it encounters "``>>``". Each object that is read is pushed onto until it encounters ``>>``. Each object that is read is pushed onto
a stack. If "``R``" is read, the last two objects on the stack are a stack. If ``R`` is read, the last two objects on the stack are
inspected. If they are integers, they are popped off the stack and inspected. If they are integers, they are popped off the stack and
their values are used to construct an indirect object handle which is their values are used to construct an indirect object handle which is
then pushed onto the stack. When "``>>``" is finally read, the stack then pushed onto the stack. When ``>>`` is finally read, the stack
is converted into a ``QPDF_Dictionary`` which is placed in a is converted into a ``QPDF_Dictionary`` which is placed in a
``QPDFObjectHandle`` and returned. ``QPDFObjectHandle`` and returned.

View File

@ -1,6 +1,9 @@
.. cSpell:ignore moddifyannotations
.. cSpell:ignore feff
.. _json: .. _json:
QPDF JSON qpdf JSON
========= =========
.. _json-overview: .. _json-overview:
@ -8,27 +11,540 @@ QPDF JSON
Overview Overview
-------- --------
Beginning with qpdf version 8.3.0, the :command:`qpdf` Beginning with qpdf version 11.0.0, the qpdf library and command-line
command-line program can produce a JSON representation of the program can produce a JSON representation of the in a PDF file. qpdf
non-content data in a PDF file. It includes a dump in JSON format of all version 11 introduces JSON format version 2. Prior to qpdf 11,
objects in the PDF file excluding the content of streams. This JSON versions 8.3.0 onward had a more limited JSON representation
representation makes it very easy to look in detail at the structure of accessible only from the command-line. For details on what changed,
a given PDF file, and it also provides a great way to work with PDF see :ref:`json-v2-changes`. The rest of this chapter documents qpdf
files programmatically from the command-line in languages that can't JSON version 2.
call or link with the qpdf library directly. Note that stream data can
be extracted from PDF files using other qpdf command-line options. Please note: this chapter discusses *qpdf JSON format*, which
represents the contents of a PDF file. This is distinct from the
*QPDFJob JSON format* which provides a higher-level interface
interacting with qpdf the way the command-line tool does. For
information about that, see :ref:`qpdf-job`.
The qpdf JSON format is specific to qpdf. There are two ways to use
qpdf JSON:
- The :qpdf:ref:`--json` command-ine flag causes creation of a JSON
representation of all the objects in a PDF file, excluding stream
data. This includes an unambiguous representation of the PDF object
structure and also provides JSON-formatted summaries of other
information about the file. This functionality is built into
``QPDFJob`` and can be accessed from the ``qpdf`` command-line tool
or from the ``QPDFJob`` C or C++ API.
- qpdf can create a JSON file that completely represents a PDF file.
You can think of this as using JSON as an *alternative syntax* for
representing a PDF file. Using qpdf JSON, it is possible to
convert a PDF file to JSON, manipulate the structure or contents of
the objects at a low level, and convert the results back to a PDF
file. This functionality can be accessed from the command-line with
the :qpdf:ref:`--json-output`, :qpdf:ref:`--json-input`, and
:qpdf:ref:`--update-from-json` flags, or from the API using the
``QPDF::writeJSON``, ``QPDF::createFromJSON``, and
``QPDF::updateFromJSON`` methods.
.. _json-terminology:
JSON Terminology
----------------
Notes about terminology:
- In JavaScript and JSON, that thing that has keys and values is
typically called an *object*.
- In PDF, that thing that has keys and values is typically called a
*dictionary*. An *object* is a PDF object such as integer, real,
boolean, null, string, array, dictionary, or stream.
- Some languages that use JSON call an *object* a *dictionary*, a
*map*, or a *hash*.
- Sometimes, it's called on *object* if it has fixed keys and a
*dictionary* if it has variable keys.
This manual is not entirely consistent about its use of *dictionary*
vs. *object* because sometimes one term or another is clearer in
context. Just be aware of the ambiguity when reading the manual. We
frequently use the term *dictionary* to refer to a JSON object because
of the consistency with PDF terminology.
.. _what-qpdf-json-is-not:
What qpdf JSON is not
---------------------
Please note that qpdf JSON offers a convenient syntax for manipulating
PDF files at a low level using JSON syntax. JSON syntax is much easier
to work with than native PDF syntax, and there are good JSON libraries
in virtually every commonly used programming language. Working with
PDF objects in JSON removes the need to worry about stream lengths,
cross reference tables, and PDF-specific representations of Unicode or
binary strings that appear outside of content streams. It does not
eliminate the need to understand the semantic structure of PDF files.
Working with qpdf JSON still requires familiarity with the PDF
specification.
In particular, qpdf JSON *does not* provide any of the following
capabilities:
- Text extraction. While you could use qpdf JSON syntax to navigate to
a page's content streams and font structures, text within pages is
still encoded using PDF syntax within content streams, and there is
no assistance for text extraction.
- Reflowing text, document structure. qpdf JSON does not add any new
information or insight into the content of PDF files. If you have a
PDF file that lacks any structural information, qpdf JSON won't help
you solve any of those problems.
This is what we mean when we say that JSON provides a *alternative
syntax* for working with PDF data. Semantically, it is identical to
native PDF.
.. _qpdf-json: .. _qpdf-json:
QPDF JSON Format qpdf JSON Format
---------------- ----------------
XXX Write this. This section describes how qpdf represents PDF objects in JSON format.
It also describes how to work with qpdf JSON to create or
modify PDF files.
.. _json.objects:
qpdf JSON Object Representation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section describes the representation of PDF objects in qpdf JSON
version 2. PDF objects are represented within the ``"objects"``
dictionary of a qpdf JSON file. This is true both for PDF serialized
to JSON (:qpdf:ref:`--json-output`, ``QPDF::writeJSON``) or objects as
they appear in the output of ``qpdf`` with the :qpdf:ref:`--json`
option.
Each key in the ``"objects"`` dictionary is either ``"trailer"`` or a
string of the form ``"obj:O G R"`` where ``O`` and ``G`` are the
object and generation numbers and ``R`` is the literal string ``R``.
This is the PDF syntax for the indirect object reference prepended by
``obj:``. The value, representing the object itself, is a JSON object
whose structure is described below.
Top-level Stream Objects
Stream objects are represented as a JSON object with the single key
``"stream"``. The stream object has a key called ``"dict"`` whose
value is the stream dictionary as an object value (described below)
with the ``"/Length"`` key omitted. Other keys are determined by the
value for json stream data (:qpdf:ref:`--json-stream-data`, or a
parameter of type ``qpdf_json_stream_data_e``) as follows:
- ``none``: stream data is not represented; no other keys are
present
- ``inline``: the stream data appears as a base64-encoded string as
the value of the ``"data"`` key
- ``file``: the stream data is written to a file, and the path to
the file is stored in the ``"datafile"`` key. A relative path is
interpreted as relative to the current directory when qpdf is
invoked.
Keys other than ``"dict"``, ``"data"``, and ``"datafile"`` are
ignored. This is primarily for future compatibility in case a newer
version of qpdf includes additional information.
As with the native PDF representation, the stream data must be
consistent with whatever filters and decode parameters are specified
in the stream dictionary.
Top-level Non-stream Objects
Non-stream objects are represented as a dictionary with the single
key ``"value"``. Other keys are ignored for future compatibility.
The value's structure is described in "Object Values" below.
Note: in files that use object streams, the trailer "dictionary" is
actually a stream, but in the JSON representation, the value of the
``"trailer"`` key is always written as a dictionary (with a
``"value"`` key like other non-stream objects). There will also be a
a stream object whose key is the object ID of the cross-reference
stream, even though this stream will generally be unreferenced. This
makes it possible to assume ``"trailer"`` points to a dictionary
without having to consider whether the file uses object streams or
not. It is also consistent with how ``QPDF::getTrailer`` behaves in
the C++ API.
Object Values
Within ``"value"`` or ``"stream"."dict"``, PDF objects are
represented as follows:
- Objects of type Boolean or null are represented as JSON objects of
the same type.
- Objects that are numeric are represented as numeric in the JSON
without regard to precision. Internally, qpdf stores numeric
values as strings, so qpdf will preserve arbitrary precision
numerical values when reading and writing JSON. It is likely that
other JSON readers and writers will have implementation-dependent
ways of handling numerical values that are out of range.
- Name objects are represented as JSON strings that start with ``/``
and are followed by the PDF name in canonical form with all PDF
syntax resolved. For example, the name whose canonical form (per
the PDF specification) is ``text/plain`` would be represented in
JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``.
- Indirect object references are represented as JSON strings that
look like a PDF indirect object reference and have the form ``"O G
R"`` where ``O`` and ``G`` are the object and generation numbers
and ``R`` is the literal string ``R``. For example, ``"3 0 R"``
would represent a reference to the object with object ID 3 and
generation 0.
- PDF strings are represented as JSON strings in one of two ways:
- ``"u:utf8-encoded-string"``: this format is used when the PDF
string can be unambiguously represented as a Unicode string and
contains no unprintable characters. This is the case whether the
input string is encoded as UTF-16, UTF-8 (as allowed by PDF
2.0), or PDF doc encoding. Strings are only represented this way
if they can be encoded without loss of information.
- ``"b:hex-string"``: this format is used to represent any binary
string value that can't be represented as a Unicode string.
``hex-string`` must have an even number of characters that range
from ``a`` through ``f``, ``A`` through ``F``, or ``0`` through
``9``.
qpdf writes empty strings as ``"u:"``, but both ``"b:"`` and
``"u:"`` are valid representations of the empty string.
There is full support for UTF-16 surrogate pairs. Binary strings
encoded with ``"b:..."`` are the internal PDF representations.
As such, the following are equivalent:
- ``"u:\ud83e\udd54"`` -- representation of U+1F954 as a surrogate
pair in JSON syntax
- ``"b:FEFFD83EDD54"`` -- representation of U+1F954 as the bytes
of a UTF-16 string in PDF syntax with the leading ``FEFF``
indicating UTF-16
- ``"b:efbbbff09fa594"`` -- representation of U+1F954 as the
bytes of a UTF-8 string in PDF syntax (as allowed by PDF 2.0)
with the leading ``EF``, ``BB``, ``BF`` sequence (which is just
UTF-8 encoding of ``FEFF``).
- A JSON string whose contents are ``u:`` followed by the UTF-8
representation of U+1F954. This is the potato emoji.
Unfortunately, I am not able to render it in the PDF version
of this manual.
- PDF arrays are represented as JSON arrays of objects as described
above
- PDF dictionaries are represented as JSON objects whose keys are
the string representations of names and whose values are
representations of PDF objects.
.. _json.output:
qpdf JSON Output
~~~~~~~~~~~~~~~~
The format of the JSON written by qpdf's :qpdf:ref:`--json-output`
flag or the ``QPDF::writeJSON`` API call is a JSON object consisting
of a single key: ``"qpdf-v2"``. Any other top-level keys are ignored.
While unknown keys in other places are ignored for future
compatibility, in this case, ignoring other top-level keys is an
explicit decision to allow users to include other keys for their own
use. No new top-level keys will be added in JSON version 2.
The ``"qpdf-v2"`` key points to a JSON object with the following keys:
- ``"pdfversion"`` -- a string containing PDF version as indicated in
the PDF header (e.g. ``"1.7"``, ``"2.0"``)
- ``"maxobjectid"`` -- a number indicating the object ID of the
highest numbered object in the file. This is provided to make it
easier for software that wants to add new objects to the file as you
can safely start with one above that number when creating new
objects. Note that the value of ``"maxobjectid"`` may be higher than
the actual maximum object that appears in the input PDF since it
takes into consideration any dangling indirect object references
from the original file. This prevents you from unwittingly creating
an object that doesn't exist but that is referenced, which may have
unintended side effects. (The PDF specification explicitly allows
dangling references and says to treat them as nulls. This can happen
if objects are removed from a PDF file.)
- ``"objects"`` -- the actual PDF objects as described in
:ref:`json.objects`.
Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
As such, none of the things ``QPDFWriter`` does apply. This includes
recompression of streams, renumbering of objects, anything to do with
object streams (which are not represented by qpdf JSON at all since
they are PDF syntax, not semantics), encryption, decryption,
linearization, QDF mode, etc.
.. _json.example:
qpdf JSON Example
~~~~~~~~~~~~~~~~~
The JSON below shows an example of a simple PDF file represented in
qpdf JSON format.
.. code-block:: json
{
"qpdf-v2": {
"pdfversion": "1.3",
"maxobjectid": 5,
"objects": {
"obj:1 0 R": {
"value": {
"/Pages": "2 0 R",
"/Type": "/Catalog"
}
},
"obj:2 0 R": {
"value": {
"/Count": 1,
"/Kids": [ "3 0 R" ],
"/Type": "/Pages"
}
},
"obj:3 0 R": {
"value": {
"/Contents": "4 0 R",
"/MediaBox": [ 0, 0, 612, 792 ],
"/Parent": "2 0 R",
"/Resources": {
"/Font": {
"/F1": "5 0 R"
}
},
"/Type": "/Page"
}
},
"obj:4 0 R": {
"stream": {
"data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI",
"dict": {
"/Filter": "/FlateDecode"
}
}
},
"obj:5 0 R": {
"value": {
"/BaseFont": "/Helvetica",
"/Encoding": "/WinAnsiEncoding",
"/Subtype": "/Type1",
"/Type": "/Font"
}
},
"trailer": {
"value": {
"/ID": [
"b:98b5a26966fba4d3a769b715b2558da6",
"b:98b5a26966fba4d3a769b715b2558da6"
],
"/Root": "1 0 R",
"/Size": 6
}
}
}
}
}
.. _json.input:
qpdf JSON Input
~~~~~~~~~~~~~~~
Output in the JSON output format described in :ref:`json.output` can
be used in two different ways:
- By using the :qpdf:ref:`--json-input` flag or calling
``QPDF::createFromJSON`` in place of ``QPDF::processFile``, a qpdf
JSON file can be used in place of a PDF file as the input to qpdf.
- By using the :qpdf:ref:`--update-from-json` flag or calling
``QPDF::updateFromJSON`` on an initialized ``QPDF`` object, a qpdf
JSON file can be used to apply changes to an existing ``QPDF``
object. That ``QPDF`` object can have come from any source including
a PDF file, a qpdf JSON file, or the result of any other process
that results in a valid, initialized ``QPDF`` object.
Here are some important things to know about qpdf JSON input.
- When a qpdf JSON file is used as the primary input file, it must be
complete. This means
- A PDF version number must be specified with the ``"pdfversion"``
key
- Stream data must be present for all streams
- The trailer dictionary must be present, though only the
``"/Root"`` key is required.
- Certain fields from the input are ignored whether creating or
updating from a JSON file:
- ``"maxobjectid"`` is ignored, so it is not necessary to update it
when adding new objects.
- ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't
put it there when it creates JSON output, and it is not necessary
to add it.
- ``"/Size"`` is ignored if it appears in a trailer dictionary as
that is always recomputed by ``QPDFWriter``.
- Unknown keys at the to top level of the file, within ``objects``,
at the top level of each individual object (inside the object that
has the ``"value"`` or ``"stream"`` key) and directly within
``"stream"`` are ignored for future compatibility. You should
avoid putting your own values in those places if you wish to avoid
risking that your JSON files will not work in future versions of
qpdf. The exception to this advice is at the top level of the
overall file where it is explicitly supported for you to add your
own keys. For example, you could add your own metadata at the top
level, and qpdf will ignore it. Note that extra top-level keys are
not preserved when qpdf reads your JSON file.
- When qpdf reads a PDF file, the internal object numbers are always
preserved. However, when qpdf writes a file using ``QPDFWriter``,
``QPDFWriter`` does its own numbering and, in general, does not
preserve input object numbers. That means that a qpdf JSON file that
is used to update an existing PDF must have object numbers that
match the input file it is modifying. In practical terms, this means
that you can't use a JSON file created from one PDF file to modify
the *output of running qpdf on that file*.
To put this more concretely, the following is valid:
::
qpdf --json-output in.pdf pdf.json
# edit pdf.json
qpdf in.pdf out.pdf --update-from-json=pdf.json
The following will not produce predictable results because
``out.pdf`` won't have the same object numbers as ``pdf.json`` and
``in.pdf``.
::
qpdf --json-output in.pdf pdf.json
# edit pdf.json
qpdf in.pdf out.pdf --update-from-json=pdf.json
# edit pdf.json again
# Don't do this
qpdf out.pdf out2.pdf --update-from-json=pdf.json
- When updating from a JSON file (:qpdf:ref:`--update-from-json`,
``QPDF::updateFromJSON``), existing objects are updated in place.
This has the following implications:
- You may omit both ``"data"`` and ``"datafile"`` if the object you
are updating is already a stream. In that case the original stream
data is preserved. You must always provide a stream dictionary,
but it may be empty. Note that an empty stream dictionary will
clear the old dictionary. There is no way to indicate that an old
stream dictionary should be left alone, so if your intention is to
replace the stream data and preserve the dictionary, the
original dictionary must appear in the JSON file.
- You can change one object type to another object type including
replacing a stream with a non-stream or a non-stream with a
stream. If you replace a non-stream with a stream, you must
provide data for the stream.
- Objects that you do not wish to modify can be omitted from the
JSON. That includes the trailer. That means you can use the output
of a qpdf JSON file that was written using
:qpdf:ref:`--json-object` to have it include only the objects you
intend to modify.
- You can omit the ``"pdfversion"`` key. The input PDF version will
be preserved.
.. _json.workflow-cli:
qpdf JSON Workflow: CLI
~~~~~~~~~~~~~~~~~~~~~~~
This section includes a few examples of using qpdf JSON.
- Convert a PDF file to JSON format, edit the JSON, and convert back
to PDF. This is an alternative to using QDF mode (see :ref:`qdf`) to
modify PDF files in a text editor. Each method has its own
advantages and disadvantages.
::
qpdf --json-output in.pdf pdf.json
# edit pdf.json
qpdf --json-input pdf.json out.pdf
- Extract only a specific object into a JSON file, modify the object
in JSON, and use the modified object to update the original PDF. In
this case, we're editing object 4, whatever that may happen to be.
You would have to know through some other means which object you
wanted to edit, such as by looking at other JSON output or using a
tool (possibly but not necessarily qpdf) to identify the object.
::
qpdf --json-output in.pdf pdf.json --json-object=4,0
# edit pdf.json
qpdf in.pdf --update-from-json=pdf.json out.pdf
Rather than using :qpdf:ref:`--json-object` as in the above example,
you could edit the JSON file to remove the objects you didn't need.
You could also just leave them there, though the update process
would be slower.
You could also add new objects to a file by adding them to
``pdf.json``. Just be sure the object number doesn't conflict with
an existing object. The ``"maxobjectid"`` field in the original
output can help with this. You don't have to update it if you add
objects as it is ignored when the file is read back in.
- Use :qpdf:ref:`--json-input` and :qpdf:ref:`--json-output` together
to demonstrate preservation of object numbers. In this example,
``a.json`` and ``b.json`` will have the same objects and object
numbers. The files may not be identical since strings may be
normalized, fields may appear in a different order, etc. However
``b.json`` and ``c.json`` are probably identical.
::
qpdf --json-output in.pdf a.json
qpdf --json-input --json-output a.json b.json
qpdf --json-input --json-output b.json c.json
.. _json.workflow-api:
qpdf JSON Workflow: API
~~~~~~~~~~~~~~~~~~~~~~~
Everything that can be done using the qpdf CLI can be done using the
C++ API. See comments in :file:`QPDF.hh` for ``writeJSON``,
``createFromJSON``, and ``updateFromJSON`` for details.
.. _json-guarantees: .. _json-guarantees:
JSON Guarantees JSON Compatibility Guarantees
--------------- -----------------------------
The qpdf JSON representation includes a JSON serialization of the raw The qpdf JSON representation includes a JSON serialization of the raw
objects in the PDF file as well as some computed information in a more objects in the PDF file as well as some computed information in a more
@ -37,24 +553,23 @@ format. These guarantees are designed to simplify the experience of a
developer working with the JSON format. developer working with the JSON format.
Compatibility Compatibility
The top-level JSON object output is a dictionary. The JSON output The top-level JSON object is a dictionary (JSON "object"). The JSON
contains various nested dictionaries and arrays. With the exception output contains various nested dictionaries and arrays. With the
of dictionaries that are populated by the fields of objects from the exception of dictionaries that are populated by the fields of
file, all instances of a dictionary are guaranteed to have exactly PDF objects from the file, all instances of a dictionary are
the same keys. Future versions of qpdf are free to add additional guaranteed to have exactly the same keys.
keys but not to remove keys or change the type of object that a key
points to. The qpdf program validates this guarantee, and in the
unlikely event that a bug in qpdf should cause it to generate data
that doesn't conform to this rule, it will ask you to file a bug
report.
The top-level JSON structure contains a "``version``" key whose value The top-level JSON structure contains a ``"version"`` key whose
is simple integer. The value of the ``version`` key will be value is simple integer. The value of the ``version`` key will be
incremented if a non-compatible change is made. A non-compatible incremented if a non-compatible change is made. A non-compatible
change would be any change that involves removal of a key, a change change would be any change that involves removal of a key, a change
to the format of data pointed to by a key, or a semantic change that to the format of data pointed to by a key, or a semantic change
requires a different interpretation of a previously existing key. A that requires a different interpretation of a previously existing
strong effort will be made to avoid breaking compatibility. key.
With a specific qpdf JSON version, future versions of qpdf are free
to add additional keys but not to remove keys or change the type of
object that a key points to.
Documentation Documentation
The :command:`qpdf` command can be invoked with the The :command:`qpdf` command can be invoked with the
@ -66,28 +581,29 @@ Documentation
- A dictionary in the help output means that the corresponding - A dictionary in the help output means that the corresponding
location in the actual JSON output is also a dictionary with location in the actual JSON output is also a dictionary with
exactly the same keys; that is, no keys present in help are absent exactly the same keys; that is, no keys present in help are
in the real output, and no keys will be present in the real output absent in the real output, and no keys will be present in the
that are not in help. As a special case, if the dictionary has a real output that are not in help. It is possible for a key to be
single key whose name starts with ``<`` and ends with ``>``, it present and have a value that is explicitly ``null``. As a
means that the JSON output is a dictionary that can have any keys, special case, if the dictionary has a single key whose name
each of which conforms to the value of the special key. This is starts with ``<`` and ends with ``>``, it means that the JSON
used for cases in which the keys of the dictionary are things like output is a dictionary that can have any value as a key. This is
object IDs. used for cases in which the keys of the dictionary are things
like object IDs.
- A string in the help output is a description of the item that - A string in the help output is a description of the item that
appears in the corresponding location of the actual output. The appears in the corresponding location of the actual output. The
corresponding output can have any format. corresponding output can have any value including ``null``.
- An array in the help output always contains a single element. It - An array in the help output always contains a single element. It
indicates that the corresponding location in the actual output is indicates that the corresponding location in the actual output is
also an array, and that each element of the array has whatever an array of any length, and that each element of the array has
format is implied by the single element of the help output's whatever format is implied by the single element of the help
array. output's array.
For example, the help output indicates includes a "``pagelabels``" For example, the help output indicates includes a ``"pagelabels"``
key whose value is an array of one element. That element is a key whose value is an array of one element. That element is a
dictionary with keys "``index``" and "``label``". In addition to dictionary with keys ``"index"`` and ``"label"``. In addition to
describing the meaning of those keys, this tells you that the actual describing the meaning of those keys, this tells you that the actual
JSON output will contain a ``pagelabels`` array, each of whose JSON output will contain a ``pagelabels`` array, each of whose
elements is a dictionary that contains an ``index`` key, a ``label`` elements is a dictionary that contains an ``index`` key, a ``label``
@ -95,56 +611,13 @@ Documentation
Directness and Simplicity Directness and Simplicity
The JSON output contains the value of every object in the file, but The JSON output contains the value of every object in the file, but
it also contains some processed data. This is analogous to how qpdf's it also contains some summary data. This is analogous to how qpdf's
library interface works. The processed data is similar to the helper library interface works. The summary data is similar to the helper
functions in that it allows you to look at certain aspects of the PDF functions in that it allows you to look at certain aspects of the
file without having to understand all the nuances of the PDF PDF file without having to understand all the nuances of the PDF
specification, while the raw objects allow you to mine the PDF for specification, while the raw objects allow you to mine the PDF for
anything that the higher-level interfaces are lacking. anything that the higher-level interfaces are lacking.
.. _json.limitations:
Limitations of JSON Representation
----------------------------------
There are a few limitations to be aware of with the JSON structure:
- Strings, names, and indirect object references in the original PDF
file are all converted to strings in the JSON representation. In the
case of a "normal" PDF file, you can tell the difference because a
name starts with a slash (``/``), and an indirect object reference
looks like ``n n R``, but if there were to be a string that looked
like a name or indirect object reference, there would be no way to
tell this from the JSON output. Note that there are certain cases
where you know for sure what something is, such as knowing that
dictionary keys in objects are always names and that certain things
in the higher-level computed data are known to contain indirect
object references.
- The JSON format doesn't support binary data very well. Mostly the
details are not important, but they are presented here for
information. When qpdf outputs a string in the JSON representation,
it converts the string to UTF-8, assuming usual PDF string semantics.
Specifically, if the original string is UTF-16, it is converted to
UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
converted to UTF-8 with that assumption. This causes strange things
to happen to binary strings. For example, if you had the binary
string ``<038051>``, this would be output to the JSON as ``\u0003•Q``
because ``03`` is not a printable character and ``80`` is the bullet
character in PDF doc encoding and is mapped to the Unicode value
``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
convert back from here to a binary string, would have to recognize
Unicode values whose code points are higher than ``0xFF`` and map
those back to their corresponding PDF doc encoding characters. There
is no way to tell the difference between a Unicode string that was
originally encoded as UTF-16 or one that was converted from PDF doc
encoding. In other words, it's best if you don't try to use the JSON
format to extract binary strings from the PDF file, but if you really
had to, it could be done. Note that qpdf's
:qpdf:ref:`--show-object` option does not have this
limitation and will reveal the string as encoded in the original
file.
.. _json.considerations: .. _json.considerations:
JSON: Special Considerations JSON: Special Considerations
@ -157,12 +630,15 @@ be aware of:
- If a PDF file has certain types of errors in its pages tree (such as - If a PDF file has certain types of errors in its pages tree (such as
page objects that are direct or multiple pages sharing the same page objects that are direct or multiple pages sharing the same
object ID), qpdf will automatically repair the pages tree. If you object ID), qpdf will automatically repair the pages tree. If you
specify ``"objects"`` and/or ``"objectinfo"`` without any other specify ``"objects"`` (and, with qpdf JSON version 1, also
keys, you will see the original pages tree without any corrections. ``"objectinfo"``) without any other keys, you will see the original
If you specify any of keys that require page tree traversal (for pages tree without any corrections. If you specify any of keys that
example, ``"pages"``, ``"outlines"``, or ``"pagelabel"``), then require page tree traversal (for example, ``"pages"``,
``"objects"`` and ``"objectinfo"`` will show the repaired page tree ``"outlines"``, or ``"pagelabel"``), then ``"objects"`` (and
so that object references will be consistent throughout the file. ``"objectinfo"``) will show the repaired page tree so that object
references will be consistent throughout the file. This is not an
issue with :qpdf:ref:`--json-output`, which doesn't repair the pages
tree.
- While qpdf guarantees that keys present in the help will be present - While qpdf guarantees that keys present in the help will be present
in the output, those fields may be null or empty if the information in the output, those fields may be null or empty if the information
@ -177,22 +653,128 @@ be aware of:
1. Note that JSON indexes from 0, and you would also use 0-based 1. Note that JSON indexes from 0, and you would also use 0-based
indexing using the API. However, 1-based indexing is easier in this indexing using the API. However, 1-based indexing is easier in this
case because the command-line syntax for specifying page ranges is case because the command-line syntax for specifying page ranges is
1-based. If you were going to write a program that looked through the 1-based. If you were going to write a program that looked through
JSON for information about specific pages and then use the the JSON for information about specific pages and then use the
command-line to extract those pages, 1-based indexing is easier. command-line to extract those pages, 1-based indexing is easier.
Besides, it's more convenient to subtract 1 from a program in a real Besides, it's more convenient to subtract 1 in a real programming
programming language than it is to add 1 from shell code. language than it is to add 1 in shell code.
- The image information included in the ``page`` section of the JSON - The image information included in the ``page`` section of the JSON
output includes the key "``filterable``". Note that the value of this output includes the key ``"filterable"``. Note that the value of
field may depend on the :qpdf:ref:`--decode-level` that this field may depend on the :qpdf:ref:`--decode-level` that you
you invoke qpdf with. The JSON output includes a top-level key invoke qpdf with. The JSON output includes a top-level key
"``parameters``" that indicates the decode level used for computing ``"parameters"`` that indicates the decode level that was used for
whether a stream was filterable. For example, jpeg images will be computing whether a stream was filterable. For example, jpeg images
shown as not filterable by default, but they will be shown as will be shown as not filterable by default, but they will be shown
filterable if you run :command:`qpdf --json as filterable if you run :command:`qpdf --json
--decode-level=all`. --decode-level=all`.
- The ``encrypt`` key's values will be populated for non-encrypted - The ``encrypt`` key's values will be populated for non-encrypted
files. Some values will be null, and others will have values that files. Some values will be null, and others will have values that
apply to unencrypted files. apply to unencrypted files.
- The qpdf library itself never loads an entire PDF into memory. This
remains true for PDF files represented in JSON format. In general,
qpdf will hold the entire object structure in memory once a file has
been fully read (objects are loaded into memory lazily but stay
there once loaded), but it will never have more than two copies of a
stream in memory at once. That said, if you ask qpdf to write JSON
to memory, it will do so, so be careful about this if you are
working with very large PDF files. There is nothing in the qpdf
library itself that prevents working with PDF files much larger than
available system memory. qpdf can both read and write such files in
JSON format. If you need to work with a PDF file's json
representation in memory, it is recommended that you use either
``none`` or ``file`` as the argument to
:qpdf:ref:`--json-stream-data`, or if using the API, use
``qpdf_sj_none`` or ``pdf_sj_file`` as the json stream data value.
If using ``none``, you can use other means to obtain the stream
data.
.. _json-v2-changes:
Changes from JSON v1 to v2
--------------------------
The following changes were made to qpdf's JSON output format for
version 2.
- The representation of objects has changed. For details, see
:ref:`json.objects`.
- The representation of strings is now unambiguous for all strings.
Strings a prefixed with either ``u:`` for Unicode strings or
``b:`` for byte strings.
- Names are shown in qpdf's canonical form rather than in PDF
syntax. (Example: the PDF-syntax name ``/text#2fplain`` appeared
as ``"/text#2fplain"`` in v1 but appears as ``"/text/plain"`` in
v2.
- The top-level representation of an object in ``"objects"`` is a
dictionary containing either a ``"value"`` key or a ``"stream"``
key, making it possible to distinguish streams from other objects.
- The ``"objectinfo"`` key has been removed in favor of a
representation in ``"objects"`` that differentiates between a stream
and other kinds of objects. In v1, it was not possible to tell a
stream from a dictionary within ``"objects"``.
- Within the ``"objects"`` dictionary, keys are now ``"obj:O G R"``
where ``O`` and ``G`` are the object and generation number.
``"trailer"`` remains the key for the trailer dictionary. In v1, the
``obj:`` prefix was not present. The rationale for this change is as
follows:
- Having a unique prefix (``obj:``) makes it much easier to search
in the JSON file for the definition of an object
- Having the key still contain ``O G R`` makes it much easier to
construct the key from an indirect reference. You just have to
prepend ``obj:``. There is no need to parse the indirect object
reference.
- In the ``"encrypt"`` object, the ``"modifyannotations"`` was
misspelled as ``"moddifyannotations"`` in v1. This has been
corrected.
Motivation for qpdf JSON version 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
qpdf JSON version 2 was created to make it possible to manipulate PDF
files using JSON syntax instead of native PDF syntax. This makes it
possible to make low-level updates to PDF files from just about any
programming language or even to do so from the command-line using
tools like ``jq`` or any editor that's capable of working with JSON
files. There were several limitations of JSON format version 1 that
made this impossible:
- Strings, names, and indirect object references in the original PDF
file were all converted to strings in the JSON representation. For
casual human inspection, this was fine, but in the general case,
there was no way to tell the difference between a string that looked
like a name or indirect object reference from an actual name or
indirect object reference.
- PDF strings were not unambiguously represented in the JSON format.
The way qpdf JSON v1 represented a string was to try to convert the
string to UTF-8. This was done by assuming a string that was not
explicitly marked as Unicode was encoded in PDF doc encoding. The
problem is that there is not a perfect bidirectional mapping between
Unicode and PDF doc encoding, so if a binary string happened to
contain characters that couldn't be bidirectionally mapped, there
would be no way to get back to the original PDF string. Even when
possible, trying to map from the JSON representation of a binary
string back to the original string required knowledge of the mapping
between PDF doc encoding and Unicode.
- There was no representation of stream data. If you wanted to extract
stream data, you could use :qpdf:ref:`--show-object`, so this wasn't
that important for inspection, but it was a blocker for being able
to go from JSON back to PDF. qpdf JSON version 2 allows stream data
to be included inline as base64-encoded data. There is also an
option to write all stream data to external files, which makes it
possible to work with very large PDF files in JSON format even with
tools that try to read the entire JSON structure into memory.
- The PDF version from PDF header was not represented in qpdf JSON v1.

View File

@ -70,12 +70,14 @@ Python
qpdf's capabilities with other functionality provided by Python's qpdf's capabilities with other functionality provided by Python's
rich standard library and available modules. rich standard library and available modules.
Other Languages Other Languages Starting with version 11.0.0, the :command:`qpdf`
Starting with version 8.3.0, the :command:`qpdf` command-line tool can produce an unambiguous JSON representation of
command-line tool can produce a JSON representation of the PDF file's a PDF file and can also create or update PDF files using this JSON
non-content data. This can facilitate interacting programmatically representation. qpdf versions from 8.3.0 through 10.6.3 had a more
with PDF files through qpdf's command line interface. For more limited JSON output format. The qpdf JSON format makes it possible
information, please see :ref:`json`. to inspect and modify the structure of a PDF file down to the
object level from the command-line or from any language that can
handle JSON data. Please see :ref:`json` for details.
Wrappers Wrappers
The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a

View File

@ -122,7 +122,7 @@ entries in ``/W`` above. Each entry consists of one or more fields, the
first of which is the type of the field. The number of bytes for each first of which is the type of the field. The number of bytes for each
field is given by ``/W`` above. A 0 in ``/W`` indicates that the field field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
is omitted and has the default value. The default value for the field is omitted and has the default value. The default value for the field
type is "``1``". All other default values are "``0``". type is ``1``. All other default values are ``0``.
PDF 1.5 has three field types: PDF 1.5 has three field types:

View File

@ -28,6 +28,13 @@ able to restore edited files to a correct state. The
arguments. It reads a possibly edited QDF file from standard input and arguments. It reads a possibly edited QDF file from standard input and
writes a repaired file to standard output. writes a repaired file to standard output.
For another way to work with PDF files in an editor, see :ref:`json`.
Using qpdf JSON format allows you to edit the PDF file semantically
without having to be concerned about PDF syntax. However, QDF files
are actually valid PDF files, so the feedback cycle may be faster if
previewing with a PDF reader. Also, since QDF files are valid PDF, you
can experiment with all aspects of the PDF file, including syntax.
The following attributes characterize a QDF file: The following attributes characterize a QDF file:
- All objects appear in numerical order in the PDF file, including when - All objects appear in numerical order in the PDF file, including when

View File

@ -27,6 +27,10 @@ executable is available from inside the C++ library using the
- Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h` - Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h`
- Note: this is unrelated to :qpdf:ref:`--json` but can be combined
with it. For more information on qpdf JSON (vs. QPDFJob JSON), see
:ref:`json`.
- The ``QPDFJob`` C++ API - The ``QPDFJob`` C++ API
If you can understand how to use the :command:`qpdf` CLI, you can If you can understand how to use the :command:`qpdf` CLI, you can

View File

@ -60,7 +60,8 @@ For a detailed list of changes, please see the file
- CLI: breaking changes - CLI: breaking changes
- The default json output version when :qpdf:ref:`--json` is - The default json output version when :qpdf:ref:`--json` is
specified has been changed from ``1`` to ``latest``. specified has been changed from ``1`` to ``latest``, which is
now ``2``.
- The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when - The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when
explicitly creating files with weak cryptographic algorithms. explicitly creating files with weak cryptographic algorithms.
@ -100,7 +101,7 @@ For a detailed list of changes, please see the file
- ``qpdf --list-attachments --verbose`` include some additional - ``qpdf --list-attachments --verbose`` include some additional
information about attachments. Additional information about information about attachments. Additional information about
attachments is also included in the ``attachments`` json key attachments is also included in the ``attachments`` JSON key
with ``--json``. with ``--json``.
- For encrypted files, ``qpdf --json`` reveals the user password - For encrypted files, ``qpdf --json`` reveals the user password
@ -647,8 +648,8 @@ For a detailed list of changes, please see the file
passwords from files or standard input than using passwords from files or standard input than using
:samp:`@file` for this purpose. :samp:`@file` for this purpose.
- Add some information about attachments to the json output, and - Add some information about attachments to the JSON output, and
added ``attachments`` as an additional json key. The added ``attachments`` as an additional JSON key. The
information included here is limited to the preferred name and information included here is limited to the preferred name and
content stream and a reference to the file spec object. This is content stream and a reference to the file spec object. This is
enough detail for clients to avoid the hassle of navigating a enough detail for clients to avoid the hassle of navigating a