2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-12-22 02:49:00 +00:00

QPDFJob: documentation

This commit is contained in:
Jay Berkenbilt 2022-02-01 07:18:23 -05:00
parent 5a7bb3474e
commit cc5485dac1
16 changed files with 589 additions and 49 deletions

View File

@ -124,14 +124,32 @@ CODING RULES
HOW TO ADD A COMMAND-LINE ARGUMENT
QPDFJob is documented in three places:
* This section provides a quick reminder for how to add a command-line
argument
* generate_auto_job has a detailed explanation about how QPDFJob and
generate_auto_job work together
* The manual ("QPDFJob Design" in qpdf-job.rst) discusses the design
approach, rationale, and evolution of QPDFJob.
Command-line arguments are closely coupled with QPDFJob. To add a new
command-line argument, add the option to the appropriate table in
job.yml. This will automatically declare a method in the private
ArgParser class in QPDFJob_argv.cc which you have to implement. The
implementation should make calls to methods in QPDFJob. Then, add the
same option to either the no-json section of job.yml if it is to be
excluded from the job json structure, or add it under the json
structure to the place where it should appear in the json structure.
implementation should make calls to methods in QPDFJob via its Config
classes. Then, add the same option to either the no-json section of
job.yml if it is to be excluded from the job json structure, or add it
under the json structure to the place where it should appear in the
json structure.
In most cases, adding a new option will automatically declare and call
the appropriate Config method, which you then have to implement. If
you need a manual handler, you have to declare the option as manual in
job.yml and implement the handler yourself, though the automatically
generated code will declare it for you.
The build will fail until the new option is documented in
manual/cli.rst. To do that, create documentation for the option by
@ -148,6 +166,10 @@ When done, the following should happen:
* qpdf --help=topic should list --new-option for the correct topic
* --new-option should appear in the manual
* --new-option should be in the command-line option index in the manual
* A Config method (in Config or one of the other Config classes in
QPDFJob) should exist that corresponds to the command-line flag
* The job JSON file should have a new key in the schema corresponding
to the new option
RELEASE PREPARATION

View File

@ -100,6 +100,7 @@
"encodable",
"encp",
"endianness",
"endl",
"endobj",
"endstream",
"enspliel",
@ -128,6 +129,7 @@
"fuzzer",
"fuzzers",
"fvisibility",
"iostream",
"gajic",
"gajić",
"gcurl",

View File

@ -8,13 +8,13 @@ BINS_examples = \
pdf-filter-tokens \
pdf-invert-images \
pdf-mod-info \
pdf-job \
pdf-name-number-tree \
pdf-npages \
pdf-overlay-page \
pdf-parse-content \
pdf-set-form-values \
pdf-split-pages
pdf-split-pages \
qpdf-job
CBINS_examples = \
pdf-c-objects \
pdf-linearize

View File

@ -9,6 +9,121 @@ import json
import filecmp
from contextlib import contextmanager
# The purpose of this code is to automatically generate various parts
# of the QPDFJob class. It is fairly complicated and extremely
# bespoke, so understanding it is important if modifications are to be
# made.
# Documentation of QPDFJob is divided among three places:
#
# * "HOW TO ADD A COMMAND-LINE ARGUMENT" in README-maintainer provides
# a quick reminder for how to add a command-line argument
#
# * This file has a detailed explanation about how QPDFJob and
# generate_auto_job work together
#
# * The manual ("QPDFJob Design" in qpdf-job.rst) discusses the design
# approach, rationale, and evolution of QPDFJob.
#
# QPDFJob solved the problem of moving extensive functionality that
# lived in qpdf.cc into the library. The QPDFJob class consists of
# four major sections:
#
# * The run() method and its subsidiaries are responsible for
# performing the actual operations on PDF files. This is implemented
# in QPDFJob.cc
#
# * The nested Config class and the other classes it creates provide
# an API for setting up a QPDFJob instance and correspond to the
# command-line arguments of the qpdf executable. This is implemented
# in QPDFJob_config.cc
#
# * The argument parsing code reads an argv array and calls
# configuration methods. This is implemented in QPDFJob_argv.cc. The
# argument parsing logic itself is implemented in the QPDFArgParser
# class.
#
# * The job JSON handling code, which reads a QPDFJob JSON file and
# calls configuration methods. This is implemented in
# QPDFJob_json.cc. The JSON parsing code is in the JSON class. A
# sax-like JSON handler class that calls callbacks in response to
# items in the JSON is implemented in the JSONHandler class.
#
# This code has the job of ensuring that configuration, command-line
# arguments, and JSON are all consistent and complete so that a
# developer or user can freely move among those different ways of
# interacting with QPDFJob in a predictable fashion. In addition, help
# information for each option appears in manual/cli.rst, and that
# information is used in creation of the job JSON schema and to supply
# help text to QPDFArgParser. This code also ensures that there is an
# exact match between options in job.yml and options in cli.rst.
#
# The job.yml file contains the data that drives this code. To
# understand job.yml, here are some important concepts.
#
# QPDFArgParser option table. There is support for positional
# arguments, options consisting of flags and optional parameters, and
# subparsers that start with a regular parameterless flag, have their
# own positional and option sections, and are terminated with -- by
# itself. Examples of this include --encrypt and --pages. An "option
# table" contains an optional positional argument handler and a list
# of valid options with specifications about their parameters. There
# are three kinds of option tables:
#
# * The built-in "help" option table contains help commands, like
# --help and --version, that are only valid when they appear as the
# single command-line argument.
#
# * The "main" option table contains the options that are valid
# starting at the beginning of argument parsing.
#
# * A named option table can be started manually by the argument
# parsing code to switch the argument parser's context. Switching
# the parser to a new option table is manual (via a call to
# selectOptionTable). Context reverts to the main option table
# automatically when -- is encountered.
#
# In QPDFJob.hh, there is a Config class for each option table except
# help.
#
# Option type: bare, required/optional parameter, required/optional
# choices. A bare argument is just a flag, like --qdf. A parameter
# option takes an arbitrary parameter, like --password. A choices
# option takes one of a fixed list of choices, like --object-streams.
# If a parameter or choices option's parameter is option, the empty
# string may be specified as an option, such as --collate (or
# --collate=). For a bare option, --option= is always the same as just
# --option. This makes it possible to switch an option from bare to
# optional choice to optional parameter all without breaking
# compatibility.
#
# JSON "schema". This is a qpdf-specific "schema" for JSON. It is not
# related to any kind of standard JSON schema. It is described in
# JSON.hh and in the manual. QPDFJob uses the JSON "schema" in a mode
# in which keys in the schema are all optional in the JSON object.
#
# Here is the mapping between configuration, argv, and JSON.
#
# The help options table is implemented solely for argv processing and
# has no counterpart in configuration or JSON.
#
# The config() method returns a shared pointer to a Config object.
# Every command-line option in the main option table has a
# corresponding method in Config whose name is the option converted to
# camel case. For bare options and options with optional parameters, a
# version exists that takes no arguments. For others, a version exists
# that takes a char const*. For example, the --qdf flag implies a
# qdf() method in Config, and the --object-streams flag implies an
# objectStreams(char const*) method in Config. For flags in option
# tables, the method is declared inside a config class specific to the
# option table. The mapping between option tables and config classes
# is explicit in job.yml. Positional arguments are handled
# individually and manually -- see QPDFJob.hh in the CONFIGURATION
# section for details. See examples/qpdf-job.cc for an example.
#
# To understand the rest, start at main and follow comments in the
# code.
whoami = os.path.basename(sys.argv[0])
BANNER = f'''//
// This file is automatically generated by {whoami}.
@ -33,12 +148,18 @@ def write_file(filename):
class Main:
# SOURCES is a list of source files whose contents are used by
# this program. If they change, we are out of date.
SOURCES = [
whoami,
'manual/_ext/qpdf.py',
'job.yml',
'manual/cli.rst',
]
# DESTS is a map to the output files this code generates. These
# generated files, as well as those added to DESTS later in the
# code, are included in various places by QPDFJob.hh or any of the
# implementing QPDFJob*.cc files.
DESTS = {
'decl': 'libqpdf/qpdf/auto_job_decl.hh',
'init': 'libqpdf/qpdf/auto_job_init.hh',
@ -48,6 +169,11 @@ class Main:
'json_init': 'libqpdf/qpdf/auto_job_json_init.hh',
# Others are added in top
}
# SUBS contains a checksum for each source and destination and is
# used to detect whether we're up to date without having to force
# recompilation all the time. This way the build can invoke this
# script unconditionally without causing stuff to rebuild every
# time.
SUMS = 'job.sums'
def main(self, args=sys.argv[1:], prog=whoami):
@ -71,8 +197,17 @@ class Main:
def top(self, options):
with open('job.yml', 'r') as f:
data = yaml.safe_load(f.read())
# config_decls maps a config key from an option in "options"
# (from job.yml) to a list of declarations. A declaration is
# generated for each config method for that option table.
self.config_decls = {}
# Keep track of which configs we've declared since we can have
# option tables share a config class, as with the encryption
# tables.
self.declared_configs = set()
# Update DESTS -- see above. This ensures that each config
# class's contents are included in job.sums.
for o in data['options']:
config = o.get('config', None)
if config is not None:
@ -257,12 +392,21 @@ class Main:
def generate(self, data):
warn(f'{whoami}: regenerating auto job files')
self.validate(data)
# Add the built-in help options to tables that we populate as
# we read job.yml since we won't encounter these in job.yml
# Keep track of which options are help options since they are
# handled specially. Add the built-in help options to tables
# that we populate as we read job.yml since we won't encounter
# these in job.yml
self.help_options = set(
['--completion-bash', '--completion-zsh', '--help']
)
# Keep track of which options we have encountered but haven't
# seen help text for. This enables us to report if any option
# is missing help.
self.options_without_help = set(self.help_options)
# Compute the information needed for generated files and write
# the files.
self.prepare(data)
with write_file(self.DESTS['decl']) as f:
print(BANNER, file=f)
@ -276,6 +420,11 @@ class Main:
with open('manual/cli.rst', 'r') as df:
print(BANNER, file=f)
self.generate_doc(df, f)
# Compute the json files after the config and arg parsing
# files. We need to have full information about all the
# options before we can generate the schema. Generating the
# schema also generates the json header files.
self.generate_schema(data)
with write_file(self.DESTS['schema']) as f:
print('static constexpr char const* JOB_SCHEMA_DATA = R"(' +
@ -301,6 +450,9 @@ class Main:
# DON'T ADD CODE TO generate AFTER update_hashes
def handle_trivial(self, i, identifier, cfg, prefix, kind, v):
# A "trivial" option is one whose handler does nothing other
# than to call the config method with the same name (switched
# to camelCase).
decl_arg = 1
decl_arg_optional = False
if kind == 'bare':
@ -341,11 +493,18 @@ class Main:
# strategy enables us to change an option from bare to
# optional_parameter or optional_choices without
# breaking binary compatibility. The overloaded
# methods both have to be implemented manually.
# methods both have to be implemented manually. They
# are not automatically called, so if you forget,
# someone will get a link error if they try to call
# one.
self.config_decls[cfg].append(
f'QPDF_DLL {config_prefix}* {identifier}();')
def handle_flag(self, i, identifier, kind, v):
# For flags that require manual handlers, declare the handler
# and register it. They have to be implemented manually in
# QPDFJob_argv.cc. You get compiler/linker errors for any
# missing methods.
if kind == 'bare':
self.decls.append(f'void {identifier}();')
self.init.append(f'this->ap.addBare("{i}", '
@ -371,14 +530,17 @@ class Main:
f', false, {v}_choices);')
def prepare(self, data):
self.decls = []
self.init = []
self.json_decls = []
self.json_init = []
self.jdata = {}
self.by_table = {}
self.decls = [] # argv handler declarations
self.init = [] # initialize arg parsing code
self.json_decls = [] # json handler declarations
self.json_init = [] # initialize json handlers
self.jdata = {} # running data used for json generate
self.by_table = {} # table information by name for easy lookup
def add_jdata(flag, table, details):
# Keep track of each flag and where it appears so we can
# check consistency between the json information and the
# options section.
nonlocal self
if table == 'help':
self.help_options.add(f'--{flag}')
@ -389,6 +551,7 @@ class Main:
'tables': {table: details},
}
# helper functions
self.init.append('auto b = [this](void (ArgParser::*f)()) {')
self.init.append(' return QPDFArgParser::bindBare(f, this);')
self.init.append('};')
@ -396,6 +559,8 @@ class Main:
self.init.append(' return QPDFArgParser::bindParam(f, this);')
self.init.append('};')
self.init.append('')
# static variables for each set of choices for choices options
for k, v in data['choices'].items():
s = f'static char const* {k}_choices[] = {{'
for i in v:
@ -406,6 +571,8 @@ class Main:
self.init.append('')
self.json_init.append('')
# constants for the table names to reduce hard-coding strings
# in the handlers
for o in data['options']:
table = o['table']
if table in ('main', 'help'):
@ -413,6 +580,20 @@ class Main:
i = self.to_identifier(table, 'O', True)
self.decls.append(f'static constexpr char const* {i} = "{table}";')
self.decls.append('')
# Walk through all the options adding declarations for the
# option handlers and initialization code to register the
# handlers in QPDFArgParser. For "trivial" cases,
# QPDFArgParser will call the corresponding config method
# automatically. Otherwise, it will declare a handler that you
# have to explicitly implement.
# If you add a new option table, you have to set config to the
# name of a member variable that you declare in the ArgParser
# class in QPDFJob_argv.cc. Then there should be an option in
# the main table, also listed as manual in job.yml, that
# switches to it. See implementations of any of the existing
# options that do this for examples.
for o in data['options']:
table = o['table']
config = o.get('config', None)
@ -437,8 +618,8 @@ class Main:
self.decls.append(f'void {arg_prefix}Positional(char*);')
self.init.append('this->ap.addPositional('
f'p(&ArgParser::{arg_prefix}Positional));')
flags = {}
flags = {}
for i in o.get('bare', []):
flags[i] = ['bare', None]
for i, v in o.get('required_parameter', {}).items():
@ -462,6 +643,11 @@ class Main:
self.handle_trivial(
i, identifier, config, config_prefix, kind, v)
# Subsidiary options tables need end methods to do any
# final checking within the option table. Final checking
# for the main option table is handled by
# checkConfiguration, which is called explicitly in the
# QPDFJob code.
if table not in ('main', 'help'):
identifier = self.to_identifier(table, 'argEnd', False)
self.decls.append(f'void {identifier}();')
@ -510,6 +696,19 @@ class Main:
return self.option_to_json_key(schema_key)
def build_schema(self, j, path, flag, expected, options_seen):
# j: the part of data from "json" in job.yml as we traverse it
# path: a string representation of the path in the json
# flag: the command-line flag
# expected: a map of command-line options we expect to eventually see
# options_seen: which options we have seen so far
# As described in job.yml, the json can have keys that don't
# map to options. This includes keys whose values are
# dictionaries as well as keys that correspond to positional
# arguments. These start with _ and get their help from
# job.yml. Things that correspond to options get their help
# from the help text we gathered from cli.rst.
if flag in expected:
options_seen.add(flag)
elif isinstance(j, str):
@ -519,6 +718,19 @@ class Main:
elif not (flag == '' or flag.startswith('_')):
raise Exception(f'json: unknown key {flag}')
# The logic here is subtle and makes sense if you understand
# how our JSON schemas work. They are described in JSON.hh,
# but basically, if you see a dictionary, the schema should
# have a dictionary with the same keys whose values are
# descriptive. If you see an array, the array should have
# single member that describes each element of the array. See
# JSON.hh for details.
# See comments in QPDFJob_json.cc in the Handlers class
# declaration to understand how and why the methods called
# here work. The idea is that Handlers keeps a stack of
# JSONHandler shared pointers so that we can register our
# handlers in the right place as we go.
if isinstance(j, dict):
schema_value = {}
if flag:
@ -579,14 +791,20 @@ class Main:
def generate_schema(self, data):
# Check to make sure that every command-line option is
# represented in data['json'].
# Build a list of options that we expect. If an option appears
# once, we just expect to see it once. If it appears in more
# than one options table, we need to see a separate version of
# it for each option table. It is represented in job.yml
# prepended with the table prefix. The table prefix is removed
# in the schema.
# represented in data['json']. Build a list of options that we
# expect. If an option appears once, we just expect to see it
# once. If it appears in more than one options table, we need
# to see a separate version of it for each option table. It is
# represented in job.yml prepended with the table prefix. The
# table prefix is removed in the schema. Example: "password"
# appears multiple times, so the json section of job.yml has
# main.password, uo.password, etc. But most options appear
# only once, so we can just list them as they are. There is a
# nearly exact match between option tables and dictionary in
# the job json schema, but it's not perfect because of how
# positional arguments are handled, so we have to do this
# extra work. Information about which tables a particular
# option appeared in is gathered up in prepare().
expected = {}
for k, v in self.jdata.items():
tables = v['tables']
@ -600,7 +818,11 @@ class Main:
# Walk through the json information building the schema as we
# go. This verifies consistency between command-line options
# and the json section of the data and builds up a schema by
# populating with help information as available.
# populating with help information as available. In addition
# to generating the schema, we declare and register json
# handlers that correspond with it. That way, we can first
# check a job JSON file against the schema, and if it matches,
# we have fewer error opportunities while calling handlers.
self.schema = self.build_schema(
data['json'], '', '', expected, options_seen)
if options_seen != set(expected.keys()):

View File

@ -62,10 +62,10 @@ class QPDFJob
// the regular API. This is exposed in the C API, which makes it
// easier to get certain high-level qpdf functionality from other
// languages. If there are any command-line errors, this method
// will throw QPDFArgParser::Usage which is derived from
// std::runtime_error. Other exceptions may be thrown in some
// cases. Note that argc, and argv should be UTF-8 encoded. If you
// are calling this from a Windows Unicode-aware main (wmain), see
// will throw QPDFUsage which is derived from std::runtime_error.
// Other exceptions may be thrown in some cases. Note that argc,
// and argv should be UTF-8 encoded. If you are calling this from
// a Windows Unicode-aware main (wmain), see
// QUtil::call_main_from_wmain for information about converting
// arguments to UTF-8. This method will mutate arguments that are
// passed to it.
@ -76,7 +76,7 @@ class QPDFJob
// Initialize a QPDFJob from json. Passing partial = true prevents
// this method from doing the final checks (calling
// checkConfiguration) after processing the json file. This makes
// it possible to initialze QPDFJob in stages using multiple json
// it possible to initialize QPDFJob in stages using multiple json
// files or to have a json file that can be processed from the CLI
// with --job-json-file and be combined with other arguments. For
// example, you might include only encryption parameters, leaving
@ -84,7 +84,11 @@ class QPDFJob
// input and output files. initializeFromJson is called with
// partial = true when invoked from the command line. To make sure
// that the json file is fully valid on its own, just don't
// specify any other command-line flags.
// specify any other command-line flags. If there are any
// configuration errors, QPDFUsage is thrown. Some error messages
// may be CLI-centric. If an an exception tells you to use the
// "--some-option" option, set the "someOption" key in the JSON
// object instead.
QPDF_DLL
void initializeFromJson(std::string const& json, bool partial = false);
@ -160,7 +164,7 @@ class QPDFJob
// object. The Config object contains methods that correspond with
// qpdf command-line arguments. You can use a fluent interface to
// configure a QPDFJob object that would do exactly the same thing
// as a specific qpdf command. The example pdf-job.cc contains an
// as a specific qpdf command. The example qpdf-job.cc contains an
// example of this usage. You can also use initializeFromJson or
// initializeFromArgv to initialize a QPDFJob object.
@ -180,6 +184,10 @@ class QPDFJob
// with references. Returning pointers instead of references
// makes for a more uniform interface.
// Maintainer documentation: see the section in README-maintainer
// called "HOW TO ADD A COMMAND-LINE ARGUMENT", which contains
// references to additional places in the documentation.
class Config;
class AttConfig
@ -330,7 +338,10 @@ class QPDFJob
// Return a top-level configuration item. See CONFIGURATION above
// for details. If an invalid configuration is created (such as
// supplying contradictory options, omitting an input file, etc.),
// QPDFUsage is thrown.
// QPDFUsage is thrown. Note that error messages are CLI-centric,
// but you can map them into config calls. For example, if an
// exception tells you to use the --some-option flag, you should
// call config()->someOption() instead.
QPDF_DLL
std::shared_ptr<Config> config();

View File

@ -1,17 +1,17 @@
# Generated by generate_auto_job
generate_auto_job 1fdb113412a444aad67b0232f3f6c4f50d9e2a5701691e5146fd1b559039ef2e
generate_auto_job 5d6ec1e4f0b94d8f73df665061d8a2188cbbe8f25ea42be78ec576547261d5ac
include/qpdf/auto_job_c_att.hh 7ad43bb374c1370ef32ebdcdcb7b73a61d281f7f4e3f12755585872ab30fb60e
include/qpdf/auto_job_c_copy_att.hh 32275d03cdc69b703dd7e02ba0bbe15756e714e9ad185484773a6178dc09e1ee
include/qpdf/auto_job_c_enc.hh 72e138c7b96ed5aacdce78c1dec04b1c20d361faec4f8faf52f64c1d6be99265
include/qpdf/auto_job_c_main.hh 69d5ea26098bcb6ec5b5e37ba0bca9e7d16a784d2618e0c05d635046848d5123
include/qpdf/auto_job_c_pages.hh 931840b329a36ca0e41401190e04537b47f2867671a6643bfd8da74014202671
include/qpdf/auto_job_c_uo.hh 0585b7de459fa479d9e51a45fa92de0ff6dee748efc9ec1cedd0dde6cee1ad50
job.yml effc93a805fb74503be2213ad885238db21991ba3d084fbfeff01183c66cb002
job.yml 9544c6e046b25d3274731fbcd07ba25b300fd67055021ac4364ad8a91f77c6b6
libqpdf/qpdf/auto_job_decl.hh 9f79396ec459f191be4c5fe34cf88c265cf47355a1a945fa39169d1c94cf04f6
libqpdf/qpdf/auto_job_help.hh 6002f503368f319a3d717484ac39d1558f34e67989d442f394791f6f6f5f0500
libqpdf/qpdf/auto_job_help.hh 43184f01816b5210bbc981de8de48446546fb94f4fd6e63cfc7f2fbac3578e6b
libqpdf/qpdf/auto_job_init.hh fd13b9f730e6275a39a15d193bd9af19cf37f4495699ec1886c2b208d7811ab1
libqpdf/qpdf/auto_job_json_decl.hh c5e3fd38a3b0c569eb0c6b4c60953a09cd6bc7d3361a357a81f64fe36af2b0cf
libqpdf/qpdf/auto_job_json_init.hh 3f86ce40931ca8f417d050fcd49104d73c1fa4e977ad19d54b372831a8ea17ed
libqpdf/qpdf/auto_job_schema.hh 18a3780671d95224cb9a27dcac627c421cae509d59f33a63e6bda0ab53cce923
manual/_ext/qpdf.py e9ac9d6c70642a3d29281ee5ad92ae2422dee8be9306fb8a0bc9dba0ed5e28f3
manual/cli.rst 35289dbf593085016a62249f760cdcad50d5cce76d799ea4acf5dff58b78679a
manual/cli.rst 3746df6c4f115387cca0d921f25619a6b8407fc10b0e4c9dcf40b0b1656c6f8a

View File

@ -1,4 +1,11 @@
# See "HOW TO ADD A COMMAND-LINE ARGUMENT" in README-maintainer.
# REMEMBER: if you add an optional_choices or optional_parameter, you
# have to explicitly remember to implement the overloaded config
# method that takes no arguments. Since no generated code will call it
# automatically, there is no automated reminder to do this. If you
# forget, it will be a link error if someone tries to call it.
choices:
yn:
- "y"

View File

@ -646,7 +646,6 @@ QPDFJob::createsOutput() const
void
QPDFJob::checkConfiguration()
{
// QXXXQ messages are CLI-centric
if (m->replace_input)
{
if (m->outfilename)
@ -722,7 +721,8 @@ QPDFJob::checkConfiguration()
{
QTC::TC("qpdf", "qpdf same file error");
usage("input file and output file are the same;"
" use --replace-input to intentionally overwrite the input file");
" use --replace-input to intentionally"
" overwrite the input file");
}
}

View File

@ -28,7 +28,6 @@ QPDFJob::Config::emptyInput()
{
if (o.m->infilename == 0)
{
// QXXXQ decide whether to fix this or just leave the comment:
// Various places in QPDFJob.cc know that the empty string for
// infile means empty. This means that passing "" as the
// argument to inputFile, or equivalently using "" as a

View File

@ -29,6 +29,28 @@ namespace
typedef std::function<void(char const*)> param_handler_t;
typedef std::function<void(JSON)> json_handler_t;
// The code that calls these methods is automatically
// generated by generate_auto_job. This describes how we
// implement what it does. We keep a stack of handlers in
// json_handlers. The top of the stack is the "current" json
// handler, intially for the top-level object. Whenever we
// encounter a scalar, we add a handler using addBare,
// addParameter, or addChoices. Whenever we encounter a
// dictionary, we first add the dictionary handlers. Then we
// walk into the dictionary and, for each key, we register a
// dict key handler and push it to the stack, then do the same
// process for the key's value. Then we pop the key handler
// off the stack. When we encounter an array, we add the array
// handlers, push an item handler to the stack, call
// recursively for the array's single item (as this is what is
// expected in a schema), and pop the item handler. Note that
// we don't pop dictionary start/end handlers. The dictionary
// handlers and the key handlers are at the same level in
// JSONHandler. This logic is subtle and took several tries to
// get right. It's best understood by carefully understanding
// the behavior of JSONHandler, the JSON schema, and the code
// in generate_auto_job.
void addBare(bare_handler_t);
void addParameter(param_handler_t);
void addChoices(char const** choices, bool required, param_handler_t);

View File

@ -812,7 +812,8 @@ This option is repeatable. If given, only specified objects will
be shown in the "objects" key of the JSON output. Otherwise, all
objects will be shown.
)");
ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input.
ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
--job-json-file.
)");
ap.addHelpTopic("testing", "options for testing or debugging", R"(The options below are useful when writing automated test code that
includes files created by qpdf or when testing qpdf itself.

View File

@ -167,9 +167,11 @@ Related Options
description of the JSON input file format.
Specify the name of a file whose contents are expected to contain a
QPDFJob JSON file. QXXXQ ref. This file is read and treated as if
the equivalent command-line arguments were supplied. It can be
mixed freely with other options.
QPDFJob JSON file. This file is read and treated as if the
equivalent command-line arguments were supplied. It can be repeated
and mixed freely with other options. Run ``qpdf`` with
:qpdf:ref:`--job-json-help` for a description of the job JSON input
file format. For more information, see :ref:`qpdf-job`.
.. _exit-status:
@ -3200,9 +3202,12 @@ Related Options
.. help: show format of job JSON
Describe the format of the QPDFJob JSON input.
Describe the format of the QPDFJob JSON input used by
--job-json-file.
Describe the format of the QPDFJob JSON input. QXXXQ doc ref.
Describe the format of the QPDFJob JSON input used by
:qpdf:ref:`--job-json-file`. For more information about QPDFJob,
see :ref:`qpdf-job`.
.. _test-options:

View File

@ -28,6 +28,7 @@ documentation, please visit `https://qpdf.readthedocs.io
weak-crypto
json
design
qpdf-job
linearization
object-streams
encryption

248
manual/qpdf-job.rst Normal file
View File

@ -0,0 +1,248 @@
.. _qpdf-job:
QPDFJob: a Job-Based Interface
==============================
All of the functionality from the :command:`qpdf` command-line
executable is available from inside the C++ library using the
``QPDFJob`` class. There are several ways to access this functionality:
- Command-line options
- Run the :command:`qpdf` command line
- Use from the C++ API with ``QPDFJob::initializeFromArgv``
- Use from the C API with QXXXQ
- The job JSON file format
- Use from the CLI with the :qpdf:ref:`--job-json-file` parameter
- Use from the C++ API with ``QPDFJob::initializeFromJson``
- Use from the C API with QXXXQ
- The ``QPDFJob`` C++ API
If you can understand how to use the :command:`qpdf` CLI, you can
understand the ``QPDFJob`` class and the json file. qpdf guarantees
that all of the above methods are in sync. Here's how it works:
.. list-table:: QPDFJob Interfaces
:widths: 30 30 30
:header-rows: 1
- - CLI
- JSON
- C++
- - ``--some-option``
- ``"someOption": ""``
- ``config()->someOption()``
- - ``--some-option=value``
- ``"someOption": "value"``
- ``config()->someOption("value")``
- - positional argument
- ``"otherOption": "value"``
- ``config()->otherOption("value")``
In the JSON file, the JSON structure is an object (dictionary) whose
keys are command-line flags converted to camelCase. Positional
arguments have some corresponding key, which you can find by running
``qpdf`` with the :qpdf:ref:`--job-json-help` flag. For example, input
and output files are named by positional arguments on the CLI. In the
JSON, they are ``"inputFile"`` and ``"outputFile"``. The following are
equivalent:
.. It would be nice to have an automated test that these are all the
same, but we have so few live examples that it's not worth it for
now.
CLI:
::
qpdf infile.pdf outfile.pdf \
--pages . other.pdf --password=x 1-5 -- \
--encrypt user owner 256 --print=low -- \
--object-streams=generate
Job JSON:
.. code-block:: json
{
"inputFile": "infile.pdf",
"outputFile": "outfile.pdf",
"pages": [
{
"file": "."
},
{
"file": "other.pdf",
"password": "x",
"range": "1-5"
}
],
"encrypt": {
"userPassword": "user",
"ownerPassword": "owner",
"256bit": {
"print": "low"
}
},
"objectStreams": "generate"
}
C++ code:
.. code-block:: c++
#include <qpdf/QPDFJob.hh>
#include <qpdf/QPDFUsage.hh>
#include <iostream>
int main(int argc, char* argv[])
{
try
{
QPDFJob j;
j.config()
->inputFile("infile.pdf")
->outputFile("outfile.pdf")
->pages()
->pageSpec(".", "1-z")
->pageSpec("other.pdf", "1-5", "x")
->endPages()
->encrypt(256, "user", "owner")
->print("low")
->endEncrypt()
->objectStreams("generate")
->checkConfiguration();
j.run();
}
catch (QPDFUsage& e)
{
std::cerr << "configuration error: " << e.what() << std::endl;
return 2;
}
catch (std::exception& e)
{
std::cerr << "other error: " << e.what() << std::endl;
return 2;
}
return 0;
}
It is also possible to mix and match command-line options and json
from the CLI. For example, you could create a file called
:file:`my-options.json` containing the following:
.. code-block:: json
{
"encrypt": {
"userPassword": "",
"ownerPassword": "owner",
"256bit": {
}
},
"objectStreams": "generate"
}
and use it with other options to create 256-bit encrypted (but
unrestricted) files with object streams while specifying other
parameters on the command line, such as
::
qpdf infile.pdf outfile.pdf --job-json-file=my-options.json
.. _qpdfjob-design:
See also :file:`examples/qpdf-job.cc` in the source distribution as
well as comments in ``QPDFJob.hh``.
QPDFJob Design
--------------
This section describes some of the design rationale and history behind
``QPDFJob``.
Documentation of ``QPDFJob`` is divided among three places:
- "HOW TO ADD A COMMAND-LINE ARGUMENT" in :file:`README-maintainer`
provides a quick reminder for how to add a command-line argument
- The source file :file:`generate_auto_job` has a detailed explanation
about how ``QPDFJob`` and ``generate_auto_job`` work together
- This chapter of the manual has other details.
Prior to qpdf version 10.6.0, the qpdf CLI executable had a lot of
functionality built into the executable that was not callable from the
library as such. This created a number of problems:
- Some of the logic in :file:`qpdf.cc` was pretty complex, such as
image optimization, generating json output, and many of the page
manipulations. While those things could all be coded using the C++
API, there would be a lot of duplicated code.
- Page splitting and merging will get more complicated over time as
qpdf supports a wider range of document-level options. It would be
nice to be able to expose this to library users instead of baking it
all into the CLI.
- Users of other languages who just wanted an interface to do things
that the CLI could do didn't have a good way to do it, such as just
handling a library call a set of command-line options or an
equivalent JSON object that could be passed in as a string.
- The qpdf CLI itself was almost 8,000 lines of code. It needed to be
refactored, cleaned up, and split.
- Exposing a new feature via the command-line required making lots of
small edits to lots of small bits of code, and it was easy to forget
something. Adding a code generator, while complex in some ways,
greatly reduces the chances of error when extending qpdf.
Here are a few notes on some design decisions about QPDFJob and its
various interfaces.
- Bare command-line options (flags with no parameter) map to config
functions that take no options and to json keys whose values are
required to be the empty string. The rationale is that we can later
change these bare options to options that take an optional parameter
without breaking backward compatibility in the CLI or the JSON.
Options that take optional parameters generate two config functions:
one has no arguments, and one that has a ``char const*`` argument.
This means that adding an optional parameter to a previously bare
option also doesn't break binary compatibility.
- Adding a new argument to :file:`job.yml` automatically triggers
almost everything by declaring and referencing things that you have
to implement. This way, once you get the code to compile and link,
you know you haven't forgotten anything. There are two tricky cases:
- If an argument handler has to do something special, like call a
nested config method or select an option table, you have to
implement it manually. This is discussed in
:file:`generate_auto_job`.
- When you add an option that has optional parameters or choices,
both of the handlers described above are declared, but only the
one that takes an argument is referenced. You have to remember to
implement the one that doesn't take an argument or else people
will get a linker error if they try to call it. The assumption is
that things with optional parameters started out as bare, so the
argument-less version is already there.
- If you have to add a new option that requires its own option table,
you will have to do some extra work including adding a new nested
Config class, adding a config member variable to ``ArgParser`` in
:file:`QPDFJob_argv.cc` and ``Handlers`` in :file:`QPDFJob_json.cc`,
and make sure that manually implemented handlers are consistent with
each other. It is best under the cases to explicit test cases for
all the various ways to get to the option.

View File

@ -2303,9 +2303,9 @@ For a detailed list of changes, please see the file
been added to the :command:`qpdf` command-line
tool. See :ref:`page-selection`.
- Options have been added to the :command:`qpdf`
command-line tool for copying encryption parameters from another
file. (QXXXQ Link)
- The :qpdf:ref:`--copy-encryption` option have been added to the
:command:`qpdf` command-line tool for copying encryption
parameters from another file.
- New methods have been added to the ``QPDF`` object for adding and
removing pages. See :ref:`adding-and-remove-pages`.