From cc5485dac1f224f856ce48781278b357f61f74bd Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Tue, 1 Feb 2022 07:18:23 -0500 Subject: [PATCH] QPDFJob: documentation --- README-maintainer | 30 +++- cSpell.json | 2 + examples/build.mk | 4 +- examples/{pdf-job.cc => qpdf-job.cc} | 0 generate_auto_job | 260 +++++++++++++++++++++++++-- include/qpdf/QPDFJob.hh | 27 ++- job.sums | 8 +- job.yml | 7 + libqpdf/QPDFJob.cc | 4 +- libqpdf/QPDFJob_config.cc | 1 - libqpdf/QPDFJob_json.cc | 22 +++ libqpdf/qpdf/auto_job_help.hh | 3 +- manual/cli.rst | 15 +- manual/index.rst | 1 + manual/qpdf-job.rst | 248 +++++++++++++++++++++++++ manual/release-notes.rst | 6 +- 16 files changed, 589 insertions(+), 49 deletions(-) rename examples/{pdf-job.cc => qpdf-job.cc} (100%) create mode 100644 manual/qpdf-job.rst diff --git a/README-maintainer b/README-maintainer index 7ea049dc..49dc643f 100644 --- a/README-maintainer +++ b/README-maintainer @@ -124,14 +124,32 @@ CODING RULES HOW TO ADD A COMMAND-LINE ARGUMENT +QPDFJob is documented in three places: + +* This section provides a quick reminder for how to add a command-line + argument + +* generate_auto_job has a detailed explanation about how QPDFJob and + generate_auto_job work together + +* The manual ("QPDFJob Design" in qpdf-job.rst) discusses the design + approach, rationale, and evolution of QPDFJob. + Command-line arguments are closely coupled with QPDFJob. To add a new command-line argument, add the option to the appropriate table in job.yml. This will automatically declare a method in the private ArgParser class in QPDFJob_argv.cc which you have to implement. The -implementation should make calls to methods in QPDFJob. Then, add the -same option to either the no-json section of job.yml if it is to be -excluded from the job json structure, or add it under the json -structure to the place where it should appear in the json structure. +implementation should make calls to methods in QPDFJob via its Config +classes. Then, add the same option to either the no-json section of +job.yml if it is to be excluded from the job json structure, or add it +under the json structure to the place where it should appear in the +json structure. + +In most cases, adding a new option will automatically declare and call +the appropriate Config method, which you then have to implement. If +you need a manual handler, you have to declare the option as manual in +job.yml and implement the handler yourself, though the automatically +generated code will declare it for you. The build will fail until the new option is documented in manual/cli.rst. To do that, create documentation for the option by @@ -148,6 +166,10 @@ When done, the following should happen: * qpdf --help=topic should list --new-option for the correct topic * --new-option should appear in the manual * --new-option should be in the command-line option index in the manual +* A Config method (in Config or one of the other Config classes in + QPDFJob) should exist that corresponds to the command-line flag +* The job JSON file should have a new key in the schema corresponding + to the new option RELEASE PREPARATION diff --git a/cSpell.json b/cSpell.json index aacb3051..688c9f1d 100644 --- a/cSpell.json +++ b/cSpell.json @@ -100,6 +100,7 @@ "encodable", "encp", "endianness", + "endl", "endobj", "endstream", "enspliel", @@ -128,6 +129,7 @@ "fuzzer", "fuzzers", "fvisibility", + "iostream", "gajic", "gajić", "gcurl", diff --git a/examples/build.mk b/examples/build.mk index 5472fba5..b4366c1a 100644 --- a/examples/build.mk +++ b/examples/build.mk @@ -8,13 +8,13 @@ BINS_examples = \ pdf-filter-tokens \ pdf-invert-images \ pdf-mod-info \ - pdf-job \ pdf-name-number-tree \ pdf-npages \ pdf-overlay-page \ pdf-parse-content \ pdf-set-form-values \ - pdf-split-pages + pdf-split-pages \ + qpdf-job CBINS_examples = \ pdf-c-objects \ pdf-linearize diff --git a/examples/pdf-job.cc b/examples/qpdf-job.cc similarity index 100% rename from examples/pdf-job.cc rename to examples/qpdf-job.cc diff --git a/generate_auto_job b/generate_auto_job index 5e1e7e8a..e56c0e60 100755 --- a/generate_auto_job +++ b/generate_auto_job @@ -9,6 +9,121 @@ import json import filecmp from contextlib import contextmanager +# The purpose of this code is to automatically generate various parts +# of the QPDFJob class. It is fairly complicated and extremely +# bespoke, so understanding it is important if modifications are to be +# made. + +# Documentation of QPDFJob is divided among three places: +# +# * "HOW TO ADD A COMMAND-LINE ARGUMENT" in README-maintainer provides +# a quick reminder for how to add a command-line argument +# +# * This file has a detailed explanation about how QPDFJob and +# generate_auto_job work together +# +# * The manual ("QPDFJob Design" in qpdf-job.rst) discusses the design +# approach, rationale, and evolution of QPDFJob. +# +# QPDFJob solved the problem of moving extensive functionality that +# lived in qpdf.cc into the library. The QPDFJob class consists of +# four major sections: +# +# * The run() method and its subsidiaries are responsible for +# performing the actual operations on PDF files. This is implemented +# in QPDFJob.cc +# +# * The nested Config class and the other classes it creates provide +# an API for setting up a QPDFJob instance and correspond to the +# command-line arguments of the qpdf executable. This is implemented +# in QPDFJob_config.cc +# +# * The argument parsing code reads an argv array and calls +# configuration methods. This is implemented in QPDFJob_argv.cc. The +# argument parsing logic itself is implemented in the QPDFArgParser +# class. +# +# * The job JSON handling code, which reads a QPDFJob JSON file and +# calls configuration methods. This is implemented in +# QPDFJob_json.cc. The JSON parsing code is in the JSON class. A +# sax-like JSON handler class that calls callbacks in response to +# items in the JSON is implemented in the JSONHandler class. +# +# This code has the job of ensuring that configuration, command-line +# arguments, and JSON are all consistent and complete so that a +# developer or user can freely move among those different ways of +# interacting with QPDFJob in a predictable fashion. In addition, help +# information for each option appears in manual/cli.rst, and that +# information is used in creation of the job JSON schema and to supply +# help text to QPDFArgParser. This code also ensures that there is an +# exact match between options in job.yml and options in cli.rst. +# +# The job.yml file contains the data that drives this code. To +# understand job.yml, here are some important concepts. +# +# QPDFArgParser option table. There is support for positional +# arguments, options consisting of flags and optional parameters, and +# subparsers that start with a regular parameterless flag, have their +# own positional and option sections, and are terminated with -- by +# itself. Examples of this include --encrypt and --pages. An "option +# table" contains an optional positional argument handler and a list +# of valid options with specifications about their parameters. There +# are three kinds of option tables: +# +# * The built-in "help" option table contains help commands, like +# --help and --version, that are only valid when they appear as the +# single command-line argument. +# +# * The "main" option table contains the options that are valid +# starting at the beginning of argument parsing. +# +# * A named option table can be started manually by the argument +# parsing code to switch the argument parser's context. Switching +# the parser to a new option table is manual (via a call to +# selectOptionTable). Context reverts to the main option table +# automatically when -- is encountered. +# +# In QPDFJob.hh, there is a Config class for each option table except +# help. +# +# Option type: bare, required/optional parameter, required/optional +# choices. A bare argument is just a flag, like --qdf. A parameter +# option takes an arbitrary parameter, like --password. A choices +# option takes one of a fixed list of choices, like --object-streams. +# If a parameter or choices option's parameter is option, the empty +# string may be specified as an option, such as --collate (or +# --collate=). For a bare option, --option= is always the same as just +# --option. This makes it possible to switch an option from bare to +# optional choice to optional parameter all without breaking +# compatibility. +# +# JSON "schema". This is a qpdf-specific "schema" for JSON. It is not +# related to any kind of standard JSON schema. It is described in +# JSON.hh and in the manual. QPDFJob uses the JSON "schema" in a mode +# in which keys in the schema are all optional in the JSON object. +# +# Here is the mapping between configuration, argv, and JSON. +# +# The help options table is implemented solely for argv processing and +# has no counterpart in configuration or JSON. +# +# The config() method returns a shared pointer to a Config object. +# Every command-line option in the main option table has a +# corresponding method in Config whose name is the option converted to +# camel case. For bare options and options with optional parameters, a +# version exists that takes no arguments. For others, a version exists +# that takes a char const*. For example, the --qdf flag implies a +# qdf() method in Config, and the --object-streams flag implies an +# objectStreams(char const*) method in Config. For flags in option +# tables, the method is declared inside a config class specific to the +# option table. The mapping between option tables and config classes +# is explicit in job.yml. Positional arguments are handled +# individually and manually -- see QPDFJob.hh in the CONFIGURATION +# section for details. See examples/qpdf-job.cc for an example. +# +# To understand the rest, start at main and follow comments in the +# code. + whoami = os.path.basename(sys.argv[0]) BANNER = f'''// // This file is automatically generated by {whoami}. @@ -33,12 +148,18 @@ def write_file(filename): class Main: + # SOURCES is a list of source files whose contents are used by + # this program. If they change, we are out of date. SOURCES = [ whoami, 'manual/_ext/qpdf.py', 'job.yml', 'manual/cli.rst', ] + # DESTS is a map to the output files this code generates. These + # generated files, as well as those added to DESTS later in the + # code, are included in various places by QPDFJob.hh or any of the + # implementing QPDFJob*.cc files. DESTS = { 'decl': 'libqpdf/qpdf/auto_job_decl.hh', 'init': 'libqpdf/qpdf/auto_job_init.hh', @@ -48,6 +169,11 @@ class Main: 'json_init': 'libqpdf/qpdf/auto_job_json_init.hh', # Others are added in top } + # SUBS contains a checksum for each source and destination and is + # used to detect whether we're up to date without having to force + # recompilation all the time. This way the build can invoke this + # script unconditionally without causing stuff to rebuild every + # time. SUMS = 'job.sums' def main(self, args=sys.argv[1:], prog=whoami): @@ -71,8 +197,17 @@ class Main: def top(self, options): with open('job.yml', 'r') as f: data = yaml.safe_load(f.read()) + # config_decls maps a config key from an option in "options" + # (from job.yml) to a list of declarations. A declaration is + # generated for each config method for that option table. self.config_decls = {} + # Keep track of which configs we've declared since we can have + # option tables share a config class, as with the encryption + # tables. self.declared_configs = set() + + # Update DESTS -- see above. This ensures that each config + # class's contents are included in job.sums. for o in data['options']: config = o.get('config', None) if config is not None: @@ -257,12 +392,21 @@ class Main: def generate(self, data): warn(f'{whoami}: regenerating auto job files') self.validate(data) - # Add the built-in help options to tables that we populate as - # we read job.yml since we won't encounter these in job.yml + + # Keep track of which options are help options since they are + # handled specially. Add the built-in help options to tables + # that we populate as we read job.yml since we won't encounter + # these in job.yml self.help_options = set( ['--completion-bash', '--completion-zsh', '--help'] ) + # Keep track of which options we have encountered but haven't + # seen help text for. This enables us to report if any option + # is missing help. self.options_without_help = set(self.help_options) + + # Compute the information needed for generated files and write + # the files. self.prepare(data) with write_file(self.DESTS['decl']) as f: print(BANNER, file=f) @@ -276,6 +420,11 @@ class Main: with open('manual/cli.rst', 'r') as df: print(BANNER, file=f) self.generate_doc(df, f) + + # Compute the json files after the config and arg parsing + # files. We need to have full information about all the + # options before we can generate the schema. Generating the + # schema also generates the json header files. self.generate_schema(data) with write_file(self.DESTS['schema']) as f: print('static constexpr char const* JOB_SCHEMA_DATA = R"(' + @@ -301,6 +450,9 @@ class Main: # DON'T ADD CODE TO generate AFTER update_hashes def handle_trivial(self, i, identifier, cfg, prefix, kind, v): + # A "trivial" option is one whose handler does nothing other + # than to call the config method with the same name (switched + # to camelCase). decl_arg = 1 decl_arg_optional = False if kind == 'bare': @@ -341,11 +493,18 @@ class Main: # strategy enables us to change an option from bare to # optional_parameter or optional_choices without # breaking binary compatibility. The overloaded - # methods both have to be implemented manually. + # methods both have to be implemented manually. They + # are not automatically called, so if you forget, + # someone will get a link error if they try to call + # one. self.config_decls[cfg].append( f'QPDF_DLL {config_prefix}* {identifier}();') def handle_flag(self, i, identifier, kind, v): + # For flags that require manual handlers, declare the handler + # and register it. They have to be implemented manually in + # QPDFJob_argv.cc. You get compiler/linker errors for any + # missing methods. if kind == 'bare': self.decls.append(f'void {identifier}();') self.init.append(f'this->ap.addBare("{i}", ' @@ -371,14 +530,17 @@ class Main: f', false, {v}_choices);') def prepare(self, data): - self.decls = [] - self.init = [] - self.json_decls = [] - self.json_init = [] - self.jdata = {} - self.by_table = {} + self.decls = [] # argv handler declarations + self.init = [] # initialize arg parsing code + self.json_decls = [] # json handler declarations + self.json_init = [] # initialize json handlers + self.jdata = {} # running data used for json generate + self.by_table = {} # table information by name for easy lookup def add_jdata(flag, table, details): + # Keep track of each flag and where it appears so we can + # check consistency between the json information and the + # options section. nonlocal self if table == 'help': self.help_options.add(f'--{flag}') @@ -389,6 +551,7 @@ class Main: 'tables': {table: details}, } + # helper functions self.init.append('auto b = [this](void (ArgParser::*f)()) {') self.init.append(' return QPDFArgParser::bindBare(f, this);') self.init.append('};') @@ -396,6 +559,8 @@ class Main: self.init.append(' return QPDFArgParser::bindParam(f, this);') self.init.append('};') self.init.append('') + + # static variables for each set of choices for choices options for k, v in data['choices'].items(): s = f'static char const* {k}_choices[] = {{' for i in v: @@ -406,6 +571,8 @@ class Main: self.init.append('') self.json_init.append('') + # constants for the table names to reduce hard-coding strings + # in the handlers for o in data['options']: table = o['table'] if table in ('main', 'help'): @@ -413,6 +580,20 @@ class Main: i = self.to_identifier(table, 'O', True) self.decls.append(f'static constexpr char const* {i} = "{table}";') self.decls.append('') + + # Walk through all the options adding declarations for the + # option handlers and initialization code to register the + # handlers in QPDFArgParser. For "trivial" cases, + # QPDFArgParser will call the corresponding config method + # automatically. Otherwise, it will declare a handler that you + # have to explicitly implement. + + # If you add a new option table, you have to set config to the + # name of a member variable that you declare in the ArgParser + # class in QPDFJob_argv.cc. Then there should be an option in + # the main table, also listed as manual in job.yml, that + # switches to it. See implementations of any of the existing + # options that do this for examples. for o in data['options']: table = o['table'] config = o.get('config', None) @@ -437,8 +618,8 @@ class Main: self.decls.append(f'void {arg_prefix}Positional(char*);') self.init.append('this->ap.addPositional(' f'p(&ArgParser::{arg_prefix}Positional));') - flags = {} + flags = {} for i in o.get('bare', []): flags[i] = ['bare', None] for i, v in o.get('required_parameter', {}).items(): @@ -462,6 +643,11 @@ class Main: self.handle_trivial( i, identifier, config, config_prefix, kind, v) + # Subsidiary options tables need end methods to do any + # final checking within the option table. Final checking + # for the main option table is handled by + # checkConfiguration, which is called explicitly in the + # QPDFJob code. if table not in ('main', 'help'): identifier = self.to_identifier(table, 'argEnd', False) self.decls.append(f'void {identifier}();') @@ -510,6 +696,19 @@ class Main: return self.option_to_json_key(schema_key) def build_schema(self, j, path, flag, expected, options_seen): + # j: the part of data from "json" in job.yml as we traverse it + # path: a string representation of the path in the json + # flag: the command-line flag + # expected: a map of command-line options we expect to eventually see + # options_seen: which options we have seen so far + + # As described in job.yml, the json can have keys that don't + # map to options. This includes keys whose values are + # dictionaries as well as keys that correspond to positional + # arguments. These start with _ and get their help from + # job.yml. Things that correspond to options get their help + # from the help text we gathered from cli.rst. + if flag in expected: options_seen.add(flag) elif isinstance(j, str): @@ -519,6 +718,19 @@ class Main: elif not (flag == '' or flag.startswith('_')): raise Exception(f'json: unknown key {flag}') + # The logic here is subtle and makes sense if you understand + # how our JSON schemas work. They are described in JSON.hh, + # but basically, if you see a dictionary, the schema should + # have a dictionary with the same keys whose values are + # descriptive. If you see an array, the array should have + # single member that describes each element of the array. See + # JSON.hh for details. + + # See comments in QPDFJob_json.cc in the Handlers class + # declaration to understand how and why the methods called + # here work. The idea is that Handlers keeps a stack of + # JSONHandler shared pointers so that we can register our + # handlers in the right place as we go. if isinstance(j, dict): schema_value = {} if flag: @@ -579,14 +791,20 @@ class Main: def generate_schema(self, data): # Check to make sure that every command-line option is - # represented in data['json']. - - # Build a list of options that we expect. If an option appears - # once, we just expect to see it once. If it appears in more - # than one options table, we need to see a separate version of - # it for each option table. It is represented in job.yml - # prepended with the table prefix. The table prefix is removed - # in the schema. + # represented in data['json']. Build a list of options that we + # expect. If an option appears once, we just expect to see it + # once. If it appears in more than one options table, we need + # to see a separate version of it for each option table. It is + # represented in job.yml prepended with the table prefix. The + # table prefix is removed in the schema. Example: "password" + # appears multiple times, so the json section of job.yml has + # main.password, uo.password, etc. But most options appear + # only once, so we can just list them as they are. There is a + # nearly exact match between option tables and dictionary in + # the job json schema, but it's not perfect because of how + # positional arguments are handled, so we have to do this + # extra work. Information about which tables a particular + # option appeared in is gathered up in prepare(). expected = {} for k, v in self.jdata.items(): tables = v['tables'] @@ -600,7 +818,11 @@ class Main: # Walk through the json information building the schema as we # go. This verifies consistency between command-line options # and the json section of the data and builds up a schema by - # populating with help information as available. + # populating with help information as available. In addition + # to generating the schema, we declare and register json + # handlers that correspond with it. That way, we can first + # check a job JSON file against the schema, and if it matches, + # we have fewer error opportunities while calling handlers. self.schema = self.build_schema( data['json'], '', '', expected, options_seen) if options_seen != set(expected.keys()): diff --git a/include/qpdf/QPDFJob.hh b/include/qpdf/QPDFJob.hh index 5a8c88cc..64075bc1 100644 --- a/include/qpdf/QPDFJob.hh +++ b/include/qpdf/QPDFJob.hh @@ -62,10 +62,10 @@ class QPDFJob // the regular API. This is exposed in the C API, which makes it // easier to get certain high-level qpdf functionality from other // languages. If there are any command-line errors, this method - // will throw QPDFArgParser::Usage which is derived from - // std::runtime_error. Other exceptions may be thrown in some - // cases. Note that argc, and argv should be UTF-8 encoded. If you - // are calling this from a Windows Unicode-aware main (wmain), see + // will throw QPDFUsage which is derived from std::runtime_error. + // Other exceptions may be thrown in some cases. Note that argc, + // and argv should be UTF-8 encoded. If you are calling this from + // a Windows Unicode-aware main (wmain), see // QUtil::call_main_from_wmain for information about converting // arguments to UTF-8. This method will mutate arguments that are // passed to it. @@ -76,7 +76,7 @@ class QPDFJob // Initialize a QPDFJob from json. Passing partial = true prevents // this method from doing the final checks (calling // checkConfiguration) after processing the json file. This makes - // it possible to initialze QPDFJob in stages using multiple json + // it possible to initialize QPDFJob in stages using multiple json // files or to have a json file that can be processed from the CLI // with --job-json-file and be combined with other arguments. For // example, you might include only encryption parameters, leaving @@ -84,7 +84,11 @@ class QPDFJob // input and output files. initializeFromJson is called with // partial = true when invoked from the command line. To make sure // that the json file is fully valid on its own, just don't - // specify any other command-line flags. + // specify any other command-line flags. If there are any + // configuration errors, QPDFUsage is thrown. Some error messages + // may be CLI-centric. If an an exception tells you to use the + // "--some-option" option, set the "someOption" key in the JSON + // object instead. QPDF_DLL void initializeFromJson(std::string const& json, bool partial = false); @@ -160,7 +164,7 @@ class QPDFJob // object. The Config object contains methods that correspond with // qpdf command-line arguments. You can use a fluent interface to // configure a QPDFJob object that would do exactly the same thing - // as a specific qpdf command. The example pdf-job.cc contains an + // as a specific qpdf command. The example qpdf-job.cc contains an // example of this usage. You can also use initializeFromJson or // initializeFromArgv to initialize a QPDFJob object. @@ -180,6 +184,10 @@ class QPDFJob // with references. Returning pointers instead of references // makes for a more uniform interface. + // Maintainer documentation: see the section in README-maintainer + // called "HOW TO ADD A COMMAND-LINE ARGUMENT", which contains + // references to additional places in the documentation. + class Config; class AttConfig @@ -330,7 +338,10 @@ class QPDFJob // Return a top-level configuration item. See CONFIGURATION above // for details. If an invalid configuration is created (such as // supplying contradictory options, omitting an input file, etc.), - // QPDFUsage is thrown. + // QPDFUsage is thrown. Note that error messages are CLI-centric, + // but you can map them into config calls. For example, if an + // exception tells you to use the --some-option flag, you should + // call config()->someOption() instead. QPDF_DLL std::shared_ptr config(); diff --git a/job.sums b/job.sums index 0c574cc1..d434c642 100644 --- a/job.sums +++ b/job.sums @@ -1,17 +1,17 @@ # Generated by generate_auto_job -generate_auto_job 1fdb113412a444aad67b0232f3f6c4f50d9e2a5701691e5146fd1b559039ef2e +generate_auto_job 5d6ec1e4f0b94d8f73df665061d8a2188cbbe8f25ea42be78ec576547261d5ac include/qpdf/auto_job_c_att.hh 7ad43bb374c1370ef32ebdcdcb7b73a61d281f7f4e3f12755585872ab30fb60e include/qpdf/auto_job_c_copy_att.hh 32275d03cdc69b703dd7e02ba0bbe15756e714e9ad185484773a6178dc09e1ee include/qpdf/auto_job_c_enc.hh 72e138c7b96ed5aacdce78c1dec04b1c20d361faec4f8faf52f64c1d6be99265 include/qpdf/auto_job_c_main.hh 69d5ea26098bcb6ec5b5e37ba0bca9e7d16a784d2618e0c05d635046848d5123 include/qpdf/auto_job_c_pages.hh 931840b329a36ca0e41401190e04537b47f2867671a6643bfd8da74014202671 include/qpdf/auto_job_c_uo.hh 0585b7de459fa479d9e51a45fa92de0ff6dee748efc9ec1cedd0dde6cee1ad50 -job.yml effc93a805fb74503be2213ad885238db21991ba3d084fbfeff01183c66cb002 +job.yml 9544c6e046b25d3274731fbcd07ba25b300fd67055021ac4364ad8a91f77c6b6 libqpdf/qpdf/auto_job_decl.hh 9f79396ec459f191be4c5fe34cf88c265cf47355a1a945fa39169d1c94cf04f6 -libqpdf/qpdf/auto_job_help.hh 6002f503368f319a3d717484ac39d1558f34e67989d442f394791f6f6f5f0500 +libqpdf/qpdf/auto_job_help.hh 43184f01816b5210bbc981de8de48446546fb94f4fd6e63cfc7f2fbac3578e6b libqpdf/qpdf/auto_job_init.hh fd13b9f730e6275a39a15d193bd9af19cf37f4495699ec1886c2b208d7811ab1 libqpdf/qpdf/auto_job_json_decl.hh c5e3fd38a3b0c569eb0c6b4c60953a09cd6bc7d3361a357a81f64fe36af2b0cf libqpdf/qpdf/auto_job_json_init.hh 3f86ce40931ca8f417d050fcd49104d73c1fa4e977ad19d54b372831a8ea17ed libqpdf/qpdf/auto_job_schema.hh 18a3780671d95224cb9a27dcac627c421cae509d59f33a63e6bda0ab53cce923 manual/_ext/qpdf.py e9ac9d6c70642a3d29281ee5ad92ae2422dee8be9306fb8a0bc9dba0ed5e28f3 -manual/cli.rst 35289dbf593085016a62249f760cdcad50d5cce76d799ea4acf5dff58b78679a +manual/cli.rst 3746df6c4f115387cca0d921f25619a6b8407fc10b0e4c9dcf40b0b1656c6f8a diff --git a/job.yml b/job.yml index eb6a6b01..eb5b7753 100644 --- a/job.yml +++ b/job.yml @@ -1,4 +1,11 @@ # See "HOW TO ADD A COMMAND-LINE ARGUMENT" in README-maintainer. + +# REMEMBER: if you add an optional_choices or optional_parameter, you +# have to explicitly remember to implement the overloaded config +# method that takes no arguments. Since no generated code will call it +# automatically, there is no automated reminder to do this. If you +# forget, it will be a link error if someone tries to call it. + choices: yn: - "y" diff --git a/libqpdf/QPDFJob.cc b/libqpdf/QPDFJob.cc index 1c6a16d6..a06f87bc 100644 --- a/libqpdf/QPDFJob.cc +++ b/libqpdf/QPDFJob.cc @@ -646,7 +646,6 @@ QPDFJob::createsOutput() const void QPDFJob::checkConfiguration() { - // QXXXQ messages are CLI-centric if (m->replace_input) { if (m->outfilename) @@ -722,7 +721,8 @@ QPDFJob::checkConfiguration() { QTC::TC("qpdf", "qpdf same file error"); usage("input file and output file are the same;" - " use --replace-input to intentionally overwrite the input file"); + " use --replace-input to intentionally" + " overwrite the input file"); } } diff --git a/libqpdf/QPDFJob_config.cc b/libqpdf/QPDFJob_config.cc index fb61924c..68eaf5c8 100644 --- a/libqpdf/QPDFJob_config.cc +++ b/libqpdf/QPDFJob_config.cc @@ -28,7 +28,6 @@ QPDFJob::Config::emptyInput() { if (o.m->infilename == 0) { - // QXXXQ decide whether to fix this or just leave the comment: // Various places in QPDFJob.cc know that the empty string for // infile means empty. This means that passing "" as the // argument to inputFile, or equivalently using "" as a diff --git a/libqpdf/QPDFJob_json.cc b/libqpdf/QPDFJob_json.cc index cc4e2ff7..c0de8666 100644 --- a/libqpdf/QPDFJob_json.cc +++ b/libqpdf/QPDFJob_json.cc @@ -29,6 +29,28 @@ namespace typedef std::function param_handler_t; typedef std::function json_handler_t; + // The code that calls these methods is automatically + // generated by generate_auto_job. This describes how we + // implement what it does. We keep a stack of handlers in + // json_handlers. The top of the stack is the "current" json + // handler, intially for the top-level object. Whenever we + // encounter a scalar, we add a handler using addBare, + // addParameter, or addChoices. Whenever we encounter a + // dictionary, we first add the dictionary handlers. Then we + // walk into the dictionary and, for each key, we register a + // dict key handler and push it to the stack, then do the same + // process for the key's value. Then we pop the key handler + // off the stack. When we encounter an array, we add the array + // handlers, push an item handler to the stack, call + // recursively for the array's single item (as this is what is + // expected in a schema), and pop the item handler. Note that + // we don't pop dictionary start/end handlers. The dictionary + // handlers and the key handlers are at the same level in + // JSONHandler. This logic is subtle and took several tries to + // get right. It's best understood by carefully understanding + // the behavior of JSONHandler, the JSON schema, and the code + // in generate_auto_job. + void addBare(bare_handler_t); void addParameter(param_handler_t); void addChoices(char const** choices, bool required, param_handler_t); diff --git a/libqpdf/qpdf/auto_job_help.hh b/libqpdf/qpdf/auto_job_help.hh index 49ac3494..38d275b5 100644 --- a/libqpdf/qpdf/auto_job_help.hh +++ b/libqpdf/qpdf/auto_job_help.hh @@ -812,7 +812,8 @@ This option is repeatable. If given, only specified objects will be shown in the "objects" key of the JSON output. Otherwise, all objects will be shown. )"); -ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input. +ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by +--job-json-file. )"); ap.addHelpTopic("testing", "options for testing or debugging", R"(The options below are useful when writing automated test code that includes files created by qpdf or when testing qpdf itself. diff --git a/manual/cli.rst b/manual/cli.rst index 7dd955c4..614be80d 100644 --- a/manual/cli.rst +++ b/manual/cli.rst @@ -167,9 +167,11 @@ Related Options description of the JSON input file format. Specify the name of a file whose contents are expected to contain a - QPDFJob JSON file. QXXXQ ref. This file is read and treated as if - the equivalent command-line arguments were supplied. It can be - mixed freely with other options. + QPDFJob JSON file. This file is read and treated as if the + equivalent command-line arguments were supplied. It can be repeated + and mixed freely with other options. Run ``qpdf`` with + :qpdf:ref:`--job-json-help` for a description of the job JSON input + file format. For more information, see :ref:`qpdf-job`. .. _exit-status: @@ -3200,9 +3202,12 @@ Related Options .. help: show format of job JSON - Describe the format of the QPDFJob JSON input. + Describe the format of the QPDFJob JSON input used by + --job-json-file. - Describe the format of the QPDFJob JSON input. QXXXQ doc ref. + Describe the format of the QPDFJob JSON input used by + :qpdf:ref:`--job-json-file`. For more information about QPDFJob, + see :ref:`qpdf-job`. .. _test-options: diff --git a/manual/index.rst b/manual/index.rst index 7f8b1483..5aa59346 100644 --- a/manual/index.rst +++ b/manual/index.rst @@ -28,6 +28,7 @@ documentation, please visit `https://qpdf.readthedocs.io weak-crypto json design + qpdf-job linearization object-streams encryption diff --git a/manual/qpdf-job.rst b/manual/qpdf-job.rst new file mode 100644 index 00000000..72e02305 --- /dev/null +++ b/manual/qpdf-job.rst @@ -0,0 +1,248 @@ + +.. _qpdf-job: + +QPDFJob: a Job-Based Interface +============================== + +All of the functionality from the :command:`qpdf` command-line +executable is available from inside the C++ library using the +``QPDFJob`` class. There are several ways to access this functionality: + +- Command-line options + + - Run the :command:`qpdf` command line + + - Use from the C++ API with ``QPDFJob::initializeFromArgv`` + + - Use from the C API with QXXXQ + +- The job JSON file format + + - Use from the CLI with the :qpdf:ref:`--job-json-file` parameter + + - Use from the C++ API with ``QPDFJob::initializeFromJson`` + + - Use from the C API with QXXXQ + +- The ``QPDFJob`` C++ API + +If you can understand how to use the :command:`qpdf` CLI, you can +understand the ``QPDFJob`` class and the json file. qpdf guarantees +that all of the above methods are in sync. Here's how it works: + +.. list-table:: QPDFJob Interfaces + :widths: 30 30 30 + :header-rows: 1 + + - - CLI + - JSON + - C++ + + - - ``--some-option`` + - ``"someOption": ""`` + - ``config()->someOption()`` + + - - ``--some-option=value`` + - ``"someOption": "value"`` + - ``config()->someOption("value")`` + + - - positional argument + - ``"otherOption": "value"`` + - ``config()->otherOption("value")`` + +In the JSON file, the JSON structure is an object (dictionary) whose +keys are command-line flags converted to camelCase. Positional +arguments have some corresponding key, which you can find by running +``qpdf`` with the :qpdf:ref:`--job-json-help` flag. For example, input +and output files are named by positional arguments on the CLI. In the +JSON, they are ``"inputFile"`` and ``"outputFile"``. The following are +equivalent: + +.. It would be nice to have an automated test that these are all the + same, but we have so few live examples that it's not worth it for + now. + +CLI: + :: + + qpdf infile.pdf outfile.pdf \ + --pages . other.pdf --password=x 1-5 -- \ + --encrypt user owner 256 --print=low -- \ + --object-streams=generate + +Job JSON: + .. code-block:: json + + { + "inputFile": "infile.pdf", + "outputFile": "outfile.pdf", + "pages": [ + { + "file": "." + }, + { + "file": "other.pdf", + "password": "x", + "range": "1-5" + } + ], + "encrypt": { + "userPassword": "user", + "ownerPassword": "owner", + "256bit": { + "print": "low" + } + }, + "objectStreams": "generate" + } + +C++ code: + .. code-block:: c++ + + #include + #include + #include + + int main(int argc, char* argv[]) + { + try + { + QPDFJob j; + j.config() + ->inputFile("infile.pdf") + ->outputFile("outfile.pdf") + ->pages() + ->pageSpec(".", "1-z") + ->pageSpec("other.pdf", "1-5", "x") + ->endPages() + ->encrypt(256, "user", "owner") + ->print("low") + ->endEncrypt() + ->objectStreams("generate") + ->checkConfiguration(); + j.run(); + } + catch (QPDFUsage& e) + { + std::cerr << "configuration error: " << e.what() << std::endl; + return 2; + } + catch (std::exception& e) + { + std::cerr << "other error: " << e.what() << std::endl; + return 2; + } + return 0; + } + +It is also possible to mix and match command-line options and json +from the CLI. For example, you could create a file called +:file:`my-options.json` containing the following: + +.. code-block:: json + + { + "encrypt": { + "userPassword": "", + "ownerPassword": "owner", + "256bit": { + } + }, + "objectStreams": "generate" + } + +and use it with other options to create 256-bit encrypted (but +unrestricted) files with object streams while specifying other +parameters on the command line, such as + +:: + + qpdf infile.pdf outfile.pdf --job-json-file=my-options.json + +.. _qpdfjob-design: + +See also :file:`examples/qpdf-job.cc` in the source distribution as +well as comments in ``QPDFJob.hh``. + + +QPDFJob Design +-------------- + +This section describes some of the design rationale and history behind +``QPDFJob``. + +Documentation of ``QPDFJob`` is divided among three places: + +- "HOW TO ADD A COMMAND-LINE ARGUMENT" in :file:`README-maintainer` + provides a quick reminder for how to add a command-line argument + +- The source file :file:`generate_auto_job` has a detailed explanation + about how ``QPDFJob`` and ``generate_auto_job`` work together + +- This chapter of the manual has other details. + +Prior to qpdf version 10.6.0, the qpdf CLI executable had a lot of +functionality built into the executable that was not callable from the +library as such. This created a number of problems: + +- Some of the logic in :file:`qpdf.cc` was pretty complex, such as + image optimization, generating json output, and many of the page + manipulations. While those things could all be coded using the C++ + API, there would be a lot of duplicated code. + +- Page splitting and merging will get more complicated over time as + qpdf supports a wider range of document-level options. It would be + nice to be able to expose this to library users instead of baking it + all into the CLI. + +- Users of other languages who just wanted an interface to do things + that the CLI could do didn't have a good way to do it, such as just + handling a library call a set of command-line options or an + equivalent JSON object that could be passed in as a string. + +- The qpdf CLI itself was almost 8,000 lines of code. It needed to be + refactored, cleaned up, and split. + +- Exposing a new feature via the command-line required making lots of + small edits to lots of small bits of code, and it was easy to forget + something. Adding a code generator, while complex in some ways, + greatly reduces the chances of error when extending qpdf. + +Here are a few notes on some design decisions about QPDFJob and its +various interfaces. + +- Bare command-line options (flags with no parameter) map to config + functions that take no options and to json keys whose values are + required to be the empty string. The rationale is that we can later + change these bare options to options that take an optional parameter + without breaking backward compatibility in the CLI or the JSON. + Options that take optional parameters generate two config functions: + one has no arguments, and one that has a ``char const*`` argument. + This means that adding an optional parameter to a previously bare + option also doesn't break binary compatibility. + +- Adding a new argument to :file:`job.yml` automatically triggers + almost everything by declaring and referencing things that you have + to implement. This way, once you get the code to compile and link, + you know you haven't forgotten anything. There are two tricky cases: + + - If an argument handler has to do something special, like call a + nested config method or select an option table, you have to + implement it manually. This is discussed in + :file:`generate_auto_job`. + + - When you add an option that has optional parameters or choices, + both of the handlers described above are declared, but only the + one that takes an argument is referenced. You have to remember to + implement the one that doesn't take an argument or else people + will get a linker error if they try to call it. The assumption is + that things with optional parameters started out as bare, so the + argument-less version is already there. + +- If you have to add a new option that requires its own option table, + you will have to do some extra work including adding a new nested + Config class, adding a config member variable to ``ArgParser`` in + :file:`QPDFJob_argv.cc` and ``Handlers`` in :file:`QPDFJob_json.cc`, + and make sure that manually implemented handlers are consistent with + each other. It is best under the cases to explicit test cases for + all the various ways to get to the option. diff --git a/manual/release-notes.rst b/manual/release-notes.rst index 8c2af683..6b5b85f4 100644 --- a/manual/release-notes.rst +++ b/manual/release-notes.rst @@ -2303,9 +2303,9 @@ For a detailed list of changes, please see the file been added to the :command:`qpdf` command-line tool. See :ref:`page-selection`. - - Options have been added to the :command:`qpdf` - command-line tool for copying encryption parameters from another - file. (QXXXQ Link) + - The :qpdf:ref:`--copy-encryption` option have been added to the + :command:`qpdf` command-line tool for copying encryption + parameters from another file. - New methods have been added to the ``QPDF`` object for adding and removing pages. See :ref:`adding-and-remove-pages`.