mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-05 08:02:11 +00:00
TODO-pages: introduce QPDFAssembler and QPDFSplitter
This commit is contained in:
parent
e52b026db4
commit
f7dd653d5f
@ -24,18 +24,15 @@ of the following properties, among others:
|
|||||||
* Contains information used by pages (named destinations)
|
* Contains information used by pages (named destinations)
|
||||||
|
|
||||||
As long as qpdf has had the ability to copy pages from one PDF to another, it has had robust
|
As long as qpdf has had the ability to copy pages from one PDF to another, it has had robust
|
||||||
handling of page-level data. Prior to the implementation of the pages epic, with the exception of
|
handling of page-level data. When qpdf creates a new PDF file from existing PDF files, it starts
|
||||||
page labels and form fields, qpdf has ignored document-level data during page copy operations.
|
with a specific PDF, known as the _primary input_. The primary input may be a file or the built-in
|
||||||
Specifically, when qpdf creates a new PDF file from existing PDF files, it always starts with a
|
_empty PDF_. Prior to the implementation of the pages epic, qpdf has ignored document-level data
|
||||||
specific PDF, known as the _primary input_. The primary input may be a file or the built-in _empty
|
(except for page labels and interactive form fields) when merging and splitting files. Any
|
||||||
PDF_. With the exception of page labels and form fields, document-level constructs that appear in
|
document-level data in the primary input was preserved, and any document-level data other than form
|
||||||
the primary input are preserved, and document-level constructs from the other PDF files are ignored.
|
fields and page labels was discarded from the other files. After this work is complete, qpdf will
|
||||||
With page labels, qpdf always ensures that any given page has the same label in the final output as
|
handle other document-level data in a manner that preserves the functionality of all pages in the
|
||||||
it had in whichever input file it originated from, which is usually (but not always) the desired
|
final PDF. Here are several examples of problems in qpdf prior to the implementation of the pages
|
||||||
behavior. With form fields, qpdf has awareness and ensures that all form fields remain operational.
|
epic:
|
||||||
The goal is to extend this document-level-awareness to other document-level constructs.
|
|
||||||
|
|
||||||
Here are several examples of problems in qpdf prior to the implementation of the pages epic:
|
|
||||||
* If two files with optional content (layers) are merged, all layers in all but the primary input
|
* If two files with optional content (layers) are merged, all layers in all but the primary input
|
||||||
will be visible in the combined file.
|
will be visible in the combined file.
|
||||||
* If two files with file attachments are merged, attachments will be retained on the primary input
|
* If two files with file attachments are merged, attachments will be retained on the primary input
|
||||||
@ -46,9 +43,10 @@ Here are several examples of problems in qpdf prior to the implementation of the
|
|||||||
entirety, including outlines that point to pages that are no longer there, and outlines will be
|
entirety, including outlines that point to pages that are no longer there, and outlines will be
|
||||||
lost from all files except the primary input.
|
lost from all files except the primary input.
|
||||||
|
|
||||||
With the above limitations, qpdf allows combining pages from arbitrary numbers of input PDFs to
|
Regarding page assembly, prior to the pages epic, qpdf allows combining pages from arbitrary numbers
|
||||||
create an output PDF, or in the case of page splitting, multiple output PDFs. The API allows
|
of input PDFs to create an output PDF, or in the case of page splitting, multiple output PDFs. The
|
||||||
arbitrary combinations of input and output files. The command-line allows only the following:
|
API allows arbitrary combinations of input and output files. The command-line allows only the
|
||||||
|
following:
|
||||||
* Merge: creation of a single output file from a primary input and any number of other inputs by
|
* Merge: creation of a single output file from a primary input and any number of other inputs by
|
||||||
selecting pages by index from the beginning or end of the file
|
selecting pages by index from the beginning or end of the file
|
||||||
* Split: creation of multiple output files from a single input or the result of a merge into files
|
* Split: creation of multiple output files from a single input or the result of a merge into files
|
||||||
@ -79,10 +77,13 @@ Here are some examples of things that will become possible:
|
|||||||
The rest of this document describes the details of what how these features will work and what needs
|
The rest of this document describes the details of what how these features will work and what needs
|
||||||
to be done to make them possible to build.
|
to be done to make them possible to build.
|
||||||
|
|
||||||
# Architectural Thoughts
|
# Architecture
|
||||||
|
|
||||||
Open question: if I do all the complex logic in `QPDFJob`, what are the implications for pikepdf or
|
Create a `QPDFAssembler` class to handle merging and a `QPDFSplitter` to handle splitting. The
|
||||||
other wrappers? This will need to be discussed in the discussion ticket.
|
complex assembly logic can be handled by `QPDFAssembler`. `QPDFSplitter` can invoke `QPDFAssembler`
|
||||||
|
with a previous `QPDFAssembler`'s output (or any `QPDF`) multiple times to create the split files.
|
||||||
|
This will mostly involve moving code from `QPDFJob` to `QPDFAssembler` and `QPDFSplitter` and having
|
||||||
|
`QPDFJob` invoke them.
|
||||||
|
|
||||||
Prior to implementation of the pages epic, `QPDFJob` goes through the following stages:
|
Prior to implementation of the pages epic, `QPDFJob` goes through the following stages:
|
||||||
|
|
||||||
@ -123,8 +124,16 @@ Prior to implementation of the pages epic, `QPDFJob` goes through the following
|
|||||||
* Preserve form fields and page labels
|
* Preserve form fields and page labels
|
||||||
|
|
||||||
Broadly, the above has to be modified in the following ways:
|
Broadly, the above has to be modified in the following ways:
|
||||||
* From the C++ API, make it possible to use an arbitrary QPDF as an input rather than having to
|
* The transformations step has to be pulled out as that wil stay in `QPDFJob`.
|
||||||
start with a file. That makes it possible to do arbitrary work on the PDF prior to submitting it.
|
* Most of write QPDF will stay in `QPDFJob`, but the split logic will move to `QPDFSplitter`.
|
||||||
|
* The entire create QPDF logic will move into `QPDFAssembler`.
|
||||||
|
* `QPDFAssembler`'s API will allow using an arbitrary QPDF as an input rather than having to start
|
||||||
|
with a file. That makes it possible to do arbitrary work on the PDF prior to passing it to
|
||||||
|
`QPDFAssembler`.
|
||||||
|
* `QPDFAssembler` and `QPDFSplitter` may need a C API, or perhaps C users will have to work through
|
||||||
|
`QPDFJob`, which will expose nearly all of the functionality.
|
||||||
|
|
||||||
|
Within `QPDFAssembler`, we will extend the create QPDF logic in the following ways:
|
||||||
* Allow creation of blank pages as an additional input source
|
* Allow creation of blank pages as an additional input source
|
||||||
* Generalize underlay/overlay
|
* Generalize underlay/overlay
|
||||||
* Enable controlling placement
|
* Enable controlling placement
|
||||||
@ -132,17 +141,32 @@ Broadly, the above has to be modified in the following ways:
|
|||||||
* Add additional reordering options
|
* Add additional reordering options
|
||||||
* We don't need to provide hooks for this. If someone is going to code a hook, they can just
|
* We don't need to provide hooks for this. If someone is going to code a hook, they can just
|
||||||
compute the page ordering directly.
|
compute the page ordering directly.
|
||||||
* Have a page composition phase after the overlay/underlay stage
|
* Have a page composition stage after the overlay/underlay stage
|
||||||
* Allow n-up, left-to-right (can reverse page order to get rtl), top-to-bottom, or modular
|
* Allow n-up, left-to-right (can reverse page order to get rtl), top-to-bottom, or modular
|
||||||
composition like pstops
|
composition like pstops
|
||||||
* Add additional ways to select pages besides range (e.g. based on outlines)
|
* Add additional ways to select pages besides range (e.g. based on outlines)
|
||||||
* Add additional ways to specify boundaries for splitting
|
|
||||||
* Enhance existing logic to handle other document-level structures, preferably in a way that
|
* Enhance existing logic to handle other document-level structures, preferably in a way that
|
||||||
requires less duplication between split and merge.
|
requires less duplication between split and merge.
|
||||||
* We don't need to turn on and off most types of document constructs individually. People can
|
* We don't need to turn on and off most types of document constructs individually. People can
|
||||||
preprocess using the API or qpdf JSON if they want fine-grained control.
|
preprocess using the API or qpdf JSON if they want fine-grained control.
|
||||||
* For things like attachments and outlines, we can add additional flags.
|
* For things like attachments and outlines, we can add additional flags.
|
||||||
|
|
||||||
|
Within `QPDFSplitter`, we will add additional ways to specify boundaries for splitting.
|
||||||
|
|
||||||
|
We must take care with the implementations and APIs for `QPDFSplitter`, `QPDFAssembler`, and
|
||||||
|
`QPDFJob` to avoid excessive duplication. Perhaps `QPDFJob` can create and configure a
|
||||||
|
`QPDFAssembler` and `QPDFSplitter` on the fly to avoid too much duplication of state.
|
||||||
|
|
||||||
|
Much of the logic will actually reside in other helper classes. For example, `QPDFAssembler` will
|
||||||
|
probably not operate with numeric ranges, leaving that to `QPDFJob` and `QUtil` but will instead
|
||||||
|
have vectors of page numbers. The logic for creating page groups from outlines, threads, or
|
||||||
|
structure will most likely live in the document helpers for those bits of functionality. This keeps
|
||||||
|
needless clutter out of `QPDFAssembler` and also makes it possible for people to perform their own
|
||||||
|
subset of functionality by calling lower-level interfaces. The main power of `QPDFAssembler` will be
|
||||||
|
to manage sequencing and destination tracking as well as to provide a future-proof API that will
|
||||||
|
allow developers to automatically benefit from additional document-level support as it is added to
|
||||||
|
qpdf.
|
||||||
|
|
||||||
## Flexible Assembly
|
## Flexible Assembly
|
||||||
|
|
||||||
This section discusses modifications to the command-line syntax to make it easier to add flexibility
|
This section discusses modifications to the command-line syntax to make it easier to add flexibility
|
||||||
@ -189,10 +213,10 @@ are handled, specify placement options, etc. Given the above framework, it would
|
|||||||
additional features incrementally, without breaking compatibility, such as selecting or splitting
|
additional features incrementally, without breaking compatibility, such as selecting or splitting
|
||||||
pages based on tags, article threads, or outlines.
|
pages based on tags, article threads, or outlines.
|
||||||
|
|
||||||
It's tempting to allow assemblies to be nested, but this gets very complicated. From the C++ API, we
|
It's tempting to allow assemblies to be nested, but this gets very complicated. From the C++ API,
|
||||||
could modify QPDFJob to allow the use any QPDF as an input, but supporting this from the CLI is hard
|
there is no problem using the output of one `QPDFAssembler` as the input to another, but supporting
|
||||||
because of the way JSON/arg parsing is set up. If people need to do that, they can just create
|
this from the CLI is hard because of the way JSON/arg parsing is set up. If people need to do that,
|
||||||
intermediate files.
|
they can just create intermediate files.
|
||||||
|
|
||||||
Proposed CLI enhancements:
|
Proposed CLI enhancements:
|
||||||
|
|
||||||
@ -424,7 +448,8 @@ Last checked: 2023-12-29
|
|||||||
gh search issues label:pages --repo qpdf/qpdf --limit 200 --state=open
|
gh search issues label:pages --repo qpdf/qpdf --limit 200 --state=open
|
||||||
```
|
```
|
||||||
|
|
||||||
* Allow an existing `QPDF` to be an input to a merge operation when using the QPDFJob C++ API
|
* Allow an existing `QPDF` to be an input to a merge or underly/overlay operation when using the
|
||||||
|
`QPDFAssembler` C++ API
|
||||||
* Issues: none
|
* Issues: none
|
||||||
* Generate a mapping from source to destination for all destinations
|
* Generate a mapping from source to destination for all destinations
|
||||||
* Issues: #1077
|
* Issues: #1077
|
||||||
|
Loading…
Reference in New Issue
Block a user