mirror of
https://github.com/qpdf/qpdf.git
synced 2024-09-28 04:59:05 +00:00
Update documentation to clarify some limitations of qpdf JSON
This commit is contained in:
parent
ed04b80caf
commit
f95e0549cc
19
TODO
19
TODO
@ -11,8 +11,6 @@ Next
|
|||||||
Before Release:
|
Before Release:
|
||||||
|
|
||||||
* Stay on top of https://github.com/pikepdf/pikepdf/pull/315
|
* Stay on top of https://github.com/pikepdf/pikepdf/pull/315
|
||||||
* Consider whether otherwise unreferenced object streams should be
|
|
||||||
included in json output. Probably not. Or maybe optionally.
|
|
||||||
* Support json v2 in the C API. At a minimum, write_json,
|
* Support json v2 in the C API. At a minimum, write_json,
|
||||||
create_from_json, and update_from_json need to be there and should
|
create_from_json, and update_from_json need to be there and should
|
||||||
take the same kinds of functions as the C API for logger.
|
take the same kinds of functions as the C API for logger.
|
||||||
@ -56,6 +54,20 @@ direct objects, which are always "resolved" in QPDFObjectHandle.
|
|||||||
Possible future JSON enhancements
|
Possible future JSON enhancements
|
||||||
=================================
|
=================================
|
||||||
|
|
||||||
|
* Consider not including unreferenced objects and trimming the trailer
|
||||||
|
in the same way that QPDFWriter does (except don't remove `/ID`).
|
||||||
|
This means excluding the linearization dictionary and hint stream,
|
||||||
|
the encryption dictionary, all keys from trailer that are removed by
|
||||||
|
QPDFWriter::getTrimmedTrailer except `/ID`, any object streams, and
|
||||||
|
the xref stream as long as all those objects are unreferenced. (They
|
||||||
|
always should be, but there could be some bizarre case of someone
|
||||||
|
creating a PDF file that has an indirect reference to one of those,
|
||||||
|
in which case we need to preserve it.) If this is done, make
|
||||||
|
`--preserve-unreferenced` preserve unreference objects and also
|
||||||
|
those extra keys. Search for "linear" and "trailer" in json.rst to
|
||||||
|
update the various places in the documentation that discuss this.
|
||||||
|
Also update the help for --json and --preserve-unreferenced.
|
||||||
|
|
||||||
* Add to JSON output the information available from a few additional
|
* Add to JSON output the information available from a few additional
|
||||||
informational options:
|
informational options:
|
||||||
|
|
||||||
@ -376,7 +388,8 @@ I find it useful to make reference to them in this list.
|
|||||||
convertible back to a valid PDF. Since providing the password may
|
convertible back to a valid PDF. Since providing the password may
|
||||||
reveal additional details, --show-encryption could potentially retry
|
reveal additional details, --show-encryption could potentially retry
|
||||||
with this option if the first time doesn't work. Then, with the file
|
with this option if the first time doesn't work. Then, with the file
|
||||||
open, we can read the encryption dictionary normally.
|
open, we can read the encryption dictionary normally. If this is
|
||||||
|
done, search for "raw, encrypted" in json.rst.
|
||||||
|
|
||||||
* In libtests, separate executables that need the object library
|
* In libtests, separate executables that need the object library
|
||||||
from those that strictly use public API. Move as many of the test
|
from those that strictly use public API. Move as many of the test
|
||||||
|
@ -52,6 +52,22 @@ changes a handful of defaults so that the resulting JSON is as close
|
|||||||
as possible to the original input and is ready for being converted
|
as possible to the original input and is ready for being converted
|
||||||
back to PDF.
|
back to PDF.
|
||||||
|
|
||||||
|
The qpdf JSON data includes unreferenced objects. This may be
|
||||||
|
addressed in a future version of qpdf. For now, that means that
|
||||||
|
certain objects that are not useful in the JSON representation are
|
||||||
|
included. This includes linearization and encryption dictionaries,
|
||||||
|
linearization hint streams, object streams, and the cross-reference
|
||||||
|
(xref) stream associated with the trailer dictionary where applicable.
|
||||||
|
For the best experience with qpdf JSON, you can run the file through
|
||||||
|
qpdf first to remove encryption, linearization, and object streams.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
qpdf --decrypt --object-streams=disable in.pdf out.pdf
|
||||||
|
qpdf --json-output out.pdf out.json
|
||||||
|
|
||||||
|
|
||||||
.. _json-terminology:
|
.. _json-terminology:
|
||||||
|
|
||||||
JSON Terminology
|
JSON Terminology
|
||||||
@ -299,10 +315,46 @@ Object Values
|
|||||||
Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
|
Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
|
||||||
As such, none of the things ``QPDFWriter`` does apply. This includes
|
As such, none of the things ``QPDFWriter`` does apply. This includes
|
||||||
recompression of streams, renumbering of objects, removal of
|
recompression of streams, renumbering of objects, removal of
|
||||||
unreferenced objects, anything to do with object streams (which are
|
unreferenced objects, encryption, decryption, linearization, QDF
|
||||||
not represented by qpdf JSON at all since they are PDF syntax, not
|
mode, etc. See :ref:`rewriting` for a more in-depth discussion. This
|
||||||
semantics), encryption, decryption, linearization, QDF mode, etc. See
|
has a few noteworthy implications:
|
||||||
:ref:`rewriting` for a more in-depth discussion.
|
|
||||||
|
- Decryption is handled transparently by qpdf. As there are no QPDF
|
||||||
|
APIs, even internal to the library, that allow retrieval of
|
||||||
|
encrypted data in its raw, encrypted form, qpdf JSON always includes
|
||||||
|
decrypted data. It is possible that a future version of qpdf may
|
||||||
|
allow access to raw, encrypted string and stream data.
|
||||||
|
|
||||||
|
- Objects that are related to a PDF file's structure, rather than its
|
||||||
|
content, are included in the JSON output, even though they are not
|
||||||
|
particularly useful. In a future version of qpdf, this may be fixed,
|
||||||
|
and the :qpdf:ref:`--preserve-unreferenced` flag may be able to be
|
||||||
|
used to get the existing behavior. For now, to avoid this, run the
|
||||||
|
file through ``qpdf --decrypt --object-streams=disable in.pdf
|
||||||
|
out.pdf`` to generate a new PDF file that contains no unreferenced
|
||||||
|
or structural objects.
|
||||||
|
|
||||||
|
- Linearized PDF files include a linearization dictionary which is not
|
||||||
|
referenced from any other object and which references the
|
||||||
|
linearization hint stream by offset. The JSON from a linearized PDF
|
||||||
|
file contains both of these objects, even though they are not useful
|
||||||
|
in the JSON. Offset information is not represented in the JSON, so
|
||||||
|
there's no way to find the linearization hint stream from the
|
||||||
|
JSON. If a new PDF is created from JSON that was written, the
|
||||||
|
objects will be read back in but will just be unreferenced objects
|
||||||
|
that will be ignored by ``QPDFWriter`` when the file is rewritten.
|
||||||
|
|
||||||
|
- The JSON from a file with object streams will include the original
|
||||||
|
object stream and will also include all the objects in the stream
|
||||||
|
as top-level objects.
|
||||||
|
|
||||||
|
- In files with object streams, the trailer "dictionary" is a
|
||||||
|
stream. In qpdf JSON files, the ``"trailer"`` key will contain a
|
||||||
|
dictionary with all the keys in it relating to the stream, and the
|
||||||
|
stream will also appear as an unreferenced object.
|
||||||
|
|
||||||
|
- Encrypted files are decrypted, but the encryption dictionary still
|
||||||
|
appears in the JSON output.
|
||||||
|
|
||||||
.. _json.example:
|
.. _json.example:
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user