mirror of
https://github.com/qpdf/qpdf.git
synced 2025-02-13 00:58:28 +00:00
TODO: more JSON notes
This commit is contained in:
parent
3c4d2bfb21
commit
7882b85b06
112
TODO
112
TODO
@ -39,6 +39,108 @@ Soon: Break ground on "Document-level work"
|
|||||||
Output JSON v2
|
Output JSON v2
|
||||||
==============
|
==============
|
||||||
|
|
||||||
|
----
|
||||||
|
notes from 5/2:
|
||||||
|
|
||||||
|
Need new pipelines:
|
||||||
|
* Pl_OStream(std::ostream) with semantics like Pl_StdioFile
|
||||||
|
* Pl_String to std::string with semantics like Pl_Buffer
|
||||||
|
* Pl_Base64
|
||||||
|
|
||||||
|
New Pipeline methods:
|
||||||
|
* writeString(std::string const&)
|
||||||
|
* writeCString(char*)
|
||||||
|
* writeChars(char*, size_t)
|
||||||
|
|
||||||
|
* Consider templated operator<< which could specialize for char* and
|
||||||
|
std::string and could use std::ostringstream otherwise
|
||||||
|
|
||||||
|
See if I can change all output and error messages issued by the
|
||||||
|
library, when context is available, to have a pipeline rather than a
|
||||||
|
FILE* or std::ostream. This makes it possible for people to capture
|
||||||
|
output more flexibly.
|
||||||
|
|
||||||
|
JSON: rather than unparse() -> string, there should be write method
|
||||||
|
that takes a pipeline and a depth. Then rewrite all the unparse
|
||||||
|
methods to use it. This makes incremental write possible as well as
|
||||||
|
writing arbitrarily large amounts of output.
|
||||||
|
|
||||||
|
JSON::parse should work from an InputSource. BufferInputSource can
|
||||||
|
already start with a std::string.
|
||||||
|
|
||||||
|
Have a json blob defined by a function that takes a pipeline and
|
||||||
|
writes data to the pipeline. It's writer should create a Pl_Base64 ->
|
||||||
|
Pl_Concatenate in front of the pipeline passed to write and call the
|
||||||
|
function with that.
|
||||||
|
|
||||||
|
Add methods needed to do incremental writes. Basically we need to
|
||||||
|
expose functionality the array and dictionary unparse methods. Maybe
|
||||||
|
we can have a DictionaryWriter and an ArrayWriter that deal with the
|
||||||
|
first/depth logic and have writeElement or writeEntry(key, value)
|
||||||
|
methods.
|
||||||
|
|
||||||
|
For json output, do not unparse to string. Use the writers instead.
|
||||||
|
Write incrementally. This changes ordering only, but we should be able
|
||||||
|
manually update the test output for those cases. Objects should be
|
||||||
|
written in numerical order, not lexically sorted. It probably makes
|
||||||
|
sense to put the trailer at the end since that's where it is in a
|
||||||
|
regular PDF.
|
||||||
|
|
||||||
|
When we get to full serialization, add json serialization performance
|
||||||
|
test.
|
||||||
|
|
||||||
|
Some if not all of the json output functionality for v2 should move
|
||||||
|
into QPDF proper rather than living in QPDFJob. There can be a
|
||||||
|
top-level QPDF method that takes a pipeline and writes the JSON
|
||||||
|
serialization to it.
|
||||||
|
|
||||||
|
Decide what the API/CLI will be for serializing to v2. Will it just be
|
||||||
|
part of --json or will it be its own separate thing? Probably we
|
||||||
|
should make it so that a serialized PDF is different but uses the same
|
||||||
|
object format as regular json mode.
|
||||||
|
|
||||||
|
For going back from JSON to PDF, a separate utility will be needed.
|
||||||
|
It's not practical for QPDFObjectHandle to be able to read JSON
|
||||||
|
because of the special handling that is required for indirect objects,
|
||||||
|
and QPDF can't just accept JSON because the way InputSource is used is
|
||||||
|
complete different. Instead, we will need a separate utility that has
|
||||||
|
logic similar to what copyForeignObject does. It will go something
|
||||||
|
like this:
|
||||||
|
|
||||||
|
* Create an empty QPDF (not emptyPDF, one with no objects in it at
|
||||||
|
all). This works:
|
||||||
|
|
||||||
|
```
|
||||||
|
%PDF-1.3
|
||||||
|
xref
|
||||||
|
0 1
|
||||||
|
0000000000 65535 f
|
||||||
|
trailer << /Size 1 >>
|
||||||
|
startxref
|
||||||
|
9
|
||||||
|
%%EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
For each object:
|
||||||
|
|
||||||
|
* Walk through the object detecting any indirect objects. For each one
|
||||||
|
that is not already known, reserve the object. We can also validate
|
||||||
|
but we should try to do the best we can with invalid JSON so people
|
||||||
|
can get good error messages.
|
||||||
|
* Construct a QPDFObjectHandle from the JSON
|
||||||
|
* If the object is the trailer, update the trailer
|
||||||
|
* Else if the object doesn't exist, reserve it
|
||||||
|
* If the object is reserved, call replaceReserved()
|
||||||
|
* Else the object already exists; this is an error.
|
||||||
|
|
||||||
|
This can almost be done through public API. I think all we need is the
|
||||||
|
ability to create a reserved object with a specific object ID.
|
||||||
|
|
||||||
|
The choices for json_key (job.yml) will be different for v1 and v2.
|
||||||
|
That information is already duplicated in multiple places.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
Remember typo: search for "Typo" In QPDFJob::doJSONEncrypt.
|
Remember typo: search for "Typo" In QPDFJob::doJSONEncrypt.
|
||||||
|
|
||||||
Remember to test interaction between generators and schemas.
|
Remember to test interaction between generators and schemas.
|
||||||
@ -173,21 +275,25 @@ JSON:
|
|||||||
object. No dictionary merges or anything like that are performed.
|
object. No dictionary merges or anything like that are performed.
|
||||||
It will call replaceObject.
|
It will call replaceObject.
|
||||||
|
|
||||||
Within .qpdf.objects, the key is "obj:o,g" or "obj:trailer", and the
|
Within .qpdf.objects, the key is "obj:o g R" or "obj:trailer", and the
|
||||||
value is a dictionary with exactly one of "value" or "stream" as its
|
value is a dictionary with exactly one of "value" or "stream" as its
|
||||||
single key.
|
single key.
|
||||||
|
|
||||||
|
Rationale of "obj:o g R" is that indirect object references are just
|
||||||
|
"o g R", and so code that wants to resolve one can do so easily by
|
||||||
|
just prepending "obj:" and not having to parse or split the string.
|
||||||
|
|
||||||
For non-streams:
|
For non-streams:
|
||||||
|
|
||||||
{
|
{
|
||||||
"obj:o,g": {
|
"obj:o g R": {
|
||||||
"value": ...
|
"value": ...
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
For streams:
|
For streams:
|
||||||
|
|
||||||
"obj:o,g": {
|
"obj:o g R": {
|
||||||
"stream": {
|
"stream": {
|
||||||
"dict": { ... stream dictionary ... },
|
"dict": { ... stream dictionary ... },
|
||||||
"filterable": bool,
|
"filterable": bool,
|
||||||
|
Loading…
x
Reference in New Issue
Block a user