2
1
mirror of https://github.com/qpdf/qpdf.git synced 2025-01-03 15:17:29 +00:00

TODO: Move lexical stuff and add detail

This commit is contained in:
Jay Berkenbilt 2019-12-30 09:17:05 -05:00
parent 0ae19c375e
commit 49f4600dd6

41
TODO
View File

@ -59,29 +59,6 @@ C++-11
time. time.
Lexical
=======
* Make it possible to run the lexer (tokenizer) over a whole file
such that the following things would be possible:
* Rewrite fix-qdf in C++ so that there is no longer a runtime perl
dependency
* Make it possible to replace all strings in a file lexically even
on badly broken files. Ideally this should work files that are
lacking xref, have broken links, etc., and ideally it should work
with encrypted files if possible. This should go through the
streams and strings and replace them with fixed or random
characters, preferably, but not necessarily, in a manner that
works with fonts. One possibility would be to detect whether a
string contains characters with normal encoding, and if so, use
0x41. If the string uses character maps, use 0x01. The output
should otherwise be unrelated to the input. This could be built
after the filtering and tokenizer rewrite and should be done in a
manner that takes advantage of the other lexical features. This
sanitizer should also clear metadata and replace images.
Page splitting/merging Page splitting/merging
====================== ======================
@ -407,3 +384,21 @@ I find it useful to make reference to them in this list
* If I ever decide to make appearance stream-generation aware of * If I ever decide to make appearance stream-generation aware of
fonts or font metrics, see email from Tobias with Message-ID fonts or font metrics, see email from Tobias with Message-ID
<5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14. <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14.
* Consider creating a sanitizer to make it easier for people to send
broken files. Now that we have json mode, this is probably no
longer worth doing. Here is the previous idea, possibly implemented
by making it possible to run the lexer (tokenizer) over a whole
file. Make it possible to replace all strings in a file lexically
even on badly broken files. Ideally this should work files that are
lacking xref, have broken links, etc., and ideally it should work
with encrypted files if possible. This should go through the
streams and strings and replace them with fixed or random
characters, preferably, but not necessarily, in a manner that works
with fonts. One possibility would be to detect whether a string
contains characters with normal encoding, and if so, use 0x41. If
the string uses character maps, use 0x01. The output should
otherwise be unrelated to the input. This could be built after the
filtering and tokenizer rewrite and should be done in a manner that
takes advantage of the other lexical features. This sanitizer
should also clear metadata and replace images.