mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-03 15:17:29 +00:00
TODO: Move lexical stuff and add detail
This commit is contained in:
parent
0ae19c375e
commit
49f4600dd6
41
TODO
41
TODO
@ -59,29 +59,6 @@ C++-11
|
||||
time.
|
||||
|
||||
|
||||
Lexical
|
||||
=======
|
||||
|
||||
* Make it possible to run the lexer (tokenizer) over a whole file
|
||||
such that the following things would be possible:
|
||||
|
||||
* Rewrite fix-qdf in C++ so that there is no longer a runtime perl
|
||||
dependency
|
||||
|
||||
* Make it possible to replace all strings in a file lexically even
|
||||
on badly broken files. Ideally this should work files that are
|
||||
lacking xref, have broken links, etc., and ideally it should work
|
||||
with encrypted files if possible. This should go through the
|
||||
streams and strings and replace them with fixed or random
|
||||
characters, preferably, but not necessarily, in a manner that
|
||||
works with fonts. One possibility would be to detect whether a
|
||||
string contains characters with normal encoding, and if so, use
|
||||
0x41. If the string uses character maps, use 0x01. The output
|
||||
should otherwise be unrelated to the input. This could be built
|
||||
after the filtering and tokenizer rewrite and should be done in a
|
||||
manner that takes advantage of the other lexical features. This
|
||||
sanitizer should also clear metadata and replace images.
|
||||
|
||||
Page splitting/merging
|
||||
======================
|
||||
|
||||
@ -407,3 +384,21 @@ I find it useful to make reference to them in this list
|
||||
* If I ever decide to make appearance stream-generation aware of
|
||||
fonts or font metrics, see email from Tobias with Message-ID
|
||||
<5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14.
|
||||
|
||||
* Consider creating a sanitizer to make it easier for people to send
|
||||
broken files. Now that we have json mode, this is probably no
|
||||
longer worth doing. Here is the previous idea, possibly implemented
|
||||
by making it possible to run the lexer (tokenizer) over a whole
|
||||
file. Make it possible to replace all strings in a file lexically
|
||||
even on badly broken files. Ideally this should work files that are
|
||||
lacking xref, have broken links, etc., and ideally it should work
|
||||
with encrypted files if possible. This should go through the
|
||||
streams and strings and replace them with fixed or random
|
||||
characters, preferably, but not necessarily, in a manner that works
|
||||
with fonts. One possibility would be to detect whether a string
|
||||
contains characters with normal encoding, and if so, use 0x41. If
|
||||
the string uses character maps, use 0x01. The output should
|
||||
otherwise be unrelated to the input. This could be built after the
|
||||
filtering and tokenizer rewrite and should be done in a manner that
|
||||
takes advantage of the other lexical features. This sanitizer
|
||||
should also clear metadata and replace images.
|
||||
|
Loading…
Reference in New Issue
Block a user