mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-22 22:58:33 +00:00
146 lines
6.2 KiB
Plaintext
146 lines
6.2 KiB
Plaintext
General
|
||
=======
|
||
|
||
* Improve the random number seed to make it more secure so that we
|
||
have stronger random numbers, particularly when multiple files are
|
||
generated in the same second. This code may need to be
|
||
OS-specific. Probably we should add a method in QUtil to seed with
|
||
a strong random number and call this automatically the first time
|
||
QUtil::random() is called.
|
||
|
||
* Consider the possibility of doing something locale-aware to support
|
||
non-ASCII passwords. Update documentation if this is done.
|
||
|
||
* Look for %PDF header somewhere within the first 1024 bytes of the
|
||
file. Also accept headers of the form "%!PS−Adobe−N.n PDF−M.m".
|
||
See Implementation notes 13 and 14 in appendix H of the PDF 1.7
|
||
specification. This is bug 3267974.
|
||
|
||
* Consider impact of article threads on page splitting/merging.
|
||
Subramanyam provided a test file; see ../misc/article-threads.pdf.
|
||
Email Q-Count: 431864 from 2009-11-03. Other things to consider:
|
||
outlines, page labels, thumbnails, zones. There are probably
|
||
others.
|
||
|
||
* See if we can avoid preserving unreferenced objects in object
|
||
streams even when preserving the object streams.
|
||
|
||
* For debugging linearization bugs, consider adding an option to save
|
||
pass 1 of linearization. This code is sufficient. Change the
|
||
interface to allow specification of a pass1 file, which would
|
||
change the behavior as in this patch.
|
||
|
||
------------------------------
|
||
Index: QPDFWriter.cc
|
||
===================================================================
|
||
--- QPDFWriter.cc (revision 932)
|
||
+++ QPDFWriter.cc (working copy)
|
||
@@ -1965,11 +1965,15 @@
|
||
|
||
// Write file in two passes. Part numbers refer to PDF spec 1.4.
|
||
|
||
+ FILE* XXX = 0;
|
||
for (int pass = 1; pass <= 2; ++pass)
|
||
{
|
||
if (pass == 1)
|
||
{
|
||
- pushDiscardFilter();
|
||
+// pushDiscardFilter();
|
||
+ XXX = fopen("/tmp/pass1.pdf", "w");
|
||
+ pushPipeline(new Pl_StdioFile("pass1", XXX));
|
||
+ activatePipelineStack();
|
||
}
|
||
|
||
// Part 1: header
|
||
@@ -2204,6 +2208,8 @@
|
||
|
||
// Restore hint offset
|
||
this->xref[hint_id] = QPDFXRefEntry(1, hint_offset, 0);
|
||
+ fclose(XXX);
|
||
+ XXX = 0;
|
||
}
|
||
}
|
||
}
|
||
------------------------------
|
||
|
||
* Provide APIs for embedded files. See *attachments*.pdf in test
|
||
suite. The private method findAttachmentStreams finds at least
|
||
cases for modern versions of Adobe Reader (>= 1.7, maybe earlier).
|
||
PDF Reference 1.7 section 3.10, "File Specifications", discusses
|
||
this.
|
||
|
||
A sourceforge user asks if qpdf can handle extracting and embedded
|
||
resources and references these tools, which may be useful as a
|
||
reference.
|
||
|
||
http://multivalent.sourceforge.net/Tools/pdf/Extract.html
|
||
http://multivalent.sourceforge.net/Tools/pdf/Embed.html
|
||
|
||
* The description of Crypt filters is unclear with respect to how to
|
||
use them to override /StmF for specific streams. I'm not sure
|
||
whether qpdf will do the right thing for any specific individual
|
||
streams that might have crypt filters, but I believe it does based
|
||
on my testing of a limited subset. The specification seems to imply
|
||
that only embedded file streams and metadata streams can have crypt
|
||
filters, and there are already special cases in the code to handle
|
||
those. Most likely, it won't be a problem, but someday someone may
|
||
find a file that qpdf doesn't work on because of crypt filters.
|
||
There is an example in the spec of using a crypt filter on a
|
||
metadata stream.
|
||
|
||
For now, we notice /Crypt filters and decode parameters consistent
|
||
with the example in the PDF specification, and the right thing
|
||
happens for metadata filters that happen to be uncompressed or
|
||
otherwise compressed in a way we can filter. This should handle
|
||
all normal cases, but it's more or less just a guess since I don't
|
||
have any test files that actually use stream-specific crypt filters
|
||
in them.
|
||
|
||
* The second xref stream for linearized files has to be padded only
|
||
because we need file_size as computed in pass 1 to be accurate. If
|
||
we were not allowing writing to a pipe, we could seek back to the
|
||
beginning and fill in the value of /L in the linearization
|
||
dictionary as an optimization to alleviate the need for this
|
||
padding. Doing so would require us to pad the /L value
|
||
individually and also to save the file descriptor and determine
|
||
whether it's seekable. This is probably not worth bothering with.
|
||
|
||
* The whole xref handling code in the QPDF object allows the same
|
||
object with more than one generation to coexist, but a lot of logic
|
||
assumes this isn't the case. Anything that creates mappings only
|
||
with the object number and not the generation is this way,
|
||
including most of the interaction between QPDFWriter and QPDF. If
|
||
we wanted to allow the same object with more than one generation to
|
||
coexist, which I'm not sure is allowed, we could fix this by
|
||
changing xref_table. Alternatively, we could detect and disallow
|
||
that case. In fact, it appears that Adobe reader and other PDF
|
||
viewing software silently ignores objects of this type, so this is
|
||
probably not a big deal.
|
||
|
||
* Pl_PNGFilter is only partially implemented. If we ever decoded
|
||
images, we'd have to finish implementing it along with the other
|
||
filter decode parameters and types. For just handling xref
|
||
streams, there's really no need as it wouldn't make sense to use
|
||
any kind of predictor other than 12 (PNG UP filter).
|
||
|
||
* If we ever want to have check mode check the integrity of the free
|
||
list, this can be done by looking at the code from prior to the
|
||
object stream support of 4/5/2008. It's in an if (0) block and
|
||
there's a comment about it. There's also something about it in
|
||
qpdf.test -- search for "free table". On the other hand, the value
|
||
of doing this seems very low since no viewer seems to care, so it's
|
||
probably not worth it.
|
||
|
||
* QPDFObjectHandle::getPageImages() doesn't notice images in
|
||
inherited resource dictionaries. See comments in that function.
|
||
|
||
* Based on an idea suggested by user "Atom Smasher", consider
|
||
providing some mechanism to recover earlier versions of a file
|
||
embedded prior to appended sections.
|
||
|
||
* From a suggestion in bug 3152169, consider having an option to
|
||
re-encode inline images with an ASCII encoding.
|
||
|
||
* From github issue 2, provide more in-depth output for examining
|
||
hint stream contents.
|