mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-03 07:12:28 +00:00
279 lines
13 KiB
Plaintext
279 lines
13 KiB
Plaintext
4.1.0
|
|
=====
|
|
|
|
* If possible, support user-pluggable stream filters. This would
|
|
enable external code to provide interpretation for filters that are
|
|
missing from qpdf.
|
|
|
|
* If possible, consider adding RLE, CCITT3, CCITT4, or any other easy
|
|
filters. (Low priority for 4.1.0.)
|
|
|
|
* If possible, support the following types of broken files:
|
|
|
|
- Files that lack %%EOF at the end but otherwise have a valid
|
|
startxref near the end
|
|
|
|
- Files that have no whitespace token after "endobj" such that
|
|
endobj collides with the start of the next object
|
|
|
|
- Files with individual corrupted streams. Just replace the
|
|
streams with empty streams or possibly uncompress as much as
|
|
possible
|
|
|
|
- See ../misc/broken-files
|
|
|
|
* The mingw64 package is broken. It contains a 32-bit version of
|
|
libstdc++-6.dll. Fix this and make sure it can never happen
|
|
again. Ideally we should test in a sandbox, but failing that, at
|
|
least run file on all the dlls to make sure they are of the right
|
|
type.
|
|
|
|
* Add to documentation, and mention this documentation in
|
|
README.maintainer:
|
|
|
|
Casting policy.
|
|
|
|
The C++ code in qpdf is free of old-style casts except where
|
|
unavoidable (e.g. where the old-style cast is in a macro provided
|
|
by a third-party header file). When there is a need for a cast, it
|
|
is handled, in order of preference by rewriting the code to avoid
|
|
the need for a cast, calling const_cast, calling static_cast,
|
|
calling reinterpret_cast, or calling some combination of the above.
|
|
The casting policy explicitly prohibits casting between sizes for
|
|
no purpose other than to quiet a compiler warning when there is no
|
|
reasonable chance of a problem resulting. The reason for this
|
|
exclusion is that it takes away enabling additional compiler
|
|
warnings as a tool for making future improvements to this aspect of
|
|
the code and also damages the readability of the code. As a last
|
|
resort, a compiler-specific pragma may be used to suppress a
|
|
warning that we don't want to fix. Examples may include
|
|
suppressing warnings about the use of old-style casts in code that
|
|
is shared between C and C++ code.
|
|
|
|
There are a few significant areas where casting is common in the qpdf
|
|
sources or where casting would be required to quiet higher levels
|
|
of compiler warnings but is omitted at present:
|
|
|
|
* signed vs. unsigned char. For historical reasons, there are a
|
|
lot of places in qpdf's internals that deal with unsigned char,
|
|
which means that a lot of casting is required to interoperate
|
|
with standard library calls and std::string. In retrospect,
|
|
qpdf should have probably used signed char everywhere and just
|
|
cast to unsigned char when needed. There are reinterpret_cast
|
|
calls to go between char* and unsigned char*, and there are
|
|
static_cast calls to go between char and unsigned char. These
|
|
should always be safe.
|
|
|
|
* non-const unsigned char* used in Pipeline interface. The
|
|
pipeline interface has a write() call that uses unsigned char*
|
|
without a const qualifier. The main reason for this is to
|
|
support pipelines that make calls to third-party libraries, such
|
|
as zlib, that don't include const in their interfaces.
|
|
Unfortunately, there are many places in the code where it is
|
|
desirable to have const char* with pipelines. None of the
|
|
pipeline implementations in qpdf currently modify the data
|
|
passed to write, and doing so would be counter to the intent of
|
|
Pipeline. There are places in the code where const_cast is used
|
|
to remove the const-ness of pointers going into Pipelines. This
|
|
could be potentially unsafe, but there is adequate testing to
|
|
assert that it is safe in qpdf's code.
|
|
|
|
* size_t vs. qpdf_offset_t. This is pretty much unavoidable since
|
|
offsets are signed types and sizes are unsigned types. Whenever
|
|
it is necessary to seek by an amount given by a size_t, it
|
|
becomes necessary to mix and match between size_t and
|
|
qpdf_offset_t. Additionally, qpdf sometimes treats memory
|
|
buffers like files, and those seek interfaces have to be
|
|
consistent with file-based input sources. Neither gcc nor MSVC
|
|
give warnings for this case by default, but both have warning
|
|
flags that can enable this. (MSVC: /W14267 or /W3 (which also
|
|
enables some additional warnings that we ignore); gcc:
|
|
-Wconversion -Wsign-conversion). This could matter for files
|
|
whose sizes are larger than 2^63 bytes, but it is reasonable to
|
|
expect that a world where such files are common would also have
|
|
larger size_t and qpdf_offset_t types in it. I am not aware of
|
|
any cases where 32-bit systems that have size_t smaller than
|
|
qpdf_offset_t could run into problems, though I can't
|
|
conclusively rule out the possibility. In the event that
|
|
someone should produce a file that qpdf can't handle because of
|
|
what is suspected to be issues involving the handling of size_t
|
|
vs. qpdf_offset_t (such files may behave properly on 64-bit
|
|
systems but not on 32-bit systems and may have very large
|
|
embedded files or streams, for example), the above mentioned
|
|
warning flags could be enabled and all those implicit
|
|
conversions could be carefully scrutinized. (I have already
|
|
gone through that exercise once in adding support for files >
|
|
4GB in size.) I continue to be commited to supporting large
|
|
files on 32-bit systems, but I would not go to any lengths to
|
|
support corner cases involving large embedded files or large
|
|
streams that work on 64-bit systems but not on 32-bit systems
|
|
because of size_t being too small. It is reasonable to assume
|
|
that anyone working with such files would be using a 64-bit
|
|
system anyway.
|
|
|
|
* size_t vs. int. There are some cases where size_t and int or
|
|
size_t and unsigned int are used interchangeably. These cases
|
|
occur when working with very small amounts of memory, such as
|
|
with the bit readers (where we're working with just a few bytes
|
|
at a time), some cases of strlen, and a few other cases. I have
|
|
scrutinized all of these cases and determined them to be safe,
|
|
but there is no mechanism in the code to ensure that new unsafe
|
|
conversions between int and size_t aren't introduced short of
|
|
good testing and strong awareness of the issues. Again, if any
|
|
such bugs are suspected in the future, enable the additional
|
|
warning flags and scrutinizing the warnings would be in order.
|
|
|
|
* New public interfaces have been added.
|
|
|
|
|
|
General
|
|
=======
|
|
|
|
* Consider providing a Windows installer for qpdf using NSIS.
|
|
|
|
* Improve the random number seed to make it more secure so that we
|
|
have stronger random numbers, particularly when multiple files are
|
|
generated in the same second. This code may need to be
|
|
OS-specific. Probably we should add a method in QUtil to seed with
|
|
a strong random number and call this automatically the first time
|
|
QUtil::random() is called.
|
|
|
|
* Study what's required to support savable forms that can be saved by
|
|
Adobe Reader. Does this require actually signing the document with
|
|
an Adobe private key? Search for "Digital signatures" in the PDF
|
|
spec, and look at ~/Q/pdf-collection/form-with-full-save.pdf, which
|
|
came from Adobe's example site.
|
|
|
|
* Consider the possibility of doing something locale-aware to support
|
|
non-ASCII passwords. Update documentation if this is done.
|
|
Consider implementing full Unicode password algorithms from newer
|
|
encryption formats.
|
|
|
|
* Consider impact of article threads on page splitting/merging.
|
|
Subramanyam provided a test file; see ../misc/article-threads.pdf.
|
|
Email Q-Count: 431864 from 2009-11-03. Other things to consider:
|
|
outlines, page labels, thumbnails, zones. There are probably
|
|
others.
|
|
|
|
* See if we can avoid preserving unreferenced objects in object
|
|
streams even when preserving the object streams.
|
|
|
|
* For debugging linearization bugs, consider adding an option to save
|
|
pass 1 of linearization. This code is sufficient. Change the
|
|
interface to allow specification of a pass1 file, which would
|
|
change the behavior as in this patch.
|
|
|
|
------------------------------
|
|
Index: QPDFWriter.cc
|
|
===================================================================
|
|
--- QPDFWriter.cc (revision 932)
|
|
+++ QPDFWriter.cc (working copy)
|
|
@@ -1965,11 +1965,15 @@
|
|
|
|
// Write file in two passes. Part numbers refer to PDF spec 1.4.
|
|
|
|
+ FILE* XXX = 0;
|
|
for (int pass = 1; pass <= 2; ++pass)
|
|
{
|
|
if (pass == 1)
|
|
{
|
|
- pushDiscardFilter();
|
|
+// pushDiscardFilter();
|
|
+ XXX = QUtil::safe_fopen("/tmp/pass1.pdf", "w");
|
|
+ pushPipeline(new Pl_StdioFile("pass1", XXX));
|
|
+ activatePipelineStack();
|
|
}
|
|
|
|
// Part 1: header
|
|
@@ -2204,6 +2208,8 @@
|
|
|
|
// Restore hint offset
|
|
this->xref[hint_id] = QPDFXRefEntry(1, hint_offset, 0);
|
|
+ fclose(XXX);
|
|
+ XXX = 0;
|
|
}
|
|
}
|
|
}
|
|
------------------------------
|
|
|
|
* Provide APIs for embedded files. See *attachments*.pdf in test
|
|
suite. The private method findAttachmentStreams finds at least
|
|
cases for modern versions of Adobe Reader (>= 1.7, maybe earlier).
|
|
PDF Reference 1.7 section 3.10, "File Specifications", discusses
|
|
this.
|
|
|
|
A sourceforge user asks if qpdf can handle extracting and embedded
|
|
resources and references these tools, which may be useful as a
|
|
reference.
|
|
|
|
http://multivalent.sourceforge.net/Tools/pdf/Extract.html
|
|
http://multivalent.sourceforge.net/Tools/pdf/Embed.html
|
|
|
|
* The description of Crypt filters is unclear with respect to how to
|
|
use them to override /StmF for specific streams. I'm not sure
|
|
whether qpdf will do the right thing for any specific individual
|
|
streams that might have crypt filters, but I believe it does based
|
|
on my testing of a limited subset. The specification seems to imply
|
|
that only embedded file streams and metadata streams can have crypt
|
|
filters, and there are already special cases in the code to handle
|
|
those. Most likely, it won't be a problem, but someday someone may
|
|
find a file that qpdf doesn't work on because of crypt filters.
|
|
There is an example in the spec of using a crypt filter on a
|
|
metadata stream.
|
|
|
|
For now, we notice /Crypt filters and decode parameters consistent
|
|
with the example in the PDF specification, and the right thing
|
|
happens for metadata filters that happen to be uncompressed or
|
|
otherwise compressed in a way we can filter. This should handle
|
|
all normal cases, but it's more or less just a guess since I don't
|
|
have any test files that actually use stream-specific crypt filters
|
|
in them.
|
|
|
|
* The second xref stream for linearized files has to be padded only
|
|
because we need file_size as computed in pass 1 to be accurate. If
|
|
we were not allowing writing to a pipe, we could seek back to the
|
|
beginning and fill in the value of /L in the linearization
|
|
dictionary as an optimization to alleviate the need for this
|
|
padding. Doing so would require us to pad the /L value
|
|
individually and also to save the file descriptor and determine
|
|
whether it's seekable. This is probably not worth bothering with.
|
|
|
|
* The whole xref handling code in the QPDF object allows the same
|
|
object with more than one generation to coexist, but a lot of logic
|
|
assumes this isn't the case. Anything that creates mappings only
|
|
with the object number and not the generation is this way,
|
|
including most of the interaction between QPDFWriter and QPDF. If
|
|
we wanted to allow the same object with more than one generation to
|
|
coexist, which I'm not sure is allowed, we could fix this by
|
|
changing xref_table. Alternatively, we could detect and disallow
|
|
that case. In fact, it appears that Adobe reader and other PDF
|
|
viewing software silently ignores objects of this type, so this is
|
|
probably not a big deal.
|
|
|
|
* Pl_PNGFilter is only partially implemented. If we ever decoded
|
|
images, we'd have to finish implementing it along with the other
|
|
filter decode parameters and types. For just handling xref
|
|
streams, there's really no need as it wouldn't make sense to use
|
|
any kind of predictor other than 12 (PNG UP filter).
|
|
|
|
* If we ever want to have check mode check the integrity of the free
|
|
list, this can be done by looking at the code from prior to the
|
|
object stream support of 4/5/2008. It's in an if (0) block and
|
|
there's a comment about it. There's also something about it in
|
|
qpdf.test -- search for "free table". On the other hand, the value
|
|
of doing this seems very low since no viewer seems to care, so it's
|
|
probably not worth it.
|
|
|
|
* QPDFObjectHandle::getPageImages() doesn't notice images in
|
|
inherited resource dictionaries. See comments in that function.
|
|
|
|
* Based on an idea suggested by user "Atom Smasher", consider
|
|
providing some mechanism to recover earlier versions of a file
|
|
embedded prior to appended sections.
|
|
|
|
* From a suggestion in bug 3152169, consider having an option to
|
|
re-encode inline images with an ASCII encoding.
|
|
|
|
* From github issue 2, provide more in-depth output for examining
|
|
hint stream contents.
|