mirror of
https://github.com/qpdf/qpdf.git
synced 2024-12-22 19:08:59 +00:00
94131116a9
git-svn-id: svn+q:///qpdf/trunk@823 71b93d88-0707-0410-a8cf-f5a4172ac649
195 lines
8.5 KiB
Plaintext
195 lines
8.5 KiB
Plaintext
2.1
|
|
===
|
|
|
|
* Update documentation to reflect new command line flags and any
|
|
other relevant changes. Should read through ChangeLog and the
|
|
manual before releasing 2.1.
|
|
|
|
* Update release documentation to remember not to include debugging
|
|
in the Windows release and to strip the DLL and executables.
|
|
Consider making the "install" target do something useful for
|
|
Windows. Update README.windows in this case including taking out
|
|
the mention of strip since it should be handled by the install
|
|
step. Determine whether -g with strip is different from not -g
|
|
with strip.
|
|
|
|
* Add comments for the security functions that map them back to the
|
|
items in Adobe's products.
|
|
|
|
* Have force version at least turn off object streams and maybe
|
|
change security settings?
|
|
|
|
* Add error codes to QPDFException. Change the error interface so
|
|
that warnings and errors are pointers that can be queried using
|
|
more C API functions. We need a way to get a full string as well
|
|
as an error code, file name, offset, and message. We should go
|
|
through all error messages to try to include all these fields as
|
|
appropriate. Make sure invalid password is specifically
|
|
detectable. I/O errors and so forth should also be
|
|
distinguishable. Make sure all errors include information about
|
|
the most recent read location including byte offset and
|
|
object/generation number.
|
|
|
|
* It might be nice to be able to trap I/O errors separately from
|
|
other errors; especially be able to separate errors that the user
|
|
can fix (like permission errors) from errors that they probably
|
|
can't fix like corrupted PDF files, unsupported filters, or
|
|
internal errors. However, only QPDF::processFile(), which does the
|
|
initial read, and QPDFWriter::QPDFWriter(), which does the initial
|
|
write, are at all likely to generate such errors for a case other
|
|
than a catastrophic failure.
|
|
|
|
* "Delphi wrapper unit 'qpdf.pas' created by Zarko Gajic
|
|
(http://delphi.about.com). .. use at your own risk and for whatever
|
|
the purpose you want .. no support provided. Sample code provided."
|
|
|
|
* R = 4, V = 4 encryption.
|
|
|
|
- Update C API for R4 encryption
|
|
|
|
- When we write encrypted files, we must remember to omit any
|
|
encryption filter settings from original streams.
|
|
|
|
- test various combinations with and without cleartext-metadata
|
|
and aes in compression tests
|
|
|
|
- figure out a way to test crypt filters defined on a stream
|
|
|
|
- test combinations of linearization and v4 encryption
|
|
|
|
- would be nice to test strings and streams with different
|
|
encryption types, but without sample data, we'd have to write
|
|
them ourselves which is not that useful
|
|
|
|
- figure out how to look at the metadata so I can tell whether
|
|
/EncryptMetadata is working the way it's supposed to
|
|
|
|
- Do something with embedded files, but what and how?
|
|
|
|
- General notes:
|
|
|
|
/CF - keys are crypt filter names, values are are crypt
|
|
dictionaries
|
|
|
|
Individual streams may also have crypt filters. Filter type
|
|
/Crypt; /DecodeParms must contain a Crypt filter decode
|
|
parameters dictionary whose /Name entry specifies the particular
|
|
filter to be used. If /Name is missing, use /Identity.
|
|
/DecodeParms << /Crypt << /Name /XYZ >> >> where /XYZ is
|
|
/Identity or a key in /CF.
|
|
|
|
/Identity means not to encrypt.
|
|
|
|
Crypt Dictionaries
|
|
|
|
/Type (optional) /CryptFilter
|
|
/CFM:
|
|
/V2 - use rc4
|
|
/AESV2 - use aes
|
|
/Length - supposed to be key length, but the one file I have
|
|
has a bogus value for it, so I'm ignoring it.
|
|
|
|
We will ignore remaining fields and values.
|
|
|
|
2.2
|
|
===
|
|
|
|
* Add ability to create new streams or replace stream data. Consider
|
|
stream data sources to include a file and offset, a buffer, or a
|
|
some kind of callback mechanism. Find messages exchanged with
|
|
Stefan Heinsen <stefan.heinsen@gmx.de> in August, 2009. He seems
|
|
to like to send encrypted mail. (key 01FCC336)
|
|
|
|
* Look at page splitting.
|
|
|
|
|
|
General
|
|
=======
|
|
|
|
* Handle embedded files. PDF Reference 1.7 section 3.10, "File
|
|
Specifications", discusses this. Once we can definitely recongize
|
|
all embedded files in a docucment, we can update the encryption
|
|
code to handle it properly. In QPDF_encryption.cc, search for
|
|
cf_file. Remove exception thrown if cf_file is different from
|
|
cf_stream, and write code in the stream decryption section to use
|
|
cf_file instead of cf_stream. In general, add interfaces to
|
|
get the list of embedded files and to extract them. To handle
|
|
general embedded files associated with the whole document, follow
|
|
root -> /Names -> /EmbeddedFiles -> /Names to get to the file
|
|
specification dictionaries. Then, in each file specification
|
|
dictionary, follow /EF -> /F to the actual stream.
|
|
|
|
* The description of Crypt filters is unclear with respect to how to
|
|
use them to override /StmF for specific streams. I'm not sure
|
|
whether qpdf will do the right thing for any specific individual
|
|
streams that might have crypt filters. The specification seems to
|
|
imply that only embedded file streams and metadata streams can have
|
|
crypt filters, and there are already special cases in the code to
|
|
handle those. Most likely, it won't be a problem, but someday
|
|
someone may find a file that qpdf doesn't work on because of crypt
|
|
filters.
|
|
|
|
* The second xref stream for linearized files has to be padded only
|
|
because we need file_size as computed in pass 1 to be accurate. If
|
|
we were not allowing writing to a pipe, we could seek back to the
|
|
beginning and fill in the value of /L in the linearization
|
|
dictionary as an optimization to alleviate the need for this
|
|
padding. Doing so would require us to pad the /L value
|
|
individually and also to save the file descriptor and determine
|
|
whether it's seekable. This is probably not worth bothering with.
|
|
|
|
* The whole xref handling code in the QPDF object allows the same
|
|
object with more than one generation to coexist, but a lot of logic
|
|
assumes this isn't the case. Anything that creates mappings only
|
|
with the object number and not the generation is this way,
|
|
including most of the interaction between QPDFWriter and QPDF. If
|
|
we wanted to allow the same object with more than one generation to
|
|
coexist, which I'm not sure is allowed, we could fix this by
|
|
changing xref_table. Alternatively, we could detect and disallow
|
|
that case. In fact, it appears that Adobe reader and other PDF
|
|
viewing software silently ignores objects of this type, so this is
|
|
probably not a big deal.
|
|
|
|
* Pl_PNGFilter is only partially implemented. If we ever decoded
|
|
images, we'd have to finish implementing it along with the other
|
|
filter decode parameters and types. For just handling xref
|
|
streams, there's really no need as it wouldn't make sense to use
|
|
any kind of predictor other than 12 (PNG UP filter).
|
|
|
|
* If we ever want to have check mode check the integrity of the free
|
|
list, this can be done by looking at the code from prior to the
|
|
object stream support of 4/5/2008. It's in an if (0) block and
|
|
there's a comment about it. There's also something about it in
|
|
qpdf.test -- search for "free table". On the other hand, the value
|
|
of doing this seems very low since no viewer seems to care, so it's
|
|
probably not worth it.
|
|
|
|
* QPDFObjectHandle::getPageImages() doesn't notice images in
|
|
inherited resource dictionaries. See comments in that function.
|
|
|
|
* Based on an idea suggested by user "Atom Smasher", consider
|
|
providing some mechanism to recover earlier versions of a file
|
|
embedded prior to appended sections.
|
|
|
|
Splitting by Pages
|
|
==================
|
|
|
|
Although qpdf does not currently support splitting a file into pages,
|
|
the work done for linearization covers almost all the work. To do
|
|
page splitting. If this functionality is needed, study
|
|
obj_user_to_objects and object_to_obj_users created in
|
|
QPDF_optimization for ideas. It's quite possible that the information
|
|
computed by calculateLinearizationData is actually sufficient to do
|
|
page splitting in many circumstances. That code knows which objects
|
|
are used by which pages, though it doesn't do anything page-specific
|
|
with outlines, thumbnails, page labels, or anything else.
|
|
|
|
Another approach would be to traverse only pages that are being output
|
|
taking care not to traverse into the pages tree, and then to fabricate
|
|
a new pages tree.
|
|
|
|
Either way, care must be taken to handle other things such as
|
|
outlines, page labels, thumbnails, threads, zones, etc. in a sensible
|
|
way. This may include simply omitting information other than page
|
|
content.
|