2011-12-06 15:58:29 +00:00
|
|
|
|
Next
|
|
|
|
|
====
|
|
|
|
|
|
2012-06-20 15:20:57 +00:00
|
|
|
|
*** ABI changes have been made. build.mk has been updated.
|
2012-05-19 13:23:00 +00:00
|
|
|
|
|
2012-06-20 22:26:56 +00:00
|
|
|
|
* 64-bit windows build, remaining steps
|
|
|
|
|
|
|
|
|
|
- new external-libs have been built and copied into
|
|
|
|
|
~/Q/storage/releases/qpdf/external-libs. Release is done in
|
|
|
|
|
git. Just need to upload when ready. Remember to document that
|
2012-06-21 21:07:35 +00:00
|
|
|
|
this version is needed for > 2.3.1.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-06-21 21:07:35 +00:00
|
|
|
|
- update README-windows.txt docs to indicate that MSVC 2010 is the
|
2012-06-26 23:11:21 +00:00
|
|
|
|
supported version and to update the information about mingw,
|
|
|
|
|
including the need for the _FILE_OFFSET_BITS workaround on the
|
|
|
|
|
32-bit version.
|
2011-12-28 22:19:40 +00:00
|
|
|
|
|
2012-06-24 19:29:39 +00:00
|
|
|
|
* Document that your compiler has to support long long.
|
2012-06-22 03:06:48 +00:00
|
|
|
|
|
2012-07-07 21:33:45 +00:00
|
|
|
|
* Make sure that the release notes call attention to the one API
|
|
|
|
|
breaking change: removal of length from replaceStreamData.
|
2012-06-24 19:26:28 +00:00
|
|
|
|
|
2012-06-24 19:29:39 +00:00
|
|
|
|
* Add a way to create new QPDFObjectHandles with a string
|
|
|
|
|
representation of them, such as
|
|
|
|
|
QPDFObjectHandle::parse("<< /a 1 /b 2 >>");
|
2012-06-22 03:06:48 +00:00
|
|
|
|
|
2012-07-08 20:00:50 +00:00
|
|
|
|
* Document thread safety: One individual QPDF or QPDFWriter object
|
|
|
|
|
can only be used by one thread at a time, but multiple threads can
|
|
|
|
|
simultaneously use separate objects.
|
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
* Write some documentation about the design of copyForeignObject.
|
2012-06-22 03:06:48 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
* copyForeignObject still to do:
|
2011-12-28 22:19:40 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
- qpdf command
|
2011-12-28 22:19:40 +00:00
|
|
|
|
|
2012-07-16 01:27:08 +00:00
|
|
|
|
Command line could be something like (but not exactly)
|
2011-12-06 15:58:29 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
--pages [ --new ] { file [password] numeric-range ... } ... --
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
The first file referenced would be the one whose other data would
|
|
|
|
|
be preserved (like trailer, info, encryption, outlines, etc.).
|
|
|
|
|
--new as first file would just use an empty file as the starting
|
|
|
|
|
point. Be explicit about whether outlines, etc., are handled.
|
|
|
|
|
They are not handled initially.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
Example: to grab pages 1-5 from file1 and 11-15 from file2
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
--pages file1.pdf 1-5 file2.pdf 11-15 --
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
To implement this, we would remove all pages from file1 except
|
|
|
|
|
pages 1 through 5. Then we would take pages 11 through 15 from
|
|
|
|
|
file2, copy them to the file, and add them as pages.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-16 01:27:08 +00:00
|
|
|
|
(Look at ~/source/examples/perl/numrange.pl for numeric range
|
|
|
|
|
parsing code.)
|
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
- document that makeIndirectObject doesn't handle foreign objects
|
|
|
|
|
automatically because copying a foreign object is a big enough
|
|
|
|
|
deal that it should be explicit. However addPages* does handle
|
|
|
|
|
foreign page objects automatically.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
- Test /Outlines and see whether there's any point in handling
|
|
|
|
|
them in the API. Maybe just copying them over works. What
|
|
|
|
|
about command line tool? Also think about page labels.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
- Tests through qpdf command line: copy pages from multiple PDFs
|
|
|
|
|
starting with one PDF and also starting with empty.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-16 01:15:24 +00:00
|
|
|
|
* Document --copy-encryption and --encryption-file-password in
|
|
|
|
|
manual. Mention that the first half of /ID as well as all the
|
|
|
|
|
encryption parameters are copied. Maybe mention about StrF and
|
|
|
|
|
StrM with respect to AES here and also with encryption
|
|
|
|
|
preservation.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
Soon
|
|
|
|
|
====
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
* See if I can support the new encryption formats mentioned in the
|
|
|
|
|
open bug on sourceforge. Check other sourceforge bugs.
|
2012-06-20 22:26:56 +00:00
|
|
|
|
|
2011-12-06 15:58:29 +00:00
|
|
|
|
|
2011-06-25 18:39:05 +00:00
|
|
|
|
General
|
|
|
|
|
=======
|
2011-06-23 18:47:01 +00:00
|
|
|
|
|
2011-04-30 18:20:35 +00:00
|
|
|
|
* Look for %PDF header somewhere within the first 1024 bytes of the
|
|
|
|
|
file. Also accept headers of the form "%!PS−Adobe−N.n PDF−M.m".
|
|
|
|
|
See Implementation notes 13 and 14 in appendix H of the PDF 1.7
|
|
|
|
|
specification. This is bug 3267974.
|
|
|
|
|
|
2010-12-27 15:42:48 +00:00
|
|
|
|
* Update qpdf docs about non-ascii passwords. See thread from
|
|
|
|
|
2010-12-07,08 for details.
|
|
|
|
|
|
2012-07-11 19:29:41 +00:00
|
|
|
|
* Consider impact of article threads on page splitting/merging.
|
|
|
|
|
Subramanyam provided a test file; see ../misc/article-threads.pdf.
|
|
|
|
|
Email Q-Count: 431864 from 2009-11-03. Other things to consider:
|
|
|
|
|
outlines, page labels, thumbnails, zones. There are probably
|
|
|
|
|
others.
|
2010-08-10 02:28:08 +00:00
|
|
|
|
|
2010-03-27 15:53:44 +00:00
|
|
|
|
* See whether it's possible to remove the call to
|
|
|
|
|
flattenScalarReferences. I can't easily figure out why I do it,
|
|
|
|
|
but removing it causes strange test failures in linearization. I
|
|
|
|
|
would have to study the optimization and linearization code to
|
|
|
|
|
figure out why I added this to begin with and what in the code
|
|
|
|
|
assumes it's the case. For enqueueObject and unparseChild in
|
|
|
|
|
QPDFWriter, simply removing the checks for indirect scalars seems
|
2010-06-06 15:45:02 +00:00
|
|
|
|
sufficient. Looking back at the branch in the apex epub
|
|
|
|
|
repository, before flattening scalar references, there was special
|
|
|
|
|
case code in QPDFWriter to avoid writing out indirect nulls. It's
|
|
|
|
|
still not obvious to me why I did it though.
|
2010-03-27 15:53:44 +00:00
|
|
|
|
|
|
|
|
|
To pursue this, remove the call to flattenScalarReferences in
|
|
|
|
|
QPDFWriter.cc and disable the logic_error exceptions for indirect
|
|
|
|
|
scalars. Just search for flattenScalarReferences in QPDFWriter.cc
|
|
|
|
|
since the logic errors have comments that mention
|
|
|
|
|
flattenScalarReferences. Then run the test suite. Several files
|
|
|
|
|
that explicitly test flattening of scalar references fail, but the
|
|
|
|
|
indirect scalars are properly preserved and written. But then
|
|
|
|
|
there are some linearized files that have a bunch of unreferenced
|
|
|
|
|
objects that contain scalars. Need to figure out what these are
|
|
|
|
|
and why they're there. Maybe they're objects that used to be
|
|
|
|
|
stream lengths. Probably we just need to make sure don't traverse
|
|
|
|
|
through a stream's /Length stream when enqueueing stream
|
2010-03-27 16:00:41 +00:00
|
|
|
|
dictionaries. This could potentially happen with any object that
|
|
|
|
|
QPDFWriter replaces when writing out files. Such objects would be
|
|
|
|
|
orphaned in the newly written file. This could be fixed, but it
|
|
|
|
|
may not be worth fixing.
|
2010-03-27 15:53:44 +00:00
|
|
|
|
|
|
|
|
|
If flattenScalarReferences is removed, a new method will be needed
|
|
|
|
|
for checking PDF files.
|
|
|
|
|
|
2010-06-06 15:45:02 +00:00
|
|
|
|
* See if we can avoid preserving unreferenced objects in object
|
2010-03-27 16:00:41 +00:00
|
|
|
|
streams even when preserving the object streams.
|
|
|
|
|
|
2010-01-18 03:37:01 +00:00
|
|
|
|
* For debugging linearization bugs, consider adding an option to save
|
|
|
|
|
pass 1 of linearization. This code is sufficient. Change the
|
|
|
|
|
interface to allow specification of a pass1 file, which would
|
|
|
|
|
change the behavior as in this patch.
|
|
|
|
|
|
|
|
|
|
------------------------------
|
|
|
|
|
Index: QPDFWriter.cc
|
|
|
|
|
===================================================================
|
2010-01-25 01:27:12 +00:00
|
|
|
|
--- QPDFWriter.cc (revision 932)
|
2010-01-18 03:37:01 +00:00
|
|
|
|
+++ QPDFWriter.cc (working copy)
|
2010-01-25 01:27:12 +00:00
|
|
|
|
@@ -1965,11 +1965,15 @@
|
2010-01-18 03:37:01 +00:00
|
|
|
|
|
|
|
|
|
// Write file in two passes. Part numbers refer to PDF spec 1.4.
|
|
|
|
|
|
|
|
|
|
+ FILE* XXX = 0;
|
|
|
|
|
for (int pass = 1; pass <= 2; ++pass)
|
|
|
|
|
{
|
|
|
|
|
if (pass == 1)
|
|
|
|
|
{
|
|
|
|
|
- pushDiscardFilter();
|
|
|
|
|
+// pushDiscardFilter();
|
|
|
|
|
+ XXX = fopen("/tmp/pass1.pdf", "w");
|
|
|
|
|
+ pushPipeline(new Pl_StdioFile("pass1", XXX));
|
|
|
|
|
+ activatePipelineStack();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Part 1: header
|
2010-01-25 01:27:12 +00:00
|
|
|
|
@@ -2204,6 +2208,8 @@
|
2010-01-18 03:37:01 +00:00
|
|
|
|
|
|
|
|
|
// Restore hint offset
|
|
|
|
|
this->xref[hint_id] = QPDFXRefEntry(1, hint_offset, 0);
|
|
|
|
|
+ fclose(XXX);
|
|
|
|
|
+ XXX = 0;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
------------------------------
|
|
|
|
|
|
2009-10-18 19:54:24 +00:00
|
|
|
|
* Handle embedded files. PDF Reference 1.7 section 3.10, "File
|
|
|
|
|
Specifications", discusses this. Once we can definitely recongize
|
|
|
|
|
all embedded files in a docucment, we can update the encryption
|
|
|
|
|
code to handle it properly. In QPDF_encryption.cc, search for
|
|
|
|
|
cf_file. Remove exception thrown if cf_file is different from
|
|
|
|
|
cf_stream, and write code in the stream decryption section to use
|
2011-08-11 12:17:39 +00:00
|
|
|
|
cf_file instead of cf_stream. In general, add interfaces to get
|
|
|
|
|
the list of embedded files and to extract them. To handle general
|
|
|
|
|
embedded files associated with the whole document, follow root ->
|
|
|
|
|
/Names -> /EmbeddedFiles -> /Names to get to the file specification
|
|
|
|
|
dictionaries. Then, in each file specification dictionary, follow
|
|
|
|
|
/EF -> /F to the actual stream. There may be other places file
|
|
|
|
|
specification dictionaries may appear, and there are also /RF keys
|
|
|
|
|
with related files, so reread section 3.10 carefully.
|
2009-10-18 19:54:24 +00:00
|
|
|
|
|
|
|
|
|
* The description of Crypt filters is unclear with respect to how to
|
|
|
|
|
use them to override /StmF for specific streams. I'm not sure
|
|
|
|
|
whether qpdf will do the right thing for any specific individual
|
|
|
|
|
streams that might have crypt filters. The specification seems to
|
|
|
|
|
imply that only embedded file streams and metadata streams can have
|
|
|
|
|
crypt filters, and there are already special cases in the code to
|
|
|
|
|
handle those. Most likely, it won't be a problem, but someday
|
|
|
|
|
someone may find a file that qpdf doesn't work on because of crypt
|
2009-10-19 00:17:11 +00:00
|
|
|
|
filters. There is an example in the spec of using a crypt filter
|
|
|
|
|
on a metadata stream.
|
|
|
|
|
|
2009-10-19 01:58:31 +00:00
|
|
|
|
For now, we notice /Crypt filters and decode parameters consistent
|
|
|
|
|
with the example in the PDF specification, and the right thing
|
|
|
|
|
happens for metadata filters that happen to be uncompressed or
|
|
|
|
|
otherwise compressed in a way we can filter. This should handle
|
|
|
|
|
all normal cases, but it's more or less just a guess since I don't
|
|
|
|
|
have any test files that actually use stream-specific crypt filters
|
|
|
|
|
in them.
|
2009-10-18 19:54:24 +00:00
|
|
|
|
|
2008-04-29 12:55:25 +00:00
|
|
|
|
* The second xref stream for linearized files has to be padded only
|
|
|
|
|
because we need file_size as computed in pass 1 to be accurate. If
|
|
|
|
|
we were not allowing writing to a pipe, we could seek back to the
|
|
|
|
|
beginning and fill in the value of /L in the linearization
|
|
|
|
|
dictionary as an optimization to alleviate the need for this
|
|
|
|
|
padding. Doing so would require us to pad the /L value
|
|
|
|
|
individually and also to save the file descriptor and determine
|
|
|
|
|
whether it's seekable. This is probably not worth bothering with.
|
|
|
|
|
|
|
|
|
|
* The whole xref handling code in the QPDF object allows the same
|
|
|
|
|
object with more than one generation to coexist, but a lot of logic
|
|
|
|
|
assumes this isn't the case. Anything that creates mappings only
|
|
|
|
|
with the object number and not the generation is this way,
|
|
|
|
|
including most of the interaction between QPDFWriter and QPDF. If
|
|
|
|
|
we wanted to allow the same object with more than one generation to
|
|
|
|
|
coexist, which I'm not sure is allowed, we could fix this by
|
|
|
|
|
changing xref_table. Alternatively, we could detect and disallow
|
|
|
|
|
that case. In fact, it appears that Adobe reader and other PDF
|
|
|
|
|
viewing software silently ignores objects of this type, so this is
|
|
|
|
|
probably not a big deal.
|
|
|
|
|
|
|
|
|
|
* Pl_PNGFilter is only partially implemented. If we ever decoded
|
|
|
|
|
images, we'd have to finish implementing it along with the other
|
|
|
|
|
filter decode parameters and types. For just handling xref
|
|
|
|
|
streams, there's really no need as it wouldn't make sense to use
|
|
|
|
|
any kind of predictor other than 12 (PNG UP filter).
|
|
|
|
|
|
|
|
|
|
* If we ever want to have check mode check the integrity of the free
|
|
|
|
|
list, this can be done by looking at the code from prior to the
|
|
|
|
|
object stream support of 4/5/2008. It's in an if (0) block and
|
|
|
|
|
there's a comment about it. There's also something about it in
|
|
|
|
|
qpdf.test -- search for "free table". On the other hand, the value
|
|
|
|
|
of doing this seems very low since no viewer seems to care, so it's
|
|
|
|
|
probably not worth it.
|
|
|
|
|
|
|
|
|
|
* QPDFObjectHandle::getPageImages() doesn't notice images in
|
|
|
|
|
inherited resource dictionaries. See comments in that function.
|
|
|
|
|
|
2009-02-22 22:24:49 +00:00
|
|
|
|
* Based on an idea suggested by user "Atom Smasher", consider
|
|
|
|
|
providing some mechanism to recover earlier versions of a file
|
|
|
|
|
embedded prior to appended sections.
|
|
|
|
|
|
2011-04-30 18:20:35 +00:00
|
|
|
|
* From a suggestion in bug 3152169, consisder having an option to
|
|
|
|
|
re-encode inline images with an ASCII encoding.
|