TODO: rescope some items

2022-08-06 16:35:40 -04:00 · 2022-08-06 16:35:40 -04:00 · 48dfae6443
parent 433be3718a
commit 48dfae6443
1 changed files with 170 additions and 161 deletions
--- a/113
+++ b/113
@ -21,31 +21,15 @@ Pending changes:
  appimage build specifically is setting the runpath, which is
  actually desirable in this case. Make sure to understand and
  document this. Maybe add a check for it in the build.
-* Decide what to do about #664 (get*Box)
-* Add an option --ignore-encryption to ignore encryption information
-  and treat encrypted files as if they weren't encrypted. This should
-  make it possible to solve #598 (--show-encryption without a
-  password). We'll need to make sure we don't try to filter any
-  streams in this mode. Ideally we should be able to combine this with
-  --json so we can look at the raw encrypted strings and streams if we
-  want to, though be sure to document that the resulting JSON won't be
-  convertible back to a valid PDF. Since providing the password may
-  reveal additional details, --show-encryption could potentially retry
-  with this option if the first time doesn't work. Then, with the file
-  open, we can read the encryption dictionary normally.
-* In libtests, separate executables that need the object library
-  from those that strictly use public API. Move as many of the test
-  drivers from the qpdf directory into the latter category as long
-  as doing so isn't too troublesome from a coverage standpoint.
-* Consider adding fuzzer code for JSON
-* Consider generating a non-flat pages tree before creating output to
-  better handle files with lots of pages. If there are more than 256
-  pages, add a second layer with the second layer nodes having no more
-  than 256 nodes and being as evenly sizes as possible. Don't worry
-  about the case of more than 65,536 pages. If the top node has more
-  than 256 children, we'll live with it.

-Parent pointer idea:
+Soon: Break ground on "Document-level work"
+
+Fix Multiple Direct Object Owner Issue
+======================================
+
+These are some ideas I've had, but I'm parking them until I fully
+understand m-holger's proposal to split QPDFObject into QPDFObject and
+QPDFValue.

 * Add std::weak_ptr<QPDFObject> parent to QPDFObject. When adding a
  direct object to an array or dictionary, set its parent. When
@ -65,8 +49,6 @@ Note that arrays and dictionaries still need to contain
 QPDFObjectHandle because of indirect objects. This only pertains to
 direct objects, which are always "resolved" in QPDFObjectHandle.

-Soon: Break ground on "Document-level work"
-
 Possible future JSON enhancements
 =================================

@ -376,24 +358,51 @@ directory or that are otherwise not publicly accessible. This includes
 things sent to me by email that are specifically not public. Even so,
 I find it useful to make reference to them in this list.

- * Look at https://bestpractices.coreinfrastructure.org/en
+* Add an option --ignore-encryption to ignore encryption information
+  and treat encrypted files as if they weren't encrypted. This should
+  make it possible to solve #598 (--show-encryption without a
+  password). We'll need to make sure we don't try to filter any
+  streams in this mode. Ideally we should be able to combine this with
+  --json so we can look at the raw encrypted strings and streams if we
+  want to, though be sure to document that the resulting JSON won't be
+  convertible back to a valid PDF. Since providing the password may
+  reveal additional details, --show-encryption could potentially retry
+  with this option if the first time doesn't work. Then, with the file
+  open, we can read the encryption dictionary normally.

- * Rework tests so that nothing is written into the source directory.
+* In libtests, separate executables that need the object library
+  from those that strictly use public API. Move as many of the test
+  drivers from the qpdf directory into the latter category as long
+  as doing so isn't too troublesome from a coverage standpoint.
+
+* Consider generating a non-flat pages tree before creating output to
+  better handle files with lots of pages. If there are more than 256
+  pages, add a second layer with the second layer nodes having no more
+  than 256 nodes and being as evenly sizes as possible. Don't worry
+  about the case of more than 65,536 pages. If the top node has more
+  than 256 children, we'll live with it. This is only safe if all
+  intermediate page nodes have only /Kids, /Parent, /Type, and /Count.
+
+* Look at https://bestpractices.coreinfrastructure.org/en
+
+* Consider adding fuzzer code for JSON
+
+* Rework tests so that nothing is written into the source directory.
  Ideally then the entire build could be done with a read-only
  source tree.

- * Large file tests fail with linux32 before and after cmake. This was
+* Large file tests fail with linux32 before and after cmake. This was
  first noticed after 10.6.3. I don't think it's worth fixing.

- * Consider updating the fuzzer with code that exercises
+* Consider updating the fuzzer with code that exercises
  copyAnnotations, file attachments, and name and number trees. Check
  fuzzer coverage.

- * Add code for creation of a file attachment annotation. It should
+* Add code for creation of a file attachment annotation. It should
  also be possible to create a widget annotation and a form field.
  Update the pdf-attach-file.cc example with new APIs when ready.

- * Flattening of form XObjects seems like something that would be
+* Flattening of form XObjects seems like something that would be
  useful in the library. We are seeing more cases of completely valid
  PDF files with form XObjects that cause problems in other software.
  Flattening of form XObjects could be a useful way to work around
@ -409,18 +418,18 @@ I find it useful to make reference to them in this list.
  dictionary may need to be changed -- create test cases with lots of
  duplicated/overlapping keys.

- * Part of closed_file_input_source.cc is disabled on Windows because
+* Part of closed_file_input_source.cc is disabled on Windows because
  of odd failures. It might be worth investigating so we can fully
  exercise this in the test suite. That said, ClosedFileInputSource
  is exercised elsewhere in qpdf's test suite, so this is not that
  pressing.

- * If possible, consider adding CCITT3, CCITT4, or any other easy
+* If possible, consider adding CCITT3, CCITT4, or any other easy
  filters. For some reference code that we probably can't use but may
  be handy anyway, see
  http://partners.adobe.com/public/developer/ps/sdk/index_archive.html

- * If possible, support the following types of broken files:
+* If possible, support the following types of broken files:

   - Files that have no whitespace token after "endobj" such that
     endobj collides with the start of the next object
@ -431,13 +440,13 @@ I find it useful to make reference to them in this list.
     snapshot of the google doc and linked PDF files from issue #476.
     Please see the issue for details.

- * Additional form features
+* Additional form features
  * set value from CLI? Specify title, and provide way to
    disambiguate, probably by giving objgen of field

- * Pl_TIFFPredictor is pretty slow.
+* Pl_TIFFPredictor is pretty slow.

- * Support for handling file names with Unicode characters in Windows
+* Support for handling file names with Unicode characters in Windows
  is incomplete. qpdf seems to support them okay from a functionality
  standpoint, and the right thing happens if you pass in UTF-8
  encoded filenames to QPDF library routines in Windows (they are
@ -445,20 +454,20 @@ I find it useful to make reference to them in this list.
  UTF-8 on output, which doesn't produce nice error messages or
  output on Windows in some cases.

- * If we ever wanted to do anything more with character encoding, see
+* If we ever wanted to do anything more with character encoding, see
  ../misc/character-encoding/, which includes machine-readable dump
  of table D.2 in the ISO-32000 PDF spec. This shows the mapping
  between Unicode, StandardEncoding, WinAnsiEncoding,
  MacRomanEncoding, and PDFDocEncoding.

- * Some test cases on bad files fail because qpdf is unable to find
+* Some test cases on bad files fail because qpdf is unable to find
  the root dictionary when it fails to read the trailer. Recovery
  could find the root dictionary and even the info dictionary in
  other ways. In particular, issue-202.pdf can be opened by evince,
  and there's no real reason that qpdf couldn't be made to be able to
  recover that file as well.

- * Audit every place where qpdf allocates memory to see whether there
+* Audit every place where qpdf allocates memory to see whether there
  are cases where malicious inputs could cause qpdf to attempt to
  grab very large amounts of memory. Certainly there are cases like
  this, such as if a very highly compressed, very large image stream
@ -466,7 +475,7 @@ I find it useful to make reference to them in this list.
  filtering doesn't ever try to do this. QPDFWriter should be checked
  carefully too. See also bugs/private/from-email-663916/

- * Interactive form modification:
+* Interactive form modification:
  https://github.com/qpdf/qpdf/issues/213 contains a good discussion
  of some ideas for adding methods to modify annotations and form
  fields if we want to make it easier to support modifications to
@ -476,19 +485,19 @@ I find it useful to make reference to them in this list.
  for "Regarding write functionality", and read that comment and the
  responses to it.

- * Look at ~/Q/pdf-collection/forms-from-appian/
+* Look at ~/Q/pdf-collection/forms-from-appian/

- * When decrypting files with /R=6, hash_V5 is called more than once
+* When decrypting files with /R=6, hash_V5 is called more than once
  with the same inputs.  Caching the results or refactoring to reduce
  the number of identical calls could improve performance for
  workloads that involve processing large numbers of small files.

- * Consider adding a method to balance the pages tree.  It would call
+* Consider adding a method to balance the pages tree.  It would call
  pushInheritedAttributesToPage, construct a pages tree from scratch,
  and replace the /Pages key of the root dictionary with the new
  tree.

- * Study what's required to support savable forms that can be saved by
+* Study what's required to support savable forms that can be saved by
  Adobe Reader. Does this require actually signing the document with
  an Adobe private key? Search for "Digital signatures" in the PDF
  spec, and look at ~/Q/pdf-collection/form-with-full-save.pdf, which
@ -497,7 +506,7 @@ I find it useful to make reference to them in this list.
  implemented, update the docs on crypto providers, which mention
  that this may happen in the future.

- * Qpdf does not honor /EFF when adding new file attachments. When it
+* Qpdf does not honor /EFF when adding new file attachments. When it
  encrypts, it never generates streams with explicit crypt filters.
  Prior to 10.2, there was an incorrect attempt to treat /EFF as a
  default value for decrypting file attachment streams, but it is not
@ -505,7 +514,7 @@ I find it useful to make reference to them in this list.
  writers to obey this when adding new attachments. Qpdf is not a
  conforming writer in that respect.

- * The whole xref handling code in the QPDF object allows the same
+* The whole xref handling code in the QPDF object allows the same
  object with more than one generation to coexist, but a lot of logic
  assumes this isn't the case.  Anything that creates mappings only
  with the object number and not the generation is this way,
@ -517,10 +526,10 @@ I find it useful to make reference to them in this list.
  viewing software silently ignores objects of this type, so this is
  probably not a big deal.

- * From a suggestion in bug 3152169, consider having an option to
+* From a suggestion in bug 3152169, consider having an option to
  re-encode inline images with an ASCII encoding.

- * From github issue 2, provide more in-depth output for examining
+* From github issue 2, provide more in-depth output for examining
  hint stream contents. Consider adding on option to provide a
  human-readable dump of linearization hint tables. This should
  include improving the 'overflow reading bit stream' message as
@ -532,11 +541,11 @@ I find it useful to make reference to them in this list.
  logic error and what could happen because of malformed user input.
  See also ../misc/linearization-errors.

- * If I ever decide to make appearance stream-generation aware of
+* If I ever decide to make appearance stream-generation aware of
  fonts or font metrics, see email from Tobias with Message-ID
  <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14.

- * Look at places in the code where object traversal is being done and,
+* Look at places in the code where object traversal is being done and,
  where possible, try to avoid it entirely or at least avoid ever
  traversing the same objects multiple times.