Fixes to ChangeLog and manual for 10.0.0 changes

2025-01-31 10:58:25 +00:00 · 2020-04-05 21:46:21 -04:00 · 2020-04-05 21:46:21 -04:00 · 3d0de5b924
commit 3d0de5b924
parent 98174373b9
3 changed files with 293 additions and 18 deletions
--- a/16
+++ b/16
@ -8,6 +8,12 @@
 	recovery when objects are copied from other files and when
 	"immediate copy from" is enabled.

+	* When copying foreign streams with immediateCopyFrom set, the
+	same type of recovery from streams with filtering errors is
+	performed as when dealing with streams in the original input. This
+	could happen, for example, if you are using the --pages option to
+	take pages from another file and that file has errors in it.
+
 	* Add a new version of QPDFObjectHandle::pipeStreamData whose
 	return value indicates overall success or failure rather than
 	whether nor not filtering was attempted. It should have always
@ -36,6 +42,12 @@
 	--preserve-unreferenced-resources is now a synonym for
 	--remove-unreferenced-resources=no.

+	* Use std::atomic for unique ID generation internally within the
+	library. This eliminates the already extremely low chance of a
+	collision, improves thread safety, and removes a dependency on a
+	random number generator. Thanks to Dean Scarff for the
+	contribution.
+
 2020-04-03  Jay Berkenbilt  <ejb@ql.org>

 	* Allow qpdf to be built on systems without wchar_t. All "normal"
@ -50,6 +62,10 @@
 	maximally fill the destination rectangle. Prior to this change,
 	placeFormXObject might shrink it but would never expand it.

+	* When calling the C API, accept any non-zero value as TRUE rather
+	than just 1. This appears to resolve issues on Windows when
+	calling some versions of the DLL directly from other languages.
+
 2020-04-02  Jay Berkenbilt  <ejb@ql.org>

 	* Add method QPDFObjectHandle::unsafeShallowCopy for copying only
--- a/manual/qpdf-manual.xml
+++ b/manual/qpdf-manual.xml
@ -1944,21 +1944,51 @@ outfile.pdf</option>
       </para>
      </listitem>
     </varlistentry>
+     <varlistentry>
+      <term><option>--remove-unreferenced-resources=<replaceable>option</replaceable></option></term>
+      <listitem>
+       <para>
+        The <replaceable>option</replaceable> may be
+        <literal>auto</literal>, <literal>yes</literal>, or
+        <literal>no</literal>. The default is <literal>auto</literal>.
+       </para>
+       <para>
+        Starting with qpdf 8.1, when splitting pages, qpdf is able to
+        attempt to remove images and fonts that are not used by a page
+        even if they are referenced in the page's resources
+        dictionary. When shared resources are in use, this behavior
+        can greatly reduce the file sizes of split pages, but the
+        analysis is very slow. In versions from 8.1 through 9.1.1,
+        qpdf did this analysis by default. Starting in qpdf 10.0.0, if
+        <literal>auto</literal> is used, qpdf does a quick analysis of
+        the file to determine whether the file is likely to have
+        unreferenced objects on pages, a pattern that frequently
+        occurs when resource dictionaries are shared across multiple
+        pages and rarely occurs otherwise. If it discovers this
+        pattern, then it will attempt to remove unreferenced
+        resources. Usually this means you get the slower splitting
+        speed only when it's actually going to create smaller files.
+        You can suppress removal of unreferenced resources altogether
+        by specifying <literal>no</literal> or force it to do the full
+        algorithm by specifying <literal>yes</literal>.
+       </para>
+       <para>
+        Other than cases in which you don't care about file size and
+        care a lot about runtime, there are few reasons to use this
+        option, especially now that <literal>auto</literal> mode is
+        supported. One reason to use this is if you suspect that qpdf
+        is removing resources it shouldn't be removing. If you
+        encounter that case, please report it as bug at <ulink
+        url="https://github.com/qpdf/qpdf/issues/">https://github.com/qpdf/qpdf/issues/</ulink>.
+       </para>
+      </listitem>
+     </varlistentry>
     <varlistentry>
      <term><option>--preserve-unreferenced-resources</option></term>
      <listitem>
       <para>
-        Starting with qpdf 8.1, when splitting pages, qpdf ordinarily
-        attempts to remove images and fonts that are not used by a
-        page even if they are referenced in the page's resources
-        dictionary. This option suppresses that behavior. There are
-        few reasons to use this option. One reason to use this is if
-        you suspect that qpdf is removing resources it shouldn't be
-        removing. If you encounter that case, please report it as a
-        bug. Another reason is that the new behavior can be much
-        slower for files that include a very large number of images or
-        other XObjects on a page. In that case, using this option will
-        return qpdf to the old behavior and speed.
+        This is a synonym for
+        <option>--remove-unreferenced-resources=no</option>.
       </para>
       <para>
        See also <option>--preserve-unreferenced</option>, which does
@ -4700,6 +4730,239 @@ print "\n";
   <filename>ChangeLog</filename> in the source distribution.
  </para>
  <variablelist>
+<!--
+   <varlistentry>
+    <term>x.y.z: Month dd, YYYY</term>
+    <listitem>
+     <itemizedlist>
+      <listitem>
+       <para>
+        Category
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          Item
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Item
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+      <listitem>
+       <para>
+        Category
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          Item
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Item
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+     </itemizedlist>
+    </listitem>
+   </varlistentry>
+-->
+   <varlistentry>
+    <term>10.0.0: April 6, 2020</term>
+    <listitem>
+     <itemizedlist>
+      <listitem>
+       <para>
+        Performance Enhancements
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          The qpdf library and executable should run much faster in
+          this version than in the last several releases. Several
+          internal library optimizations have been made, and there has
+          been improved behavior on page splitting as well. This
+          version of qpdf should outperform any of the 8.x or 9.x
+          versions.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+      <listitem>
+       <para>
+        CLI Enhancements
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          Add <literal>objectinfo</literal> key to the JSON output.
+          This will be a place to put computed metadata or other
+          information about PDF objects that are not immediately
+          evident in other ways or that seem useful for some other
+          reason. In this version, information is provided about each
+          object indicating whether it is a stream and, if so, what
+          its length and filters are. Without this, it was not
+          possible to tell conclusively from the JSON output alone
+          whether or not an object was a stream. Run <command>qpdf
+          --json-help</command> for details.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Add new option
+          <option>--remove-unreferenced-resources</option> which takes
+          <literal>auto</literal>, <literal>yes</literal>, or
+          <literal>no</literal> as arguments. The new
+          <literal>auto</literal> mode, which is the default, performs
+          a fast heuristic over a PDF file when splitting pages to
+          determine whether the expensive process of finding and
+          removing unreferenced resources is likely to be of benefit.
+          For most files, this new default will result in a
+          significant performance improvement for splitting pages. See
+          <xref linkend="ref.advanced-transformation"/> for a more
+          detailed discussion.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          The <option>--preserve-unreferenced-resources</option> is
+          now just a synonym for
+          <option>--remove-unreferenced-resources=no</option>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          If the <literal>QPDF_EXECUTABLE</literal> environment
+          variable is set when invoking <command>qpdf
+          --bash-completion</command> or <command>qpdf
+          --zsh-completion</command>, the completion command that it
+          outputs will refer to qpdf using the value of that variable
+          rather than what <command>qpdf</command> determines its
+          executable path to be. This can be useful when wrapping
+          <command>qpdf</command> with a script, working with a
+          version in the source tree, using an AppImage, or other
+          situations where there is some indirection.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+      <listitem>
+       <para>
+        Library Enhancements
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          Add a new version of
+          <function>QPDFObjectHandle::StreamDataProvider::provideStreamData</function>
+          that accepts the <function>suppress_warnings</function> and
+          <function>will_retry</function> options and allows a success
+          code to be returned. This makes it possible to implement a
+          <classname>StreamDataProvider</classname> that calls
+          <function>pipeStreamData</function> on another stream and to
+          pass the response back to the caller, which enables better
+          error handling on those proxied streams.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Update <function>QPDFObjectHandle::pipeStreamData</function>
+          to return an overall success code that goes beyond whether
+          or not filtered data was written successfully. This allows
+          better error handling of cases that were not filtering
+          errors. You have to call this explicitly. Methods in
+          previously existing APIs have the same semantics as before.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          The
+          <function>QPDFPageObjectHelper::placeFormXObject</function>
+          method now allows separate control over whether it should be
+          willing to shrink or expand objects to fit them better into
+          the destination rectangle. The previous behavior was that
+          shrinking was allowed but expansion was not. The previous
+          behavior is still the default.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          When calling the C API, any non-zero value passed to a
+          boolean parameter is treated as <literal>TRUE</literal>.
+          Previously only the value <literal>1</literal> was accepted.
+          This makes the C API behave more like most C interfaces and
+          is known to improve compatibility with some Windows
+          environments that dynamically load the DLL and call
+          functions from it.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Add <function>QPDFObjectHandle::unsafeShallowCopy</function>
+          for copying only top-level dictionary keys or array items.
+          This is unsafe because it creates a situation in which
+          changing a lower-level item in one object may also change it
+          in another object, but for cases in which you
+          <emphasis>know</emphasis> you are only inserting or
+          replacing top-level items, it is much faster than
+          <function>QPDFObjectHandle::shallowCopy</function>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Add <function>QPDFObjectHandle::filterAsContents</function>,
+          which filter's a stream's data as a content stream. This is
+          useful for parsing the contents for form XObjects in the
+          same way as parsing page content streams.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+      <listitem>
+       <para>
+        Bug Fixes
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          When detecting and removing unreferenced resources during
+          page splitting, traverse into form XObjects and handle their
+          resources dictionaries as well.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          The same error recovery is applied to streams in other than
+          the primary input file when merging or splitting pages.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+      <listitem>
+       <para>
+        Build Changes
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          Allow qpdf to built on stripped down systems whose C/C++
+          libraries lack the <classname>wchar_t</classname> type.
+          Search for <classname>wchar_t</classname> in qpdf's
+          README.md for details. This should be very rare, but it is
+          known to be helpful in some embedded environments.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </listitem>
+     </itemizedlist>
+    </listitem>
+   </varlistentry>
   <varlistentry>
   <term>9.1.1: January 26, 2020</term>
    <listitem>
@ -4804,8 +5067,6 @@ print "\n";
     </itemizedlist>
    </listitem>
   </varlistentry>
-  </variablelist>
-  <variablelist>
   <varlistentry>
    <term>9.1.0: November 17, 2019</term>
    <listitem>
@ -4905,8 +5166,6 @@ print "\n";
     </itemizedlist>
    </listitem>
   </varlistentry>
-  </variablelist>
-  <variablelist>
   <varlistentry>
    <term>9.0.2: October 12, 2019</term>
    <listitem>
@ -5272,7 +5531,7 @@ print "\n";
          in dynamically linked code catching exceptions or
          subclassing, this could be the reason. If you see this,
          please report a bug at <ulink
-          url="https://github.com/qpdf/qpdf/issues/">pikepdf</ulink>.
+          url="https://github.com/qpdf/qpdf/issues/">https://github.com/qpdf/qpdf/issues/</ulink>.
         </para>
        </listitem>
        <listitem>
--- a/qpdf/qpdf.cc
+++ b/qpdf/qpdf.cc
@ -1483,10 +1483,10 @@ ArgParser::argHelp()
        << "--normalize-content=[yn]  enables or disables normalization of content streams\n"
        << "--object-streams=mode     controls handing of object streams\n"
        << "--preserve-unreferenced   preserve unreferenced objects\n"
-        << "--preserve-unreferenced-resources\n"
-        << "                          synonym for --remove-unreferenced-resources=no\n"
        << "--remove-unreferenced-resources={auto,yes,no}\n"
        << "                          whether to remove unreferenced page resources\n"
+        << "--preserve-unreferenced-resources\n"
+        << "                          synonym for --remove-unreferenced-resources=no\n"
        << "--newline-before-endstream  always put a newline before endstream\n"
        << "--coalesce-contents       force all pages' content to be a single stream\n"
        << "--flatten-annotations=option\n"