2
1
mirror of https://github.com/qpdf/qpdf.git synced 2025-01-03 15:17:29 +00:00

Describe content normalization edge cases in manual

This commit is contained in:
Jay Berkenbilt 2018-02-20 21:12:55 -05:00
parent 30380b64e3
commit e429a2e170

View File

@ -1050,7 +1050,10 @@ outfile.pdf</option>
<term><option>--normalize-content=[yn]</option></term>
<listitem>
<para>
Enables or disables normalization of content streams.
Enables or disables normalization of content streams. Content
normalization is enabled by default in QDF mode. Please see
<xref linkend="ref.qdf"/> for additional discussion of QDF
mode.
</para>
</listitem>
</varlistentry>
@ -1205,6 +1208,36 @@ outfile.pdf</option>
who wish to study PDF content streams or to debug PDF content.
You should not use this for &ldquo;production&rdquo; PDF files.
</para>
<para>
This paragraph discusses edge cases of content normalization that
are not of concern to most users and are not relevant when content
normalization is not enabled. When normalizing content, if qpdf
runs into any lexical errors, it will print a warning indicating
that content may be damaged. The only situation in which qpdf is
known to cause damage during content normalization is when a
page's contents are split across multiple streams and streams are
split in the middle of a lexical token such as a string, name, or
inline image. There may be some pathological cases in which qpdf
could damage content without noticing this, such as if the partial
tokens at the end of one stream and the beginning of the next
stream are both valid, but usually qpdf will be able to detect
this case. For slightly increased safety, you can specify
<option>--coalesce-contents</option> in addition to
<option>--normalize-content</option> or <option>--qdf</option>.
This will cause qpdf to combine all the content streams into one,
thus recombining any split tokens. However doing this will prevent
you from being able to see the original layout of the content
streams. If you must inspect the original content streams in an
uncompressed format, you can always run with <option>--qdf
--normalize-content=n</option> for a QDF file without content
normalization, or alternatively
<option>--stream-data=uncompress</option> for a regular non-QDF
mode file with uncompressed streams. These will both uncompress
all the streams but will not attempt to normalize content. Please
note that if you are using content normalization or QDF mode for
the purpose of manually inspecting files, you don't have to care
about this.
</para>
<para>
Object streams, also known as compressed objects, were introduced
into the PDF specification at version 1.5, corresponding to