mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-31 10:58:25 +00:00
Document casting policy
This commit is contained in:
parent
1ec1b12864
commit
e8ddac8950
@ -1,3 +1,8 @@
|
||||
2013-03-25 Jay Berkenbilt <ejb@ql.org>
|
||||
|
||||
* manual/qpdf-manual.xml: Document the casting policy that is
|
||||
followed in qpdf's implementation.
|
||||
|
||||
2013-03-11 Jay Berkenbilt <ejb@ql.org>
|
||||
|
||||
* When creating Windows binary distributions, make sure to only
|
||||
|
@ -31,7 +31,9 @@ Release Reminders
|
||||
* Check all open issues in the sourceforge trackers and on github.
|
||||
|
||||
* If any interfaces were added or changed, check C API to see whether
|
||||
changes are appropriate there as well.
|
||||
changes are appropriate there as well. If necessary, review the
|
||||
casting policy in the manual, and ensure that integer types are
|
||||
properly handled.
|
||||
|
||||
* Increment shared library version information as needed
|
||||
(libqpdf/build.mk)
|
||||
|
107
TODO
107
TODO
@ -1,3 +1,9 @@
|
||||
4.1.0
|
||||
=====
|
||||
|
||||
* New public interfaces have been added.
|
||||
|
||||
|
||||
4.2.0
|
||||
=====
|
||||
|
||||
@ -38,107 +44,6 @@
|
||||
- See ../misc/broken-files
|
||||
|
||||
|
||||
4.1.0
|
||||
=====
|
||||
|
||||
* Add to documentation, and mention this documentation in
|
||||
README.maintainer:
|
||||
|
||||
Casting policy.
|
||||
|
||||
The C++ code in qpdf is free of old-style casts except where
|
||||
unavoidable (e.g. where the old-style cast is in a macro provided
|
||||
by a third-party header file). When there is a need for a cast, it
|
||||
is handled, in order of preference by rewriting the code to avoid
|
||||
the need for a cast, calling const_cast, calling static_cast,
|
||||
calling reinterpret_cast, or calling some combination of the above.
|
||||
The casting policy explicitly prohibits casting between sizes for
|
||||
no purpose other than to quiet a compiler warning when there is no
|
||||
reasonable chance of a problem resulting. The reason for this
|
||||
exclusion is that it takes away enabling additional compiler
|
||||
warnings as a tool for making future improvements to this aspect of
|
||||
the code and also damages the readability of the code. As a last
|
||||
resort, a compiler-specific pragma may be used to suppress a
|
||||
warning that we don't want to fix. Examples may include
|
||||
suppressing warnings about the use of old-style casts in code that
|
||||
is shared between C and C++ code.
|
||||
|
||||
There are a few significant areas where casting is common in the qpdf
|
||||
sources or where casting would be required to quiet higher levels
|
||||
of compiler warnings but is omitted at present:
|
||||
|
||||
* signed vs. unsigned char. For historical reasons, there are a
|
||||
lot of places in qpdf's internals that deal with unsigned char,
|
||||
which means that a lot of casting is required to interoperate
|
||||
with standard library calls and std::string. In retrospect,
|
||||
qpdf should have probably used signed char everywhere and just
|
||||
cast to unsigned char when needed. There are reinterpret_cast
|
||||
calls to go between char* and unsigned char*, and there are
|
||||
static_cast calls to go between char and unsigned char. These
|
||||
should always be safe.
|
||||
|
||||
* non-const unsigned char* used in Pipeline interface. The
|
||||
pipeline interface has a write() call that uses unsigned char*
|
||||
without a const qualifier. The main reason for this is to
|
||||
support pipelines that make calls to third-party libraries, such
|
||||
as zlib, that don't include const in their interfaces.
|
||||
Unfortunately, there are many places in the code where it is
|
||||
desirable to have const char* with pipelines. None of the
|
||||
pipeline implementations in qpdf currently modify the data
|
||||
passed to write, and doing so would be counter to the intent of
|
||||
Pipeline. There are places in the code where const_cast is used
|
||||
to remove the const-ness of pointers going into Pipelines. This
|
||||
could be potentially unsafe, but there is adequate testing to
|
||||
assert that it is safe in qpdf's code.
|
||||
|
||||
* size_t vs. qpdf_offset_t. This is pretty much unavoidable since
|
||||
offsets are signed types and sizes are unsigned types. Whenever
|
||||
it is necessary to seek by an amount given by a size_t, it
|
||||
becomes necessary to mix and match between size_t and
|
||||
qpdf_offset_t. Additionally, qpdf sometimes treats memory
|
||||
buffers like files, and those seek interfaces have to be
|
||||
consistent with file-based input sources. Neither gcc nor MSVC
|
||||
give warnings for this case by default, but both have warning
|
||||
flags that can enable this. (MSVC: /W14267 or /W3 (which also
|
||||
enables some additional warnings that we ignore); gcc:
|
||||
-Wconversion -Wsign-conversion). This could matter for files
|
||||
whose sizes are larger than 2^63 bytes, but it is reasonable to
|
||||
expect that a world where such files are common would also have
|
||||
larger size_t and qpdf_offset_t types in it. I am not aware of
|
||||
any cases where 32-bit systems that have size_t smaller than
|
||||
qpdf_offset_t could run into problems, though I can't
|
||||
conclusively rule out the possibility. In the event that
|
||||
someone should produce a file that qpdf can't handle because of
|
||||
what is suspected to be issues involving the handling of size_t
|
||||
vs. qpdf_offset_t (such files may behave properly on 64-bit
|
||||
systems but not on 32-bit systems and may have very large
|
||||
embedded files or streams, for example), the above mentioned
|
||||
warning flags could be enabled and all those implicit
|
||||
conversions could be carefully scrutinized. (I have already
|
||||
gone through that exercise once in adding support for files >
|
||||
4GB in size.) I continue to be commited to supporting large
|
||||
files on 32-bit systems, but I would not go to any lengths to
|
||||
support corner cases involving large embedded files or large
|
||||
streams that work on 64-bit systems but not on 32-bit systems
|
||||
because of size_t being too small. It is reasonable to assume
|
||||
that anyone working with such files would be using a 64-bit
|
||||
system anyway.
|
||||
|
||||
* size_t vs. int. There are some cases where size_t and int or
|
||||
size_t and unsigned int are used interchangeably. These cases
|
||||
occur when working with very small amounts of memory, such as
|
||||
with the bit readers (where we're working with just a few bytes
|
||||
at a time), some cases of strlen, and a few other cases. I have
|
||||
scrutinized all of these cases and determined them to be safe,
|
||||
but there is no mechanism in the code to ensure that new unsafe
|
||||
conversions between int and size_t aren't introduced short of
|
||||
good testing and strong awareness of the issues. Again, if any
|
||||
such bugs are suspected in the future, enable the additional
|
||||
warning flags and scrutinizing the warnings would be in order.
|
||||
|
||||
* New public interfaces have been added.
|
||||
|
||||
|
||||
General
|
||||
=======
|
||||
|
||||
|
@ -1623,6 +1623,166 @@ outfile.pdf</option>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="ref.casting">
|
||||
<title>Casting Policy</title>
|
||||
<para>
|
||||
This section describes the casting policy followed by qpdf's
|
||||
implementation. This is no concern to qpdf's end users and
|
||||
largely of no concern to people writing code that uses qpdf, but
|
||||
it could be of interest to people who are porting qpdf to a new
|
||||
platform or who are making modifications to the code.
|
||||
</para>
|
||||
<para>
|
||||
The C++ code in qpdf is free of old-style casts except where
|
||||
unavoidable (e.g. where the old-style cast is in a macro provided
|
||||
by a third-party header file). When there is a need for a cast,
|
||||
it is handled, in order of preference, by rewriting the code to
|
||||
avoid the need for a cast, calling
|
||||
<function>const_cast</function>, calling
|
||||
<function>static_cast</function>, calling
|
||||
<function>reinterpret_cast</function>, or calling some combination
|
||||
of the above. As a last resort, a compiler-specific
|
||||
<literal>#pragma</literal> may be used to suppress a warning that
|
||||
we don't want to fix. Examples may include suppressing warnings
|
||||
about the use of old-style casts in code that is shared between C
|
||||
and C++ code.
|
||||
</para>
|
||||
<para>
|
||||
The casting policy explicitly prohibits casting between integer
|
||||
sizes for no purpose other than to quiet a compiler warning when
|
||||
there is no reasonable chance of a problem resulting. The reason
|
||||
for this exclusion is that the practice of adding these additional
|
||||
casts precludes future use of additional compiler warnings as a
|
||||
tool for making future improvements to this aspect of the code,
|
||||
and it also damages the readability of the code.
|
||||
</para>
|
||||
<para>
|
||||
There are a few significant areas where casting is common in the
|
||||
qpdf sources or where casting would be required to quiet higher
|
||||
levels of compiler warnings but is omitted at present:
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<type>char</type> vs. <type>unsigned char</type>. For
|
||||
historical reasons, there are a lot of places in qpdf's
|
||||
internals that deal with <type>unsigned char</type>, which
|
||||
means that a lot of casting is required to interoperate with
|
||||
standard library calls and <type>std::string</type>. In
|
||||
retrospect, qpdf should have probably used regular (signed)
|
||||
<type>char</type> and <type>char*</type> everywhere and just
|
||||
cast to <type>unsigned char</type> when needed, but it's too
|
||||
late to make that change now. There are
|
||||
<function>reinterpret_cast</function> calls to go between
|
||||
<type>char*</type> and <type>unsigned char*</type>, and there
|
||||
are <function>static_cast</function> calls to go between
|
||||
<type>char</type> and <type>unsigned char</type>. These should
|
||||
always be safe.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Non-const <type>unsigned char*</type> used in the
|
||||
<type>Pipeline</type> interface. The pipeline interface has a
|
||||
<function>write</function> call that uses <type>unsigned
|
||||
char*</type> without a <type>const</type> qualifier. The main
|
||||
reason for this is to support pipelines that make calls to
|
||||
third-party libraries, such as zlib, that don't include
|
||||
<type>const</type> in their interfaces. Unfortunately, there
|
||||
are many places in the code where it is desirable to have
|
||||
<type>const char*</type> with pipelines. None of the pipeline
|
||||
implementations in qpdf currently modify the data passed to
|
||||
write, and doing so would be counter to the intent of
|
||||
<type>Pipeline</type>, but there is nothing in the code to
|
||||
prevent this from being done. There are places in the code
|
||||
where <function>const_cast</function> is used to remove the
|
||||
const-ness of pointers going into <type>Pipeline</type>s. This
|
||||
could theoretically be unsafe, but there is adequate testing to
|
||||
assert that it is safe and will remain safe in qpdf's code.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
|
||||
pretty much unavoidable since sizes are unsigned types and
|
||||
offsets are signed types. Whenever it is necessary to seek by
|
||||
an amount given by a <type>size_t</type>, it becomes necessary
|
||||
to mix and match between <type>size_t</type> and
|
||||
<type>qpdf_offset_t</type>. Additionally, qpdf sometimes
|
||||
treats memory buffers like files (as with
|
||||
<type>BufferInputSource</type>, and those seek interfaces have
|
||||
to be consistent with file-based input sources. Neither gcc
|
||||
nor MSVC give warnings for this case by default, but both have
|
||||
warning flags that can enable this. (MSVC:
|
||||
<option>/W14267</option> or <option>/W3</option>, which also
|
||||
enables some additional warnings that we ignore; gcc:
|
||||
<option>-Wconversion -Wsign-conversion</option>). This could
|
||||
matter for files whose sizes are larger than
|
||||
2<superscript>63</superscript> bytes, but it is reasonable to
|
||||
expect that a world where such files are common would also have
|
||||
larger <type>size_t</type> and <type>qpdf_offset_t</type> types
|
||||
in it. On most 64-bit systems at the time of this writing (the
|
||||
release of version 4.1.0 of qpdf), both <type>size_t</type> and
|
||||
<type>qpdf_offset_t</type> are 64-bit integer types, while on
|
||||
many current 32-bit systems, <type>size_t</type> is a 32-bit
|
||||
type while <type>qpdf_offset_t</type> is a 64-bit type. I am
|
||||
not aware of any cases where 32-bit systems that have
|
||||
<type>size_t</type> smaller than <type>qpdf_offset_t</type>
|
||||
could run into problems. Although I can't conclusively rule
|
||||
out the possibility of such problems existing, I suspect any
|
||||
cases would be pretty contrived. In the event that someone
|
||||
should produce a file that qpdf can't handle because of what is
|
||||
suspected to be issues involving the handling of
|
||||
<type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
|
||||
may behave properly on 64-bit systems but not on 32-bit systems
|
||||
because they have very large embedded files or streams, for
|
||||
example), the above mentioned warning flags could be enabled
|
||||
and all those implicit conversions could be carefully
|
||||
scrutinized. (I have already gone through that exercise once
|
||||
in adding support for files larger than 4 GB in size.) I
|
||||
continue to be commited to supporting large files on 32-bit
|
||||
systems, but I would not go to any lengths to support corner
|
||||
cases involving large embedded files or large streams that work
|
||||
on 64-bit systems but not on 32-bit systems because of
|
||||
<type>size_t</type> being too small. It is reasonable to
|
||||
assume that anyone working with such files would be using a
|
||||
64-bit system anyway since many 32-bit applications would have
|
||||
similar difficulties.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<type>size_t</type> vs. <type>int</type> or <type>long</type>.
|
||||
There are some cases where <type>size_t</type> and
|
||||
<type>int</type> or <type>long</type> or <type>size_t</type>
|
||||
and <type>unsigned int</type> or <type>unsigned long</type> are
|
||||
used interchangeably. These cases occur when working with very
|
||||
small amounts of memory, such as with the bit readers (where
|
||||
we're working with just a few bytes at a time), some cases of
|
||||
<function>strlen</function>, and a few other cases. I have
|
||||
scrutinized all of these cases and determined them to be safe,
|
||||
but there is no mechanism in the code to ensure that new unsafe
|
||||
conversions between <type>int</type> and <type>size_t</type>
|
||||
aren't introduced short of good testing and strong awareness of
|
||||
the issues. Again, if any such bugs are suspected in the
|
||||
future, enabling the additional warning flags and scrutinizing
|
||||
the warnings would be in order.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
To be clear, I believe qpdf to be well-behaved with respect to
|
||||
sizes and offsets, and qpdf's test suite includes actual
|
||||
generation and full processing of files larger than 4 GB in
|
||||
size. The issues raised here are largely academic and should not
|
||||
in any way be interpreted to mean that qpdf has practical problems
|
||||
involving sloppiness with integer types. I also believe that
|
||||
appropriate measures have been taken in the code to avoid problems
|
||||
with signed vs. unsigned integers from resulting in memory
|
||||
overwrites or other issues with potential security implications,
|
||||
though there are never any absolute guarantees.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="ref.encryption">
|
||||
<title>Encryption</title>
|
||||
<para>
|
||||
|
Loading…
x
Reference in New Issue
Block a user