diff --git a/ChangeLog b/ChangeLog index c00f7a18..36f7f051 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2013-03-25 Jay Berkenbilt + + * manual/qpdf-manual.xml: Document the casting policy that is + followed in qpdf's implementation. + 2013-03-11 Jay Berkenbilt * When creating Windows binary distributions, make sure to only diff --git a/README.maintainer b/README.maintainer index 4be45f51..a1ac5847 100644 --- a/README.maintainer +++ b/README.maintainer @@ -31,7 +31,9 @@ Release Reminders * Check all open issues in the sourceforge trackers and on github. * If any interfaces were added or changed, check C API to see whether - changes are appropriate there as well. + changes are appropriate there as well. If necessary, review the + casting policy in the manual, and ensure that integer types are + properly handled. * Increment shared library version information as needed (libqpdf/build.mk) diff --git a/TODO b/TODO index 97e7d2c4..ebbd8105 100644 --- a/TODO +++ b/TODO @@ -1,3 +1,9 @@ +4.1.0 +===== + + * New public interfaces have been added. + + 4.2.0 ===== @@ -38,107 +44,6 @@ - See ../misc/broken-files -4.1.0 -===== - - * Add to documentation, and mention this documentation in - README.maintainer: - - Casting policy. - - The C++ code in qpdf is free of old-style casts except where - unavoidable (e.g. where the old-style cast is in a macro provided - by a third-party header file). When there is a need for a cast, it - is handled, in order of preference by rewriting the code to avoid - the need for a cast, calling const_cast, calling static_cast, - calling reinterpret_cast, or calling some combination of the above. - The casting policy explicitly prohibits casting between sizes for - no purpose other than to quiet a compiler warning when there is no - reasonable chance of a problem resulting. The reason for this - exclusion is that it takes away enabling additional compiler - warnings as a tool for making future improvements to this aspect of - the code and also damages the readability of the code. As a last - resort, a compiler-specific pragma may be used to suppress a - warning that we don't want to fix. Examples may include - suppressing warnings about the use of old-style casts in code that - is shared between C and C++ code. - - There are a few significant areas where casting is common in the qpdf - sources or where casting would be required to quiet higher levels - of compiler warnings but is omitted at present: - - * signed vs. unsigned char. For historical reasons, there are a - lot of places in qpdf's internals that deal with unsigned char, - which means that a lot of casting is required to interoperate - with standard library calls and std::string. In retrospect, - qpdf should have probably used signed char everywhere and just - cast to unsigned char when needed. There are reinterpret_cast - calls to go between char* and unsigned char*, and there are - static_cast calls to go between char and unsigned char. These - should always be safe. - - * non-const unsigned char* used in Pipeline interface. The - pipeline interface has a write() call that uses unsigned char* - without a const qualifier. The main reason for this is to - support pipelines that make calls to third-party libraries, such - as zlib, that don't include const in their interfaces. - Unfortunately, there are many places in the code where it is - desirable to have const char* with pipelines. None of the - pipeline implementations in qpdf currently modify the data - passed to write, and doing so would be counter to the intent of - Pipeline. There are places in the code where const_cast is used - to remove the const-ness of pointers going into Pipelines. This - could be potentially unsafe, but there is adequate testing to - assert that it is safe in qpdf's code. - - * size_t vs. qpdf_offset_t. This is pretty much unavoidable since - offsets are signed types and sizes are unsigned types. Whenever - it is necessary to seek by an amount given by a size_t, it - becomes necessary to mix and match between size_t and - qpdf_offset_t. Additionally, qpdf sometimes treats memory - buffers like files, and those seek interfaces have to be - consistent with file-based input sources. Neither gcc nor MSVC - give warnings for this case by default, but both have warning - flags that can enable this. (MSVC: /W14267 or /W3 (which also - enables some additional warnings that we ignore); gcc: - -Wconversion -Wsign-conversion). This could matter for files - whose sizes are larger than 2^63 bytes, but it is reasonable to - expect that a world where such files are common would also have - larger size_t and qpdf_offset_t types in it. I am not aware of - any cases where 32-bit systems that have size_t smaller than - qpdf_offset_t could run into problems, though I can't - conclusively rule out the possibility. In the event that - someone should produce a file that qpdf can't handle because of - what is suspected to be issues involving the handling of size_t - vs. qpdf_offset_t (such files may behave properly on 64-bit - systems but not on 32-bit systems and may have very large - embedded files or streams, for example), the above mentioned - warning flags could be enabled and all those implicit - conversions could be carefully scrutinized. (I have already - gone through that exercise once in adding support for files > - 4GB in size.) I continue to be commited to supporting large - files on 32-bit systems, but I would not go to any lengths to - support corner cases involving large embedded files or large - streams that work on 64-bit systems but not on 32-bit systems - because of size_t being too small. It is reasonable to assume - that anyone working with such files would be using a 64-bit - system anyway. - - * size_t vs. int. There are some cases where size_t and int or - size_t and unsigned int are used interchangeably. These cases - occur when working with very small amounts of memory, such as - with the bit readers (where we're working with just a few bytes - at a time), some cases of strlen, and a few other cases. I have - scrutinized all of these cases and determined them to be safe, - but there is no mechanism in the code to ensure that new unsafe - conversions between int and size_t aren't introduced short of - good testing and strong awareness of the issues. Again, if any - such bugs are suspected in the future, enable the additional - warning flags and scrutinizing the warnings would be in order. - - * New public interfaces have been added. - - General ======= diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml index 59b6c355..cfa6ec2b 100644 --- a/manual/qpdf-manual.xml +++ b/manual/qpdf-manual.xml @@ -1623,6 +1623,166 @@ outfile.pdf + + Casting Policy + + This section describes the casting policy followed by qpdf's + implementation. This is no concern to qpdf's end users and + largely of no concern to people writing code that uses qpdf, but + it could be of interest to people who are porting qpdf to a new + platform or who are making modifications to the code. + + + The C++ code in qpdf is free of old-style casts except where + unavoidable (e.g. where the old-style cast is in a macro provided + by a third-party header file). When there is a need for a cast, + it is handled, in order of preference, by rewriting the code to + avoid the need for a cast, calling + const_cast, calling + static_cast, calling + reinterpret_cast, or calling some combination + of the above. As a last resort, a compiler-specific + #pragma may be used to suppress a warning that + we don't want to fix. Examples may include suppressing warnings + about the use of old-style casts in code that is shared between C + and C++ code. + + + The casting policy explicitly prohibits casting between integer + sizes for no purpose other than to quiet a compiler warning when + there is no reasonable chance of a problem resulting. The reason + for this exclusion is that the practice of adding these additional + casts precludes future use of additional compiler warnings as a + tool for making future improvements to this aspect of the code, + and it also damages the readability of the code. + + + There are a few significant areas where casting is common in the + qpdf sources or where casting would be required to quiet higher + levels of compiler warnings but is omitted at present: + + + + char vs. unsigned char. For + historical reasons, there are a lot of places in qpdf's + internals that deal with unsigned char, which + means that a lot of casting is required to interoperate with + standard library calls and std::string. In + retrospect, qpdf should have probably used regular (signed) + char and char* everywhere and just + cast to unsigned char when needed, but it's too + late to make that change now. There are + reinterpret_cast calls to go between + char* and unsigned char*, and there + are static_cast calls to go between + char and unsigned char. These should + always be safe. + + + + + Non-const unsigned char* used in the + Pipeline interface. The pipeline interface has a + write call that uses unsigned + char* without a const qualifier. The main + reason for this is to support pipelines that make calls to + third-party libraries, such as zlib, that don't include + const in their interfaces. Unfortunately, there + are many places in the code where it is desirable to have + const char* with pipelines. None of the pipeline + implementations in qpdf currently modify the data passed to + write, and doing so would be counter to the intent of + Pipeline, but there is nothing in the code to + prevent this from being done. There are places in the code + where const_cast is used to remove the + const-ness of pointers going into Pipelines. This + could theoretically be unsafe, but there is adequate testing to + assert that it is safe and will remain safe in qpdf's code. + + + + + size_t vs. qpdf_offset_t. This is + pretty much unavoidable since sizes are unsigned types and + offsets are signed types. Whenever it is necessary to seek by + an amount given by a size_t, it becomes necessary + to mix and match between size_t and + qpdf_offset_t. Additionally, qpdf sometimes + treats memory buffers like files (as with + BufferInputSource, and those seek interfaces have + to be consistent with file-based input sources. Neither gcc + nor MSVC give warnings for this case by default, but both have + warning flags that can enable this. (MSVC: + or , which also + enables some additional warnings that we ignore; gcc: + ). This could + matter for files whose sizes are larger than + 263 bytes, but it is reasonable to + expect that a world where such files are common would also have + larger size_t and qpdf_offset_t types + in it. On most 64-bit systems at the time of this writing (the + release of version 4.1.0 of qpdf), both size_t and + qpdf_offset_t are 64-bit integer types, while on + many current 32-bit systems, size_t is a 32-bit + type while qpdf_offset_t is a 64-bit type. I am + not aware of any cases where 32-bit systems that have + size_t smaller than qpdf_offset_t + could run into problems. Although I can't conclusively rule + out the possibility of such problems existing, I suspect any + cases would be pretty contrived. In the event that someone + should produce a file that qpdf can't handle because of what is + suspected to be issues involving the handling of + size_t vs. qpdf_offset_t (such files + may behave properly on 64-bit systems but not on 32-bit systems + because they have very large embedded files or streams, for + example), the above mentioned warning flags could be enabled + and all those implicit conversions could be carefully + scrutinized. (I have already gone through that exercise once + in adding support for files larger than 4 GB in size.) I + continue to be commited to supporting large files on 32-bit + systems, but I would not go to any lengths to support corner + cases involving large embedded files or large streams that work + on 64-bit systems but not on 32-bit systems because of + size_t being too small. It is reasonable to + assume that anyone working with such files would be using a + 64-bit system anyway since many 32-bit applications would have + similar difficulties. + + + + + size_t vs. int or long. + There are some cases where size_t and + int or long or size_t + and unsigned int or unsigned long are + used interchangeably. These cases occur when working with very + small amounts of memory, such as with the bit readers (where + we're working with just a few bytes at a time), some cases of + strlen, and a few other cases. I have + scrutinized all of these cases and determined them to be safe, + but there is no mechanism in the code to ensure that new unsafe + conversions between int and size_t + aren't introduced short of good testing and strong awareness of + the issues. Again, if any such bugs are suspected in the + future, enabling the additional warning flags and scrutinizing + the warnings would be in order. + + + + + + To be clear, I believe qpdf to be well-behaved with respect to + sizes and offsets, and qpdf's test suite includes actual + generation and full processing of files larger than 4 GB in + size. The issues raised here are largely academic and should not + in any way be interpreted to mean that qpdf has practical problems + involving sloppiness with integer types. I also believe that + appropriate measures have been taken in the code to avoid problems + with signed vs. unsigned integers from resulting in memory + overwrites or other issues with potential security implications, + though there are never any absolute guarantees. + + Encryption