From fddbcab0e7cc5978802251696055efa64667f637 Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Sun, 6 Jan 2019 21:18:36 -0500 Subject: [PATCH] Mostly don't require original QPDF for copyForeignObject (fixes #219) The original QPDF is only required now when the source QPDFObjectHandle is a stream that gets its stream data from a QPDFObjectHandle::StreamDataProvider. --- ChangeLog | 10 +++ TODO | 52 ++++++++++---- include/qpdf/QPDF.hh | 115 ++++++++++++++++++++---------- libqpdf/QPDF.cc | 135 +++++++++++++++++++++++++++++++++--- libqpdf/QPDF_Stream.cc | 24 +++++++ libqpdf/qpdf/QPDF_Stream.hh | 6 ++ qpdf/qpdf.testcov | 1 + 7 files changed, 285 insertions(+), 58 deletions(-) diff --git a/ChangeLog b/ChangeLog index db5b897f..6a62bdec 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2019-01-06 Jay Berkenbilt + + * Remove the restriction in most cases that the source QPDF used + in a copyForeignObject call has to stick around until the + destination QPDF is written. The exceptional case is when the + source stream gets is data using a + QPDFObjectHandle::StreamDataProvider. For a more in-depth + discussion, see comments around copyForeignObject in QPDF.hh. + Fixes #219. + 2019-01-05 Jay Berkenbilt * When generating appearances, if the font uses one of the diff --git a/TODO b/TODO index e49684f3..d17e68f7 100644 --- a/TODO +++ b/TODO @@ -3,19 +3,6 @@ Soon * Set up OSS-Fuzz (Google). See starred email in qpdf label. - * Issue #219, not requiring QPDF reference of copied object to be - kept around. Comments are in the issue. Most likely we should take - all the encryption parameters and push them into a separate class - within QPDF. We can convert the encryption computation methods to - methods of that class and then hang onto that along with the input - stream for foreign streams. That should handle the encrypted file - case. For the stream data provider case, we don't have to do - anything special. In that case, it's definitely up to the user - to make sure the stream data provider and everything it uses stay - in scope. The user can use a PointerHolder if they want as - long as the original QPDF object is allocated this way. That would - be a good case for the test suite. - * Figure out how to render Gajić correctly in the PDF version of the qpdf manual. @@ -333,3 +320,42 @@ I find it useful to make reference to them in this list of all that code to make the checks more robust. In particular, it's hard to look at the code and quickly determine what is a true logic error and what could happen because of malformed user input. + + * There are a few known limitations of copying foreign streams. These + are somewhat discussed in github issue 219. They are most likely + not worth fixing. + + * The original pipeStreamData in QPDF_Stream has various logic for + reporting warnings and letting the caller retry. This logic is + not implemented for stream data providers. When copying foreign + streams, qpdf uses a stream data provider + (QPDF::CopiedStreamDataProvider) to read the stream data from the + original file. While a warning is issued for that case, there is + no way to actually propagate failure information back through + because StreamDataProvider::provideStreamData doesn't take the + suppress_warnings or will_retry options, and adding them would + break source compatibility. + + * When copying streams that use StreamDataProvider provider, the + original QPDF object must still be kept around. This is the case + of where QPDF q1 has a stream for which replaceStreamData was + called to provide a StreamDataProvider, and then that stream from + q1 was subsequently copied to q2. In that case, q1 must be kept + around until q2 is written. That is a pretty unusual case as, + most of the time, people could just attach the stream data + provider directly to the q2. That said, it actually appears in + qpdf itself. qpdf allows you to combine the --pages and --split + options. In that case, a merged file, which uses + StreamDataProvider to access the original stream, is subsequently + copied to the final split output files. Solving this restriction + is difficult and would probably require breaking source + compatibility because QPDF_Stream doesn't have enough information + to know whether the original QPDF object is needed by the + stream's data provider. Also, the interface is designed to pass + the object ID and generation of the object whose data is being + provided so the provider can use it for lookup or any other + purpose. As such, copying a stream data provider to a new stream + potentially changes the meaning of the parameters passed to the + provider. This happens with CopiedStreamDataProvider, which uses + the object ID and generation of the new stream to locate the + parameters of the old stream. diff --git a/include/qpdf/QPDF.hh b/include/qpdf/QPDF.hh index 38f3f7b3..333d22d5 100644 --- a/include/qpdf/QPDF.hh +++ b/include/qpdf/QPDF.hh @@ -237,19 +237,26 @@ class QPDF replaceReserved(QPDFObjectHandle reserved, QPDFObjectHandle replacement); - // Copy an object from another QPDF to this one. Please note that - // the QPDF object containing the object being copied must stick - // around because it is still used to retrieve any stream data - // referenced by the copied objects. + // Copy an object from another QPDF to this one. Starting with + // qpdf version 8.3.0, it is no longer necessary to keep the + // original QPDF around after the call to copyForeignObject as + // long as the source of any copied stream data is still + // available. Usually this means you just have to keep the input + // file around, not the QPDF object. The exception to this is if + // you copy a stream that gets its data from a + // QPDFObjectHandle::StreamDataProvider. In this case only, the + // original stream's QPDF object must stick around because the + // QPDF object is itself the source of the original stream data. + // For a more in-depth discussion, please see the TODO file. // - // The return value is an indirect reference to the copied object - // in this file. This method is intended to be used to copy - // non-page objects and will not copy page objects. To copy page - // objects, pass the foreign page object directly to addPage (or - // addPageAt). If you copy objects that contain references to - // pages, you should copy the pages first using addPage(At). - // Otherwise references to the pages that have not been copied - // will be replaced with nulls. + // The return value of this method is an indirect reference to the + // copied object in this file. This method is intended to be used + // to copy non-page objects and will not copy page objects. To + // copy page objects, pass the foreign page object directly to + // addPage (or addPageAt). If you copy objects that contain + // references to pages, you should copy the pages first using + // addPage(At). Otherwise references to the pages that have not + // been copied will be replaced with nulls. // When copying objects with this method, object structure will be // preserved, so all indirectly referenced indirect objects will @@ -641,9 +648,59 @@ class QPDF std::set visiting; }; + class EncryptionParameters + { + friend class QPDF; + public: + EncryptionParameters(); + + private: + bool encrypted; + bool encryption_initialized; + int encryption_V; + int encryption_R; + bool encrypt_metadata; + std::map crypt_filters; + encryption_method_e cf_stream; + encryption_method_e cf_string; + encryption_method_e cf_file; + std::string provided_password; + std::string user_password; + std::string encryption_key; + std::string cached_object_encryption_key; + int cached_key_objid; + int cached_key_generation; + }; + + class ForeignStreamData + { + friend class QPDF; + public: + ForeignStreamData( + PointerHolder encp, + PointerHolder file, + int foreign_objid, + int foreign_generation, + qpdf_offset_t offset, + size_t length, + bool is_attachment_stream, + QPDFObjectHandle local_dict); + + private: + PointerHolder encp; + PointerHolder file; + int foreign_objid; + int foreign_generation; + qpdf_offset_t offset; + size_t length; + bool is_attachment_stream; + QPDFObjectHandle local_dict; + }; + class CopiedStreamDataProvider: public QPDFObjectHandle::StreamDataProvider { public: + CopiedStreamDataProvider(QPDF& destination_qpdf); virtual ~CopiedStreamDataProvider() { } @@ -651,9 +708,14 @@ class QPDF Pipeline* pipeline); void registerForeignStream(QPDFObjGen const& local_og, QPDFObjectHandle foreign_stream); + void registerForeignStream(QPDFObjGen const& local_og, + PointerHolder); private: + QPDF& destination_qpdf; std::map foreign_streams; + std::map > foreign_stream_data; }; class StringDecrypter: public QPDFObjectHandle::StringDecrypter @@ -692,30 +754,6 @@ class QPDF }; friend class ResolveRecorder; - class EncryptionParameters - { - friend class QPDF; - public: - EncryptionParameters(); - - private: - bool encrypted; - bool encryption_initialized; - int encryption_V; - int encryption_R; - bool encrypt_metadata; - std::map crypt_filters; - encryption_method_e cf_stream; - encryption_method_e cf_string; - encryption_method_e cf_file; - std::string provided_password; - std::string user_password; - std::string encryption_key; - std::string cached_object_encryption_key; - int cached_key_objid; - int cached_key_generation; - }; - void parse(char const* password); void warn(QPDFExc const& e); void setTrailer(QPDFObjectHandle obj); @@ -759,6 +797,11 @@ class QPDF Pipeline* pipeline, bool suppress_warnings, bool will_retry); + bool pipeForeignStreamData( + PointerHolder, + Pipeline*, + unsigned long encode_flags, + qpdf_stream_decode_level_e decode_level); static bool pipeStreamData(PointerHolder encp, PointerHolder file, QPDF& qpdf_for_warning, diff --git a/libqpdf/QPDF.cc b/libqpdf/QPDF.cc index f5267dea..c5172fd7 100644 --- a/libqpdf/QPDF.cc +++ b/libqpdf/QPDF.cc @@ -18,6 +18,7 @@ #include #include #include +#include std::string QPDF::qpdf_version = "8.2.1"; @@ -39,13 +40,50 @@ static char const* EMPTY_PDF = "110\n" "%%EOF\n"; +QPDF::ForeignStreamData::ForeignStreamData( + PointerHolder encp, + PointerHolder file, + int foreign_objid, + int foreign_generation, + qpdf_offset_t offset, + size_t length, + bool is_attachment_stream, + QPDFObjectHandle local_dict) + : + encp(encp), + file(file), + foreign_objid(foreign_objid), + foreign_generation(foreign_generation), + offset(offset), + length(length), + is_attachment_stream(is_attachment_stream), + local_dict(local_dict) +{ +} + +QPDF::CopiedStreamDataProvider::CopiedStreamDataProvider( + QPDF& destination_qpdf) : + destination_qpdf(destination_qpdf) +{ +} + void QPDF::CopiedStreamDataProvider::provideStreamData( int objid, int generation, Pipeline* pipeline) { - QPDFObjectHandle foreign_stream = - this->foreign_streams[QPDFObjGen(objid, generation)]; - foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none); + PointerHolder foreign_data = + this->foreign_stream_data[QPDFObjGen(objid, generation)]; + if (foreign_data.getPointer()) + { + destination_qpdf.pipeForeignStreamData( + foreign_data, pipeline, 0, qpdf_dl_none); + } + else + { + QPDFObjectHandle foreign_stream = + this->foreign_streams[QPDFObjGen(objid, generation)]; + foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none); + } } void @@ -55,6 +93,14 @@ QPDF::CopiedStreamDataProvider::registerForeignStream( this->foreign_streams[local_og] = foreign_stream; } +void +QPDF::CopiedStreamDataProvider::registerForeignStream( + QPDFObjGen const& local_og, + PointerHolder foreign_stream) +{ + this->foreign_stream_data[local_og] = foreign_stream; +} + QPDF::StringDecrypter::StringDecrypter(QPDF* qpdf, int objid, int gen) : qpdf(qpdf), objid(objid), @@ -2307,15 +2353,67 @@ QPDF::replaceForeignIndirectObjects( if (this->m->copied_stream_data_provider == 0) { this->m->copied_stream_data_provider = - new CopiedStreamDataProvider(); + new CopiedStreamDataProvider(*this); this->m->copied_streams = this->m->copied_stream_data_provider; } QPDFObjGen local_og(result.getObjGen()); - this->m->copied_stream_data_provider->registerForeignStream( - local_og, foreign); - result.replaceStreamData(this->m->copied_streams, - dict.getKey("/Filter"), - dict.getKey("/DecodeParms")); + // Copy information from the foreign stream so we can pipe its + // data later without keeping the original QPDF object around. + QPDF* foreign_stream_qpdf = foreign.getOwningQPDF(); + if (! foreign_stream_qpdf) + { + throw std::logic_error("unable to retrieve owning qpdf" + " from foreign stream"); + } + QPDF_Stream* stream = + dynamic_cast( + QPDFObjectHandle::ObjAccessor::getObject( + foreign).getPointer()); + if (! stream) + { + throw std::logic_error("unable to retrieve underlying" + " stream object from foreign stream"); + } + PointerHolder stream_buffer = + stream->getStreamDataBuffer(); + PointerHolder stream_provider = + stream->getStreamDataProvider(); + if (stream_buffer.getPointer()) + { +// QTC::TC("qpdf", "QPDF copy foreign stream with buffer"); + result.replaceStreamData(stream_buffer, + dict.getKey("/Filter"), + dict.getKey("/DecodeParms")); + } + else if (stream_provider.getPointer()) + { + // In this case, the remote stream's QPDF must stay in scope. +// QTC::TC("qpdf", "QPDF copy foreign stream with provider"); + this->m->copied_stream_data_provider->registerForeignStream( + local_og, foreign); + result.replaceStreamData(this->m->copied_streams, + dict.getKey("/Filter"), + dict.getKey("/DecodeParms")); + } + else + { + PointerHolder foreign_stream_data = + new ForeignStreamData( + foreign_stream_qpdf->m->encp, + foreign_stream_qpdf->m->file, + foreign.getObjectID(), + foreign.getGeneration(), + stream->getOffset(), + stream->getLength(), + (foreign_stream_qpdf->m->attachment_streams.count( + foreign.getObjGen()) > 0), + dict); + this->m->copied_stream_data_provider->registerForeignStream( + local_og, foreign_stream_data); + result.replaceStreamData(this->m->copied_streams, + dict.getKey("/Filter"), + dict.getKey("/DecodeParms")); + } } else { @@ -2613,6 +2711,25 @@ QPDF::pipeStreamData(int objid, int generation, pipeline, suppress_warnings, will_retry); } +bool +QPDF::pipeForeignStreamData( + PointerHolder foreign, + Pipeline* pipeline, + unsigned long encode_flags, + qpdf_stream_decode_level_e decode_level) +{ + if (foreign->encp->encrypted) + { + QTC::TC("qpdf", "QPDF pipe foreign encrypted stream"); + } + return pipeStreamData( + foreign->encp, foreign->file, *this, + foreign->foreign_objid, foreign->foreign_generation, + foreign->offset, foreign->length, + foreign->local_dict, foreign->is_attachment_stream, + pipeline, false, false); +} + void QPDF::findAttachmentStreams() { diff --git a/libqpdf/QPDF_Stream.cc b/libqpdf/QPDF_Stream.cc index 0a5f53b4..3733940d 100644 --- a/libqpdf/QPDF_Stream.cc +++ b/libqpdf/QPDF_Stream.cc @@ -133,6 +133,30 @@ QPDF_Stream::isDataModified() const return (! this->token_filters.empty()); } +qpdf_offset_t +QPDF_Stream::getOffset() const +{ + return this->offset; +} + +size_t +QPDF_Stream::getLength() const +{ + return this->length; +} + +PointerHolder +QPDF_Stream::getStreamDataBuffer() const +{ + return this->stream_data; +} + +PointerHolder +QPDF_Stream::getStreamDataProvider() const +{ + return this->stream_provider; +} + PointerHolder QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level) { diff --git a/libqpdf/qpdf/QPDF_Stream.hh b/libqpdf/qpdf/QPDF_Stream.hh index b38bed08..647b600d 100644 --- a/libqpdf/qpdf/QPDF_Stream.hh +++ b/libqpdf/qpdf/QPDF_Stream.hh @@ -24,6 +24,12 @@ class QPDF_Stream: public QPDFObject QPDFObjectHandle getDict() const; bool isDataModified() const; + // Methods to help QPDF copy foreign streams + qpdf_offset_t getOffset() const; + size_t getLength() const; + PointerHolder getStreamDataBuffer() const; + PointerHolder getStreamDataProvider() const; + // See comments in QPDFObjectHandle.hh for these methods. bool pipeStreamData(Pipeline*, unsigned long encode_flags, diff --git a/qpdf/qpdf.testcov b/qpdf/qpdf.testcov index 8e1a62f1..6a9c90d7 100644 --- a/qpdf/qpdf.testcov +++ b/qpdf/qpdf.testcov @@ -407,3 +407,4 @@ qpdf image optimize no shrink 0 qpdf image optimize too small 0 QPDFFormFieldObjectHelper WinAnsi 0 QPDF_encryption attachment stream 0 +QPDF pipe foreign encrypted stream 0