Mostly don't require original QPDF for copyForeignObject (fixes #219)

The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
This commit is contained in:
Jay Berkenbilt 2019-01-06 21:18:36 -05:00
parent fbbb0ee016
commit fddbcab0e7
7 changed files with 285 additions and 58 deletions

View File

@ -1,3 +1,13 @@
2019-01-06 Jay Berkenbilt <ejb@ql.org>
* Remove the restriction in most cases that the source QPDF used
in a copyForeignObject call has to stick around until the
destination QPDF is written. The exceptional case is when the
source stream gets is data using a
QPDFObjectHandle::StreamDataProvider. For a more in-depth
discussion, see comments around copyForeignObject in QPDF.hh.
Fixes #219.
2019-01-05 Jay Berkenbilt <ejb@ql.org>
* When generating appearances, if the font uses one of the

52
TODO
View File

@ -3,19 +3,6 @@ Soon
* Set up OSS-Fuzz (Google). See starred email in qpdf label.
* Issue #219, not requiring QPDF reference of copied object to be
kept around. Comments are in the issue. Most likely we should take
all the encryption parameters and push them into a separate class
within QPDF. We can convert the encryption computation methods to
methods of that class and then hang onto that along with the input
stream for foreign streams. That should handle the encrypted file
case. For the stream data provider case, we don't have to do
anything special. In that case, it's definitely up to the user
to make sure the stream data provider and everything it uses stay
in scope. The user can use a PointerHolder<QPDF> if they want as
long as the original QPDF object is allocated this way. That would
be a good case for the test suite.
* Figure out how to render Gajić correctly in the PDF version of the
qpdf manual.
@ -333,3 +320,42 @@ I find it useful to make reference to them in this list
of all that code to make the checks more robust. In particular,
it's hard to look at the code and quickly determine what is a true
logic error and what could happen because of malformed user input.
* There are a few known limitations of copying foreign streams. These
are somewhat discussed in github issue 219. They are most likely
not worth fixing.
* The original pipeStreamData in QPDF_Stream has various logic for
reporting warnings and letting the caller retry. This logic is
not implemented for stream data providers. When copying foreign
streams, qpdf uses a stream data provider
(QPDF::CopiedStreamDataProvider) to read the stream data from the
original file. While a warning is issued for that case, there is
no way to actually propagate failure information back through
because StreamDataProvider::provideStreamData doesn't take the
suppress_warnings or will_retry options, and adding them would
break source compatibility.
* When copying streams that use StreamDataProvider provider, the
original QPDF object must still be kept around. This is the case
of where QPDF q1 has a stream for which replaceStreamData was
called to provide a StreamDataProvider, and then that stream from
q1 was subsequently copied to q2. In that case, q1 must be kept
around until q2 is written. That is a pretty unusual case as,
most of the time, people could just attach the stream data
provider directly to the q2. That said, it actually appears in
qpdf itself. qpdf allows you to combine the --pages and --split
options. In that case, a merged file, which uses
StreamDataProvider to access the original stream, is subsequently
copied to the final split output files. Solving this restriction
is difficult and would probably require breaking source
compatibility because QPDF_Stream doesn't have enough information
to know whether the original QPDF object is needed by the
stream's data provider. Also, the interface is designed to pass
the object ID and generation of the object whose data is being
provided so the provider can use it for lookup or any other
purpose. As such, copying a stream data provider to a new stream
potentially changes the meaning of the parameters passed to the
provider. This happens with CopiedStreamDataProvider, which uses
the object ID and generation of the new stream to locate the
parameters of the old stream.

View File

@ -237,19 +237,26 @@ class QPDF
replaceReserved(QPDFObjectHandle reserved,
QPDFObjectHandle replacement);
// Copy an object from another QPDF to this one. Please note that
// the QPDF object containing the object being copied must stick
// around because it is still used to retrieve any stream data
// referenced by the copied objects.
// Copy an object from another QPDF to this one. Starting with
// qpdf version 8.3.0, it is no longer necessary to keep the
// original QPDF around after the call to copyForeignObject as
// long as the source of any copied stream data is still
// available. Usually this means you just have to keep the input
// file around, not the QPDF object. The exception to this is if
// you copy a stream that gets its data from a
// QPDFObjectHandle::StreamDataProvider. In this case only, the
// original stream's QPDF object must stick around because the
// QPDF object is itself the source of the original stream data.
// For a more in-depth discussion, please see the TODO file.
//
// The return value is an indirect reference to the copied object
// in this file. This method is intended to be used to copy
// non-page objects and will not copy page objects. To copy page
// objects, pass the foreign page object directly to addPage (or
// addPageAt). If you copy objects that contain references to
// pages, you should copy the pages first using addPage(At).
// Otherwise references to the pages that have not been copied
// will be replaced with nulls.
// The return value of this method is an indirect reference to the
// copied object in this file. This method is intended to be used
// to copy non-page objects and will not copy page objects. To
// copy page objects, pass the foreign page object directly to
// addPage (or addPageAt). If you copy objects that contain
// references to pages, you should copy the pages first using
// addPage(At). Otherwise references to the pages that have not
// been copied will be replaced with nulls.
// When copying objects with this method, object structure will be
// preserved, so all indirectly referenced indirect objects will
@ -641,9 +648,59 @@ class QPDF
std::set<QPDFObjGen> visiting;
};
class EncryptionParameters
{
friend class QPDF;
public:
EncryptionParameters();
private:
bool encrypted;
bool encryption_initialized;
int encryption_V;
int encryption_R;
bool encrypt_metadata;
std::map<std::string, encryption_method_e> crypt_filters;
encryption_method_e cf_stream;
encryption_method_e cf_string;
encryption_method_e cf_file;
std::string provided_password;
std::string user_password;
std::string encryption_key;
std::string cached_object_encryption_key;
int cached_key_objid;
int cached_key_generation;
};
class ForeignStreamData
{
friend class QPDF;
public:
ForeignStreamData(
PointerHolder<EncryptionParameters> encp,
PointerHolder<InputSource> file,
int foreign_objid,
int foreign_generation,
qpdf_offset_t offset,
size_t length,
bool is_attachment_stream,
QPDFObjectHandle local_dict);
private:
PointerHolder<EncryptionParameters> encp;
PointerHolder<InputSource> file;
int foreign_objid;
int foreign_generation;
qpdf_offset_t offset;
size_t length;
bool is_attachment_stream;
QPDFObjectHandle local_dict;
};
class CopiedStreamDataProvider: public QPDFObjectHandle::StreamDataProvider
{
public:
CopiedStreamDataProvider(QPDF& destination_qpdf);
virtual ~CopiedStreamDataProvider()
{
}
@ -651,9 +708,14 @@ class QPDF
Pipeline* pipeline);
void registerForeignStream(QPDFObjGen const& local_og,
QPDFObjectHandle foreign_stream);
void registerForeignStream(QPDFObjGen const& local_og,
PointerHolder<ForeignStreamData>);
private:
QPDF& destination_qpdf;
std::map<QPDFObjGen, QPDFObjectHandle> foreign_streams;
std::map<QPDFObjGen,
PointerHolder<ForeignStreamData> > foreign_stream_data;
};
class StringDecrypter: public QPDFObjectHandle::StringDecrypter
@ -692,30 +754,6 @@ class QPDF
};
friend class ResolveRecorder;
class EncryptionParameters
{
friend class QPDF;
public:
EncryptionParameters();
private:
bool encrypted;
bool encryption_initialized;
int encryption_V;
int encryption_R;
bool encrypt_metadata;
std::map<std::string, encryption_method_e> crypt_filters;
encryption_method_e cf_stream;
encryption_method_e cf_string;
encryption_method_e cf_file;
std::string provided_password;
std::string user_password;
std::string encryption_key;
std::string cached_object_encryption_key;
int cached_key_objid;
int cached_key_generation;
};
void parse(char const* password);
void warn(QPDFExc const& e);
void setTrailer(QPDFObjectHandle obj);
@ -759,6 +797,11 @@ class QPDF
Pipeline* pipeline,
bool suppress_warnings,
bool will_retry);
bool pipeForeignStreamData(
PointerHolder<ForeignStreamData>,
Pipeline*,
unsigned long encode_flags,
qpdf_stream_decode_level_e decode_level);
static bool pipeStreamData(PointerHolder<QPDF::EncryptionParameters> encp,
PointerHolder<InputSource> file,
QPDF& qpdf_for_warning,

View File

@ -18,6 +18,7 @@
#include <qpdf/QPDFExc.hh>
#include <qpdf/QPDF_Null.hh>
#include <qpdf/QPDF_Dictionary.hh>
#include <qpdf/QPDF_Stream.hh>
std::string QPDF::qpdf_version = "8.2.1";
@ -39,13 +40,50 @@ static char const* EMPTY_PDF =
"110\n"
"%%EOF\n";
QPDF::ForeignStreamData::ForeignStreamData(
PointerHolder<EncryptionParameters> encp,
PointerHolder<InputSource> file,
int foreign_objid,
int foreign_generation,
qpdf_offset_t offset,
size_t length,
bool is_attachment_stream,
QPDFObjectHandle local_dict)
:
encp(encp),
file(file),
foreign_objid(foreign_objid),
foreign_generation(foreign_generation),
offset(offset),
length(length),
is_attachment_stream(is_attachment_stream),
local_dict(local_dict)
{
}
QPDF::CopiedStreamDataProvider::CopiedStreamDataProvider(
QPDF& destination_qpdf) :
destination_qpdf(destination_qpdf)
{
}
void
QPDF::CopiedStreamDataProvider::provideStreamData(
int objid, int generation, Pipeline* pipeline)
{
QPDFObjectHandle foreign_stream =
this->foreign_streams[QPDFObjGen(objid, generation)];
foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none);
PointerHolder<ForeignStreamData> foreign_data =
this->foreign_stream_data[QPDFObjGen(objid, generation)];
if (foreign_data.getPointer())
{
destination_qpdf.pipeForeignStreamData(
foreign_data, pipeline, 0, qpdf_dl_none);
}
else
{
QPDFObjectHandle foreign_stream =
this->foreign_streams[QPDFObjGen(objid, generation)];
foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none);
}
}
void
@ -55,6 +93,14 @@ QPDF::CopiedStreamDataProvider::registerForeignStream(
this->foreign_streams[local_og] = foreign_stream;
}
void
QPDF::CopiedStreamDataProvider::registerForeignStream(
QPDFObjGen const& local_og,
PointerHolder<ForeignStreamData> foreign_stream)
{
this->foreign_stream_data[local_og] = foreign_stream;
}
QPDF::StringDecrypter::StringDecrypter(QPDF* qpdf, int objid, int gen) :
qpdf(qpdf),
objid(objid),
@ -2307,15 +2353,67 @@ QPDF::replaceForeignIndirectObjects(
if (this->m->copied_stream_data_provider == 0)
{
this->m->copied_stream_data_provider =
new CopiedStreamDataProvider();
new CopiedStreamDataProvider(*this);
this->m->copied_streams = this->m->copied_stream_data_provider;
}
QPDFObjGen local_og(result.getObjGen());
this->m->copied_stream_data_provider->registerForeignStream(
local_og, foreign);
result.replaceStreamData(this->m->copied_streams,
dict.getKey("/Filter"),
dict.getKey("/DecodeParms"));
// Copy information from the foreign stream so we can pipe its
// data later without keeping the original QPDF object around.
QPDF* foreign_stream_qpdf = foreign.getOwningQPDF();
if (! foreign_stream_qpdf)
{
throw std::logic_error("unable to retrieve owning qpdf"
" from foreign stream");
}
QPDF_Stream* stream =
dynamic_cast<QPDF_Stream*>(
QPDFObjectHandle::ObjAccessor::getObject(
foreign).getPointer());
if (! stream)
{
throw std::logic_error("unable to retrieve underlying"
" stream object from foreign stream");
}
PointerHolder<Buffer> stream_buffer =
stream->getStreamDataBuffer();
PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider =
stream->getStreamDataProvider();
if (stream_buffer.getPointer())
{
// QTC::TC("qpdf", "QPDF copy foreign stream with buffer");
result.replaceStreamData(stream_buffer,
dict.getKey("/Filter"),
dict.getKey("/DecodeParms"));
}
else if (stream_provider.getPointer())
{
// In this case, the remote stream's QPDF must stay in scope.
// QTC::TC("qpdf", "QPDF copy foreign stream with provider");
this->m->copied_stream_data_provider->registerForeignStream(
local_og, foreign);
result.replaceStreamData(this->m->copied_streams,
dict.getKey("/Filter"),
dict.getKey("/DecodeParms"));
}
else
{
PointerHolder<ForeignStreamData> foreign_stream_data =
new ForeignStreamData(
foreign_stream_qpdf->m->encp,
foreign_stream_qpdf->m->file,
foreign.getObjectID(),
foreign.getGeneration(),
stream->getOffset(),
stream->getLength(),
(foreign_stream_qpdf->m->attachment_streams.count(
foreign.getObjGen()) > 0),
dict);
this->m->copied_stream_data_provider->registerForeignStream(
local_og, foreign_stream_data);
result.replaceStreamData(this->m->copied_streams,
dict.getKey("/Filter"),
dict.getKey("/DecodeParms"));
}
}
else
{
@ -2613,6 +2711,25 @@ QPDF::pipeStreamData(int objid, int generation,
pipeline, suppress_warnings, will_retry);
}
bool
QPDF::pipeForeignStreamData(
PointerHolder<ForeignStreamData> foreign,
Pipeline* pipeline,
unsigned long encode_flags,
qpdf_stream_decode_level_e decode_level)
{
if (foreign->encp->encrypted)
{
QTC::TC("qpdf", "QPDF pipe foreign encrypted stream");
}
return pipeStreamData(
foreign->encp, foreign->file, *this,
foreign->foreign_objid, foreign->foreign_generation,
foreign->offset, foreign->length,
foreign->local_dict, foreign->is_attachment_stream,
pipeline, false, false);
}
void
QPDF::findAttachmentStreams()
{

View File

@ -133,6 +133,30 @@ QPDF_Stream::isDataModified() const
return (! this->token_filters.empty());
}
qpdf_offset_t
QPDF_Stream::getOffset() const
{
return this->offset;
}
size_t
QPDF_Stream::getLength() const
{
return this->length;
}
PointerHolder<Buffer>
QPDF_Stream::getStreamDataBuffer() const
{
return this->stream_data;
}
PointerHolder<QPDFObjectHandle::StreamDataProvider>
QPDF_Stream::getStreamDataProvider() const
{
return this->stream_provider;
}
PointerHolder<Buffer>
QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
{

View File

@ -24,6 +24,12 @@ class QPDF_Stream: public QPDFObject
QPDFObjectHandle getDict() const;
bool isDataModified() const;
// Methods to help QPDF copy foreign streams
qpdf_offset_t getOffset() const;
size_t getLength() const;
PointerHolder<Buffer> getStreamDataBuffer() const;
PointerHolder<QPDFObjectHandle::StreamDataProvider> getStreamDataProvider() const;
// See comments in QPDFObjectHandle.hh for these methods.
bool pipeStreamData(Pipeline*,
unsigned long encode_flags,

View File

@ -407,3 +407,4 @@ qpdf image optimize no shrink 0
qpdf image optimize too small 0
QPDFFormFieldObjectHelper WinAnsi 0
QPDF_encryption attachment stream 0
QPDF pipe foreign encrypted stream 0