2
1
mirror of https://github.com/qpdf/qpdf.git synced 2024-12-22 19:08:59 +00:00

Support copying objects from another QPDF object

This includes QPDF::copyForeignObject and supporting foreign objects
as arguments to addPage*.
This commit is contained in:
Jay Berkenbilt 2012-07-11 15:29:41 -04:00
parent 8a217eb3a2
commit e7b8f297ba
16 changed files with 1151 additions and 108 deletions

View File

@ -1,3 +1,21 @@
2012-07-11 Jay Berkenbilt <ejb@ql.org>
* Added new APIs to copy objects from one QPDF to another. This
includes letting QPDF::addPage() (and QPDF::addPageAt()) accept a
page object from another QPDF and adding
QPDF::copyForeignObject(). See QPDF.hh for details.
* Add method QPDFObjectHandle::getOwningQPDF() to return the QPDF
object associated with an indirect QPDFObjectHandle.
* Add convenience methods to QPDFObjectHandle: assertIndirect(),
isPageObject(), isPagesObject()
* Cache when QPDF::pushInheritedAttributesToPage() has been called
to avoid traversing the pages trees multiple times. This state is
cleared by QPDF::updateAllPagesCache() and ignored by
QPDF::flattenPagesTree().
2012-07-08 Jay Berkenbilt <ejb@ql.org> 2012-07-08 Jay Berkenbilt <ejb@ql.org>
* Add QPDFObjectHandle::newReserved to create a reserved object * Add QPDFObjectHandle::newReserved to create a reserved object

121
TODO
View File

@ -28,35 +28,11 @@ Next
can only be used by one thread at a time, but multiple threads can can only be used by one thread at a time, but multiple threads can
simultaneously use separate objects. simultaneously use separate objects.
* Write some documentation about the design of copyForeignObject.
Soon * copyForeignObject still to do:
====
* Provide an option to copy encryption parameters from another file. - qpdf command
This would make it possible to decrypt a file, manually work with
it, and then re-encrypt it using the original encryption parameters
including a possibly unknown owner password.
* See if I can support the new encryption formats mentioned in the
open bug on sourceforge. Check other sourceforge bugs.
* Splitting/merging concepts
newPDF() could create a PDF with just a trailer, no pages, and a
minimal info. Then the page routines could be used to add pages to
it.
Starting with any pdf, you should be able to copy objects from
another pdf. The copy should be smart about never traversing into
a /Page or /Pages.
We could provide a method of copying objects from one PDF into
another. This would do whatever optimization is necessary (maybe
just optimizePagesTree) and then traverse the set of objects
specified to find all objects referenced by the set. Each of those
would be copied over with a table mapping old ID to new ID. This
would be done from bottom up most likely disallowing cycles or
handling them sanely.
Command line could be something like Command line could be something like
@ -65,7 +41,8 @@ Soon
The first file referenced would be the one whose other data would The first file referenced would be the one whose other data would
be preserved (like trailer, info, encryption, outlines, etc.). be preserved (like trailer, info, encryption, outlines, etc.).
--new as first file would just use an empty file as the starting --new as first file would just use an empty file as the starting
point. point. Be explicit about whether outlines, etc., are handled.
They are not handled initially.
Example: to grab pages 1-5 from file1 and 11-15 from file2 Example: to grab pages 1-5 from file1 and 11-15 from file2
@ -73,31 +50,32 @@ Soon
To implement this, we would remove all pages from file1 except To implement this, we would remove all pages from file1 except
pages 1 through 5. Then we would take pages 11 through 15 from pages 1 through 5. Then we would take pages 11 through 15 from
file2 and add them to a set for transfer. This would end up file2, copy them to the file, and add them as pages.
generating a list of indirect objects. We would copy those objects
shallowly to the new PDF keeping track of the mapping and replacing
any indirect object keys as appropriate, much like QPDFWriter does.
When all the objects are registered, we would add those pages to - document that makeIndirectObject doesn't handle foreign objects
the result. automatically because copying a foreign object is a big enough
deal that it should be explicit. However addPages* does handle
foreign page objects automatically.
This approach could work for both splitting and merging. It's - Test /Outlines and see whether there's any point in handling
possible it could be implemented now without any new APIs, but most them in the API. Maybe just copying them over works. What
of the work should be doable by the library with only a small set about command line tool? Also think about page labels.
of additions.
newPDF() - Tests through qpdf command line: copy pages from multiple PDFs
QPDFObjectCopier c(qpdf1, qpdf2) starting with one PDF and also starting with empty.
QPDFObjectHandle obj = c.copyObject(<object from qpdf1>)
Without traversing pages, copies all indirect objects referenced * (Hopefully) Provide an option to copy encryption parameters from
by <object from qpdf1> preserving referential integrity and another file. This would make it possible to decrypt a file,
returns an object handle in qpdf2 of the same object. If called manually work with it, and then re-encrypt it using the original
multiple times on the same object, retraverses in case there were encryption parameters including a possibly unknown owner password.
changes.
Soon
====
* See if I can support the new encryption formats mentioned in the
open bug on sourceforge. Check other sourceforge bugs.
QPDFObjectHandle obj = c.getMapping(<object from qpdf1>)
find the object in qpdf2 corresponding to the object from qpdf1.
Return the null object if none.
General General
======= =======
@ -110,23 +88,11 @@ General
* Update qpdf docs about non-ascii passwords. See thread from * Update qpdf docs about non-ascii passwords. See thread from
2010-12-07,08 for details. 2010-12-07,08 for details.
* Look at page splitting. Subramanyam provided a test file; see * Consider impact of article threads on page splitting/merging.
../misc/article-threads.pdf. Email Q-Count: 431864 from Subramanyam provided a test file; see ../misc/article-threads.pdf.
2009-11-03. See also "Splitting by Pages" below. Email Q-Count: 431864 from 2009-11-03. Other things to consider:
outlines, page labels, thumbnails, zones. There are probably
* Consider writing a PDF merge utility. With 2.2, it would be others.
possible to have a StreamDataProvider that would allow stream data
to be directly copied from one PDF file to another. One possible
strategy would be to have a program that adds all the pages of one
file to the end of another file. The basic
strategy would be to create a table that adds new streams to the
original file, mapping the new streams' obj/gen to a stream in the
file whose pages are being appended. The StreamDataProvider, when
asked, could simply pipe the streams of the file being appended to
the provided pipeline and could copy the filter and decode
parameters from the original file. Being able to do this requires
a lot of the same logic as being able to do splitting, so a general
split/merge program would be a great addition.
* See whether it's possible to remove the call to * See whether it's possible to remove the call to
flattenScalarReferences. I can't easily figure out why I do it, flattenScalarReferences. I can't easily figure out why I do it,
@ -279,26 +245,3 @@ Index: QPDFWriter.cc
* From a suggestion in bug 3152169, consisder having an option to * From a suggestion in bug 3152169, consisder having an option to
re-encode inline images with an ASCII encoding. re-encode inline images with an ASCII encoding.
Splitting by Pages
==================
Although qpdf does not currently support splitting a file into pages,
the work done for linearization covers almost all the work. To do
page splitting. If this functionality is needed, study
obj_user_to_objects and object_to_obj_users created in
QPDF_optimization for ideas. It's quite possible that the information
computed by calculateLinearizationData is actually sufficient to do
page splitting in many circumstances. That code knows which objects
are used by which pages, though it doesn't do anything page-specific
with outlines, thumbnails, page labels, or anything else.
Another approach would be to traverse only pages that are being output
taking care not to traverse into the pages tree, and then to fabricate
a new pages tree.
Either way, care must be taken to handle other things such as
outlines, page labels, thumbnails, threads, zones, etc. in a sensible
way. This may include simply omitting information other than page
content.

View File

@ -190,6 +190,28 @@ class QPDF
replaceReserved(QPDFObjectHandle reserved, replaceReserved(QPDFObjectHandle reserved,
QPDFObjectHandle replacement); QPDFObjectHandle replacement);
// Copy an object from another QPDF to this one. The return value
// is an indirect reference to the copied object in this file.
// This method is intended to be used to copy non-page objects and
// will not copy page objects. To copy page objects, pass the
// foreign page object directly to addPage (or addPageAt). If you
// copy objects that contain references to pages, you should copy
// the pages first using addPage(At). Otherwise references to the
// pages that have not been copied will be replaced with nulls.
// When copying objects with this method, object structure will be
// preserved, so all indirectly referenced indirect objects will
// be copied as well. This includes any circular references that
// may exist. The QPDF object keeps a record of what has already
// been copied, so shared objects will not be copied multiple
// times. This also means that if you mutate an object that has
// already been copied and try to copy it again, it won't work
// since the modified object will not be recopied. Therefore, you
// should do all mutation on the original file that you are going
// to do before you start copying its objects to a new file.
QPDF_DLL
QPDFObjectHandle copyForeignObject(QPDFObjectHandle foreign);
// Encryption support // Encryption support
enum encryption_method_e { e_none, e_unknown, e_rc4, e_aes }; enum encryption_method_e { e_none, e_unknown, e_rc4, e_aes };
@ -380,7 +402,10 @@ class QPDF
// modify /Pages structures directly, you must call this method // modify /Pages structures directly, you must call this method
// afterwards. This method updates the internal list of pages, so // afterwards. This method updates the internal list of pages, so
// after calling this method, any previous references returned by // after calling this method, any previous references returned by
// getAllPages() will be valid again. // getAllPages() will be valid again. It also resets any state
// about having pushed inherited attributes in /Pages objects down
// to the pages, so if you add any inheritable attributes to a
// /Pages object, you should also call this method.
QPDF_DLL QPDF_DLL
void updateAllPagesCache(); void updateAllPagesCache();
@ -389,11 +414,19 @@ class QPDF
// resolved by explicitly setting the values in each /Page. // resolved by explicitly setting the values in each /Page.
void pushInheritedAttributesToPage(); void pushInheritedAttributesToPage();
// Add new page at the beginning or the end of the current pdf // Add new page at the beginning or the end of the current pdf.
// The newpage parameter may be either a direct object, an
// indirect object from this QPDF, or an indirect object from
// another QPDF. If it is a direct object, it will be made
// indirect. If it is an indirect object from another QPDF, this
// method will call pushInheritedAttributesToPage on the other
// file and then copy the page to this QPDF using the same
// underlying code as copyForeignObject.
QPDF_DLL QPDF_DLL
void addPage(QPDFObjectHandle newpage, bool first); void addPage(QPDFObjectHandle newpage, bool first);
// Add new page before or after refpage // Add new page before or after refpage. See comments for addPage
// for details about what newpage should be.
QPDF_DLL QPDF_DLL
void addPageAt(QPDFObjectHandle newpage, bool before, void addPageAt(QPDFObjectHandle newpage, bool before,
QPDFObjectHandle refpage); QPDFObjectHandle refpage);
@ -542,6 +575,29 @@ class QPDF
qpdf_offset_t end_after_space; qpdf_offset_t end_after_space;
}; };
class ObjCopier
{
public:
std::map<ObjGen, QPDFObjectHandle> object_map;
std::vector<QPDFObjectHandle> to_copy;
std::set<ObjGen> visiting;
};
class CopiedStreamDataProvider: public QPDFObjectHandle::StreamDataProvider
{
public:
virtual ~CopiedStreamDataProvider()
{
}
virtual void provideStreamData(int objid, int generation,
Pipeline* pipeline);
void registerForeignStream(ObjGen const& local_og,
QPDFObjectHandle foreign_stream);
private:
std::map<ObjGen, QPDFObjectHandle> foreign_streams;
};
void parse(char const* password); void parse(char const* password);
void warn(QPDFExc const& e); void warn(QPDFExc const& e);
void setTrailer(QPDFObjectHandle obj); void setTrailer(QPDFObjectHandle obj);
@ -602,6 +658,14 @@ class QPDF
QPDFObjectHandle& stream_dict, QPDFObjectHandle& stream_dict,
std::vector<PointerHolder<Pipeline> >& heap); std::vector<PointerHolder<Pipeline> >& heap);
// Methods to support object copying
QPDFObjectHandle copyForeignObject(
QPDFObjectHandle foreign, bool allow_page);
void reserveObjects(QPDFObjectHandle foreign, ObjCopier& obj_copier,
bool top);
QPDFObjectHandle replaceForeignIndirectObjects(
QPDFObjectHandle foreign, ObjCopier& obj_copier, bool top);
// Linearization Hint table structures. // Linearization Hint table structures.
// Naming conventions: // Naming conventions:
@ -960,7 +1024,12 @@ class QPDF
QPDFObjectHandle trailer; QPDFObjectHandle trailer;
std::vector<QPDFObjectHandle> all_pages; std::vector<QPDFObjectHandle> all_pages;
std::map<ObjGen, int> pageobj_to_pages_pos; std::map<ObjGen, int> pageobj_to_pages_pos;
bool pushed_inherited_attributes_to_pages;
std::vector<QPDFExc> warnings; std::vector<QPDFExc> warnings;
std::map<QPDF*, ObjCopier> object_copiers;
PointerHolder<QPDFObjectHandle::StreamDataProvider> copied_streams;
// copied_stream_data_provider is owned by copied_streams
CopiedStreamDataProvider* copied_stream_data_provider;
// Linearization data // Linearization data
qpdf_offset_t first_xref_item_offset; // actual value from file qpdf_offset_t first_xref_item_offset; // actual value from file

View File

@ -222,6 +222,11 @@ class QPDFObjectHandle
QPDF_DLL QPDF_DLL
bool isOrHasName(std::string const&); bool isOrHasName(std::string const&);
// Return the QPDF object that owns an indirect object. Returns
// null for a direct object.
QPDF_DLL
QPDF* getOwningQPDF();
// Create a shallow copy of an object as a direct object. Since // Create a shallow copy of an object as a direct object. Since
// this is a shallow copy, for dictionaries and arrays, any keys // this is a shallow copy, for dictionaries and arrays, any keys
// or items that were indirect objects will still be indirect // or items that were indirect objects will still be indirect
@ -453,10 +458,17 @@ class QPDFObjectHandle
QPDF_DLL QPDF_DLL
void assertReserved(); void assertReserved();
QPDF_DLL
void assertIndirect();
QPDF_DLL QPDF_DLL
void assertScalar(); void assertScalar();
QPDF_DLL QPDF_DLL
void assertNumber(); void assertNumber();
QPDF_DLL
bool isPageObject();
QPDF_DLL
bool isPagesObject();
QPDF_DLL QPDF_DLL
void assertPageObject(); void assertPageObject();

View File

@ -348,6 +348,23 @@ QPDF::ObjGen::operator<(ObjGen const& rhs) const
((this->obj == rhs.obj) && (this->gen < rhs.gen))); ((this->obj == rhs.obj) && (this->gen < rhs.gen)));
} }
void
QPDF::CopiedStreamDataProvider::provideStreamData(
int objid, int generation, Pipeline* pipeline)
{
QPDFObjectHandle foreign_stream =
this->foreign_streams[ObjGen(objid, generation)];
foreign_stream.pipeStreamData(pipeline, false, false, false);
}
void
QPDF::CopiedStreamDataProvider::registerForeignStream(
ObjGen const& local_og, QPDFObjectHandle foreign_stream)
{
this->foreign_streams[local_og] = foreign_stream;
}
std::string const& std::string const&
QPDF::QPDFVersion() QPDF::QPDFVersion()
{ {
@ -369,6 +386,8 @@ QPDF::QPDF() :
cf_file(e_none), cf_file(e_none),
cached_key_objid(0), cached_key_objid(0),
cached_key_generation(0), cached_key_generation(0),
pushed_inherited_attributes_to_pages(false),
copied_stream_data_provider(0),
first_xref_item_offset(0), first_xref_item_offset(0),
uncompressed_after_compressed(false) uncompressed_after_compressed(false)
{ {
@ -2067,6 +2086,244 @@ QPDF::replaceReserved(QPDFObjectHandle reserved,
replacement); replacement);
} }
QPDFObjectHandle
QPDF::copyForeignObject(QPDFObjectHandle foreign)
{
return copyForeignObject(foreign, false);
}
QPDFObjectHandle
QPDF::copyForeignObject(QPDFObjectHandle foreign, bool allow_page)
{
if (! foreign.isIndirect())
{
QTC::TC("qpdf", "QPDF copyForeign direct");
throw std::logic_error(
"QPDF::copyForeign called with direct object handle");
}
QPDF* other = foreign.getOwningQPDF();
if (other == this)
{
QTC::TC("qpdf", "QPDF copyForeign not foreign");
throw std::logic_error(
"QPDF::copyForeign called with object from this QPDF");
}
ObjCopier& obj_copier = this->object_copiers[other];
if (! obj_copier.visiting.empty())
{
throw std::logic_error("obj_copier.visiting is not empty"
" at the beginning of copyForeignObject");
}
// Make sure we have an object in this file for every referenced
// object in the old file. obj_copier.object_map maps foreign
// ObjGen to local objects. For everything new that we have to
// copy, the local object will be a reservation, unless it is a
// stream, in which case the local object will already be a
// stream.
reserveObjects(foreign, obj_copier, true);
if (! obj_copier.visiting.empty())
{
throw std::logic_error("obj_copier.visiting is not empty"
" after reserving objects");
}
// Copy any new objects and replace the reservations.
for (std::vector<QPDFObjectHandle>::iterator iter =
obj_copier.to_copy.begin();
iter != obj_copier.to_copy.end(); ++iter)
{
QPDFObjectHandle& to_copy = *iter;
QPDFObjectHandle copy =
replaceForeignIndirectObjects(to_copy, obj_copier, true);
if (! to_copy.isStream())
{
ObjGen og(to_copy.getObjectID(), to_copy.getGeneration());
replaceReserved(obj_copier.object_map[og], copy);
}
}
obj_copier.to_copy.clear();
return obj_copier.object_map[ObjGen(foreign.getObjectID(),
foreign.getGeneration())];
}
void
QPDF::reserveObjects(QPDFObjectHandle foreign, ObjCopier& obj_copier,
bool top)
{
if (foreign.isReserved())
{
throw std::logic_error(
"QPDF: attempting to copy a foreign reserved object");
}
if (foreign.isPagesObject())
{
QTC::TC("qpdf", "QPDF not copying pages object");
return;
}
if ((! top) && foreign.isPageObject())
{
QTC::TC("qpdf", "QPDF not crossing page boundary");
return;
}
if (foreign.isIndirect())
{
ObjGen foreign_og(foreign.getObjectID(), foreign.getGeneration());
if (obj_copier.visiting.find(foreign_og) != obj_copier.visiting.end())
{
QTC::TC("qpdf", "QPDF loop reserving objects");
return;
}
QTC::TC("qpdf", "QPDF copy indirect");
obj_copier.visiting.insert(foreign_og);
std::map<ObjGen, QPDFObjectHandle>::iterator mapping =
obj_copier.object_map.find(foreign_og);
if (mapping == obj_copier.object_map.end())
{
obj_copier.to_copy.push_back(foreign);
QPDFObjectHandle reservation;
if (foreign.isStream())
{
reservation = QPDFObjectHandle::newStream(this);
}
else
{
reservation = QPDFObjectHandle::newReserved(this);
}
obj_copier.object_map[foreign_og] = reservation;
}
}
if (foreign.isArray())
{
QTC::TC("qpdf", "QPDF reserve array");
int n = foreign.getArrayNItems();
for (int i = 0; i < n; ++i)
{
reserveObjects(foreign.getArrayItem(i), obj_copier, false);
}
}
else if (foreign.isDictionary())
{
QTC::TC("qpdf", "QPDF reserve dictionary");
std::set<std::string> keys = foreign.getKeys();
for (std::set<std::string>::iterator iter = keys.begin();
iter != keys.end(); ++iter)
{
reserveObjects(foreign.getKey(*iter), obj_copier, false);
}
}
else if (foreign.isStream())
{
QTC::TC("qpdf", "QPDF reserve stream");
reserveObjects(foreign.getDict(), obj_copier, false);
}
if (foreign.isIndirect())
{
ObjGen foreign_og(foreign.getObjectID(), foreign.getGeneration());
obj_copier.visiting.erase(foreign_og);
}
}
QPDFObjectHandle
QPDF::replaceForeignIndirectObjects(
QPDFObjectHandle foreign, ObjCopier& obj_copier, bool top)
{
QPDFObjectHandle result;
if ((! top) && foreign.isIndirect())
{
QTC::TC("qpdf", "QPDF replace indirect");
ObjGen foreign_og(foreign.getObjectID(), foreign.getGeneration());
std::map<ObjGen, QPDFObjectHandle>::iterator mapping =
obj_copier.object_map.find(foreign_og);
if (mapping == obj_copier.object_map.end())
{
// This case would occur if this is a reference to a Page
// or Pages object that we didn't traverse into.
QTC::TC("qpdf", "QPDF replace foreign indirect with null");
result = QPDFObjectHandle::newNull();
}
else
{
result = obj_copier.object_map[foreign_og];
}
}
else if (foreign.isArray())
{
QTC::TC("qpdf", "QPDF replace array");
result = QPDFObjectHandle::newArray();
int n = foreign.getArrayNItems();
for (int i = 0; i < n; ++i)
{
result.appendItem(
replaceForeignIndirectObjects(
foreign.getArrayItem(i), obj_copier, false));
}
}
else if (foreign.isDictionary())
{
QTC::TC("qpdf", "QPDF replace dictionary");
result = QPDFObjectHandle::newDictionary();
std::set<std::string> keys = foreign.getKeys();
for (std::set<std::string>::iterator iter = keys.begin();
iter != keys.end(); ++iter)
{
result.replaceKey(
*iter,
replaceForeignIndirectObjects(
foreign.getKey(*iter), obj_copier, false));
}
}
else if (foreign.isStream())
{
QTC::TC("qpdf", "QPDF replace stream");
ObjGen foreign_og(foreign.getObjectID(), foreign.getGeneration());
result = obj_copier.object_map[foreign_og];
result.assertStream();
QPDFObjectHandle dict = result.getDict();
QPDFObjectHandle old_dict = foreign.getDict();
std::set<std::string> keys = old_dict.getKeys();
for (std::set<std::string>::iterator iter = keys.begin();
iter != keys.end(); ++iter)
{
dict.replaceKey(
*iter,
replaceForeignIndirectObjects(
old_dict.getKey(*iter), obj_copier, false));
}
if (this->copied_stream_data_provider == 0)
{
this->copied_stream_data_provider = new CopiedStreamDataProvider();
this->copied_streams = this->copied_stream_data_provider;
}
ObjGen local_og(result.getObjectID(), result.getGeneration());
this->copied_stream_data_provider->registerForeignStream(
local_og, foreign);
result.replaceStreamData(this->copied_streams,
dict.getKey("/Filter"),
dict.getKey("/DecodeParms"));
}
else
{
foreign.assertScalar();
result = foreign;
result.makeDirect();
}
if (top && (! result.isStream()) && result.isIndirect())
{
throw std::logic_error("replacement for foreign object is indirect");
}
return result;
}
void void
QPDF::swapObjects(int objid1, int generation1, int objid2, int generation2) QPDF::swapObjects(int objid1, int generation1, int objid2, int generation2)

View File

@ -355,6 +355,14 @@ QPDFObjectHandle::isOrHasName(std::string const& value)
return false; return false;
} }
// Indirect object accessors
QPDF*
QPDFObjectHandle::getOwningQPDF()
{
// Will be null for direct objects
return this->qpdf;
}
// Dictionary mutators // Dictionary mutators
void void
@ -784,6 +792,7 @@ QPDFObjectHandle::makeDirectInternal(std::set<int>& visited)
} }
dereference(); dereference();
this->qpdf = 0;
this->objid = 0; this->objid = 0;
this->generation = 0; this->generation = 0;
@ -945,6 +954,16 @@ QPDFObjectHandle::assertReserved()
assertType("Reserved", isReserved()); assertType("Reserved", isReserved());
} }
void
QPDFObjectHandle::assertIndirect()
{
if (! isIndirect())
{
throw std::logic_error(
"operation for indirect object attempted on direct object");
}
}
void void
QPDFObjectHandle::assertScalar() QPDFObjectHandle::assertScalar()
{ {
@ -957,11 +976,24 @@ QPDFObjectHandle::assertNumber()
assertType("Number", isNumber()); assertType("Number", isNumber());
} }
bool
QPDFObjectHandle::isPageObject()
{
return (this->isDictionary() && this->hasKey("/Type") &&
(this->getKey("/Type").getName() == "/Page"));
}
bool
QPDFObjectHandle::isPagesObject()
{
return (this->isDictionary() && this->hasKey("/Type") &&
(this->getKey("/Type").getName() == "/Pages"));
}
void void
QPDFObjectHandle::assertPageObject() QPDFObjectHandle::assertPageObject()
{ {
if (! (this->isDictionary() && this->hasKey("/Type") && if (! isPageObject())
(this->getKey("/Type").getName() == "/Page")))
{ {
throw std::logic_error("page operation called on non-Page object"); throw std::logic_error("page operation called on non-Page object");
} }

View File

@ -232,6 +232,14 @@ QPDF::pushInheritedAttributesToPage(bool allow_changes, bool warn_skipped_keys)
// Traverse pages tree pushing all inherited resources down to the // Traverse pages tree pushing all inherited resources down to the
// page level. // page level.
// The record of whether we've done this is cleared by
// updateAllPagesCache(). If we're warning for skipped keys,
// re-traverse unconditionally.
if (this->pushed_inherited_attributes_to_pages && (! warn_skipped_keys))
{
return;
}
// key_ancestors is a mapping of page attribute keys to a stack of // key_ancestors is a mapping of page attribute keys to a stack of
// Pages nodes that contain values for them. // Pages nodes that contain values for them.
std::map<std::string, std::vector<QPDFObjectHandle> > key_ancestors; std::map<std::string, std::vector<QPDFObjectHandle> > key_ancestors;
@ -240,6 +248,7 @@ QPDF::pushInheritedAttributesToPage(bool allow_changes, bool warn_skipped_keys)
this->trailer.getKey("/Root").getKey("/Pages"), this->trailer.getKey("/Root").getKey("/Pages"),
key_ancestors, this->all_pages, allow_changes, warn_skipped_keys); key_ancestors, this->all_pages, allow_changes, warn_skipped_keys);
assert(key_ancestors.empty()); assert(key_ancestors.empty());
this->pushed_inherited_attributes_to_pages = true;
} }
void void

View File

@ -89,6 +89,7 @@ QPDF::updateAllPagesCache()
QTC::TC("qpdf", "QPDF updateAllPagesCache"); QTC::TC("qpdf", "QPDF updateAllPagesCache");
this->all_pages.clear(); this->all_pages.clear();
this->pageobj_to_pages_pos.clear(); this->pageobj_to_pages_pos.clear();
this->pushed_inherited_attributes_to_pages = false;
getAllPages(); getAllPages();
} }
@ -161,6 +162,12 @@ QPDF::insertPage(QPDFObjectHandle newpage, int pos)
QTC::TC("qpdf", "QPDF insert non-indirect page"); QTC::TC("qpdf", "QPDF insert non-indirect page");
newpage = this->makeIndirectObject(newpage); newpage = this->makeIndirectObject(newpage);
} }
else if (newpage.getOwningQPDF() != this)
{
QTC::TC("qpdf", "QPDF insert foreign page");
newpage.getOwningQPDF()->pushInheritedAttributesToPage();
newpage = this->copyForeignObject(newpage, true);
}
else else
{ {
QTC::TC("qpdf", "QPDF insert indirect page"); QTC::TC("qpdf", "QPDF insert indirect page");

View File

@ -218,3 +218,18 @@ QPDF unknown key not inherited 0
QPDF_Stream provider length not provided 0 QPDF_Stream provider length not provided 0
QPDF_Stream unknown stream length 0 QPDF_Stream unknown stream length 0
QPDF replaceReserved 0 QPDF replaceReserved 0
QPDF copyForeign direct 0
QPDF copyForeign not foreign 0
QPDF copy indirect 0
QPDF loop reserving objects 0
QPDF replace indirect 0
QPDF replace array 0
QPDF replace dictionary 0
QPDF replace stream 0
QPDF reserve array 0
QPDF reserve dictionary 0
QPDF reserve stream 0
QPDF not crossing page boundary 0
QPDF replace foreign indirect with null 0
QPDF not copying pages object 0
QPDF insert foreign page 0

View File

@ -379,6 +379,27 @@ $td->runtest("check output",
{$td->FILE => "a.pdf"}, {$td->FILE => "a.pdf"},
{$td->FILE => "from-scratch-0.pdf"}); {$td->FILE => "from-scratch-0.pdf"});
# ---------- # ----------
$td->notify("--- Copy Foreign Objects ---");
$n_tests += 7;
foreach my $d ([25, 1], [26, 2], [27, 3])
{
my ($testn, $outn) = @$d;
$td->runtest("copy objects $outn",
{$td->COMMAND => "test_driver $testn" .
" copy-foreign-objects-in.pdf"},
{$td->STRING => "test $testn done\n", $td->EXIT_STATUS => 0},
$td->NORMALIZE_NEWLINES);
$td->runtest("check output",
{$td->FILE => "a.pdf"},
{$td->FILE => "copy-foreign-objects-out$outn.pdf"});
}
$td->runtest("copy objects error",
{$td->COMMAND => "test_driver 28 copy-foreign-objects-in.pdf"},
{$td->FILE => "copy-foreign-objects-errors.out",
$td->EXIT_STATUS => 0},
$td->NORMALIZE_NEWLINES);
# ----------
$td->notify("--- Error Condition Tests ---"); $td->notify("--- Error Condition Tests ---");
# $n_tests incremented after initialization of badfiles below. # $n_tests incremented after initialization of badfiles below.

View File

@ -0,0 +1,3 @@
logic error: QPDF::copyForeign called with object from this QPDF
logic error: QPDF::copyForeign called with direct object handle
test 28 done

View File

@ -0,0 +1,335 @@
%PDF-1.3
%¿÷¢þ
%QDF-1.0
% This test file is specifically crafted for testing copyForeignObject
% and also for testing addPage when called with a page from another
% file.
% The /QTest key in trailer has pointers to several indirect objects:
% O1, O2, O3 where O1 is an array that contains a dictionary that has
% a key that points to O2, O2 is a dictionary that contains an array
% that points to O1, and O3 is a page object that inherits some
% resource from its parent /Pages and also points to some other page.
% O1 also points to a stream whose dictionary has a key that points to
% another stream whose dictionary points back to the first stream.
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/Count 5
/Kids [
3 0 R
4 0 R
5 0 R
6 0 R
7 0 R
]
/Rotate 180
/Type /Pages
>>
endobj
%% Page 1
3 0 obj
<<
/Contents 8 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 10 0 R
>>
/ProcSet [
/PDF
/Text
]
>>
/Type /Page
>>
endobj
%% Page 2
4 0 obj
<<
/Contents 11 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 10 0 R
>>
/ProcSet [
/PDF
/Text
]
>>
/Type /Page
>>
endobj
%% Page 3, object O3
5 0 obj
<<
/This-is-O3 true
/Contents 13 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 10 0 R
>>
/ProcSet [
/PDF
/Text
]
>>
/OtherPage 6 0 R
/Type /Page
>>
endobj
%% Page 4
6 0 obj
<<
/This-is-O3-other-page true
/Contents 15 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 10 0 R
>>
/ProcSet [
/PDF
/Text
]
>>
/Type /Page
>>
endobj
%% Page 5
7 0 obj
<<
/Contents 17 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 10 0 R
>>
/ProcSet [
/PDF
/Text
]
>>
/Type /Page
>>
endobj
%% Contents for page 1
8 0 obj
<<
/Length 9 0 R
>>
stream
BT /F1 15 Tf 72 720 Td (Original page 0) Tj ET
endstream
endobj
9 0 obj
47
endobj
10 0 obj
<<
/BaseFont /Times-Roman
/Encoding /WinAnsiEncoding
/Subtype /Type1
/Type /Font
>>
endobj
%% Contents for page 2
11 0 obj
<<
/Length 12 0 R
>>
stream
BT /F1 15 Tf 72 720 Td (Original page 1) Tj ET
endstream
endobj
12 0 obj
47
endobj
%% Contents for page 3
13 0 obj
<<
/Length 14 0 R
>>
stream
BT /F1 15 Tf 72 720 Td (Original page 2) Tj ET
endstream
endobj
14 0 obj
47
endobj
%% Contents for page 4
15 0 obj
<<
/Length 16 0 R
>>
stream
BT /F1 15 Tf 72 720 Td (Original page 3) Tj ET
endstream
endobj
16 0 obj
47
endobj
%% Contents for page 5
17 0 obj
<<
/Length 18 0 R
>>
stream
BT /F1 15 Tf 72 720 Td (Original page 4) Tj ET
endstream
endobj
18 0 obj
47
endobj
% O1
19 0 obj
[
/This-is-O1
/potato
<< /O2 [3.14159 << /O2 20 0 R >> 2.17828 ] >>
/salad
/O2 20 0 R
/Stream1 21 0 R
]
endobj
% O2
20 0 obj
<<
/This-is-O2 true
/K1 [2.236 /O1 19 0 R 1.732]
/O1 19 0 R
>>
endobj
% stream1
21 0 obj
<<
/This-is-Stream1 true
/Length 22 0 R
/Stream2 23 0 R
>>
stream
This is stream 1.
endstream
endobj
22 0 obj
18
endobj
% stream2
23 0 obj
<<
/This-is-Stream2 true
/Length 24 0 R
/Stream1 21 0 R
>>
stream
This is stream 2.
endstream
endobj
24 0 obj
18
endobj
% QTest
25 0 obj
<< /This-is-QTest true /O1 19 0 R /O2 20 0 R /O3 5 0 R >>
endobj
xref
0 26
0000000000 65535 f
0000000655 00000 n
0000000709 00000 n
0000000845 00000 n
0000001073 00000 n
0000001313 00000 n
0000001580 00000 n
0000001839 00000 n
0000002081 00000 n
0000002183 00000 n
0000002202 00000 n
0000002334 00000 n
0000002438 00000 n
0000002481 00000 n
0000002585 00000 n
0000002628 00000 n
0000002732 00000 n
0000002775 00000 n
0000002879 00000 n
0000002904 00000 n
0000003042 00000 n
0000003138 00000 n
0000003255 00000 n
0000003285 00000 n
0000003402 00000 n
0000003430 00000 n
trailer <<
/Root 1 0 R
/Size 26
/QTest 25 0 R
/ID [<d15f7aca3be584a96c1c94adb0931e71><9adb6b2fdb22e857340f7103917b16e4>]
>>
startxref
3505
%%EOF

View File

@ -0,0 +1,66 @@
%PDF-1.3
%¿÷¢þ
1 0 obj
<< /Pages 3 0 R /Type /Catalog >>
endobj
2 0 obj
<< /O1 4 0 R /O2 5 0 R /This-is-QTest true >>
endobj
3 0 obj
<< /Count 1 /Kids [ 6 0 R ] /Type /Pages >>
endobj
4 0 obj
[ /This-is-O1 /potato << /O2 [ 3.14159 << /O2 5 0 R >> 2.17828 ] >> /salad /O2 5 0 R /Stream1 7 0 R ]
endobj
5 0 obj
<< /K1 [ 2.236 /O1 4 0 R 1.732 ] /O1 4 0 R /This-is-O2 true >>
endobj
6 0 obj
<< /Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Resources << /Font << /F1 9 0 R >> /ProcSet 10 0 R >> /Type /Page >>
endobj
7 0 obj
<< /Stream2 11 0 R /This-is-Stream1 true /Length 18 >>
stream
This is stream 1.
endstream
endobj
8 0 obj
<< /Length 44 >>
stream
BT
/F1 24 Tf
72 720 Td
(Potato) Tj
ET
endstream
endobj
9 0 obj
<< /BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font >>
endobj
10 0 obj
[ /PDF /Text ]
endobj
11 0 obj
<< /Stream1 7 0 R /This-is-Stream2 true /Length 18 >>
stream
This is stream 2.
endstream
endobj
xref
0 12
0000000000 65535 f
0000000015 00000 n
0000000064 00000 n
0000000125 00000 n
0000000184 00000 n
0000000301 00000 n
0000000379 00000 n
0000000523 00000 n
0000000628 00000 n
0000000721 00000 n
0000000828 00000 n
0000000859 00000 n
trailer << /QTest 2 0 R /Root 1 0 R /Size 12 /ID [<31415926535897932384626433832795><31415926535897932384626433832795>] >>
startxref
964
%%EOF

View File

@ -0,0 +1,81 @@
%PDF-1.3
%¿÷¢þ
1 0 obj
<< /Pages 3 0 R /Type /Catalog >>
endobj
2 0 obj
<< /O1 4 0 R /O2 5 0 R /O3 6 0 R /This-is-QTest true >>
endobj
3 0 obj
<< /Count 2 /Kids [ 7 0 R 6 0 R ] /Type /Pages >>
endobj
4 0 obj
[ /This-is-O1 /potato << /O2 [ 3.14159 << /O2 5 0 R >> 2.17828 ] >> /salad /O2 5 0 R /Stream1 8 0 R ]
endobj
5 0 obj
<< /K1 [ 2.236 /O1 4 0 R 1.732 ] /O1 4 0 R /This-is-O2 true >>
endobj
6 0 obj
<< /Contents 9 0 R /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Resources << /Font << /F1 10 0 R >> /ProcSet [ /PDF /Text ] >> /Rotate 180 /This-is-O3 true /Type /Page >>
endobj
7 0 obj
<< /Contents 11 0 R /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Resources << /Font << /F1 12 0 R >> /ProcSet 13 0 R >> /Type /Page >>
endobj
8 0 obj
<< /Stream2 14 0 R /This-is-Stream1 true /Length 18 >>
stream
This is stream 1.
endstream
endobj
9 0 obj
<< /Length 47 >>
stream
BT /F1 15 Tf 72 720 Td (Original page 2) Tj ET
endstream
endobj
10 0 obj
<< /BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Subtype /Type1 /Type /Font >>
endobj
11 0 obj
<< /Length 44 >>
stream
BT
/F1 24 Tf
72 720 Td
(Potato) Tj
ET
endstream
endobj
12 0 obj
<< /BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font >>
endobj
13 0 obj
[ /PDF /Text ]
endobj
14 0 obj
<< /Stream1 8 0 R /This-is-Stream2 true /Length 18 >>
stream
This is stream 2.
endstream
endobj
xref
0 15
0000000000 65535 f
0000000015 00000 n
0000000064 00000 n
0000000135 00000 n
0000000200 00000 n
0000000317 00000 n
0000000395 00000 n
0000000577 00000 n
0000000723 00000 n
0000000828 00000 n
0000000924 00000 n
0000001024 00000 n
0000001118 00000 n
0000001226 00000 n
0000001257 00000 n
trailer << /QTest 2 0 R /Root 1 0 R /Size 15 /ID [<31415926535897932384626433832795><31415926535897932384626433832795>] >>
startxref
1362
%%EOF

View File

@ -0,0 +1,92 @@
%PDF-1.3
%¿÷¢þ
1 0 obj
<< /Pages 3 0 R /Type /Catalog >>
endobj
2 0 obj
<< /O1 4 0 R /O2 5 0 R /O3 6 0 R /This-is-QTest true >>
endobj
3 0 obj
<< /Count 3 /Kids [ 7 0 R 8 0 R 6 0 R ] /Type /Pages >>
endobj
4 0 obj
[ /This-is-O1 /potato << /O2 [ 3.14159 << /O2 5 0 R >> 2.17828 ] >> /salad /O2 5 0 R /Stream1 9 0 R ]
endobj
5 0 obj
<< /K1 [ 2.236 /O1 4 0 R 1.732 ] /O1 4 0 R /This-is-O2 true >>
endobj
6 0 obj
<< /Contents 10 0 R /MediaBox [ 0 0 612 792 ] /OtherPage 8 0 R /Parent 3 0 R /Resources << /Font << /F1 11 0 R >> /ProcSet [ /PDF /Text ] >> /Rotate 180 /This-is-O3 true /Type /Page >>
endobj
7 0 obj
<< /Contents 12 0 R /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Resources << /Font << /F1 13 0 R >> /ProcSet 14 0 R >> /Type /Page >>
endobj
8 0 obj
<< /Contents 15 0 R /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Resources << /Font << /F1 11 0 R >> /ProcSet [ /PDF /Text ] >> /Rotate 180 /This-is-O3-other-page true /Type /Page >>
endobj
9 0 obj
<< /Stream2 16 0 R /This-is-Stream1 true /Length 18 >>
stream
This is stream 1.
endstream
endobj
10 0 obj
<< /Length 47 >>
stream
BT /F1 15 Tf 72 720 Td (Original page 2) Tj ET
endstream
endobj
11 0 obj
<< /BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Subtype /Type1 /Type /Font >>
endobj
12 0 obj
<< /Length 44 >>
stream
BT
/F1 24 Tf
72 720 Td
(Potato) Tj
ET
endstream
endobj
13 0 obj
<< /BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font >>
endobj
14 0 obj
[ /PDF /Text ]
endobj
15 0 obj
<< /Length 47 >>
stream
BT /F1 15 Tf 72 720 Td (Original page 3) Tj ET
endstream
endobj
16 0 obj
<< /Stream1 9 0 R /This-is-Stream2 true /Length 18 >>
stream
This is stream 2.
endstream
endobj
xref
0 17
0000000000 65535 f
0000000015 00000 n
0000000064 00000 n
0000000135 00000 n
0000000206 00000 n
0000000323 00000 n
0000000401 00000 n
0000000601 00000 n
0000000747 00000 n
0000000941 00000 n
0000001046 00000 n
0000001143 00000 n
0000001243 00000 n
0000001337 00000 n
0000001445 00000 n
0000001476 00000 n
0000001573 00000 n
trailer << /QTest 2 0 R /Root 1 0 R /Size 17 /ID [<31415926535897932384626433832795><31415926535897932384626433832795>] >>
startxref
1678
%%EOF

View File

@ -916,6 +916,89 @@ void runtest(int n, char const* filename)
w.setStreamDataMode(qpdf_s_preserve); w.setStreamDataMode(qpdf_s_preserve);
w.write(); w.write();
} }
else if (n == 25)
{
// The copy object tests are designed to work with a specific
// file. Look at the test suite for the file, and look at the
// file for comments about the file's structure.
// Copy qtest without crossing page boundaries. Should get O1
// and O2 and their streams but not O3 or any other pages.
QPDF newpdf;
newpdf.processFile("minimal.pdf");
QPDFObjectHandle qtest = pdf.getTrailer().getKey("/QTest");
newpdf.getTrailer().replaceKey(
"/QTest", newpdf.copyForeignObject(qtest));
QPDFWriter w(newpdf, "a.pdf");
w.setStaticID(true);
w.setStreamDataMode(qpdf_s_preserve);
w.write();
}
else if (n == 26)
{
// Copy the O3 page using addPage. Copy qtest without
// crossing page boundaries. In addition to previous results,
// should get page O3 but no other pages including the page
// that O3 points to. Also, inherited object will have been
// pushed down and will be preserved.
QPDF newpdf;
newpdf.processFile("minimal.pdf");
QPDFObjectHandle qtest = pdf.getTrailer().getKey("/QTest");
QPDFObjectHandle O3 = qtest.getKey("/O3");
newpdf.addPage(O3, false);
newpdf.getTrailer().replaceKey(
"/QTest", newpdf.copyForeignObject(qtest));
QPDFWriter w(newpdf, "a.pdf");
w.setStaticID(true);
w.setStreamDataMode(qpdf_s_preserve);
w.write();
}
else if (n == 27)
{
// Copy O3 and the page O3 refers to before copying qtest.
// Should get qtest plus only the O3 page and the page that O3
// points to. Inherited objects should be preserved.
QPDF newpdf;
newpdf.processFile("minimal.pdf");
QPDFObjectHandle qtest = pdf.getTrailer().getKey("/QTest");
QPDFObjectHandle O3 = qtest.getKey("/O3");
newpdf.addPage(O3.getKey("/OtherPage"), false);
newpdf.addPage(O3, false);
newpdf.getTrailer().replaceKey(
"/QTest", newpdf.copyForeignObject(qtest));
QPDFWriter w(newpdf, "a.pdf");
w.setStaticID(true);
w.setStreamDataMode(qpdf_s_preserve);
w.write();
}
else if (n == 28)
{
// Copy foreign object errors
try
{
pdf.copyForeignObject(pdf.getTrailer().getKey("/QTest"));
std::cout << "oops -- didn't throw" << std::endl;
}
catch (std::logic_error e)
{
std::cout << "logic error: " << e.what() << std::endl;
}
try
{
pdf.copyForeignObject(QPDFObjectHandle::newInteger(1));
std::cout << "oops -- didn't throw" << std::endl;
}
catch (std::logic_error e)
{
std::cout << "logic error: " << e.what() << std::endl;
}
}
else else
{ {
throw std::runtime_error(std::string("invalid test ") + throw std::runtime_error(std::string("invalid test ") +