Implement TokenFilter and refactor Pl_QPDFTokenizer

Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a
TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a
general filter that passes data through a TokenFilter.
This commit is contained in:
Jay Berkenbilt 2018-02-02 18:21:34 -05:00
parent b8723e97f4
commit 9910104442
16 changed files with 635 additions and 119 deletions

View File

@ -107,6 +107,49 @@
applications that use page-level APIs in QPDFObjectHandle to be
more tolerant of certain types of damaged files.
* Add QPDFObjectHandle::TokenFilter class and methods to use it to
perform lexical filtering on content streams. You can call
QPDFObjectHandle::addTokenFilter on stream object, or you can call
the higher level QPDFObjectHandle::addContentTokenFilter on a page
object to cause the stream's contents to passed through a token
filter while being retrieved by QPDFWriter or any other consumer.
For details on using TokenFilter, please see comments in
QPDFObjectHandle.hh.
* Enhance the string, type QPDFTokenizer::Token constructor to
initialize a raw value in addition to a value. Tokens have a
value, which is a canonical representation, and a raw value. For
all tokens except strings and names, the raw value and the value
are the same. For strings, the value excludes the outer delimiters
and has non-printing characters normalized. For names, the value
resolves non-printing characters. In order to better facilitate
token filters that mostly preserve contents and to enable
developers to be mostly unconcerned about the nuances of token
values and raw values, creating string and name tokens now
properly handles this subtlety of values and raw values. When
constructing string tokens, take care to avoid passing in the
outer delimiters. This has always been the case, but it is now
clarified in comments in QPDFObjectHandle.hh::TokenFilter. This
has no impact on any existing code unless there's some code
somewhere that was relying on Token::getRawValue() returning an
empty string for a manually constructed token. The token class's
operator== method still only looks at type and value, not raw
value. For example, string tokens for <41> and (A) would still be
equal because both are representations of the string "A".
* Add QPDFObjectHandle::isDataModified method. This method just
returns true if addTokenFilter has been called on the stream. It
enables a caller to determine whether it is safe to optimize away
piping of stream data in cases where the input and output are
expected to be the same. QPDFWriter uses this internally to skip
the optimization of not re-compressing already compressed streams
if addTokenFilter has been called. Most developers will not have
to worry about this as it is used internally in the library in the
places that need it. If you are manually retrieving stream data
with QPDFObjectHandle::getStreamData or
QPDFObjectHandle::pipeStreamData, you don't need to worry about
this at all.
2018-02-04 Jay Berkenbilt <ejb@ql.org>
* Add QPDFWriter::setLinearizationPass1Filename method and

View File

@ -35,6 +35,7 @@
#include <qpdf/PointerHolder.hh>
#include <qpdf/Buffer.hh>
#include <qpdf/InputSource.hh>
#include <qpdf/QPDFTokenizer.hh>
#include <qpdf/QPDFObject.hh>
@ -76,6 +77,66 @@ class QPDFObjectHandle
Pipeline* pipeline) = 0;
};
// The TokenFilter class provides a way to filter content streams
// in a lexically aware fashion. TokenFilters can be attached to
// streams using the addTokenFilter or addContentTokenFilter
// methods. The handleToken method is called for each token,
// including the eof token, and then handleEOF is called at the
// very end. Handlers may call write (or writeToken) to pass data
// downstream. The finish() method must be called exactly one time
// to ensure that any written data is flushed out. The default
// handleEOF calls finish. If you override handleEOF, you must
// ensure that finish() is called either there or in response to
// whatever event causes you to terminate creation of output.
// Failure to call finish() may result in some of the data you
// have written being lost. You should not rely on a destructor
// for calling finish() since the destructor call may occur later
// than you expect. Please see examples/token-filters.cc for
// examples of using TokenFilters.
//
// Please note that when you call token.getValue() on a token of
// type tt_string, you get the string value without any
// delimiters. token.getRawValue() will return something suitable
// for being written to output, or calling writeToken with a
// string token will also work. The correct way to construct a
// string token that would write the literal value (str) is
// QPDFTokenizer::Token(QPDFTokenizer::tt_string, "str").
class TokenFilter
{
public:
QPDF_DLL
TokenFilter()
{
}
QPDF_DLL
virtual ~TokenFilter()
{
}
virtual void handleToken(QPDFTokenizer::Token const&) = 0;
virtual void handleEOF()
{
// If you override handleEOF, you must be sure to call
// finish().
finish();
}
// This is called internally by the qpdf library.
void setPipeline(Pipeline*);
protected:
QPDF_DLL
void write(char const* data, size_t len);
QPDF_DLL
void write(std::string const& str);
QPDF_DLL
void writeToken(QPDFTokenizer::Token const&);
QPDF_DLL
void finish();
private:
Pipeline* pipeline;
};
// This class is used by parse to decrypt strings when reading an
// object that contains encrypted strings.
class StringDecrypter
@ -223,6 +284,23 @@ class QPDFObjectHandle
static void parseContentStream(QPDFObjectHandle stream_or_array,
ParserCallbacks* callbacks);
// Attach a token filter to a page's contents. If the page's
// contents is an array of streams, it is automatically coalesced.
// The token filter is applied to the page's contents as a single
// stream.
QPDF_DLL
void addContentTokenFilter(PointerHolder<TokenFilter> token_filter);
// As of qpdf 8, it is possible to add custom token filters to a
// stream. The tokenized stream data is passed through the token
// filter after all original filters but before content stream
// normalization if requested. This is a low-level interface to
// add it to a stream. You will usually want to call
// addContentTokenFilter instead, which can be applied to a page
// object, and which will automatically handle the case of pages
// whose contents are split across multiple streams.
void addTokenFilter(PointerHolder<TokenFilter> token_filter);
// Type-specific factories
QPDF_DLL
static QPDFObjectHandle newNull();
@ -414,6 +492,13 @@ class QPDFObjectHandle
QPDF_DLL
QPDFObjectHandle getDict();
// If addTokenFilter has been called for this stream, then the
// original data should be considered to be modified. This means we
// should avoid optimizations such as not filtering a stream that
// is already compressed.
QPDF_DLL
bool isDataModified();
// Returns filtered (uncompressed) stream data. Throws an
// exception if the stream is filtered and we can't decode it.
QPDF_DLL
@ -608,7 +693,7 @@ class QPDFObjectHandle
// stream or an array of streams. If this page's content is an
// array, concatenate the streams into a single stream. This can
// be useful when working with files that split content streams in
// arbitary spots, such as in the middle of a token, as that can
// arbitrary spots, such as in the middle of a token, as that can
// confuse some software. You could also call this after calling
// addPageContents.
QPDF_DLL

View File

@ -62,13 +62,8 @@ class QPDFTokenizer
{
public:
Token() : type(tt_bad) {}
Token(token_type_e type, std::string const& value) :
type(type),
value(value)
{
}
QPDF_DLL
Token(token_type_e type, std::string const& value);
Token(token_type_e type, std::string const& value,
std::string raw_value, std::string error_message) :
type(type),
@ -93,7 +88,7 @@ class QPDFTokenizer
{
return this->error_message;
}
bool operator==(Token const& rhs)
bool operator==(Token const& rhs) const
{
// Ignore fields other than type and value
return ((this->type != tt_bad) &&

View File

@ -0,0 +1,77 @@
#include <qpdf/ContentNormalizer.hh>
#include <qpdf/QUtil.hh>
ContentNormalizer::ContentNormalizer()
{
}
ContentNormalizer::~ContentNormalizer()
{
}
void
ContentNormalizer::handleToken(QPDFTokenizer::Token const& token)
{
std::string value = token.getRawValue();
QPDFTokenizer::token_type_e token_type = token.getType();
switch (token_type)
{
case QPDFTokenizer::tt_space:
{
size_t len = value.length();
for (size_t i = 0; i < len; ++i)
{
char ch = value.at(i);
if (ch == '\r')
{
if ((i + 1 < len) && (value.at(i + 1) == '\n'))
{
// ignore
}
else
{
write("\n");
}
}
else
{
write(&ch, 1);
}
}
}
break;
case QPDFTokenizer::tt_string:
// Replacing string and name tokens in this way normalizes
// their representation as this will automatically handle
// quoting of unprintable characters, etc.
writeToken(QPDFTokenizer::Token(
QPDFTokenizer::tt_string, token.getValue()));
break;
case QPDFTokenizer::tt_name:
writeToken(QPDFTokenizer::Token(
QPDFTokenizer::tt_name, token.getValue()));
break;
default:
writeToken(token);
break;
}
value = token.getRawValue();
if (((token_type == QPDFTokenizer::tt_string) ||
(token_type == QPDFTokenizer::tt_name)) &&
((value.find('\r') != std::string::npos) ||
(value.find('\n') != std::string::npos)))
{
write("\n");
}
}
void
ContentNormalizer::handleEOF()
{
finish();
}

View File

@ -1,107 +1,51 @@
#include <qpdf/Pl_QPDFTokenizer.hh>
#include <qpdf/QPDF_String.hh>
#include <qpdf/QPDF_Name.hh>
#include <qpdf/QTC.hh>
#include <qpdf/QUtil.hh>
#include <stdexcept>
#include <string.h>
Pl_QPDFTokenizer::Pl_QPDFTokenizer(char const* identifier, Pipeline* next) :
Pipeline(identifier, next),
just_wrote_nl(false),
Pl_QPDFTokenizer::Members::Members() :
filter(0),
last_char_was_cr(false),
unread_char(false),
char_to_unread('\0')
{
tokenizer.allowEOF();
tokenizer.includeIgnorable();
}
Pl_QPDFTokenizer::Members::~Members()
{
}
Pl_QPDFTokenizer::Pl_QPDFTokenizer(
char const* identifier,
QPDFObjectHandle::TokenFilter* filter)
:
Pipeline(identifier, 0),
m(new Members)
{
m->filter = filter;
m->tokenizer.allowEOF();
m->tokenizer.includeIgnorable();
}
Pl_QPDFTokenizer::~Pl_QPDFTokenizer()
{
}
void
Pl_QPDFTokenizer::writeNext(char const* buf, size_t len)
{
if (len)
{
getNext()->write(QUtil::unsigned_char_pointer(buf), len);
this->just_wrote_nl = (buf[len-1] == '\n');
}
}
void
Pl_QPDFTokenizer::writeToken(QPDFTokenizer::Token& token)
{
std::string value = token.getRawValue();
switch (token.getType())
{
case QPDFTokenizer::tt_space:
{
size_t len = value.length();
for (size_t i = 0; i < len; ++i)
{
char ch = value.at(i);
if (ch == '\r')
{
if ((i + 1 < len) && (value.at(i + 1) == '\n'))
{
// ignore
}
else
{
writeNext("\n", 1);
}
}
else
{
writeNext(&ch, 1);
}
}
}
value.clear();
break;
case QPDFTokenizer::tt_string:
value = QPDF_String(token.getValue()).unparse();
break;
case QPDFTokenizer::tt_name:
value = QPDF_Name(token.getValue()).unparse();
break;
default:
break;
}
writeNext(value.c_str(), value.length());
}
void
Pl_QPDFTokenizer::processChar(char ch)
{
tokenizer.presentCharacter(ch);
this->m->tokenizer.presentCharacter(ch);
QPDFTokenizer::Token token;
if (tokenizer.getToken(token, this->unread_char, this->char_to_unread))
if (this->m->tokenizer.getToken(
token, this->m->unread_char, this->m->char_to_unread))
{
writeToken(token);
std::string value = token.getRawValue();
QPDFTokenizer::token_type_e token_type = token.getType();
if (((token_type == QPDFTokenizer::tt_string) ||
(token_type == QPDFTokenizer::tt_name)) &&
((value.find('\r') != std::string::npos) ||
(value.find('\n') != std::string::npos)))
this->m->filter->handleToken(token);
if ((token.getType() == QPDFTokenizer::tt_word) &&
(token.getValue() == "ID"))
{
writeNext("\n", 1);
}
if ((token.getType() == QPDFTokenizer::tt_word) &&
(token.getValue() == "ID"))
{
QTC::TC("qpdf", "Pl_QPDFTokenizer found ID");
tokenizer.expectInlineImage();
}
this->m->tokenizer.expectInlineImage();
}
}
}
@ -109,10 +53,10 @@ Pl_QPDFTokenizer::processChar(char ch)
void
Pl_QPDFTokenizer::checkUnread()
{
if (this->unread_char)
if (this->m->unread_char)
{
processChar(this->char_to_unread);
if (this->unread_char)
processChar(this->m->char_to_unread);
if (this->m->unread_char)
{
throw std::logic_error(
"INTERNAL ERROR: unread_char still true after processing "
@ -135,20 +79,13 @@ Pl_QPDFTokenizer::write(unsigned char* buf, size_t len)
void
Pl_QPDFTokenizer::finish()
{
this->tokenizer.presentEOF();
this->m->tokenizer.presentEOF();
QPDFTokenizer::Token token;
if (tokenizer.getToken(token, this->unread_char, this->char_to_unread))
if (this->m->tokenizer.getToken(
token, this->m->unread_char, this->m->char_to_unread))
{
writeToken(token);
if (unread_char)
{
if (this->char_to_unread == '\r')
{
this->char_to_unread = '\n';
}
writeNext(&this->char_to_unread, 1);
}
this->m->filter->handleToken(token);
}
getNext()->finish();
this->m->filter->handleEOF();
}

View File

@ -62,6 +62,50 @@ CoalesceProvider::provideStreamData(int, int, Pipeline* p)
concat.manualFinish();
}
void
QPDFObjectHandle::TokenFilter::setPipeline(Pipeline* p)
{
this->pipeline = p;
}
void
QPDFObjectHandle::TokenFilter::write(char const* data, size_t len)
{
if (! this->pipeline)
{
throw std::logic_error(
"TokenFilter::write called before setPipeline");
}
if (len)
{
this->pipeline->write(QUtil::unsigned_char_pointer(data), len);
}
}
void
QPDFObjectHandle::TokenFilter::write(std::string const& str)
{
write(str.c_str(), str.length());
}
void
QPDFObjectHandle::TokenFilter::writeToken(QPDFTokenizer::Token const& token)
{
std::string value = token.getRawValue();
write(value.c_str(), value.length());
}
void
QPDFObjectHandle::TokenFilter::finish()
{
if (! this->pipeline)
{
throw std::logic_error(
"TokenFilter::finish called before setPipeline");
}
this->pipeline->finish();
}
void
QPDFObjectHandle::ParserCallbacks::terminateParsing()
{
@ -508,6 +552,13 @@ QPDFObjectHandle::getDict()
return dynamic_cast<QPDF_Stream*>(obj.getPointer())->getDict();
}
bool
QPDFObjectHandle::isDataModified()
{
assertStream();
return dynamic_cast<QPDF_Stream*>(obj.getPointer())->isDataModified();
}
void
QPDFObjectHandle::replaceDict(QPDFObjectHandle new_dict)
{
@ -1033,6 +1084,21 @@ QPDFObjectHandle::parseContentStream_data(
}
}
void
QPDFObjectHandle::addContentTokenFilter(PointerHolder<TokenFilter> filter)
{
coalesceContentStreams();
this->getKey("/Contents").addTokenFilter(filter);
}
void
QPDFObjectHandle::addTokenFilter(PointerHolder<TokenFilter> filter)
{
assertStream();
return dynamic_cast<QPDF_Stream*>(
obj.getPointer())->addTokenFilter(filter);
}
QPDFObjectHandle
QPDFObjectHandle::parse(PointerHolder<InputSource> input,
std::string const& object_description,

View File

@ -7,6 +7,7 @@
#include <qpdf/QTC.hh>
#include <qpdf/QPDFExc.hh>
#include <qpdf/QUtil.hh>
#include <qpdf/QPDFObjectHandle.hh>
#include <stdexcept>
#include <string.h>
@ -39,6 +40,23 @@ QPDFTokenizer::Members::~Members()
{
}
QPDFTokenizer::Token::Token(token_type_e type, std::string const& value) :
type(type),
value(value),
raw_value(value)
{
if (type == tt_string)
{
raw_value = QPDFObjectHandle::newString(value).unparse();
}
else if (type == tt_string)
{
raw_value = QPDFObjectHandle::newName(value).unparse();
}
}
QPDFTokenizer::QPDFTokenizer() :
m(new Members())
{

View File

@ -1591,7 +1591,8 @@ QPDFWriter::unparseObject(QPDFObjectHandle object, int level,
{
is_metadata = true;
}
bool filter = (this->m->compress_streams ||
bool filter = (object.isDataModified() ||
this->m->compress_streams ||
this->m->stream_decode_level);
if (this->m->compress_streams)
{
@ -1602,7 +1603,8 @@ QPDFWriter::unparseObject(QPDFObjectHandle object, int level,
// compressed with a lossy compression scheme, but we
// don't support any of those right now.
QPDFObjectHandle filter_obj = stream_dict.getKey("/Filter");
if (filter_obj.isName() &&
if ((! object.isDataModified()) &&
filter_obj.isName() &&
((filter_obj.getName() == "/FlateDecode") ||
(filter_obj.getName() == "/Fl")))
{

View File

@ -13,7 +13,7 @@
#include <qpdf/Pl_RunLength.hh>
#include <qpdf/Pl_DCT.hh>
#include <qpdf/Pl_Count.hh>
#include <qpdf/ContentNormalizer.hh>
#include <qpdf/QTC.hh>
#include <qpdf/QPDF.hh>
#include <qpdf/QPDFExc.hh>
@ -91,6 +91,12 @@ QPDF_Stream::getDict() const
return this->stream_dict;
}
bool
QPDF_Stream::isDataModified() const
{
return (! this->token_filters.empty());
}
PointerHolder<Buffer>
QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
{
@ -440,21 +446,36 @@ QPDF_Stream::pipeStreamData(Pipeline* pipeline,
// create to be deleted when this function finishes.
std::vector<PointerHolder<Pipeline> > to_delete;
PointerHolder<ContentNormalizer> normalizer;
if (filter)
{
if (encode_flags & qpdf_ef_compress)
{
pipeline = new Pl_Flate("compress object stream", pipeline,
pipeline = new Pl_Flate("compress stream", pipeline,
Pl_Flate::a_deflate);
to_delete.push_back(pipeline);
}
if (encode_flags & qpdf_ef_normalize)
{
pipeline = new Pl_QPDFTokenizer("normalizer", pipeline);
normalizer = new ContentNormalizer();
normalizer->setPipeline(pipeline);
pipeline = new Pl_QPDFTokenizer(
"normalizer", normalizer.getPointer());
to_delete.push_back(pipeline);
}
for (std::vector<PointerHolder<
QPDFObjectHandle::TokenFilter> >::reverse_iterator iter =
this->token_filters.rbegin();
iter != this->token_filters.rend(); ++iter)
{
(*iter)->setPipeline(pipeline);
pipeline = new Pl_QPDFTokenizer(
"token filter", (*iter).getPointer());
to_delete.push_back(pipeline);
}
for (std::vector<std::string>::reverse_iterator iter = filters.rbegin();
iter != filters.rend(); ++iter)
{
@ -612,6 +633,13 @@ QPDF_Stream::replaceStreamData(
replaceFilterData(filter, decode_parms, 0);
}
void
QPDF_Stream::addTokenFilter(
PointerHolder<QPDFObjectHandle::TokenFilter> token_filter)
{
this->token_filters.push_back(token_filter);
}
void
QPDF_Stream::replaceFilterData(QPDFObjectHandle const& filter,
QPDFObjectHandle const& decode_parms,

View File

@ -9,6 +9,7 @@ SRCS_libqpdf = \
libqpdf/BitWriter.cc \
libqpdf/Buffer.cc \
libqpdf/BufferInputSource.cc \
libqpdf/ContentNormalizer.cc \
libqpdf/FileInputSource.cc \
libqpdf/InputSource.cc \
libqpdf/InsecureRandomDataProvider.cc \

View File

@ -0,0 +1,15 @@
#ifndef __CONTENTNORMALIZER_HH__
#define __CONTENTNORMALIZER_HH__
#include <qpdf/QPDFObjectHandle.hh>
class ContentNormalizer: public QPDFObjectHandle::TokenFilter
{
public:
ContentNormalizer();
virtual ~ContentNormalizer();
virtual void handleToken(QPDFTokenizer::Token const&);
virtual void handleEOF();
};
#endif // __CONTENTNORMALIZER_HH__

View File

@ -4,6 +4,8 @@
#include <qpdf/Pipeline.hh>
#include <qpdf/QPDFTokenizer.hh>
#include <qpdf/PointerHolder.hh>
#include <qpdf/QPDFObjectHandle.hh>
//
// Treat incoming text as a stream consisting of valid PDF tokens, but
@ -16,7 +18,8 @@
class Pl_QPDFTokenizer: public Pipeline
{
public:
Pl_QPDFTokenizer(char const* identifier, Pipeline* next);
Pl_QPDFTokenizer(char const* identifier,
QPDFObjectHandle::TokenFilter* filter);
virtual ~Pl_QPDFTokenizer();
virtual void write(unsigned char* buf, size_t len);
virtual void finish();
@ -24,14 +27,25 @@ class Pl_QPDFTokenizer: public Pipeline
private:
void processChar(char ch);
void checkUnread();
void writeNext(char const*, size_t len);
void writeToken(QPDFTokenizer::Token&);
QPDFTokenizer tokenizer;
bool just_wrote_nl;
bool last_char_was_cr;
bool unread_char;
char char_to_unread;
class Members
{
friend class Pl_QPDFTokenizer;
public:
~Members();
private:
Members();
Members(Members const&);
QPDFObjectHandle::TokenFilter* filter;
QPDFTokenizer tokenizer;
bool last_char_was_cr;
bool unread_char;
char char_to_unread;
};
PointerHolder<Members> m;
};
#endif // __PL_QPDFTOKENIZER_HH__

View File

@ -20,6 +20,7 @@ class QPDF_Stream: public QPDFObject
virtual QPDFObject::object_type_e getTypeCode() const;
virtual char const* getTypeName() const;
QPDFObjectHandle getDict() const;
bool isDataModified() const;
// See comments in QPDFObjectHandle.hh for these methods.
bool pipeStreamData(Pipeline*,
@ -35,6 +36,8 @@ class QPDF_Stream: public QPDFObject
PointerHolder<QPDFObjectHandle::StreamDataProvider> provider,
QPDFObjectHandle const& filter,
QPDFObjectHandle const& decode_parms);
void addTokenFilter(
PointerHolder<QPDFObjectHandle::TokenFilter> token_filter);
void replaceDict(QPDFObjectHandle new_dict);
@ -72,6 +75,8 @@ class QPDF_Stream: public QPDFObject
size_t length;
PointerHolder<Buffer> stream_data;
PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider;
std::vector<
PointerHolder<QPDFObjectHandle::TokenFilter> > token_filters;
};
#endif // __QPDF_STREAM_HH__

View File

@ -756,6 +756,19 @@ $td->runtest("check output",
{$td->FILE => "a.pdf"},
{$td->FILE => "coalesce-out.pdf"});
show_ntests();
# ----------
$td->notify("--- Token filters ---");
$n_tests += 2;
$td->runtest("token filter",
{$td->COMMAND => "test_driver 41 coalesce.pdf"},
{$td->STRING => "test 41 done\n", $td->EXIT_STATUS => 0},
$td->NORMALIZE_NEWLINES);
$td->runtest("check output",
{$td->FILE => "a.pdf"},
{$td->FILE => "token-filters-out.pdf"});
show_ntests();
# ----------
$td->notify("--- Newline before endstream ---");

View File

@ -0,0 +1,171 @@
%PDF-1.3
%¿÷¢þ
%QDF-1.0
%% Original object ID: 1 0
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
%% Original object ID: 2 0
2 0 obj
<<
/Count 2
/Kids [
3 0 R
4 0 R
]
/Type /Pages
>>
endobj
%% Page 1
%% Original object ID: 3 0
3 0 obj
<<
/Contents 5 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 7 0 R
>>
/ProcSet 8 0 R
>>
/Type /Page
>>
endobj
%% Page 2
%% Original object ID: 4 0
4 0 obj
<<
/Contents 9 0 R
/MediaBox [
0
0
612
792
]
/Parent 2 0 R
/Resources <<
/Font <<
/F1 11 0 R
>>
/ProcSet 12 0 R
>>
/Type /Page
>>
endobj
%% Contents for page 1
%% Original object ID: 19 0
5 0 obj
<<
/Length 6 0 R
>>
stream
BT
/F1 24 Tf
72 720 Td
(Salad) Tj
ET [ /array/split ] BI
/CS /G/W 66/H 47/BPC 8/F/Fl/DP<</Predictor 15/Columns 66>>
ID xœÅÖIà P|ÿC;UÈ`ÀÓ7 ¦ĘÚæ<C39A>}Dðï_´øÉW©„œÄ-”ˆ>ÿ‡À<E280A1>>”^&®¡uâ]€"!‡•*¬&<26>E|Sy® ðd-€<<3C>B0Bú@Nê+<hlèKÐî/56L <C2A0>ã £–¹¦>0>Y<>ù!cì\Y Ø%Yð¥Ö8?& Öëˆ}jûè<>3<EFBFBD>ÂÖlpÛsHöûtúQØTt*hÌUúãwÍÕÐ%¨)p³"•DiRj¹DYNUÓÙAvFà& <0A>ÍÔu#c•ÆW ô߉W“O
EI/bye
endstream
endobj
6 0 obj
375
endobj
%% Original object ID: 13 0
7 0 obj
<<
/BaseFont /Helvetica
/Encoding /WinAnsiEncoding
/Name /F1
/Subtype /Type1
/Type /Font
>>
endobj
%% Original object ID: 14 0
8 0 obj
[
/PDF
/Text
]
endobj
%% Contents for page 2
%% Original object ID: 15 0
9 0 obj
<<
/Length 10 0 R
>>
stream
BT
/F1 24 Tf
72 720 Td
(Salad) Tj
ET
/bye
endstream
endobj
10 0 obj
48
endobj
%% Original object ID: 17 0
11 0 obj
<<
/BaseFont /Helvetica
/Encoding /WinAnsiEncoding
/Name /F1
/Subtype /Type1
/Type /Font
>>
endobj
%% Original object ID: 18 0
12 0 obj
[
/PDF
/Text
]
endobj
xref
0 13
0000000000 65535 f
0000000052 00000 n
0000000133 00000 n
0000000252 00000 n
0000000481 00000 n
0000000726 00000 n
0000001156 00000 n
0000001204 00000 n
0000001350 00000 n
0000001436 00000 n
0000001540 00000 n
0000001588 00000 n
0000001735 00000 n
trailer <<
/Root 1 0 R
/Size 13
/ID [<fa46a90bcf56476b9904a2e7adb75024><31415926535897932384626433832795>]
>>
startxref
1771
%%EOF

View File

@ -97,6 +97,36 @@ ParserCallbacks::handleEOF()
std::cout << "-EOF-" << std::endl;
}
class TokenFilter: public QPDFObjectHandle::TokenFilter
{
public:
TokenFilter()
{
}
virtual ~TokenFilter()
{
}
virtual void handleToken(QPDFTokenizer::Token const& t)
{
if (t == QPDFTokenizer::Token(QPDFTokenizer::tt_string, "Potato"))
{
// Exercise unparsing of strings by token constructor
writeToken(
QPDFTokenizer::Token(QPDFTokenizer::tt_string, "Salad"));
}
else
{
writeToken(t);
}
}
virtual void handleEOF()
{
writeToken(QPDFTokenizer::Token(QPDFTokenizer::tt_name, "/bye"));
write("\n");
finish();
}
};
static std::string getPageContents(QPDFObjectHandle page)
{
PointerHolder<Buffer> b1 =
@ -1345,6 +1375,22 @@ void runtest(int n, char const* filename1, char const* arg2)
w.setStaticID(true);
w.write();
}
else if (n == 41)
{
// Apply a token filter. This test case is crafted to work
// with coalesce.pdf.
std::vector<QPDFObjectHandle> pages = pdf.getAllPages();
for (std::vector<QPDFObjectHandle>::iterator iter =
pages.begin();
iter != pages.end(); ++iter)
{
(*iter).addContentTokenFilter(new TokenFilter);
}
QPDFWriter w(pdf, "a.pdf");
w.setQDFMode(true);
w.setStaticID(true);
w.write();
}
else
{
throw std::runtime_error(std::string("invalid test ") +