mirror of
https://github.com/qpdf/qpdf.git
synced 2024-05-29 08:20:53 +00:00
Implement TokenFilter and refactor Pl_QPDFTokenizer
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a general filter that passes data through a TokenFilter.
This commit is contained in:
parent
b8723e97f4
commit
9910104442
43
ChangeLog
43
ChangeLog
|
@ -107,6 +107,49 @@
|
||||||
applications that use page-level APIs in QPDFObjectHandle to be
|
applications that use page-level APIs in QPDFObjectHandle to be
|
||||||
more tolerant of certain types of damaged files.
|
more tolerant of certain types of damaged files.
|
||||||
|
|
||||||
|
* Add QPDFObjectHandle::TokenFilter class and methods to use it to
|
||||||
|
perform lexical filtering on content streams. You can call
|
||||||
|
QPDFObjectHandle::addTokenFilter on stream object, or you can call
|
||||||
|
the higher level QPDFObjectHandle::addContentTokenFilter on a page
|
||||||
|
object to cause the stream's contents to passed through a token
|
||||||
|
filter while being retrieved by QPDFWriter or any other consumer.
|
||||||
|
For details on using TokenFilter, please see comments in
|
||||||
|
QPDFObjectHandle.hh.
|
||||||
|
|
||||||
|
* Enhance the string, type QPDFTokenizer::Token constructor to
|
||||||
|
initialize a raw value in addition to a value. Tokens have a
|
||||||
|
value, which is a canonical representation, and a raw value. For
|
||||||
|
all tokens except strings and names, the raw value and the value
|
||||||
|
are the same. For strings, the value excludes the outer delimiters
|
||||||
|
and has non-printing characters normalized. For names, the value
|
||||||
|
resolves non-printing characters. In order to better facilitate
|
||||||
|
token filters that mostly preserve contents and to enable
|
||||||
|
developers to be mostly unconcerned about the nuances of token
|
||||||
|
values and raw values, creating string and name tokens now
|
||||||
|
properly handles this subtlety of values and raw values. When
|
||||||
|
constructing string tokens, take care to avoid passing in the
|
||||||
|
outer delimiters. This has always been the case, but it is now
|
||||||
|
clarified in comments in QPDFObjectHandle.hh::TokenFilter. This
|
||||||
|
has no impact on any existing code unless there's some code
|
||||||
|
somewhere that was relying on Token::getRawValue() returning an
|
||||||
|
empty string for a manually constructed token. The token class's
|
||||||
|
operator== method still only looks at type and value, not raw
|
||||||
|
value. For example, string tokens for <41> and (A) would still be
|
||||||
|
equal because both are representations of the string "A".
|
||||||
|
|
||||||
|
* Add QPDFObjectHandle::isDataModified method. This method just
|
||||||
|
returns true if addTokenFilter has been called on the stream. It
|
||||||
|
enables a caller to determine whether it is safe to optimize away
|
||||||
|
piping of stream data in cases where the input and output are
|
||||||
|
expected to be the same. QPDFWriter uses this internally to skip
|
||||||
|
the optimization of not re-compressing already compressed streams
|
||||||
|
if addTokenFilter has been called. Most developers will not have
|
||||||
|
to worry about this as it is used internally in the library in the
|
||||||
|
places that need it. If you are manually retrieving stream data
|
||||||
|
with QPDFObjectHandle::getStreamData or
|
||||||
|
QPDFObjectHandle::pipeStreamData, you don't need to worry about
|
||||||
|
this at all.
|
||||||
|
|
||||||
2018-02-04 Jay Berkenbilt <ejb@ql.org>
|
2018-02-04 Jay Berkenbilt <ejb@ql.org>
|
||||||
|
|
||||||
* Add QPDFWriter::setLinearizationPass1Filename method and
|
* Add QPDFWriter::setLinearizationPass1Filename method and
|
||||||
|
|
|
@ -35,6 +35,7 @@
|
||||||
#include <qpdf/PointerHolder.hh>
|
#include <qpdf/PointerHolder.hh>
|
||||||
#include <qpdf/Buffer.hh>
|
#include <qpdf/Buffer.hh>
|
||||||
#include <qpdf/InputSource.hh>
|
#include <qpdf/InputSource.hh>
|
||||||
|
#include <qpdf/QPDFTokenizer.hh>
|
||||||
|
|
||||||
#include <qpdf/QPDFObject.hh>
|
#include <qpdf/QPDFObject.hh>
|
||||||
|
|
||||||
|
@ -76,6 +77,66 @@ class QPDFObjectHandle
|
||||||
Pipeline* pipeline) = 0;
|
Pipeline* pipeline) = 0;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// The TokenFilter class provides a way to filter content streams
|
||||||
|
// in a lexically aware fashion. TokenFilters can be attached to
|
||||||
|
// streams using the addTokenFilter or addContentTokenFilter
|
||||||
|
// methods. The handleToken method is called for each token,
|
||||||
|
// including the eof token, and then handleEOF is called at the
|
||||||
|
// very end. Handlers may call write (or writeToken) to pass data
|
||||||
|
// downstream. The finish() method must be called exactly one time
|
||||||
|
// to ensure that any written data is flushed out. The default
|
||||||
|
// handleEOF calls finish. If you override handleEOF, you must
|
||||||
|
// ensure that finish() is called either there or in response to
|
||||||
|
// whatever event causes you to terminate creation of output.
|
||||||
|
// Failure to call finish() may result in some of the data you
|
||||||
|
// have written being lost. You should not rely on a destructor
|
||||||
|
// for calling finish() since the destructor call may occur later
|
||||||
|
// than you expect. Please see examples/token-filters.cc for
|
||||||
|
// examples of using TokenFilters.
|
||||||
|
//
|
||||||
|
// Please note that when you call token.getValue() on a token of
|
||||||
|
// type tt_string, you get the string value without any
|
||||||
|
// delimiters. token.getRawValue() will return something suitable
|
||||||
|
// for being written to output, or calling writeToken with a
|
||||||
|
// string token will also work. The correct way to construct a
|
||||||
|
// string token that would write the literal value (str) is
|
||||||
|
// QPDFTokenizer::Token(QPDFTokenizer::tt_string, "str").
|
||||||
|
class TokenFilter
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
QPDF_DLL
|
||||||
|
TokenFilter()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
QPDF_DLL
|
||||||
|
virtual ~TokenFilter()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
virtual void handleToken(QPDFTokenizer::Token const&) = 0;
|
||||||
|
virtual void handleEOF()
|
||||||
|
{
|
||||||
|
// If you override handleEOF, you must be sure to call
|
||||||
|
// finish().
|
||||||
|
finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
// This is called internally by the qpdf library.
|
||||||
|
void setPipeline(Pipeline*);
|
||||||
|
|
||||||
|
protected:
|
||||||
|
QPDF_DLL
|
||||||
|
void write(char const* data, size_t len);
|
||||||
|
QPDF_DLL
|
||||||
|
void write(std::string const& str);
|
||||||
|
QPDF_DLL
|
||||||
|
void writeToken(QPDFTokenizer::Token const&);
|
||||||
|
QPDF_DLL
|
||||||
|
void finish();
|
||||||
|
|
||||||
|
private:
|
||||||
|
Pipeline* pipeline;
|
||||||
|
};
|
||||||
|
|
||||||
// This class is used by parse to decrypt strings when reading an
|
// This class is used by parse to decrypt strings when reading an
|
||||||
// object that contains encrypted strings.
|
// object that contains encrypted strings.
|
||||||
class StringDecrypter
|
class StringDecrypter
|
||||||
|
@ -223,6 +284,23 @@ class QPDFObjectHandle
|
||||||
static void parseContentStream(QPDFObjectHandle stream_or_array,
|
static void parseContentStream(QPDFObjectHandle stream_or_array,
|
||||||
ParserCallbacks* callbacks);
|
ParserCallbacks* callbacks);
|
||||||
|
|
||||||
|
// Attach a token filter to a page's contents. If the page's
|
||||||
|
// contents is an array of streams, it is automatically coalesced.
|
||||||
|
// The token filter is applied to the page's contents as a single
|
||||||
|
// stream.
|
||||||
|
QPDF_DLL
|
||||||
|
void addContentTokenFilter(PointerHolder<TokenFilter> token_filter);
|
||||||
|
|
||||||
|
// As of qpdf 8, it is possible to add custom token filters to a
|
||||||
|
// stream. The tokenized stream data is passed through the token
|
||||||
|
// filter after all original filters but before content stream
|
||||||
|
// normalization if requested. This is a low-level interface to
|
||||||
|
// add it to a stream. You will usually want to call
|
||||||
|
// addContentTokenFilter instead, which can be applied to a page
|
||||||
|
// object, and which will automatically handle the case of pages
|
||||||
|
// whose contents are split across multiple streams.
|
||||||
|
void addTokenFilter(PointerHolder<TokenFilter> token_filter);
|
||||||
|
|
||||||
// Type-specific factories
|
// Type-specific factories
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
static QPDFObjectHandle newNull();
|
static QPDFObjectHandle newNull();
|
||||||
|
@ -414,6 +492,13 @@ class QPDFObjectHandle
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
QPDFObjectHandle getDict();
|
QPDFObjectHandle getDict();
|
||||||
|
|
||||||
|
// If addTokenFilter has been called for this stream, then the
|
||||||
|
// original data should be considered to be modified. This means we
|
||||||
|
// should avoid optimizations such as not filtering a stream that
|
||||||
|
// is already compressed.
|
||||||
|
QPDF_DLL
|
||||||
|
bool isDataModified();
|
||||||
|
|
||||||
// Returns filtered (uncompressed) stream data. Throws an
|
// Returns filtered (uncompressed) stream data. Throws an
|
||||||
// exception if the stream is filtered and we can't decode it.
|
// exception if the stream is filtered and we can't decode it.
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
|
@ -608,7 +693,7 @@ class QPDFObjectHandle
|
||||||
// stream or an array of streams. If this page's content is an
|
// stream or an array of streams. If this page's content is an
|
||||||
// array, concatenate the streams into a single stream. This can
|
// array, concatenate the streams into a single stream. This can
|
||||||
// be useful when working with files that split content streams in
|
// be useful when working with files that split content streams in
|
||||||
// arbitary spots, such as in the middle of a token, as that can
|
// arbitrary spots, such as in the middle of a token, as that can
|
||||||
// confuse some software. You could also call this after calling
|
// confuse some software. You could also call this after calling
|
||||||
// addPageContents.
|
// addPageContents.
|
||||||
QPDF_DLL
|
QPDF_DLL
|
||||||
|
|
|
@ -62,13 +62,8 @@ class QPDFTokenizer
|
||||||
{
|
{
|
||||||
public:
|
public:
|
||||||
Token() : type(tt_bad) {}
|
Token() : type(tt_bad) {}
|
||||||
|
QPDF_DLL
|
||||||
Token(token_type_e type, std::string const& value) :
|
Token(token_type_e type, std::string const& value);
|
||||||
type(type),
|
|
||||||
value(value)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
Token(token_type_e type, std::string const& value,
|
Token(token_type_e type, std::string const& value,
|
||||||
std::string raw_value, std::string error_message) :
|
std::string raw_value, std::string error_message) :
|
||||||
type(type),
|
type(type),
|
||||||
|
@ -93,7 +88,7 @@ class QPDFTokenizer
|
||||||
{
|
{
|
||||||
return this->error_message;
|
return this->error_message;
|
||||||
}
|
}
|
||||||
bool operator==(Token const& rhs)
|
bool operator==(Token const& rhs) const
|
||||||
{
|
{
|
||||||
// Ignore fields other than type and value
|
// Ignore fields other than type and value
|
||||||
return ((this->type != tt_bad) &&
|
return ((this->type != tt_bad) &&
|
||||||
|
|
77
libqpdf/ContentNormalizer.cc
Normal file
77
libqpdf/ContentNormalizer.cc
Normal file
|
@ -0,0 +1,77 @@
|
||||||
|
#include <qpdf/ContentNormalizer.hh>
|
||||||
|
#include <qpdf/QUtil.hh>
|
||||||
|
|
||||||
|
ContentNormalizer::ContentNormalizer()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
ContentNormalizer::~ContentNormalizer()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
ContentNormalizer::handleToken(QPDFTokenizer::Token const& token)
|
||||||
|
{
|
||||||
|
std::string value = token.getRawValue();
|
||||||
|
QPDFTokenizer::token_type_e token_type = token.getType();
|
||||||
|
|
||||||
|
switch (token_type)
|
||||||
|
{
|
||||||
|
case QPDFTokenizer::tt_space:
|
||||||
|
{
|
||||||
|
size_t len = value.length();
|
||||||
|
for (size_t i = 0; i < len; ++i)
|
||||||
|
{
|
||||||
|
char ch = value.at(i);
|
||||||
|
if (ch == '\r')
|
||||||
|
{
|
||||||
|
if ((i + 1 < len) && (value.at(i + 1) == '\n'))
|
||||||
|
{
|
||||||
|
// ignore
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
write("\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
write(&ch, 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
|
case QPDFTokenizer::tt_string:
|
||||||
|
// Replacing string and name tokens in this way normalizes
|
||||||
|
// their representation as this will automatically handle
|
||||||
|
// quoting of unprintable characters, etc.
|
||||||
|
writeToken(QPDFTokenizer::Token(
|
||||||
|
QPDFTokenizer::tt_string, token.getValue()));
|
||||||
|
break;
|
||||||
|
|
||||||
|
case QPDFTokenizer::tt_name:
|
||||||
|
writeToken(QPDFTokenizer::Token(
|
||||||
|
QPDFTokenizer::tt_name, token.getValue()));
|
||||||
|
break;
|
||||||
|
|
||||||
|
default:
|
||||||
|
writeToken(token);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
value = token.getRawValue();
|
||||||
|
if (((token_type == QPDFTokenizer::tt_string) ||
|
||||||
|
(token_type == QPDFTokenizer::tt_name)) &&
|
||||||
|
((value.find('\r') != std::string::npos) ||
|
||||||
|
(value.find('\n') != std::string::npos)))
|
||||||
|
{
|
||||||
|
write("\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
ContentNormalizer::handleEOF()
|
||||||
|
{
|
||||||
|
finish();
|
||||||
|
}
|
|
@ -1,107 +1,51 @@
|
||||||
#include <qpdf/Pl_QPDFTokenizer.hh>
|
#include <qpdf/Pl_QPDFTokenizer.hh>
|
||||||
#include <qpdf/QPDF_String.hh>
|
|
||||||
#include <qpdf/QPDF_Name.hh>
|
|
||||||
#include <qpdf/QTC.hh>
|
#include <qpdf/QTC.hh>
|
||||||
#include <qpdf/QUtil.hh>
|
|
||||||
#include <stdexcept>
|
#include <stdexcept>
|
||||||
#include <string.h>
|
#include <string.h>
|
||||||
|
|
||||||
Pl_QPDFTokenizer::Pl_QPDFTokenizer(char const* identifier, Pipeline* next) :
|
Pl_QPDFTokenizer::Members::Members() :
|
||||||
Pipeline(identifier, next),
|
filter(0),
|
||||||
just_wrote_nl(false),
|
|
||||||
last_char_was_cr(false),
|
last_char_was_cr(false),
|
||||||
unread_char(false),
|
unread_char(false),
|
||||||
char_to_unread('\0')
|
char_to_unread('\0')
|
||||||
{
|
{
|
||||||
tokenizer.allowEOF();
|
}
|
||||||
tokenizer.includeIgnorable();
|
|
||||||
|
Pl_QPDFTokenizer::Members::~Members()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
Pl_QPDFTokenizer::Pl_QPDFTokenizer(
|
||||||
|
char const* identifier,
|
||||||
|
QPDFObjectHandle::TokenFilter* filter)
|
||||||
|
:
|
||||||
|
Pipeline(identifier, 0),
|
||||||
|
m(new Members)
|
||||||
|
{
|
||||||
|
m->filter = filter;
|
||||||
|
m->tokenizer.allowEOF();
|
||||||
|
m->tokenizer.includeIgnorable();
|
||||||
}
|
}
|
||||||
|
|
||||||
Pl_QPDFTokenizer::~Pl_QPDFTokenizer()
|
Pl_QPDFTokenizer::~Pl_QPDFTokenizer()
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
|
|
||||||
void
|
|
||||||
Pl_QPDFTokenizer::writeNext(char const* buf, size_t len)
|
|
||||||
{
|
|
||||||
if (len)
|
|
||||||
{
|
|
||||||
getNext()->write(QUtil::unsigned_char_pointer(buf), len);
|
|
||||||
this->just_wrote_nl = (buf[len-1] == '\n');
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
void
|
|
||||||
Pl_QPDFTokenizer::writeToken(QPDFTokenizer::Token& token)
|
|
||||||
{
|
|
||||||
std::string value = token.getRawValue();
|
|
||||||
|
|
||||||
switch (token.getType())
|
|
||||||
{
|
|
||||||
case QPDFTokenizer::tt_space:
|
|
||||||
{
|
|
||||||
size_t len = value.length();
|
|
||||||
for (size_t i = 0; i < len; ++i)
|
|
||||||
{
|
|
||||||
char ch = value.at(i);
|
|
||||||
if (ch == '\r')
|
|
||||||
{
|
|
||||||
if ((i + 1 < len) && (value.at(i + 1) == '\n'))
|
|
||||||
{
|
|
||||||
// ignore
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
writeNext("\n", 1);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
writeNext(&ch, 1);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
value.clear();
|
|
||||||
break;
|
|
||||||
|
|
||||||
case QPDFTokenizer::tt_string:
|
|
||||||
value = QPDF_String(token.getValue()).unparse();
|
|
||||||
|
|
||||||
break;
|
|
||||||
|
|
||||||
case QPDFTokenizer::tt_name:
|
|
||||||
value = QPDF_Name(token.getValue()).unparse();
|
|
||||||
break;
|
|
||||||
|
|
||||||
default:
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
writeNext(value.c_str(), value.length());
|
|
||||||
}
|
|
||||||
|
|
||||||
void
|
void
|
||||||
Pl_QPDFTokenizer::processChar(char ch)
|
Pl_QPDFTokenizer::processChar(char ch)
|
||||||
{
|
{
|
||||||
tokenizer.presentCharacter(ch);
|
this->m->tokenizer.presentCharacter(ch);
|
||||||
QPDFTokenizer::Token token;
|
QPDFTokenizer::Token token;
|
||||||
if (tokenizer.getToken(token, this->unread_char, this->char_to_unread))
|
if (this->m->tokenizer.getToken(
|
||||||
|
token, this->m->unread_char, this->m->char_to_unread))
|
||||||
{
|
{
|
||||||
writeToken(token);
|
this->m->filter->handleToken(token);
|
||||||
std::string value = token.getRawValue();
|
if ((token.getType() == QPDFTokenizer::tt_word) &&
|
||||||
QPDFTokenizer::token_type_e token_type = token.getType();
|
(token.getValue() == "ID"))
|
||||||
if (((token_type == QPDFTokenizer::tt_string) ||
|
|
||||||
(token_type == QPDFTokenizer::tt_name)) &&
|
|
||||||
((value.find('\r') != std::string::npos) ||
|
|
||||||
(value.find('\n') != std::string::npos)))
|
|
||||||
{
|
{
|
||||||
writeNext("\n", 1);
|
|
||||||
}
|
|
||||||
if ((token.getType() == QPDFTokenizer::tt_word) &&
|
|
||||||
(token.getValue() == "ID"))
|
|
||||||
{
|
|
||||||
QTC::TC("qpdf", "Pl_QPDFTokenizer found ID");
|
QTC::TC("qpdf", "Pl_QPDFTokenizer found ID");
|
||||||
tokenizer.expectInlineImage();
|
this->m->tokenizer.expectInlineImage();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -109,10 +53,10 @@ Pl_QPDFTokenizer::processChar(char ch)
|
||||||
void
|
void
|
||||||
Pl_QPDFTokenizer::checkUnread()
|
Pl_QPDFTokenizer::checkUnread()
|
||||||
{
|
{
|
||||||
if (this->unread_char)
|
if (this->m->unread_char)
|
||||||
{
|
{
|
||||||
processChar(this->char_to_unread);
|
processChar(this->m->char_to_unread);
|
||||||
if (this->unread_char)
|
if (this->m->unread_char)
|
||||||
{
|
{
|
||||||
throw std::logic_error(
|
throw std::logic_error(
|
||||||
"INTERNAL ERROR: unread_char still true after processing "
|
"INTERNAL ERROR: unread_char still true after processing "
|
||||||
|
@ -135,20 +79,13 @@ Pl_QPDFTokenizer::write(unsigned char* buf, size_t len)
|
||||||
void
|
void
|
||||||
Pl_QPDFTokenizer::finish()
|
Pl_QPDFTokenizer::finish()
|
||||||
{
|
{
|
||||||
this->tokenizer.presentEOF();
|
this->m->tokenizer.presentEOF();
|
||||||
QPDFTokenizer::Token token;
|
QPDFTokenizer::Token token;
|
||||||
if (tokenizer.getToken(token, this->unread_char, this->char_to_unread))
|
if (this->m->tokenizer.getToken(
|
||||||
|
token, this->m->unread_char, this->m->char_to_unread))
|
||||||
{
|
{
|
||||||
writeToken(token);
|
this->m->filter->handleToken(token);
|
||||||
if (unread_char)
|
|
||||||
{
|
|
||||||
if (this->char_to_unread == '\r')
|
|
||||||
{
|
|
||||||
this->char_to_unread = '\n';
|
|
||||||
}
|
|
||||||
writeNext(&this->char_to_unread, 1);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
getNext()->finish();
|
this->m->filter->handleEOF();
|
||||||
}
|
}
|
||||||
|
|
|
@ -62,6 +62,50 @@ CoalesceProvider::provideStreamData(int, int, Pipeline* p)
|
||||||
concat.manualFinish();
|
concat.manualFinish();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::TokenFilter::setPipeline(Pipeline* p)
|
||||||
|
{
|
||||||
|
this->pipeline = p;
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::TokenFilter::write(char const* data, size_t len)
|
||||||
|
{
|
||||||
|
if (! this->pipeline)
|
||||||
|
{
|
||||||
|
throw std::logic_error(
|
||||||
|
"TokenFilter::write called before setPipeline");
|
||||||
|
}
|
||||||
|
if (len)
|
||||||
|
{
|
||||||
|
this->pipeline->write(QUtil::unsigned_char_pointer(data), len);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::TokenFilter::write(std::string const& str)
|
||||||
|
{
|
||||||
|
write(str.c_str(), str.length());
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::TokenFilter::writeToken(QPDFTokenizer::Token const& token)
|
||||||
|
{
|
||||||
|
std::string value = token.getRawValue();
|
||||||
|
write(value.c_str(), value.length());
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::TokenFilter::finish()
|
||||||
|
{
|
||||||
|
if (! this->pipeline)
|
||||||
|
{
|
||||||
|
throw std::logic_error(
|
||||||
|
"TokenFilter::finish called before setPipeline");
|
||||||
|
}
|
||||||
|
this->pipeline->finish();
|
||||||
|
}
|
||||||
|
|
||||||
void
|
void
|
||||||
QPDFObjectHandle::ParserCallbacks::terminateParsing()
|
QPDFObjectHandle::ParserCallbacks::terminateParsing()
|
||||||
{
|
{
|
||||||
|
@ -508,6 +552,13 @@ QPDFObjectHandle::getDict()
|
||||||
return dynamic_cast<QPDF_Stream*>(obj.getPointer())->getDict();
|
return dynamic_cast<QPDF_Stream*>(obj.getPointer())->getDict();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool
|
||||||
|
QPDFObjectHandle::isDataModified()
|
||||||
|
{
|
||||||
|
assertStream();
|
||||||
|
return dynamic_cast<QPDF_Stream*>(obj.getPointer())->isDataModified();
|
||||||
|
}
|
||||||
|
|
||||||
void
|
void
|
||||||
QPDFObjectHandle::replaceDict(QPDFObjectHandle new_dict)
|
QPDFObjectHandle::replaceDict(QPDFObjectHandle new_dict)
|
||||||
{
|
{
|
||||||
|
@ -1033,6 +1084,21 @@ QPDFObjectHandle::parseContentStream_data(
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::addContentTokenFilter(PointerHolder<TokenFilter> filter)
|
||||||
|
{
|
||||||
|
coalesceContentStreams();
|
||||||
|
this->getKey("/Contents").addTokenFilter(filter);
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDFObjectHandle::addTokenFilter(PointerHolder<TokenFilter> filter)
|
||||||
|
{
|
||||||
|
assertStream();
|
||||||
|
return dynamic_cast<QPDF_Stream*>(
|
||||||
|
obj.getPointer())->addTokenFilter(filter);
|
||||||
|
}
|
||||||
|
|
||||||
QPDFObjectHandle
|
QPDFObjectHandle
|
||||||
QPDFObjectHandle::parse(PointerHolder<InputSource> input,
|
QPDFObjectHandle::parse(PointerHolder<InputSource> input,
|
||||||
std::string const& object_description,
|
std::string const& object_description,
|
||||||
|
|
|
@ -7,6 +7,7 @@
|
||||||
#include <qpdf/QTC.hh>
|
#include <qpdf/QTC.hh>
|
||||||
#include <qpdf/QPDFExc.hh>
|
#include <qpdf/QPDFExc.hh>
|
||||||
#include <qpdf/QUtil.hh>
|
#include <qpdf/QUtil.hh>
|
||||||
|
#include <qpdf/QPDFObjectHandle.hh>
|
||||||
|
|
||||||
#include <stdexcept>
|
#include <stdexcept>
|
||||||
#include <string.h>
|
#include <string.h>
|
||||||
|
@ -39,6 +40,23 @@ QPDFTokenizer::Members::~Members()
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
|
|
||||||
|
QPDFTokenizer::Token::Token(token_type_e type, std::string const& value) :
|
||||||
|
type(type),
|
||||||
|
value(value),
|
||||||
|
raw_value(value)
|
||||||
|
{
|
||||||
|
if (type == tt_string)
|
||||||
|
{
|
||||||
|
raw_value = QPDFObjectHandle::newString(value).unparse();
|
||||||
|
}
|
||||||
|
else if (type == tt_string)
|
||||||
|
{
|
||||||
|
raw_value = QPDFObjectHandle::newName(value).unparse();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
QPDFTokenizer::QPDFTokenizer() :
|
QPDFTokenizer::QPDFTokenizer() :
|
||||||
m(new Members())
|
m(new Members())
|
||||||
{
|
{
|
||||||
|
|
|
@ -1591,7 +1591,8 @@ QPDFWriter::unparseObject(QPDFObjectHandle object, int level,
|
||||||
{
|
{
|
||||||
is_metadata = true;
|
is_metadata = true;
|
||||||
}
|
}
|
||||||
bool filter = (this->m->compress_streams ||
|
bool filter = (object.isDataModified() ||
|
||||||
|
this->m->compress_streams ||
|
||||||
this->m->stream_decode_level);
|
this->m->stream_decode_level);
|
||||||
if (this->m->compress_streams)
|
if (this->m->compress_streams)
|
||||||
{
|
{
|
||||||
|
@ -1602,7 +1603,8 @@ QPDFWriter::unparseObject(QPDFObjectHandle object, int level,
|
||||||
// compressed with a lossy compression scheme, but we
|
// compressed with a lossy compression scheme, but we
|
||||||
// don't support any of those right now.
|
// don't support any of those right now.
|
||||||
QPDFObjectHandle filter_obj = stream_dict.getKey("/Filter");
|
QPDFObjectHandle filter_obj = stream_dict.getKey("/Filter");
|
||||||
if (filter_obj.isName() &&
|
if ((! object.isDataModified()) &&
|
||||||
|
filter_obj.isName() &&
|
||||||
((filter_obj.getName() == "/FlateDecode") ||
|
((filter_obj.getName() == "/FlateDecode") ||
|
||||||
(filter_obj.getName() == "/Fl")))
|
(filter_obj.getName() == "/Fl")))
|
||||||
{
|
{
|
||||||
|
|
|
@ -13,7 +13,7 @@
|
||||||
#include <qpdf/Pl_RunLength.hh>
|
#include <qpdf/Pl_RunLength.hh>
|
||||||
#include <qpdf/Pl_DCT.hh>
|
#include <qpdf/Pl_DCT.hh>
|
||||||
#include <qpdf/Pl_Count.hh>
|
#include <qpdf/Pl_Count.hh>
|
||||||
|
#include <qpdf/ContentNormalizer.hh>
|
||||||
#include <qpdf/QTC.hh>
|
#include <qpdf/QTC.hh>
|
||||||
#include <qpdf/QPDF.hh>
|
#include <qpdf/QPDF.hh>
|
||||||
#include <qpdf/QPDFExc.hh>
|
#include <qpdf/QPDFExc.hh>
|
||||||
|
@ -91,6 +91,12 @@ QPDF_Stream::getDict() const
|
||||||
return this->stream_dict;
|
return this->stream_dict;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool
|
||||||
|
QPDF_Stream::isDataModified() const
|
||||||
|
{
|
||||||
|
return (! this->token_filters.empty());
|
||||||
|
}
|
||||||
|
|
||||||
PointerHolder<Buffer>
|
PointerHolder<Buffer>
|
||||||
QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
|
QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
|
||||||
{
|
{
|
||||||
|
@ -440,21 +446,36 @@ QPDF_Stream::pipeStreamData(Pipeline* pipeline,
|
||||||
// create to be deleted when this function finishes.
|
// create to be deleted when this function finishes.
|
||||||
std::vector<PointerHolder<Pipeline> > to_delete;
|
std::vector<PointerHolder<Pipeline> > to_delete;
|
||||||
|
|
||||||
|
PointerHolder<ContentNormalizer> normalizer;
|
||||||
if (filter)
|
if (filter)
|
||||||
{
|
{
|
||||||
if (encode_flags & qpdf_ef_compress)
|
if (encode_flags & qpdf_ef_compress)
|
||||||
{
|
{
|
||||||
pipeline = new Pl_Flate("compress object stream", pipeline,
|
pipeline = new Pl_Flate("compress stream", pipeline,
|
||||||
Pl_Flate::a_deflate);
|
Pl_Flate::a_deflate);
|
||||||
to_delete.push_back(pipeline);
|
to_delete.push_back(pipeline);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (encode_flags & qpdf_ef_normalize)
|
if (encode_flags & qpdf_ef_normalize)
|
||||||
{
|
{
|
||||||
pipeline = new Pl_QPDFTokenizer("normalizer", pipeline);
|
normalizer = new ContentNormalizer();
|
||||||
|
normalizer->setPipeline(pipeline);
|
||||||
|
pipeline = new Pl_QPDFTokenizer(
|
||||||
|
"normalizer", normalizer.getPointer());
|
||||||
to_delete.push_back(pipeline);
|
to_delete.push_back(pipeline);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
for (std::vector<PointerHolder<
|
||||||
|
QPDFObjectHandle::TokenFilter> >::reverse_iterator iter =
|
||||||
|
this->token_filters.rbegin();
|
||||||
|
iter != this->token_filters.rend(); ++iter)
|
||||||
|
{
|
||||||
|
(*iter)->setPipeline(pipeline);
|
||||||
|
pipeline = new Pl_QPDFTokenizer(
|
||||||
|
"token filter", (*iter).getPointer());
|
||||||
|
to_delete.push_back(pipeline);
|
||||||
|
}
|
||||||
|
|
||||||
for (std::vector<std::string>::reverse_iterator iter = filters.rbegin();
|
for (std::vector<std::string>::reverse_iterator iter = filters.rbegin();
|
||||||
iter != filters.rend(); ++iter)
|
iter != filters.rend(); ++iter)
|
||||||
{
|
{
|
||||||
|
@ -612,6 +633,13 @@ QPDF_Stream::replaceStreamData(
|
||||||
replaceFilterData(filter, decode_parms, 0);
|
replaceFilterData(filter, decode_parms, 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
QPDF_Stream::addTokenFilter(
|
||||||
|
PointerHolder<QPDFObjectHandle::TokenFilter> token_filter)
|
||||||
|
{
|
||||||
|
this->token_filters.push_back(token_filter);
|
||||||
|
}
|
||||||
|
|
||||||
void
|
void
|
||||||
QPDF_Stream::replaceFilterData(QPDFObjectHandle const& filter,
|
QPDF_Stream::replaceFilterData(QPDFObjectHandle const& filter,
|
||||||
QPDFObjectHandle const& decode_parms,
|
QPDFObjectHandle const& decode_parms,
|
||||||
|
|
|
@ -9,6 +9,7 @@ SRCS_libqpdf = \
|
||||||
libqpdf/BitWriter.cc \
|
libqpdf/BitWriter.cc \
|
||||||
libqpdf/Buffer.cc \
|
libqpdf/Buffer.cc \
|
||||||
libqpdf/BufferInputSource.cc \
|
libqpdf/BufferInputSource.cc \
|
||||||
|
libqpdf/ContentNormalizer.cc \
|
||||||
libqpdf/FileInputSource.cc \
|
libqpdf/FileInputSource.cc \
|
||||||
libqpdf/InputSource.cc \
|
libqpdf/InputSource.cc \
|
||||||
libqpdf/InsecureRandomDataProvider.cc \
|
libqpdf/InsecureRandomDataProvider.cc \
|
||||||
|
|
15
libqpdf/qpdf/ContentNormalizer.hh
Normal file
15
libqpdf/qpdf/ContentNormalizer.hh
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
#ifndef __CONTENTNORMALIZER_HH__
|
||||||
|
#define __CONTENTNORMALIZER_HH__
|
||||||
|
|
||||||
|
#include <qpdf/QPDFObjectHandle.hh>
|
||||||
|
|
||||||
|
class ContentNormalizer: public QPDFObjectHandle::TokenFilter
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
ContentNormalizer();
|
||||||
|
virtual ~ContentNormalizer();
|
||||||
|
virtual void handleToken(QPDFTokenizer::Token const&);
|
||||||
|
virtual void handleEOF();
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif // __CONTENTNORMALIZER_HH__
|
|
@ -4,6 +4,8 @@
|
||||||
#include <qpdf/Pipeline.hh>
|
#include <qpdf/Pipeline.hh>
|
||||||
|
|
||||||
#include <qpdf/QPDFTokenizer.hh>
|
#include <qpdf/QPDFTokenizer.hh>
|
||||||
|
#include <qpdf/PointerHolder.hh>
|
||||||
|
#include <qpdf/QPDFObjectHandle.hh>
|
||||||
|
|
||||||
//
|
//
|
||||||
// Treat incoming text as a stream consisting of valid PDF tokens, but
|
// Treat incoming text as a stream consisting of valid PDF tokens, but
|
||||||
|
@ -16,7 +18,8 @@
|
||||||
class Pl_QPDFTokenizer: public Pipeline
|
class Pl_QPDFTokenizer: public Pipeline
|
||||||
{
|
{
|
||||||
public:
|
public:
|
||||||
Pl_QPDFTokenizer(char const* identifier, Pipeline* next);
|
Pl_QPDFTokenizer(char const* identifier,
|
||||||
|
QPDFObjectHandle::TokenFilter* filter);
|
||||||
virtual ~Pl_QPDFTokenizer();
|
virtual ~Pl_QPDFTokenizer();
|
||||||
virtual void write(unsigned char* buf, size_t len);
|
virtual void write(unsigned char* buf, size_t len);
|
||||||
virtual void finish();
|
virtual void finish();
|
||||||
|
@ -24,14 +27,25 @@ class Pl_QPDFTokenizer: public Pipeline
|
||||||
private:
|
private:
|
||||||
void processChar(char ch);
|
void processChar(char ch);
|
||||||
void checkUnread();
|
void checkUnread();
|
||||||
void writeNext(char const*, size_t len);
|
|
||||||
void writeToken(QPDFTokenizer::Token&);
|
|
||||||
|
|
||||||
QPDFTokenizer tokenizer;
|
class Members
|
||||||
bool just_wrote_nl;
|
{
|
||||||
bool last_char_was_cr;
|
friend class Pl_QPDFTokenizer;
|
||||||
bool unread_char;
|
|
||||||
char char_to_unread;
|
public:
|
||||||
|
~Members();
|
||||||
|
|
||||||
|
private:
|
||||||
|
Members();
|
||||||
|
Members(Members const&);
|
||||||
|
|
||||||
|
QPDFObjectHandle::TokenFilter* filter;
|
||||||
|
QPDFTokenizer tokenizer;
|
||||||
|
bool last_char_was_cr;
|
||||||
|
bool unread_char;
|
||||||
|
char char_to_unread;
|
||||||
|
};
|
||||||
|
PointerHolder<Members> m;
|
||||||
};
|
};
|
||||||
|
|
||||||
#endif // __PL_QPDFTOKENIZER_HH__
|
#endif // __PL_QPDFTOKENIZER_HH__
|
||||||
|
|
|
@ -20,6 +20,7 @@ class QPDF_Stream: public QPDFObject
|
||||||
virtual QPDFObject::object_type_e getTypeCode() const;
|
virtual QPDFObject::object_type_e getTypeCode() const;
|
||||||
virtual char const* getTypeName() const;
|
virtual char const* getTypeName() const;
|
||||||
QPDFObjectHandle getDict() const;
|
QPDFObjectHandle getDict() const;
|
||||||
|
bool isDataModified() const;
|
||||||
|
|
||||||
// See comments in QPDFObjectHandle.hh for these methods.
|
// See comments in QPDFObjectHandle.hh for these methods.
|
||||||
bool pipeStreamData(Pipeline*,
|
bool pipeStreamData(Pipeline*,
|
||||||
|
@ -35,6 +36,8 @@ class QPDF_Stream: public QPDFObject
|
||||||
PointerHolder<QPDFObjectHandle::StreamDataProvider> provider,
|
PointerHolder<QPDFObjectHandle::StreamDataProvider> provider,
|
||||||
QPDFObjectHandle const& filter,
|
QPDFObjectHandle const& filter,
|
||||||
QPDFObjectHandle const& decode_parms);
|
QPDFObjectHandle const& decode_parms);
|
||||||
|
void addTokenFilter(
|
||||||
|
PointerHolder<QPDFObjectHandle::TokenFilter> token_filter);
|
||||||
|
|
||||||
void replaceDict(QPDFObjectHandle new_dict);
|
void replaceDict(QPDFObjectHandle new_dict);
|
||||||
|
|
||||||
|
@ -72,6 +75,8 @@ class QPDF_Stream: public QPDFObject
|
||||||
size_t length;
|
size_t length;
|
||||||
PointerHolder<Buffer> stream_data;
|
PointerHolder<Buffer> stream_data;
|
||||||
PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider;
|
PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider;
|
||||||
|
std::vector<
|
||||||
|
PointerHolder<QPDFObjectHandle::TokenFilter> > token_filters;
|
||||||
};
|
};
|
||||||
|
|
||||||
#endif // __QPDF_STREAM_HH__
|
#endif // __QPDF_STREAM_HH__
|
||||||
|
|
|
@ -756,6 +756,19 @@ $td->runtest("check output",
|
||||||
{$td->FILE => "a.pdf"},
|
{$td->FILE => "a.pdf"},
|
||||||
{$td->FILE => "coalesce-out.pdf"});
|
{$td->FILE => "coalesce-out.pdf"});
|
||||||
|
|
||||||
|
show_ntests();
|
||||||
|
# ----------
|
||||||
|
$td->notify("--- Token filters ---");
|
||||||
|
$n_tests += 2;
|
||||||
|
|
||||||
|
$td->runtest("token filter",
|
||||||
|
{$td->COMMAND => "test_driver 41 coalesce.pdf"},
|
||||||
|
{$td->STRING => "test 41 done\n", $td->EXIT_STATUS => 0},
|
||||||
|
$td->NORMALIZE_NEWLINES);
|
||||||
|
$td->runtest("check output",
|
||||||
|
{$td->FILE => "a.pdf"},
|
||||||
|
{$td->FILE => "token-filters-out.pdf"});
|
||||||
|
|
||||||
show_ntests();
|
show_ntests();
|
||||||
# ----------
|
# ----------
|
||||||
$td->notify("--- Newline before endstream ---");
|
$td->notify("--- Newline before endstream ---");
|
||||||
|
|
171
qpdf/qtest/qpdf/token-filters-out.pdf
Normal file
171
qpdf/qtest/qpdf/token-filters-out.pdf
Normal file
|
@ -0,0 +1,171 @@
|
||||||
|
%PDF-1.3
|
||||||
|
%¿÷¢þ
|
||||||
|
%QDF-1.0
|
||||||
|
|
||||||
|
%% Original object ID: 1 0
|
||||||
|
1 0 obj
|
||||||
|
<<
|
||||||
|
/Pages 2 0 R
|
||||||
|
/Type /Catalog
|
||||||
|
>>
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Original object ID: 2 0
|
||||||
|
2 0 obj
|
||||||
|
<<
|
||||||
|
/Count 2
|
||||||
|
/Kids [
|
||||||
|
3 0 R
|
||||||
|
4 0 R
|
||||||
|
]
|
||||||
|
/Type /Pages
|
||||||
|
>>
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Page 1
|
||||||
|
%% Original object ID: 3 0
|
||||||
|
3 0 obj
|
||||||
|
<<
|
||||||
|
/Contents 5 0 R
|
||||||
|
/MediaBox [
|
||||||
|
0
|
||||||
|
0
|
||||||
|
612
|
||||||
|
792
|
||||||
|
]
|
||||||
|
/Parent 2 0 R
|
||||||
|
/Resources <<
|
||||||
|
/Font <<
|
||||||
|
/F1 7 0 R
|
||||||
|
>>
|
||||||
|
/ProcSet 8 0 R
|
||||||
|
>>
|
||||||
|
/Type /Page
|
||||||
|
>>
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Page 2
|
||||||
|
%% Original object ID: 4 0
|
||||||
|
4 0 obj
|
||||||
|
<<
|
||||||
|
/Contents 9 0 R
|
||||||
|
/MediaBox [
|
||||||
|
0
|
||||||
|
0
|
||||||
|
612
|
||||||
|
792
|
||||||
|
]
|
||||||
|
/Parent 2 0 R
|
||||||
|
/Resources <<
|
||||||
|
/Font <<
|
||||||
|
/F1 11 0 R
|
||||||
|
>>
|
||||||
|
/ProcSet 12 0 R
|
||||||
|
>>
|
||||||
|
/Type /Page
|
||||||
|
>>
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Contents for page 1
|
||||||
|
%% Original object ID: 19 0
|
||||||
|
5 0 obj
|
||||||
|
<<
|
||||||
|
/Length 6 0 R
|
||||||
|
>>
|
||||||
|
stream
|
||||||
|
BT
|
||||||
|
/F1 24 Tf
|
||||||
|
72 720 Td
|
||||||
|
(Salad) Tj
|
||||||
|
ET [ /array/split ] BI
|
||||||
|
/CS /G/W 66/H 47/BPC 8/F/Fl/DP<</Predictor 15/Columns 66>>
|
||||||
|
ID xœÅÖIà P|ÿC;UÈ`ÀÓ7‘Z©¦Ä˜Úæ<C39A>}Dðï_´øÉW©„œÄ-”ˆ>ÿ‡À<E280A1>>”^&®¡uâ]€"!‡•–*¬&<26>E|Sy® ðd-€<<3C>B0Bú@Nê+<hlèKÐî/56L ‰<C2A0>ã £–¹¦>0>Y<>ù!cì\YØ%Yð¥Ö8?& Öëˆ}j’ûè;«<>3<EFBFBD>ÂÖlpÛsHöûtúQØTt*hÌUúãwÍÕÐ%¨)p–³"•DiRj¹–DYNUÓÙAv’Fà&
<0A>ÍÔu#c•ÆW ô߉W“O
|
||||||
|
EI/bye
|
||||||
|
endstream
|
||||||
|
endobj
|
||||||
|
|
||||||
|
6 0 obj
|
||||||
|
375
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Original object ID: 13 0
|
||||||
|
7 0 obj
|
||||||
|
<<
|
||||||
|
/BaseFont /Helvetica
|
||||||
|
/Encoding /WinAnsiEncoding
|
||||||
|
/Name /F1
|
||||||
|
/Subtype /Type1
|
||||||
|
/Type /Font
|
||||||
|
>>
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Original object ID: 14 0
|
||||||
|
8 0 obj
|
||||||
|
[
|
||||||
|
/PDF
|
||||||
|
/Text
|
||||||
|
]
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Contents for page 2
|
||||||
|
%% Original object ID: 15 0
|
||||||
|
9 0 obj
|
||||||
|
<<
|
||||||
|
/Length 10 0 R
|
||||||
|
>>
|
||||||
|
stream
|
||||||
|
BT
|
||||||
|
/F1 24 Tf
|
||||||
|
72 720 Td
|
||||||
|
(Salad) Tj
|
||||||
|
ET
|
||||||
|
/bye
|
||||||
|
endstream
|
||||||
|
endobj
|
||||||
|
|
||||||
|
10 0 obj
|
||||||
|
48
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Original object ID: 17 0
|
||||||
|
11 0 obj
|
||||||
|
<<
|
||||||
|
/BaseFont /Helvetica
|
||||||
|
/Encoding /WinAnsiEncoding
|
||||||
|
/Name /F1
|
||||||
|
/Subtype /Type1
|
||||||
|
/Type /Font
|
||||||
|
>>
|
||||||
|
endobj
|
||||||
|
|
||||||
|
%% Original object ID: 18 0
|
||||||
|
12 0 obj
|
||||||
|
[
|
||||||
|
/PDF
|
||||||
|
/Text
|
||||||
|
]
|
||||||
|
endobj
|
||||||
|
|
||||||
|
xref
|
||||||
|
0 13
|
||||||
|
0000000000 65535 f
|
||||||
|
0000000052 00000 n
|
||||||
|
0000000133 00000 n
|
||||||
|
0000000252 00000 n
|
||||||
|
0000000481 00000 n
|
||||||
|
0000000726 00000 n
|
||||||
|
0000001156 00000 n
|
||||||
|
0000001204 00000 n
|
||||||
|
0000001350 00000 n
|
||||||
|
0000001436 00000 n
|
||||||
|
0000001540 00000 n
|
||||||
|
0000001588 00000 n
|
||||||
|
0000001735 00000 n
|
||||||
|
trailer <<
|
||||||
|
/Root 1 0 R
|
||||||
|
/Size 13
|
||||||
|
/ID [<fa46a90bcf56476b9904a2e7adb75024><31415926535897932384626433832795>]
|
||||||
|
>>
|
||||||
|
startxref
|
||||||
|
1771
|
||||||
|
%%EOF
|
|
@ -97,6 +97,36 @@ ParserCallbacks::handleEOF()
|
||||||
std::cout << "-EOF-" << std::endl;
|
std::cout << "-EOF-" << std::endl;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
class TokenFilter: public QPDFObjectHandle::TokenFilter
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
TokenFilter()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
virtual ~TokenFilter()
|
||||||
|
{
|
||||||
|
}
|
||||||
|
virtual void handleToken(QPDFTokenizer::Token const& t)
|
||||||
|
{
|
||||||
|
if (t == QPDFTokenizer::Token(QPDFTokenizer::tt_string, "Potato"))
|
||||||
|
{
|
||||||
|
// Exercise unparsing of strings by token constructor
|
||||||
|
writeToken(
|
||||||
|
QPDFTokenizer::Token(QPDFTokenizer::tt_string, "Salad"));
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
writeToken(t);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
virtual void handleEOF()
|
||||||
|
{
|
||||||
|
writeToken(QPDFTokenizer::Token(QPDFTokenizer::tt_name, "/bye"));
|
||||||
|
write("\n");
|
||||||
|
finish();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
static std::string getPageContents(QPDFObjectHandle page)
|
static std::string getPageContents(QPDFObjectHandle page)
|
||||||
{
|
{
|
||||||
PointerHolder<Buffer> b1 =
|
PointerHolder<Buffer> b1 =
|
||||||
|
@ -1345,6 +1375,22 @@ void runtest(int n, char const* filename1, char const* arg2)
|
||||||
w.setStaticID(true);
|
w.setStaticID(true);
|
||||||
w.write();
|
w.write();
|
||||||
}
|
}
|
||||||
|
else if (n == 41)
|
||||||
|
{
|
||||||
|
// Apply a token filter. This test case is crafted to work
|
||||||
|
// with coalesce.pdf.
|
||||||
|
std::vector<QPDFObjectHandle> pages = pdf.getAllPages();
|
||||||
|
for (std::vector<QPDFObjectHandle>::iterator iter =
|
||||||
|
pages.begin();
|
||||||
|
iter != pages.end(); ++iter)
|
||||||
|
{
|
||||||
|
(*iter).addContentTokenFilter(new TokenFilter);
|
||||||
|
}
|
||||||
|
QPDFWriter w(pdf, "a.pdf");
|
||||||
|
w.setQDFMode(true);
|
||||||
|
w.setStaticID(true);
|
||||||
|
w.write();
|
||||||
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
throw std::runtime_error(std::string("invalid test ") +
|
throw std::runtime_error(std::string("invalid test ") +
|
||||||
|
|
Loading…
Reference in New Issue
Block a user