Implement TokenFilter and refactor Pl_QPDFTokenizer

Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a general filter that passes data through a TokenFilter.
2024-05-29 08:20:53 +00:00 · 2018-02-02 18:21:34 -05:00 · 2018-02-02 18:21:34 -05:00 · 9910104442
commit 9910104442
parent b8723e97f4
16 changed files with 635 additions and 119 deletions
--- a/43
+++ b/43
@ -107,6 +107,49 @@
 	applications that use page-level APIs in QPDFObjectHandle to be
 	more tolerant of certain types of damaged files.
 	* Add QPDFObjectHandle::TokenFilter class and methods to use it to
 	perform lexical filtering on content streams. You can call
 	QPDFObjectHandle::addTokenFilter on stream object, or you can call
 	the higher level QPDFObjectHandle::addContentTokenFilter on a page
 	object to cause the stream's contents to passed through a token
 	filter while being retrieved by QPDFWriter or any other consumer.
 	For details on using TokenFilter, please see comments in
 	QPDFObjectHandle.hh.
 	* Enhance the string, type QPDFTokenizer::Token constructor to
 	initialize a raw value in addition to a value. Tokens have a
 	value, which is a canonical representation, and a raw value. For
 	all tokens except strings and names, the raw value and the value
 	are the same. For strings, the value excludes the outer delimiters
 	and has non-printing characters normalized. For names, the value
 	resolves non-printing characters. In order to better facilitate
 	token filters that mostly preserve contents and to enable
 	developers to be mostly unconcerned about the nuances of token
 	values and raw values, creating string and name tokens now
 	properly handles this subtlety of values and raw values. When
 	constructing string tokens, take care to avoid passing in the
 	outer delimiters. This has always been the case, but it is now
 	clarified in comments in QPDFObjectHandle.hh::TokenFilter. This
 	has no impact on any existing code unless there's some code
 	somewhere that was relying on Token::getRawValue() returning an
 	empty string for a manually constructed token. The token class's
 	operator== method still only looks at type and value, not raw
 	value. For example, string tokens for <41> and (A) would still be
 	equal because both are representations of the string "A".
 	* Add QPDFObjectHandle::isDataModified method. This method just
 	returns true if addTokenFilter has been called on the stream. It
 	enables a caller to determine whether it is safe to optimize away
 	piping of stream data in cases where the input and output are
 	expected to be the same. QPDFWriter uses this internally to skip
 	the optimization of not re-compressing already compressed streams
 	if addTokenFilter has been called. Most developers will not have
 	to worry about this as it is used internally in the library in the
 	places that need it. If you are manually retrieving stream data
 	with QPDFObjectHandle::getStreamData or
 	QPDFObjectHandle::pipeStreamData, you don't need to worry about
 	this at all.
 2018-02-04  Jay Berkenbilt  <ejb@ql.org>
 	* Add QPDFWriter::setLinearizationPass1Filename method and
--- a/include/qpdf/QPDFObjectHandle.hh
+++ b/include/qpdf/QPDFObjectHandle.hh
@ -35,6 +35,7 @@
 #include <qpdf/PointerHolder.hh>
 #include <qpdf/Buffer.hh>
 #include <qpdf/InputSource.hh>
 #include <qpdf/QPDFTokenizer.hh>
 #include <qpdf/QPDFObject.hh>
@ -76,6 +77,66 @@ class QPDFObjectHandle
 				       Pipeline* pipeline) = 0;
    };
    // The TokenFilter class provides a way to filter content streams
    // in a lexically aware fashion. TokenFilters can be attached to
    // streams using the addTokenFilter or addContentTokenFilter
    // methods. The handleToken method is called for each token,
    // including the eof token, and then handleEOF is called at the
    // very end. Handlers may call write (or writeToken) to pass data
    // downstream. The finish() method must be called exactly one time
    // to ensure that any written data is flushed out. The default
    // handleEOF calls finish. If you override handleEOF, you must
    // ensure that finish() is called either there or in response to
    // whatever event causes you to terminate creation of output.
    // Failure to call finish() may result in some of the data you
    // have written being lost. You should not rely on a destructor
    // for calling finish() since the destructor call may occur later
    // than you expect. Please see examples/token-filters.cc for
    // examples of using TokenFilters.
    //
    // Please note that when you call token.getValue() on a token of
    // type tt_string, you get the string value without any
    // delimiters. token.getRawValue() will return something suitable
    // for being written to output, or calling writeToken with a
    // string token will also work. The correct way to construct a
    // string token that would write the literal value (str) is
    // QPDFTokenizer::Token(QPDFTokenizer::tt_string, "str").
    class TokenFilter
    {
      public:
        QPDF_DLL
        TokenFilter()
        {
        }
        QPDF_DLL
        virtual ~TokenFilter()
        {
        }
        virtual void handleToken(QPDFTokenizer::Token const&) = 0;
        virtual void handleEOF()
        {
            // If you override handleEOF, you must be sure to call
            // finish().
            finish();
        }
        // This is called internally by the qpdf library.
        void setPipeline(Pipeline*);
      protected:
        QPDF_DLL
        void write(char const* data, size_t len);
        QPDF_DLL
        void write(std::string const& str);
        QPDF_DLL
        void writeToken(QPDFTokenizer::Token const&);
        QPDF_DLL
        void finish();
      private:
        Pipeline* pipeline;
    };
    // This class is used by parse to decrypt strings when reading an
    // object that contains encrypted strings.
    class StringDecrypter
@ -223,6 +284,23 @@ class QPDFObjectHandle
    static void parseContentStream(QPDFObjectHandle stream_or_array,
                                   ParserCallbacks* callbacks);
    // Attach a token filter to a page's contents. If the page's
    // contents is an array of streams, it is automatically coalesced.
    // The token filter is applied to the page's contents as a single
    // stream.
    QPDF_DLL
    void addContentTokenFilter(PointerHolder<TokenFilter> token_filter);
    // As of qpdf 8, it is possible to add custom token filters to a
    // stream. The tokenized stream data is passed through the token
    // filter after all original filters but before content stream
    // normalization if requested. This is a low-level interface to
    // add it to a stream. You will usually want to call
    // addContentTokenFilter instead, which can be applied to a page
    // object, and which will automatically handle the case of pages
    // whose contents are split across multiple streams.
    void addTokenFilter(PointerHolder<TokenFilter> token_filter);
    // Type-specific factories
    QPDF_DLL
    static QPDFObjectHandle newNull();
@ -414,6 +492,13 @@ class QPDFObjectHandle
    QPDF_DLL
    QPDFObjectHandle getDict();
    // If addTokenFilter has been called for this stream, then the
    // original data should be considered to be modified. This means we
    // should avoid optimizations such as not filtering a stream that
    // is already compressed.
    QPDF_DLL
    bool isDataModified();
    // Returns filtered (uncompressed) stream data.  Throws an
    // exception if the stream is filtered and we can't decode it.
    QPDF_DLL
@ -608,7 +693,7 @@ class QPDFObjectHandle
    // stream or an array of streams. If this page's content is an
    // array, concatenate the streams into a single stream. This can
    // be useful when working with files that split content streams in
-    // arbitary spots, such as in the middle of a token, as that can
+    // arbitrary spots, such as in the middle of a token, as that can
    // confuse some software. You could also call this after calling
    // addPageContents.
    QPDF_DLL
--- a/include/qpdf/QPDFTokenizer.hh
+++ b/include/qpdf/QPDFTokenizer.hh
@ -62,13 +62,8 @@ class QPDFTokenizer
    {
      public:
 	Token() : type(tt_bad) {}
-
+        QPDF_DLL
-	Token(token_type_e type, std::string const& value) :
+	Token(token_type_e type, std::string const& value);
 	    type(type),
 	    value(value)
 	{
 	}
 	Token(token_type_e type, std::string const& value,
 	      std::string raw_value, std::string error_message) :
 	    type(type),
@ -93,7 +88,7 @@ class QPDFTokenizer
 	{
 	    return this->error_message;
 	}
-	bool operator==(Token const& rhs)
+	bool operator==(Token const& rhs) const
 	{
 	    // Ignore fields other than type and value
 	    return ((this->type != tt_bad) &&
--- a/libqpdf/ContentNormalizer.cc
+++ b/libqpdf/ContentNormalizer.cc
@ -0,0 +1,77 @@
 #include <qpdf/ContentNormalizer.hh>
 #include <qpdf/QUtil.hh>
 ContentNormalizer::ContentNormalizer()
 {
 }
 ContentNormalizer::~ContentNormalizer()
 {
 }
 void
 ContentNormalizer::handleToken(QPDFTokenizer::Token const& token)
 {
    std::string value = token.getRawValue();
    QPDFTokenizer::token_type_e token_type = token.getType();
    switch (token_type)
    {
      case QPDFTokenizer::tt_space:
        {
            size_t len = value.length();
            for (size_t i = 0; i < len; ++i)
            {
                char ch = value.at(i);
                if (ch == '\r')
                {
                    if ((i + 1 < len) && (value.at(i + 1) == '\n'))
                    {
                        // ignore
                    }
                    else
                    {
                        write("\n");
                    }
                }
                else
                {
                    write(&ch, 1);
                }
            }
        }
        break;
      case QPDFTokenizer::tt_string:
        // Replacing string and name tokens in this way normalizes
        // their representation as this will automatically handle
        // quoting of unprintable characters, etc.
        writeToken(QPDFTokenizer::Token(
                       QPDFTokenizer::tt_string, token.getValue()));
 	break;
      case QPDFTokenizer::tt_name:
        writeToken(QPDFTokenizer::Token(
                       QPDFTokenizer::tt_name, token.getValue()));
 	break;
      default:
        writeToken(token);
 	break;
    }
    value = token.getRawValue();
    if (((token_type == QPDFTokenizer::tt_string) ||
         (token_type == QPDFTokenizer::tt_name)) &&
        ((value.find('\r') != std::string::npos) ||
         (value.find('\n') != std::string::npos)))
    {
        write("\n");
    }
 }
 void
 ContentNormalizer::handleEOF()
 {
    finish();
 }
--- a/libqpdf/Pl_QPDFTokenizer.cc
+++ b/libqpdf/Pl_QPDFTokenizer.cc
@ -1,107 +1,51 @@
 #include <qpdf/Pl_QPDFTokenizer.hh>
 #include <qpdf/QPDF_String.hh>
 #include <qpdf/QPDF_Name.hh>
 #include <qpdf/QTC.hh>
 #include <qpdf/QUtil.hh>
 #include <stdexcept>
 #include <string.h>
-Pl_QPDFTokenizer::Pl_QPDFTokenizer(char const* identifier, Pipeline* next) :
+Pl_QPDFTokenizer::Members::Members() :
-    Pipeline(identifier, next),
+    filter(0),
    just_wrote_nl(false),
    last_char_was_cr(false),
    unread_char(false),
    char_to_unread('\0')
 {
-    tokenizer.allowEOF();
+}
-    tokenizer.includeIgnorable();
+
 Pl_QPDFTokenizer::Members::~Members()
 {
 }
 Pl_QPDFTokenizer::Pl_QPDFTokenizer(
    char const* identifier,
    QPDFObjectHandle::TokenFilter* filter)
    :
    Pipeline(identifier, 0),
    m(new Members)
 {
    m->filter = filter;
    m->tokenizer.allowEOF();
    m->tokenizer.includeIgnorable();
 }
 Pl_QPDFTokenizer::~Pl_QPDFTokenizer()
 {
 }
 void
 Pl_QPDFTokenizer::writeNext(char const* buf, size_t len)
 {
    if (len)
    {
 	getNext()->write(QUtil::unsigned_char_pointer(buf), len);
 	this->just_wrote_nl = (buf[len-1] == '\n');
    }
 }
 void
 Pl_QPDFTokenizer::writeToken(QPDFTokenizer::Token& token)
 {
    std::string value = token.getRawValue();
    switch (token.getType())
    {
      case QPDFTokenizer::tt_space:
        {
            size_t len = value.length();
            for (size_t i = 0; i < len; ++i)
            {
                char ch = value.at(i);
                if (ch == '\r')
                {
                    if ((i + 1 < len) && (value.at(i + 1) == '\n'))
                    {
                        // ignore
                    }
                    else
                    {
                        writeNext("\n", 1);
                    }
                }
                else
                {
                    writeNext(&ch, 1);
                }
            }
        }
        value.clear();
        break;
      case QPDFTokenizer::tt_string:
 	value = QPDF_String(token.getValue()).unparse();
 	break;
      case QPDFTokenizer::tt_name:
 	value = QPDF_Name(token.getValue()).unparse();
 	break;
      default:
 	break;
    }
    writeNext(value.c_str(), value.length());
 }
 void
 Pl_QPDFTokenizer::processChar(char ch)
 {
-    tokenizer.presentCharacter(ch);
+    this->m->tokenizer.presentCharacter(ch);
    QPDFTokenizer::Token token;
-    if (tokenizer.getToken(token, this->unread_char, this->char_to_unread))
+    if (this->m->tokenizer.getToken(
            token, this->m->unread_char, this->m->char_to_unread))
    {
-	writeToken(token);
+	this->m->filter->handleToken(token);
-        std::string value = token.getRawValue();
+        if ((token.getType() == QPDFTokenizer::tt_word) &&
-        QPDFTokenizer::token_type_e token_type = token.getType();
+            (token.getValue() == "ID"))
        if (((token_type == QPDFTokenizer::tt_string) ||
             (token_type == QPDFTokenizer::tt_name)) &&
            ((value.find('\r') != std::string::npos) ||
             (value.find('\n') != std::string::npos)))
        {
 	    writeNext("\n", 1);
 	}
 	if ((token.getType() == QPDFTokenizer::tt_word) &&
 	    (token.getValue() == "ID"))
 	{
            QTC::TC("qpdf", "Pl_QPDFTokenizer found ID");
-            tokenizer.expectInlineImage();
+            this->m->tokenizer.expectInlineImage();
-	}
+        }
    }
 }
@ -109,10 +53,10 @@ Pl_QPDFTokenizer::processChar(char ch)
 void
 Pl_QPDFTokenizer::checkUnread()
 {
-    if (this->unread_char)
+    if (this->m->unread_char)
    {
-	processChar(this->char_to_unread);
+	processChar(this->m->char_to_unread);
-	if (this->unread_char)
+	if (this->m->unread_char)
 	{
 	    throw std::logic_error(
 		"INTERNAL ERROR: unread_char still true after processing "
@ -135,20 +79,13 @@ Pl_QPDFTokenizer::write(unsigned char* buf, size_t len)
 void
 Pl_QPDFTokenizer::finish()
 {
-    this->tokenizer.presentEOF();
+    this->m->tokenizer.presentEOF();
    QPDFTokenizer::Token token;
-    if (tokenizer.getToken(token, this->unread_char, this->char_to_unread))
+    if (this->m->tokenizer.getToken(
            token, this->m->unread_char, this->m->char_to_unread))
    {
-        writeToken(token);
+	this->m->filter->handleToken(token);
        if (unread_char)
        {
            if (this->char_to_unread == '\r')
            {
                this->char_to_unread = '\n';
            }
            writeNext(&this->char_to_unread, 1);
        }
    }
-    getNext()->finish();
+    this->m->filter->handleEOF();
 }
--- a/libqpdf/QPDFObjectHandle.cc
+++ b/libqpdf/QPDFObjectHandle.cc
@ -62,6 +62,50 @@ CoalesceProvider::provideStreamData(int, int, Pipeline* p)
    concat.manualFinish();
 }
 void
 QPDFObjectHandle::TokenFilter::setPipeline(Pipeline* p)
 {
    this->pipeline = p;
 }
 void
 QPDFObjectHandle::TokenFilter::write(char const* data, size_t len)
 {
    if (! this->pipeline)
    {
        throw std::logic_error(
            "TokenFilter::write called before setPipeline");
    }
    if (len)
    {
 	this->pipeline->write(QUtil::unsigned_char_pointer(data), len);
    }
 }
 void
 QPDFObjectHandle::TokenFilter::write(std::string const& str)
 {
    write(str.c_str(), str.length());
 }
 void
 QPDFObjectHandle::TokenFilter::writeToken(QPDFTokenizer::Token const& token)
 {
    std::string value = token.getRawValue();
    write(value.c_str(), value.length());
 }
 void
 QPDFObjectHandle::TokenFilter::finish()
 {
    if (! this->pipeline)
    {
        throw std::logic_error(
            "TokenFilter::finish called before setPipeline");
    }
    this->pipeline->finish();
 }
 void
 QPDFObjectHandle::ParserCallbacks::terminateParsing()
 {
@ -508,6 +552,13 @@ QPDFObjectHandle::getDict()
    return dynamic_cast<QPDF_Stream*>(obj.getPointer())->getDict();
 }
 bool
 QPDFObjectHandle::isDataModified()
 {
    assertStream();
    return dynamic_cast<QPDF_Stream*>(obj.getPointer())->isDataModified();
 }
 void
 QPDFObjectHandle::replaceDict(QPDFObjectHandle new_dict)
 {
@ -1033,6 +1084,21 @@ QPDFObjectHandle::parseContentStream_data(
    }
 }
 void
 QPDFObjectHandle::addContentTokenFilter(PointerHolder<TokenFilter> filter)
 {
    coalesceContentStreams();
    this->getKey("/Contents").addTokenFilter(filter);
 }
 void
 QPDFObjectHandle::addTokenFilter(PointerHolder<TokenFilter> filter)
 {
    assertStream();
    return dynamic_cast<QPDF_Stream*>(
        obj.getPointer())->addTokenFilter(filter);
 }
 QPDFObjectHandle
 QPDFObjectHandle::parse(PointerHolder<InputSource> input,
                        std::string const& object_description,
--- a/libqpdf/QPDFTokenizer.cc
+++ b/libqpdf/QPDFTokenizer.cc
@ -7,6 +7,7 @@
 #include <qpdf/QTC.hh>
 #include <qpdf/QPDFExc.hh>
 #include <qpdf/QUtil.hh>
 #include <qpdf/QPDFObjectHandle.hh>
 #include <stdexcept>
 #include <string.h>
@ -39,6 +40,23 @@ QPDFTokenizer::Members::~Members()
 {
 }
 QPDFTokenizer::Token::Token(token_type_e type, std::string const& value) :
    type(type),
    value(value),
    raw_value(value)
 {
    if (type == tt_string)
    {
        raw_value = QPDFObjectHandle::newString(value).unparse();
    }
    else if (type == tt_string)
    {
        raw_value = QPDFObjectHandle::newName(value).unparse();
    }
 }
 QPDFTokenizer::QPDFTokenizer() :
    m(new Members())
 {
--- a/libqpdf/QPDFWriter.cc
+++ b/libqpdf/QPDFWriter.cc
@ -1591,7 +1591,8 @@ QPDFWriter::unparseObject(QPDFObjectHandle object, int level,
 	{
 	    is_metadata = true;
 	}
-	bool filter = (this->m->compress_streams ||
+	bool filter = (object.isDataModified() ||
                       this->m->compress_streams ||
                       this->m->stream_decode_level);
 	if (this->m->compress_streams)
 	{
@ -1602,7 +1603,8 @@ QPDFWriter::unparseObject(QPDFObjectHandle object, int level,
 	    // compressed with a lossy compression scheme, but we
 	    // don't support any of those right now.
 	    QPDFObjectHandle filter_obj = stream_dict.getKey("/Filter");
-	    if (filter_obj.isName() &&
+            if ((! object.isDataModified()) &&
                filter_obj.isName() &&
 		((filter_obj.getName() == "/FlateDecode") ||
 		 (filter_obj.getName() == "/Fl")))
 	    {
--- a/libqpdf/QPDF_Stream.cc
+++ b/libqpdf/QPDF_Stream.cc
@ -13,7 +13,7 @@
 #include <qpdf/Pl_RunLength.hh>
 #include <qpdf/Pl_DCT.hh>
 #include <qpdf/Pl_Count.hh>
-
+#include <qpdf/ContentNormalizer.hh>
 #include <qpdf/QTC.hh>
 #include <qpdf/QPDF.hh>
 #include <qpdf/QPDFExc.hh>
@ -91,6 +91,12 @@ QPDF_Stream::getDict() const
    return this->stream_dict;
 }
 bool
 QPDF_Stream::isDataModified() const
 {
    return (! this->token_filters.empty());
 }
 PointerHolder<Buffer>
 QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
 {
@ -440,21 +446,36 @@ QPDF_Stream::pipeStreamData(Pipeline* pipeline,
    // create to be deleted when this function finishes.
    std::vector<PointerHolder<Pipeline> > to_delete;
    PointerHolder<ContentNormalizer> normalizer;
    if (filter)
    {
 	if (encode_flags & qpdf_ef_compress)
 	{
-	    pipeline = new Pl_Flate("compress object stream", pipeline,
+	    pipeline = new Pl_Flate("compress stream", pipeline,
 				    Pl_Flate::a_deflate);
 	    to_delete.push_back(pipeline);
 	}
 	if (encode_flags & qpdf_ef_normalize)
 	{
-	    pipeline = new Pl_QPDFTokenizer("normalizer", pipeline);
+            normalizer = new ContentNormalizer();
            normalizer->setPipeline(pipeline);
 	    pipeline = new Pl_QPDFTokenizer(
                "normalizer", normalizer.getPointer());
 	    to_delete.push_back(pipeline);
 	}
        for (std::vector<PointerHolder<
                 QPDFObjectHandle::TokenFilter> >::reverse_iterator iter =
                 this->token_filters.rbegin();
             iter != this->token_filters.rend(); ++iter)
        {
            (*iter)->setPipeline(pipeline);
            pipeline = new Pl_QPDFTokenizer(
                "token filter", (*iter).getPointer());
            to_delete.push_back(pipeline);
        }
 	for (std::vector<std::string>::reverse_iterator iter = filters.rbegin();
 	     iter != filters.rend(); ++iter)
 	{
@ -612,6 +633,13 @@ QPDF_Stream::replaceStreamData(
    replaceFilterData(filter, decode_parms, 0);
 }
 void
 QPDF_Stream::addTokenFilter(
    PointerHolder<QPDFObjectHandle::TokenFilter> token_filter)
 {
    this->token_filters.push_back(token_filter);
 }
 void
 QPDF_Stream::replaceFilterData(QPDFObjectHandle const& filter,
 			       QPDFObjectHandle const& decode_parms,
--- a/libqpdf/build.mk
+++ b/libqpdf/build.mk
@ -9,6 +9,7 @@ SRCS_libqpdf = \
 	libqpdf/BitWriter.cc \
 	libqpdf/Buffer.cc \
 	libqpdf/BufferInputSource.cc \
 	libqpdf/ContentNormalizer.cc \
 	libqpdf/FileInputSource.cc \
 	libqpdf/InputSource.cc \
 	libqpdf/InsecureRandomDataProvider.cc \
--- a/libqpdf/qpdf/ContentNormalizer.hh
+++ b/libqpdf/qpdf/ContentNormalizer.hh
@ -0,0 +1,15 @@
 #ifndef __CONTENTNORMALIZER_HH__
 #define __CONTENTNORMALIZER_HH__
 #include <qpdf/QPDFObjectHandle.hh>
 class ContentNormalizer: public QPDFObjectHandle::TokenFilter
 {
  public:
    ContentNormalizer();
    virtual ~ContentNormalizer();
    virtual void handleToken(QPDFTokenizer::Token const&);
    virtual void handleEOF();
 };
 #endif // __CONTENTNORMALIZER_HH__
--- a/libqpdf/qpdf/Pl_QPDFTokenizer.hh
+++ b/libqpdf/qpdf/Pl_QPDFTokenizer.hh
@ -4,6 +4,8 @@
 #include <qpdf/Pipeline.hh>
 #include <qpdf/QPDFTokenizer.hh>
 #include <qpdf/PointerHolder.hh>
 #include <qpdf/QPDFObjectHandle.hh>
 //
 // Treat incoming text as a stream consisting of valid PDF tokens, but
@ -16,7 +18,8 @@
 class Pl_QPDFTokenizer: public Pipeline
 {
  public:
-    Pl_QPDFTokenizer(char const* identifier, Pipeline* next);
+    Pl_QPDFTokenizer(char const* identifier,
                     QPDFObjectHandle::TokenFilter* filter);
    virtual ~Pl_QPDFTokenizer();
    virtual void write(unsigned char* buf, size_t len);
    virtual void finish();
@ -24,14 +27,25 @@ class Pl_QPDFTokenizer: public Pipeline
  private:
    void processChar(char ch);
    void checkUnread();
    void writeNext(char const*, size_t len);
    void writeToken(QPDFTokenizer::Token&);
-    QPDFTokenizer tokenizer;
+    class Members
-    bool just_wrote_nl;
+    {
-    bool last_char_was_cr;
+        friend class Pl_QPDFTokenizer;
-    bool unread_char;
+
-    char char_to_unread;
+      public:
        ~Members();
      private:
        Members();
        Members(Members const&);
        QPDFObjectHandle::TokenFilter* filter;
        QPDFTokenizer tokenizer;
        bool last_char_was_cr;
        bool unread_char;
        char char_to_unread;
    };
    PointerHolder<Members> m;
 };
 #endif // __PL_QPDFTOKENIZER_HH__
--- a/libqpdf/qpdf/QPDF_Stream.hh
+++ b/libqpdf/qpdf/QPDF_Stream.hh
@ -20,6 +20,7 @@ class QPDF_Stream: public QPDFObject
    virtual QPDFObject::object_type_e getTypeCode() const;
    virtual char const* getTypeName() const;
    QPDFObjectHandle getDict() const;
    bool isDataModified() const;
    // See comments in QPDFObjectHandle.hh for these methods.
    bool pipeStreamData(Pipeline*,
@ -35,6 +36,8 @@ class QPDF_Stream: public QPDFObject
 	PointerHolder<QPDFObjectHandle::StreamDataProvider> provider,
 	QPDFObjectHandle const& filter,
 	QPDFObjectHandle const& decode_parms);
    void addTokenFilter(
        PointerHolder<QPDFObjectHandle::TokenFilter> token_filter);
    void replaceDict(QPDFObjectHandle new_dict);
@ -72,6 +75,8 @@ class QPDF_Stream: public QPDFObject
    size_t length;
    PointerHolder<Buffer> stream_data;
    PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider;
    std::vector<
        PointerHolder<QPDFObjectHandle::TokenFilter> > token_filters;
 };
 #endif // __QPDF_STREAM_HH__
--- a/qpdf/qtest/qpdf.test
+++ b/qpdf/qtest/qpdf.test
@ -756,6 +756,19 @@ $td->runtest("check output",
             {$td->FILE => "a.pdf"},
             {$td->FILE => "coalesce-out.pdf"});
 show_ntests();
 # ----------
 $td->notify("--- Token filters ---");
 $n_tests += 2;
 $td->runtest("token filter",
             {$td->COMMAND => "test_driver 41 coalesce.pdf"},
             {$td->STRING => "test 41 done\n", $td->EXIT_STATUS => 0},
             $td->NORMALIZE_NEWLINES);
 $td->runtest("check output",
             {$td->FILE => "a.pdf"},
             {$td->FILE => "token-filters-out.pdf"});
 show_ntests();
 # ----------
 $td->notify("--- Newline before endstream ---");
--- a/qpdf/qtest/qpdf/token-filters-out.pdf
+++ b/qpdf/qtest/qpdf/token-filters-out.pdf
@ -0,0 +1,171 @@
 %PDF-1.3
 %¿÷¢þ
 %QDF-1.0
 %% Original object ID: 1 0
 1 0 obj
 <<
  /Pages 2 0 R
  /Type /Catalog
 >>
 endobj
 %% Original object ID: 2 0
 2 0 obj
 <<
  /Count 2
  /Kids [
    3 0 R
    4 0 R
  ]
  /Type /Pages
 >>
 endobj
 %% Page 1
 %% Original object ID: 3 0
 3 0 obj
 <<
  /Contents 5 0 R
  /MediaBox [
    0
    0
    612
    792
  ]
  /Parent 2 0 R
  /Resources <<
    /Font <<
      /F1 7 0 R
    >>
    /ProcSet 8 0 R
  >>
  /Type /Page
 >>
 endobj
 %% Page 2
 %% Original object ID: 4 0
 4 0 obj
 <<
  /Contents 9 0 R
  /MediaBox [
    0
    0
    612
    792
  ]
  /Parent 2 0 R
  /Resources <<
    /Font <<
      /F1 11 0 R
    >>
    /ProcSet 12 0 R
  >>
  /Type /Page
 >>
 endobj
 %% Contents for page 1
 %% Original object ID: 19 0
 5 0 obj
 <<
  /Length 6 0 R
 >>
 stream
 BT
  /F1 24 Tf
  72 720 Td
  (Salad) Tj
 ET [ /array/split ] BI
 /CS /G/W 66/H 47/BPC 8/F/Fl/DP<</Predictor 15/Columns 66>>
 ID xœÅÖIÃ P|ÿC;UÈ`ÀÓ7‘Z©¦Ä˜Úæ<C39A>}Dðï_´øÉW©„œÄ-”ˆ>ÿ‡À<E280A1>>”^&®¡uâ]€"!‡•–*¬&<26>E|Sy® ðd-€<<3C>B0Bú@Nê+<hlèKÐî/56L ‰<C2A0>ã £–¹¦>0>Y<>ù!cì\YØ%Yð¥Ö8?& Öëˆ}j’ûè;«<>3<EFBFBD>ÂÖlpÛsHöûtúQØTt*hÌUúãwÍÕÐ%¨)p–³"•DiRj¹–DYNUÓÙAv’Fà&
<0A>ÍÔu#c•ÆW	ôß‰W“O
 EI/bye
 endstream
 endobj
 6 0 obj
 375
 endobj
 %% Original object ID: 13 0
 7 0 obj
 <<
  /BaseFont /Helvetica
  /Encoding /WinAnsiEncoding
  /Name /F1
  /Subtype /Type1
  /Type /Font
 >>
 endobj
 %% Original object ID: 14 0
 8 0 obj
 [
  /PDF
  /Text
 ]
 endobj
 %% Contents for page 2
 %% Original object ID: 15 0
 9 0 obj
 <<
  /Length 10 0 R
 >>
 stream
 BT
  /F1 24 Tf
  72 720 Td
  (Salad) Tj
 ET
 /bye
 endstream
 endobj
 10 0 obj
 48
 endobj
 %% Original object ID: 17 0
 11 0 obj
 <<
  /BaseFont /Helvetica
  /Encoding /WinAnsiEncoding
  /Name /F1
  /Subtype /Type1
  /Type /Font
 >>
 endobj
 %% Original object ID: 18 0
 12 0 obj
 [
  /PDF
  /Text
 ]
 endobj
 xref
 0 13
 0000000000 65535 f 
 0000000052 00000 n 
 0000000133 00000 n 
 0000000252 00000 n 
 0000000481 00000 n 
 0000000726 00000 n 
 0000001156 00000 n 
 0000001204 00000 n 
 0000001350 00000 n 
 0000001436 00000 n 
 0000001540 00000 n 
 0000001588 00000 n 
 0000001735 00000 n 
 trailer <<
  /Root 1 0 R
  /Size 13
  /ID [<fa46a90bcf56476b9904a2e7adb75024><31415926535897932384626433832795>]
 >>
 startxref
 1771
 %%EOF
--- a/qpdf/test_driver.cc
+++ b/qpdf/test_driver.cc
@ -97,6 +97,36 @@ ParserCallbacks::handleEOF()
    std::cout << "-EOF-" << std::endl;
 }
 class TokenFilter: public QPDFObjectHandle::TokenFilter
 {
  public:
    TokenFilter()
    {
    }
    virtual ~TokenFilter()
    {
    }
    virtual void handleToken(QPDFTokenizer::Token const& t)
    {
        if (t == QPDFTokenizer::Token(QPDFTokenizer::tt_string, "Potato"))
        {
            // Exercise unparsing of strings by token constructor
            writeToken(
                QPDFTokenizer::Token(QPDFTokenizer::tt_string, "Salad"));
        }
        else
        {
            writeToken(t);
        }
    }
    virtual void handleEOF()
    {
        writeToken(QPDFTokenizer::Token(QPDFTokenizer::tt_name, "/bye"));
        write("\n");
        finish();
    }
 };
 static std::string getPageContents(QPDFObjectHandle page)
 {
    PointerHolder<Buffer> b1 =
@ -1345,6 +1375,22 @@ void runtest(int n, char const* filename1, char const* arg2)
        w.setStaticID(true);
        w.write();
    }
    else if (n == 41)
    {
        // Apply a token filter. This test case is crafted to work
        // with coalesce.pdf.
        std::vector<QPDFObjectHandle> pages = pdf.getAllPages();
        for (std::vector<QPDFObjectHandle>::iterator iter =
                 pages.begin();
             iter != pages.end(); ++iter)
        {
            (*iter).addContentTokenFilter(new TokenFilter);
        }
        QPDFWriter w(pdf, "a.pdf");
 	w.setQDFMode(true);
        w.setStaticID(true);
        w.write();
    }
    else
    {
 	throw std::runtime_error(std::string("invalid test ") +