Class AbstractStringFilter

  • All Implemented Interfaces:
    IXMLConfigurable, IDocumentFilter, IOnMatchFilter, IImporterHandler
    Direct Known Subclasses:
    RegexContentFilter, ScriptFilter, TextFilter

    public abstract class AbstractStringFilter
    extends AbstractCharStreamFilter

    Base class to facilitate creating filters based on text content, loading text into StringBuilder for memory processing.

    Since 2.2.0 this class limits the memory used for content filtering by reading one section of text at a time. Each sections are sent for filtering once they are read until a match is found. No two sections exists in memory at once. Sub-classes should respect this approach. Each section have a maximum number of characters equal to the maximum read size defined using setMaxReadSize(int). When none is set, the default read size is defined by TextReader.DEFAULT_MAX_READ_SIZE.

    An attempt is made to break sections nicely after a paragraph, sentence, or word. When not possible, long text will be cut at a size equal to the maximum read size.

    Since 3.0.0 the isStringContentMatching(HandlerDoc, StringBuilder, ParseState, int) method is invoked at least once, even if there is no content. This gives subclasses a chance to act on metadata even if there is no content.

    Implementors should be conscious about memory when dealing with the string builder.

    XML configuration usage:

    
    maxReadSize="(max characters to read at once)"
       sourceCharset="(character encoding)"
      onMatch="[include|exclude]"

    Subclasses inherit the above IXMLConfigurable attribute(s), in addition to <restrictTo>.

    Author:
    Pascal Essiembre