public abstract class AbstractStringCondition extends AbstractCharStreamCondition
Base class to facilitate creating conditions based on text content,
loading text into StringBuilder
for memory processing.
This class limits the memory used for content
filtering by reading one section of text at a time. Each
sections are sent for filtering once they are read until the condition
is met.
No two sections exists in memory at once. Sub-classes should
respect this approach. Each section have a maximum number of characters
equal to the maximum read size defined using setMaxReadSize(int)
.
When none is set, the default read size is defined by
TextReader.DEFAULT_MAX_READ_SIZE
.
An attempt is made to break sections nicely after a paragraph, sentence, or word. When not possible, long text will be cut at a size equal to the maximum read size.
The testDocument(HandlerDoc, String, ParseState, int)
method is invoked at least once, even if there is no content. This gives
subclasses a chance to act on metadata even if there is no content.
Implementors should be conscious about memory when dealing with the string builder.
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)"
Subclasses inherit the above IXMLConfigurable
attribute(s).
AbstractStringFilter
)Constructor and Description |
---|
AbstractStringCondition() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
int |
getMaxReadSize()
Gets the maximum number of characters to read for filtering
at once.
|
int |
hashCode() |
protected void |
loadCharStreamConditionFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected abstract void |
loadStringConditionFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveCharStreamConditionToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
protected abstract void |
saveStringConditionToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read for filtering
at once.
|
protected boolean |
testDocument(HandlerDoc doc,
Reader input,
ParseState parseState) |
protected abstract boolean |
testDocument(HandlerDoc doc,
String input,
ParseState parseState,
int sectionIndex) |
String |
toString() |
getSourceCharset, loadFromXML, saveToXML, setSourceCharset, testDocument
public int getMaxReadSize()
TextReader.DEFAULT_MAX_READ_SIZE
.public void setMaxReadSize(int maxReadSize)
maxReadSize
- maximum read sizeprotected final boolean testDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
testDocument
in class AbstractCharStreamCondition
ImporterHandlerException
protected abstract boolean testDocument(HandlerDoc doc, String input, ParseState parseState, int sectionIndex) throws ImporterHandlerException
ImporterHandlerException
protected final void loadCharStreamConditionFromXML(XML xml)
AbstractCharStreamCondition
loadCharStreamConditionFromXML
in class AbstractCharStreamCondition
xml
- XML configurationprotected final void saveCharStreamConditionToXML(XML xml)
AbstractCharStreamCondition
saveCharStreamConditionToXML
in class AbstractCharStreamCondition
xml
- the XMLprotected abstract void saveStringConditionToXML(XML xml)
xml
- the XMLprotected abstract void loadStringConditionFromXML(XML xml)
xml
- XML configurationpublic boolean equals(Object other)
equals
in class AbstractCharStreamCondition
public int hashCode()
hashCode
in class AbstractCharStreamCondition
public String toString()
toString
in class AbstractCharStreamCondition
Copyright © 2009–2023 Norconex Inc.. All rights reserved.