public abstract class AbstractStringFilter extends AbstractCharStreamFilter
Base class to facilitate creating filters based on text content, loading
text into StringBuilder
for memory processing.
Since 2.2.0 this class limits the memory used for content
filtering by reading one section of text at a time. Each
sections are sent for filtering once they are read until a match is found.
No two sections exists in memory at once. Sub-classes should
respect this approach. Each section have a maximum number of characters
equal to the maximum read size defined using setMaxReadSize(int)
.
When none is set, the default read size is defined by
TextReader.DEFAULT_MAX_READ_SIZE
.
An attempt is made to break sections nicely after a paragraph, sentence, or word. When not possible, long text will be cut at a size equal to the maximum read size.
Implementors should be conscious about memory when dealing with the string builder.
Subclasses inherit this IXMLConfigurable
configuration:
<!-- parent tag has these attributes: maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)" onMatch="[include|exclude]" --> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)" > (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
Constructor and Description |
---|
AbstractStringFilter() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
int |
getMaxReadSize()
Gets the maximum number of characters to read for filtering
at once.
|
int |
hashCode() |
protected abstract boolean |
isStringContentMatching(String reference,
StringBuilder content,
ImporterMetadata metadata,
boolean parsed,
int sectionIndex) |
protected boolean |
isTextDocumentMatching(String reference,
Reader input,
ImporterMetadata metadata,
boolean parsed) |
protected void |
loadCharStreamFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected abstract void |
loadStringFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveCharStreamFilterToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
protected abstract void |
saveStringFilterToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read for filtering
at once.
|
String |
toString() |
getSourceCharset, isDocumentMatched, loadFilterFromXML, saveFilterToXML, setSourceCharset
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
protected final boolean isTextDocumentMatching(String reference, Reader input, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
isTextDocumentMatching
in class AbstractCharStreamFilter
ImporterHandlerException
public int getMaxReadSize()
TextReader.DEFAULT_MAX_READ_SIZE
.public void setMaxReadSize(int maxReadSize)
maxReadSize
- maximum read sizeprotected abstract boolean isStringContentMatching(String reference, StringBuilder content, ImporterMetadata metadata, boolean parsed, int sectionIndex) throws ImporterHandlerException
ImporterHandlerException
protected final void saveCharStreamFilterToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractCharStreamFilter
saveCharStreamFilterToXML
in class AbstractCharStreamFilter
writer
- the xml writerXMLStreamException
- could not save to XMLprotected abstract void saveStringFilterToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
writer
- the xml writerXMLStreamException
- could not save to XMLprotected final void loadCharStreamFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractCharStreamFilter
loadCharStreamFilterFromXML
in class AbstractCharStreamFilter
xml
- xml configurationIOException
- could not load from XMLprotected abstract void loadStringFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
xml
- xml configurationIOException
- could not load from XMLpublic boolean equals(Object obj)
equals
in class AbstractCharStreamFilter
public int hashCode()
hashCode
in class AbstractCharStreamFilter
public String toString()
toString
in class AbstractCharStreamFilter
Copyright © 2009–2021 Norconex Inc.. All rights reserved.