public abstract class AbstractCharStreamFilter extends AbstractDocumentFilter
Base class for filters dealing with the body of text documents only.
Subclasses can safely be used as either pre-parse or post-parse handlers
restricted to text documents only (see AbstractImporterHandler
).
Since 2.5.0, when used as a pre-parse handler,
this class attempts to detect the content character
encoding unless the character encoding
was specified using setSourceCharset(String)
. Since document
parsing converts content to UTF-8, UTF-8 is always assumed when
used as a post-parse handler.
Subclasses inherit this IXMLConfigurable
configuration:
<!-- parent tag has these attribute: sourceCharset="(character encoding)" onMatch="[include|exclude]" --> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
Constructor and Description |
---|
AbstractCharStreamFilter() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
String |
getSourceCharset()
Gets the assumed source character encoding.
|
int |
hashCode() |
protected boolean |
isDocumentMatched(String reference,
InputStream input,
ImporterMetadata metadata,
boolean parsed) |
protected abstract boolean |
isTextDocumentMatching(String reference,
Reader input,
ImporterMetadata metadata,
boolean parsed) |
protected abstract void |
loadCharStreamFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
loadFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) |
protected abstract void |
saveCharStreamFilterToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
protected void |
saveFilterToXML(EnhancedXMLStreamWriter writer) |
void |
setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.
|
String |
toString() |
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public String getSourceCharset()
public void setSourceCharset(String sourceCharset)
sourceCharset
- character encoding of the source to be transformedprotected final boolean isDocumentMatched(String reference, InputStream input, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
isDocumentMatched
in class AbstractDocumentFilter
ImporterHandlerException
protected abstract boolean isTextDocumentMatching(String reference, Reader input, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
ImporterHandlerException
protected final void saveFilterToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
saveFilterToXML
in class AbstractDocumentFilter
XMLStreamException
protected abstract void saveCharStreamFilterToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
writer
- the xml writerXMLStreamException
- could not save to XMLprotected final void loadFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
loadFilterFromXML
in class AbstractDocumentFilter
IOException
protected abstract void loadCharStreamFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
xml
- xml configurationIOException
- could not load from XMLpublic boolean equals(Object obj)
equals
in class AbstractDocumentFilter
public int hashCode()
hashCode
in class AbstractDocumentFilter
public String toString()
toString
in class AbstractDocumentFilter
Copyright © 2009–2021 Norconex Inc.. All rights reserved.