Class AbstractCharStreamFilter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.filter.AbstractDocumentFilter
-
- com.norconex.importer.handler.filter.AbstractCharStreamFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
- Direct Known Subclasses:
AbstractStringFilter
public abstract class AbstractCharStreamFilter extends AbstractDocumentFilter
Base class for filters dealing with the body of text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers restricted to text documents only (see
AbstractImporterHandler
).When used as a pre-parse handler, this class uses the detected or previously set content character encoding unless the character encoding was specified using
setSourceCharset(String)
. Since document parsing converts content to UTF-8, UTF-8 is always assumed when used as a post-parse handler.XML configuration usage:
sourceCharset="(character encoding)" onMatch="[include|exclude]"
Subclasses inherit the above
IXMLConfigurable
attribute(s), in addition to <restrictTo>.- Since:
- 2.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractCharStreamFilter()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getSourceCharset()
Gets the assumed source character encoding.int
hashCode()
protected boolean
isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState)
protected abstract boolean
isTextDocumentMatching(HandlerDoc doc, Reader input, ParseState parseState)
protected abstract void
loadCharStreamFilterFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
loadFilterFromXML(XML xml)
protected abstract void
saveCharStreamFilterToXML(XML xml)
Saves configuration settings specific to the implementing class.protected void
saveFilterToXML(XML xml)
void
setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.String
toString()
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractDocumentFilter
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getSourceCharset
public String getSourceCharset()
Gets the assumed source character encoding.- Returns:
- character encoding of the source to be transformed
- Since:
- 2.5.0
-
setSourceCharset
public void setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.- Parameters:
sourceCharset
- character encoding of the source to be transformed- Since:
- 2.5.0
-
isDocumentMatched
protected final boolean isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
- Specified by:
isDocumentMatched
in classAbstractDocumentFilter
- Throws:
ImporterHandlerException
-
isTextDocumentMatching
protected abstract boolean isTextDocumentMatching(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
- Throws:
ImporterHandlerException
-
saveFilterToXML
protected final void saveFilterToXML(XML xml)
- Specified by:
saveFilterToXML
in classAbstractDocumentFilter
-
saveCharStreamFilterToXML
protected abstract void saveCharStreamFilterToXML(XML xml)
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Parameters:
xml
- the XML
-
loadFilterFromXML
protected final void loadFilterFromXML(XML xml)
- Specified by:
loadFilterFromXML
in classAbstractDocumentFilter
-
loadCharStreamFilterFromXML
protected abstract void loadCharStreamFilterFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractDocumentFilter
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractDocumentFilter
-
toString
public String toString()
- Overrides:
toString
in classAbstractDocumentFilter
-
-