Class AbstractCharStreamTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
- Direct Known Subclasses:
AbstractStringTagger
,CountMatchesTagger
,SplitTagger
,TextStatisticsTagger
,URLExtractorTagger
public abstract class AbstractCharStreamTagger extends AbstractDocumentTagger
Base class for taggers dealing with the body of text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers restricted to text documents only (see
AbstractImporterHandler
).Since 2.5.0, when used as a pre-parse handler, this class attempts to detect the content character encoding unless the character encoding was specified using
setSourceCharset(String)
. Since document parsing converts content to UTF-8, UTF-8 is always assumed when used as a post-parse handler.XML configuration usage:
sourceCharset="(character encoding)"
Subclasses inherit the above
IXMLConfigurable
attribute(s), in addition to <restrictTo>.- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractCharStreamTagger()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getSourceCharset()
Gets the assumed source character encoding.int
hashCode()
protected abstract void
loadCharStreamTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected abstract void
saveCharStreamTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.protected void
tagApplicableDocument(HandlerDoc doc, InputStream input, ParseState parseState)
protected abstract void
tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getSourceCharset
public String getSourceCharset()
Gets the assumed source character encoding.- Returns:
- character encoding of the source to be transformed
- Since:
- 2.5.0
-
setSourceCharset
public void setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.- Parameters:
sourceCharset
- character encoding of the source to be transformed- Since:
- 2.5.0
-
tagApplicableDocument
protected final void tagApplicableDocument(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocument
in classAbstractDocumentTagger
- Throws:
ImporterHandlerException
-
tagTextDocument
protected abstract void tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
- Throws:
ImporterHandlerException
-
saveHandlerToXML
protected final void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
saveCharStreamTaggerToXML
protected abstract void saveCharStreamTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Parameters:
xml
- the XML
-
loadHandlerFromXML
protected final void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
loadCharStreamTaggerFromXML
protected abstract void loadCharStreamTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml
- xml configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-