Class AbstractCharStreamTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTagger
- Direct Known Subclasses:
AbstractStringTagger,CountMatchesTagger,SplitTagger,TextStatisticsTagger,URLExtractorTagger
public abstract class AbstractCharStreamTagger extends AbstractDocumentTagger
Base class for taggers dealing with the body of text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers restricted to text documents only (see
AbstractImporterHandler).Since 2.5.0, when used as a pre-parse handler, this class attempts to detect the content character encoding unless the character encoding was specified using
setSourceCharset(String). Since document parsing converts content to UTF-8, UTF-8 is always assumed when used as a post-parse handler.XML configuration usage:
sourceCharset="(character encoding)"Subclasses inherit the above
IXMLConfigurableattribute(s), in addition to <restrictTo>.- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractCharStreamTagger()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description booleanequals(Object other)StringgetSourceCharset()Gets the assumed source character encoding.inthashCode()protected abstract voidloadCharStreamTaggerFromXML(XML xml)Loads configuration settings specific to the implementing class.protected voidloadHandlerFromXML(XML xml)Loads configuration settings specific to the implementing class.protected abstract voidsaveCharStreamTaggerToXML(XML xml)Saves configuration settings specific to the implementing class.protected voidsaveHandlerToXML(XML xml)Saves configuration settings specific to the implementing class.voidsetSourceCharset(String sourceCharset)Sets the assumed source character encoding.protected voidtagApplicableDocument(HandlerDoc doc, InputStream input, ParseState parseState)protected abstract voidtagTextDocument(HandlerDoc doc, Reader input, ParseState parseState)StringtoString()-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getSourceCharset
public String getSourceCharset()
Gets the assumed source character encoding.- Returns:
- character encoding of the source to be transformed
- Since:
- 2.5.0
-
setSourceCharset
public void setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.- Parameters:
sourceCharset- character encoding of the source to be transformed- Since:
- 2.5.0
-
tagApplicableDocument
protected final void tagApplicableDocument(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocumentin classAbstractDocumentTagger- Throws:
ImporterHandlerException
-
tagTextDocument
protected abstract void tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
- Throws:
ImporterHandlerException
-
saveHandlerToXML
protected final void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandlerSaves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXMLin classAbstractImporterHandler- Parameters:
xml- the XML
-
saveCharStreamTaggerToXML
protected abstract void saveCharStreamTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Parameters:
xml- the XML
-
loadHandlerFromXML
protected final void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandlerLoads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXMLin classAbstractImporterHandler- Parameters:
xml- XML configuration
-
loadCharStreamTaggerFromXML
protected abstract void loadCharStreamTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml- xml configuration
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toStringin classAbstractImporterHandler
-
-