Class AbstractCharStreamTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
- Direct Known Subclasses:
AbstractStringTransformer
,SubstringTransformer
public abstract class AbstractCharStreamTransformer extends AbstractDocumentTransformer
Base class for transformers dealing with text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers restricted to text documents only (see
AbstractImporterHandler
).Sub-classes can restrict to which document to apply this transformation based on document metadata (see
AbstractImporterHandler
).Since 2.5.0, when used as a pre-parse handler, this class attempts to detect the content character encoding unless the character encoding was specified using
setSourceCharset(String)
. If the character set cannot be established, UTF-8 is assumed. Since document parsing converts content to UTF-8, UTF-8 is always assumed when used as a post-parse handler.XML configuration usage:
sourceCharset="(character encoding)"
Subclasses inherit the above
IXMLConfigurable
attribute(s), in addition to <restrictTo>.- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractCharStreamTransformer()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getSourceCharset()
Gets the assumed source character encoding.int
hashCode()
protected abstract void
loadCharStreamTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected abstract void
saveCharStreamTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.String
toString()
protected void
transformApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState)
protected abstract void
transformTextDocument(HandlerDoc doc, Reader input, Writer output, ParseState parseState)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getSourceCharset
public String getSourceCharset()
Gets the assumed source character encoding.- Returns:
- character encoding of the source to be transformed
- Since:
- 2.5.0
-
setSourceCharset
public void setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.- Parameters:
sourceCharset
- character encoding of the source to be transformed- Since:
- 2.5.0
-
transformApplicableDocument
protected final void transformApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState) throws ImporterHandlerException
- Specified by:
transformApplicableDocument
in classAbstractDocumentTransformer
- Throws:
ImporterHandlerException
-
transformTextDocument
protected abstract void transformTextDocument(HandlerDoc doc, Reader input, Writer output, ParseState parseState) throws ImporterHandlerException
- Throws:
ImporterHandlerException
-
saveHandlerToXML
protected final void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
saveCharStreamTransformerToXML
protected abstract void saveCharStreamTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Parameters:
xml
- the XML
-
loadHandlerFromXML
protected final void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
loadCharStreamTransformerFromXML
protected abstract void loadCharStreamTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-