public abstract class AbstractCharStreamTransformer extends AbstractDocumentTransformer
Base class for transformers dealing with text documents only.
Subclasses can safely be used as either pre-parse or post-parse handlers
restricted to text documents only (see AbstractImporterHandler
).
Sub-classes can restrict to which document to apply this transformation
based on document metadata (see AbstractImporterHandler
).
Since 2.5.0, when used as a pre-parse handler,
this class attempts to detect the content character
encoding unless the character encoding
was specified using setSourceCharset(String)
. If the character
set cannot be established, UTF-8 is assumed.
Since document
parsing converts content to UTF-8, UTF-8 is always assumed when
used as a post-parse handler.
sourceCharset="(character encoding)"
Subclasses inherit the above IXMLConfigurable
attribute(s),
in addition to
<restrictTo>.
Constructor and Description |
---|
AbstractCharStreamTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getSourceCharset()
Gets the assumed source character encoding.
|
int |
hashCode() |
protected abstract void |
loadCharStreamTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected abstract void |
saveCharStreamTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.
|
String |
toString() |
protected void |
transformApplicableDocument(HandlerDoc doc,
InputStream input,
OutputStream output,
ParseState parseState) |
protected abstract void |
transformTextDocument(HandlerDoc doc,
Reader input,
Writer output,
ParseState parseState) |
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public String getSourceCharset()
public void setSourceCharset(String sourceCharset)
sourceCharset
- character encoding of the source to be transformedprotected final void transformApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState) throws ImporterHandlerException
transformApplicableDocument
in class AbstractDocumentTransformer
ImporterHandlerException
protected abstract void transformTextDocument(HandlerDoc doc, Reader input, Writer output, ParseState parseState) throws ImporterHandlerException
ImporterHandlerException
protected final void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLprotected abstract void saveCharStreamTransformerToXML(XML xml)
xml
- the XMLprotected final void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected abstract void loadCharStreamTransformerFromXML(XML xml)
xml
- XML configurationpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.