public abstract class AbstractCharStreamTransformer extends AbstractDocumentTransformer
Base class for transformers dealing with text documents only.
Subclasses can safely be used as either pre-parse or post-parse handlers
restricted to text documents only (see AbstractImporterHandler
).
Sub-classes can restrict to which document to apply this transformation
based on document metadata (see AbstractImporterHandler
).
Since 2.5.0, when used as a pre-parse handler,
this class attempts to detect the content character
encoding unless the character encoding
was specified using setSourceCharset(String)
. If the character
set cannot be established, UTF-8 is assumed.
Since document
parsing converts content to UTF-8, UTF-8 is always assumed when
used as a post-parse handler.
Subclasses implementing IXMLConfigurable
should allow this inner
configuration:
<!-- parent tag has these attribute: sourceCharset="(character encoding)" --> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
Constructor and Description |
---|
AbstractCharStreamTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
String |
getSourceCharset()
Gets the assumed source character encoding.
|
int |
hashCode() |
protected abstract void |
loadCharStreamTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected abstract void |
saveCharStreamTransformerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setSourceCharset(String sourceCharset)
Sets the assumed source character encoding.
|
String |
toString() |
protected void |
transformApplicableDocument(String reference,
InputStream input,
OutputStream output,
ImporterMetadata metadata,
boolean parsed) |
protected abstract void |
transformTextDocument(String reference,
Reader input,
Writer output,
ImporterMetadata metadata,
boolean parsed) |
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public String getSourceCharset()
public void setSourceCharset(String sourceCharset)
sourceCharset
- character encoding of the source to be transformedprotected final void transformApplicableDocument(String reference, InputStream input, OutputStream output, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
transformApplicableDocument
in class AbstractDocumentTransformer
ImporterHandlerException
protected abstract void transformTextDocument(String reference, Reader input, Writer output, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
ImporterHandlerException
protected final void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLprotected abstract void saveCharStreamTransformerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
writer
- the xml writerXMLStreamException
- could not save to XMLprotected final void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected abstract void loadCharStreamTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
xml
- xml configurationIOException
- could not load from XMLpublic boolean equals(Object obj)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.