public class SubstringTransformer extends AbstractCharStreamTransformer implements IXMLConfigurable
Keep a substring of the content matching a begin and end character indexes. Useful when you have to truncate long content, or when you know precisely where is located the text to extract in some files.
The "begin" value is inclusive, while the "end" value is exclusive. Both are optional. When not specified (or a negative value), the index is assumed to be the beginning and end of the content, respectively.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
<transformer class="com.norconex.importer.handler.transformer.impl.SubstringTransformer" begin="(number)" end="(number)"> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> </transformer>
The following truncates long text to be 10,000 characters maximum.
<transformer class="com.norconex.importer.handler.transformer.impl.SubstringTransformer" end="10000"/>
Constructor and Description |
---|
SubstringTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
long |
getBegin() |
long |
getEnd() |
int |
hashCode() |
protected void |
loadCharStreamTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveCharStreamTransformerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setBegin(long beginIndex)
Sets the beginning index (inclusive).
|
void |
setEnd(long endIndex)
Sets the end index (exclusive).
|
String |
toString() |
protected void |
transformTextDocument(String reference,
Reader input,
Writer output,
ImporterMetadata metadata,
boolean parsed) |
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public long getBegin()
public void setBegin(long beginIndex)
beginIndex
- beginning indexpublic long getEnd()
public void setEnd(long endIndex)
endIndex
- end indexprotected void transformTextDocument(String reference, Reader input, Writer output, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
transformTextDocument
in class AbstractCharStreamTransformer
ImporterHandlerException
protected void saveCharStreamTransformerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractCharStreamTransformer
saveCharStreamTransformerToXML
in class AbstractCharStreamTransformer
writer
- the xml writerXMLStreamException
- could not save to XMLprotected void loadCharStreamTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractCharStreamTransformer
loadCharStreamTransformerFromXML
in class AbstractCharStreamTransformer
xml
- xml configurationIOException
- could not load from XMLpublic int hashCode()
hashCode
in class AbstractCharStreamTransformer
public boolean equals(Object other)
equals
in class AbstractCharStreamTransformer
public String toString()
toString
in class AbstractCharStreamTransformer
Copyright © 2009–2021 Norconex Inc.. All rights reserved.