Class SubstringTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- com.norconex.importer.handler.transformer.impl.SubstringTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
public class SubstringTransformer extends AbstractCharStreamTransformer implements IXMLConfigurable
Keep a substring of the content matching a begin and end character indexes. Useful when you have to truncate long content, or when you know precisely where is located the text to extract in some files.
The "begin" value is inclusive, while the "end" value is exclusive. Both are optional. When not specified (or a negative value), the index is assumed to be the beginning and end of the content, respectively.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.SubstringTransformer" sourceCharset="(character encoding)" begin="(number)" end="(number)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> </handler>
XML usage example:
<handler class="SubstringTransformer" end="10000"/>
The above example truncates long text to be 10,000 characters maximum.
- Since:
- 2.7.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description SubstringTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
long
getBegin()
long
getEnd()
int
hashCode()
protected void
loadCharStreamTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveCharStreamTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setBegin(long beginIndex)
Sets the beginning index (inclusive).void
setEnd(long endIndex)
Sets the end index (exclusive).String
toString()
protected void
transformTextDocument(HandlerDoc doc, Reader input, Writer output, ParseState parseState)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML
-
-
-
-
Method Detail
-
getBegin
public long getBegin()
-
setBegin
public void setBegin(long beginIndex)
Sets the beginning index (inclusive). A negative value is treated the same as zero.- Parameters:
beginIndex
- beginning index
-
getEnd
public long getEnd()
-
setEnd
public void setEnd(long endIndex)
Sets the end index (exclusive). A negative value is treated as the content end.- Parameters:
endIndex
- end index
-
transformTextDocument
protected void transformTextDocument(HandlerDoc doc, Reader input, Writer output, ParseState parseState) throws ImporterHandlerException
- Specified by:
transformTextDocument
in classAbstractCharStreamTransformer
- Throws:
ImporterHandlerException
-
saveCharStreamTransformerToXML
protected void saveCharStreamTransformerToXML(XML xml)
Description copied from class:AbstractCharStreamTransformer
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveCharStreamTransformerToXML
in classAbstractCharStreamTransformer
- Parameters:
xml
- the XML
-
loadCharStreamTransformerFromXML
protected void loadCharStreamTransformerFromXML(XML xml)
Description copied from class:AbstractCharStreamTransformer
Loads configuration settings specific to the implementing class.- Specified by:
loadCharStreamTransformerFromXML
in classAbstractCharStreamTransformer
- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractCharStreamTransformer
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractCharStreamTransformer
-
toString
public String toString()
- Overrides:
toString
in classAbstractCharStreamTransformer
-
-