public class SubstringTransformer extends AbstractCharStreamTransformer implements IXMLConfigurable
Keep a substring of the content matching a begin and end character indexes. Useful when you have to truncate long content, or when you know precisely where is located the text to extract in some files.
The "begin" value is inclusive, while the "end" value is exclusive. Both are optional. When not specified (or a negative value), the index is assumed to be the beginning and end of the content, respectively.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
<handler
class="com.norconex.importer.handler.transformer.impl.SubstringTransformer"
sourceCharset="(character encoding)"
begin="(number)"
end="(number)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
</handler>
<handler
class="SubstringTransformer"
end="10000"/>
The above example truncates long text to be 10,000 characters maximum.
Constructor and Description |
---|
SubstringTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
long |
getBegin() |
long |
getEnd() |
int |
hashCode() |
protected void |
loadCharStreamTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveCharStreamTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setBegin(long beginIndex)
Sets the beginning index (inclusive).
|
void |
setEnd(long endIndex)
Sets the end index (exclusive).
|
String |
toString() |
protected void |
transformTextDocument(HandlerDoc doc,
Reader input,
Writer output,
ParseState parseState) |
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public long getBegin()
public void setBegin(long beginIndex)
beginIndex
- beginning indexpublic long getEnd()
public void setEnd(long endIndex)
endIndex
- end indexprotected void transformTextDocument(HandlerDoc doc, Reader input, Writer output, ParseState parseState) throws ImporterHandlerException
transformTextDocument
in class AbstractCharStreamTransformer
ImporterHandlerException
protected void saveCharStreamTransformerToXML(XML xml)
AbstractCharStreamTransformer
saveCharStreamTransformerToXML
in class AbstractCharStreamTransformer
xml
- the XMLprotected void loadCharStreamTransformerFromXML(XML xml)
AbstractCharStreamTransformer
loadCharStreamTransformerFromXML
in class AbstractCharStreamTransformer
xml
- XML configurationpublic boolean equals(Object other)
equals
in class AbstractCharStreamTransformer
public int hashCode()
hashCode
in class AbstractCharStreamTransformer
public String toString()
toString
in class AbstractCharStreamTransformer
Copyright © 2009–2023 Norconex Inc.. All rights reserved.