Class ReduceConsecutivesTransformer
java.lang.Object
com.norconex.importer.handler.AbstractImporterHandler
com.norconex.importer.handler.transformer.AbstractDocumentTransformer
com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
com.norconex.importer.handler.transformer.AbstractStringTransformer
com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTransformer
Reduces specified consecutive characters or strings to only one instance (document content only). If reducing duplicate words, you usually have to add a space at the Beginning or end of the word.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
For more advanced replacement needs, consider using
ReplaceTransformer instead.
XML configuration usage:
<handler
class="com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer"
ignoreCase="[false|true]"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher>(field-matching expression)</fieldMatcher>
<valueMatcher>(value-matching expression)</valueMatcher>
</restrictTo>
<!-- multiple reduce tags allowed -->
<reduce>(character or string to strip)</reduce>
</handler>
In addition to regular characters, you can specify these special characters in your XML:
- \r (carriage returns)
- \n (line feed)
- \t (tab)
- \s (space)
XML usage example:
<handler
class="ReduceConsecutivesTransformer">
<reduce>\s</reduce>
</handler>
The above example reduces multiple spaces into a single one.
- Since:
- 1.2.0
- Author:
- Pascal Essiembre
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddReductions(String... reductions) booleaninthashCode()booleanDeprecated.booleanGets whether to ignore case sensitivity.protected voidLoads configuration settings specific to the implementing class.protected voidSaves configuration settings specific to the implementing class.voidsetCaseSensitive(boolean caseSensitive) Deprecated.Since 3.0.0, usesetIgnoreCase(boolean).voidsetIgnoreCase(boolean ignoreCase) Sets whether to ignore case sensitivity.voidsetReductions(String... reductions) toString()protected voidtransformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) Methods inherited from class com.norconex.importer.handler.transformer.AbstractStringTransformer
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocumentMethods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocumentMethods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocumentMethods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
Constructor Details
-
ReduceConsecutivesTransformer
public ReduceConsecutivesTransformer()
-
-
Method Details
-
transformStringContent
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) - Specified by:
transformStringContentin classAbstractStringTransformer
-
getReductions
-
setReductions
-
addReductions
-
isCaseSensitive
Deprecated.Since 3.0.0, useisIgnoreCase().Gets whether character matching should be case sensitive or not.- Returns:
trueif case sensitive.
-
setCaseSensitive
Deprecated.Since 3.0.0, usesetIgnoreCase(boolean).Sets whether to ignore case when matching characters or string to reduce.- Parameters:
caseSensitive-trueto consider character case
-
isIgnoreCase
public boolean isIgnoreCase()Gets whether to ignore case sensitivity.- Returns:
trueif ignoring character case- Since:
- 3.0.0
-
setIgnoreCase
public void setIgnoreCase(boolean ignoreCase) Sets whether to ignore case sensitivity.- Parameters:
ignoreCase-trueif ignoring character case- Since:
- 3.0.0
-
loadStringTransformerFromXML
Description copied from class:AbstractStringTransformerLoads configuration settings specific to the implementing class.- Specified by:
loadStringTransformerFromXMLin classAbstractStringTransformer- Parameters:
xml- XML configuration
-
saveStringTransformerToXML
Description copied from class:AbstractStringTransformerSaves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTransformerToXMLin classAbstractStringTransformer- Parameters:
xml- the XML
-
equals
- Overrides:
equalsin classAbstractStringTransformer
-
hashCode
public int hashCode()- Overrides:
hashCodein classAbstractStringTransformer
-
toString
- Overrides:
toStringin classAbstractStringTransformer
-
isIgnoreCase().