Class ReduceConsecutivesTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- com.norconex.importer.handler.transformer.AbstractStringTransformer
-
- com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
public class ReduceConsecutivesTransformer extends AbstractStringTransformer
Reduces specified consecutive characters or strings to only one instance (document content only). If reducing duplicate words, you usually have to add a space at the Beginning or end of the word.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
For more advanced replacement needs, consider using
ReplaceTransformer
instead.XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer" ignoreCase="[false|true]" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <!-- multiple reduce tags allowed --> <reduce>(character or string to strip)</reduce> </handler>
In addition to regular characters, you can specify these special characters in your XML:
- \r (carriage returns)
- \n (line feed)
- \t (tab)
- \s (space)
XML usage example:
<handler class="ReduceConsecutivesTransformer"> <reduce>\s</reduce> </handler>
The above example reduces multiple spaces into a single one.
- Since:
- 1.2.0
- Author:
- Pascal Essiembre
- See Also:
ReplaceTransformer
-
-
Constructor Summary
Constructors Constructor Description ReduceConsecutivesTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addReductions(String... reductions)
boolean
equals(Object other)
List<String>
getReductions()
int
hashCode()
boolean
isCaseSensitive()
Deprecated.Since 3.0.0, useisIgnoreCase()
.boolean
isIgnoreCase()
Gets whether to ignore case sensitivity.protected void
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setCaseSensitive(boolean caseSensitive)
Deprecated.Since 3.0.0, usesetIgnoreCase(boolean)
.void
setIgnoreCase(boolean ignoreCase)
Sets whether to ignore case sensitivity.void
setReductions(String... reductions)
String
toString()
protected void
transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractStringTransformer
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
transformStringContent
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
- Specified by:
transformStringContent
in classAbstractStringTransformer
-
setReductions
public void setReductions(String... reductions)
-
addReductions
public void addReductions(String... reductions)
-
isCaseSensitive
@Deprecated public boolean isCaseSensitive()
Deprecated.Since 3.0.0, useisIgnoreCase()
.Gets whether character matching should be case sensitive or not.- Returns:
true
if case sensitive.
-
setCaseSensitive
@Deprecated public void setCaseSensitive(boolean caseSensitive)
Deprecated.Since 3.0.0, usesetIgnoreCase(boolean)
.Sets whether to ignore case when matching characters or string to reduce.- Parameters:
caseSensitive
-true
to consider character case
-
isIgnoreCase
public boolean isIgnoreCase()
Gets whether to ignore case sensitivity.- Returns:
true
if ignoring character case- Since:
- 3.0.0
-
setIgnoreCase
public void setIgnoreCase(boolean ignoreCase)
Sets whether to ignore case sensitivity.- Parameters:
ignoreCase
-true
if ignoring character case- Since:
- 3.0.0
-
loadStringTransformerFromXML
protected void loadStringTransformerFromXML(XML xml)
Description copied from class:AbstractStringTransformer
Loads configuration settings specific to the implementing class.- Specified by:
loadStringTransformerFromXML
in classAbstractStringTransformer
- Parameters:
xml
- XML configuration
-
saveStringTransformerToXML
protected void saveStringTransformerToXML(XML xml)
Description copied from class:AbstractStringTransformer
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTransformerToXML
in classAbstractStringTransformer
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringTransformer
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringTransformer
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringTransformer
-
-