public class ReduceConsecutivesTransformer extends AbstractStringTransformer
Reduces specified consecutive characters or strings to only one instance (document content only). If reducing duplicate words, you usually have to add a space at the Beginning or end of the word.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
For more advanced replacement needs, consider using
ReplaceTransformer
instead.
<handler
class="com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer"
ignoreCase="[false|true]"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<!-- multiple reduce tags allowed -->
<reduce>(character or string to strip)</reduce>
</handler>
In addition to regular characters, you can specify these special characters in your XML:
<handler
class="ReduceConsecutivesTransformer">
<reduce>\s</reduce>
</handler>
The above example reduces multiple spaces into a single one.
ReplaceTransformer
Constructor and Description |
---|
ReduceConsecutivesTransformer() |
Modifier and Type | Method and Description |
---|---|
void |
addReductions(String... reductions) |
boolean |
equals(Object other) |
List<String> |
getReductions() |
int |
hashCode() |
boolean |
isCaseSensitive()
Deprecated.
Since 3.0.0, use
isIgnoreCase() . |
boolean |
isIgnoreCase()
Gets whether to ignore case sensitivity.
|
protected void |
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setCaseSensitive(boolean caseSensitive)
Deprecated.
Since 3.0.0, use
setIgnoreCase(boolean) . |
void |
setIgnoreCase(boolean ignoreCase)
Sets whether to ignore case sensitivity.
|
void |
setReductions(String... reductions) |
String |
toString() |
protected void |
transformStringContent(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
transformStringContent
in class AbstractStringTransformer
public void setReductions(String... reductions)
public void addReductions(String... reductions)
@Deprecated public boolean isCaseSensitive()
isIgnoreCase()
.true
if case sensitive.@Deprecated public void setCaseSensitive(boolean caseSensitive)
setIgnoreCase(boolean)
.caseSensitive
- true
to consider character casepublic boolean isIgnoreCase()
true
if ignoring character casepublic void setIgnoreCase(boolean ignoreCase)
ignoreCase
- true
if ignoring character caseprotected void loadStringTransformerFromXML(XML xml)
AbstractStringTransformer
loadStringTransformerFromXML
in class AbstractStringTransformer
xml
- XML configurationprotected void saveStringTransformerToXML(XML xml)
AbstractStringTransformer
saveStringTransformerToXML
in class AbstractStringTransformer
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractStringTransformer
public int hashCode()
hashCode
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009–2023 Norconex Inc.. All rights reserved.