Class ReduceConsecutivesTransformer

All Implemented Interfaces:
IXMLConfigurable, IImporterHandler, IDocumentTransformer

public class ReduceConsecutivesTransformer extends AbstractStringTransformer

Reduces specified consecutive characters or strings to only one instance (document content only). If reducing duplicate words, you usually have to add a space at the Beginning or end of the word.

This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.

For more advanced replacement needs, consider using ReplaceTransformer instead.

XML configuration usage:


<handler
    class="com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer"
    ignoreCase="[false|true]"
    maxReadSize="(max characters to read at once)"
    sourceCharset="(character encoding)">
  <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
  <restrictTo>
    <fieldMatcher>(field-matching expression)</fieldMatcher>
    <valueMatcher>(value-matching expression)</valueMatcher>
  </restrictTo>
  <!-- multiple reduce tags allowed -->
  <reduce>(character or string to strip)</reduce>
</handler>

In addition to regular characters, you can specify these special characters in your XML:

  • \r (carriage returns)
  • \n (line feed)
  • \t (tab)
  • \s (space)

XML usage example:


<handler
    class="ReduceConsecutivesTransformer">
  <reduce>\s</reduce>
</handler>

The above example reduces multiple spaces into a single one.

Since:
1.2.0
Author:
Pascal Essiembre
See Also:
  • Constructor Details

    • ReduceConsecutivesTransformer

      public ReduceConsecutivesTransformer()
  • Method Details

    • transformStringContent

      protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
      Specified by:
      transformStringContent in class AbstractStringTransformer
    • getReductions

      public List<String> getReductions()
    • setReductions

      public void setReductions(String... reductions)
    • addReductions

      public void addReductions(String... reductions)
    • isCaseSensitive

      @Deprecated public boolean isCaseSensitive()
      Deprecated.
      Since 3.0.0, use isIgnoreCase().
      Gets whether character matching should be case sensitive or not.
      Returns:
      true if case sensitive.
    • setCaseSensitive

      @Deprecated public void setCaseSensitive(boolean caseSensitive)
      Deprecated.
      Since 3.0.0, use setIgnoreCase(boolean).
      Sets whether to ignore case when matching characters or string to reduce.
      Parameters:
      caseSensitive - true to consider character case
    • isIgnoreCase

      public boolean isIgnoreCase()
      Gets whether to ignore case sensitivity.
      Returns:
      true if ignoring character case
      Since:
      3.0.0
    • setIgnoreCase

      public void setIgnoreCase(boolean ignoreCase)
      Sets whether to ignore case sensitivity.
      Parameters:
      ignoreCase - true if ignoring character case
      Since:
      3.0.0
    • loadStringTransformerFromXML

      protected void loadStringTransformerFromXML(XML xml)
      Description copied from class: AbstractStringTransformer
      Loads configuration settings specific to the implementing class.
      Specified by:
      loadStringTransformerFromXML in class AbstractStringTransformer
      Parameters:
      xml - XML configuration
    • saveStringTransformerToXML

      protected void saveStringTransformerToXML(XML xml)
      Description copied from class: AbstractStringTransformer
      Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.
      Specified by:
      saveStringTransformerToXML in class AbstractStringTransformer
      Parameters:
      xml - the XML
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class AbstractStringTransformer
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class AbstractStringTransformer
    • toString

      public String toString()
      Overrides:
      toString in class AbstractStringTransformer