Class ReduceConsecutivesTransformer

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTransformer

    public class ReduceConsecutivesTransformer
    extends AbstractStringTransformer

    Reduces specified consecutive characters or strings to only one instance (document content only). If reducing duplicate words, you usually have to add a space at the Beginning or end of the word.

    This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.

    For more advanced replacement needs, consider using ReplaceTransformer instead.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer"
        ignoreCase="[false|true]"
        maxReadSize="(max characters to read at once)"
        sourceCharset="(character encoding)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <!-- multiple reduce tags allowed -->
      <reduce>(character or string to strip)</reduce>
    </handler>

    In addition to regular characters, you can specify these special characters in your XML:

    • \r (carriage returns)
    • \n (line feed)
    • \t (tab)
    • \s (space)

    XML usage example:

    
    <handler
        class="ReduceConsecutivesTransformer">
      <reduce>\s</reduce>
    </handler>

    The above example reduces multiple spaces into a single one.

    Since:
    1.2.0
    Author:
    Pascal Essiembre
    See Also:
    ReplaceTransformer
    • Constructor Detail

      • ReduceConsecutivesTransformer

        public ReduceConsecutivesTransformer()
    • Method Detail

      • getReductions

        public List<String> getReductions()
      • setReductions

        public void setReductions​(String... reductions)
      • addReductions

        public void addReductions​(String... reductions)
      • isCaseSensitive

        @Deprecated
        public boolean isCaseSensitive()
        Deprecated.
        Since 3.0.0, use isIgnoreCase().
        Gets whether character matching should be case sensitive or not.
        Returns:
        true if case sensitive.
      • setCaseSensitive

        @Deprecated
        public void setCaseSensitive​(boolean caseSensitive)
        Deprecated.
        Since 3.0.0, use setIgnoreCase(boolean).
        Sets whether to ignore case when matching characters or string to reduce.
        Parameters:
        caseSensitive - true to consider character case
      • isIgnoreCase

        public boolean isIgnoreCase()
        Gets whether to ignore case sensitivity.
        Returns:
        true if ignoring character case
        Since:
        3.0.0
      • setIgnoreCase

        public void setIgnoreCase​(boolean ignoreCase)
        Sets whether to ignore case sensitivity.
        Parameters:
        ignoreCase - true if ignoring character case
        Since:
        3.0.0
      • saveStringTransformerToXML

        protected void saveStringTransformerToXML​(XML xml)
        Description copied from class: AbstractStringTransformer
        Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.
        Specified by:
        saveStringTransformerToXML in class AbstractStringTransformer
        Parameters:
        xml - the XML