Class SubstringTransformer

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTransformer

    public class SubstringTransformer
    extends AbstractCharStreamTransformer
    implements IXMLConfigurable

    Keep a substring of the content matching a begin and end character indexes. Useful when you have to truncate long content, or when you know precisely where is located the text to extract in some files.

    The "begin" value is inclusive, while the "end" value is exclusive. Both are optional. When not specified (or a negative value), the index is assumed to be the beginning and end of the content, respectively.

    This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.transformer.impl.SubstringTransformer"
        sourceCharset="(character encoding)"
        begin="(number)"
        end="(number)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
    </handler>

    XML usage example:

    
    <handler
        class="SubstringTransformer"
        end="10000"/>

    The above example truncates long text to be 10,000 characters maximum.

    Since:
    2.7.0
    Author:
    Pascal Essiembre