Class StripBetweenTransformer

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTransformer

    public class StripBetweenTransformer
    extends AbstractStringTransformer
    implements IXMLConfigurable

    Strips any content found between a matching start and end strings. The matching strings are defined in pairs and multiple ones can be specified at once.

    This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.transformer.impl.StripBetweenTransformer"
        maxReadSize="(max characters to read at once)"
        sourceCharset="(character encoding)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <!-- multiple stripBetween tags allowed -->
      <stripBetween
          inclusive="[false|true]">
        <startMatcher>(expression matching "left" delimiter)</startMatcher>
        <endMatcher>(expression matching "right" delimiter)</endMatcher>
      </stripBetween>
    </handler>

    XML usage example:

    
    <handler
        class="StripBetweenTransformer">
      <stripBetween
          inclusive="true">
        <startMatcher>
          <![CDATA[<!-- SIDENAV_START -->]]>
        </startMatcher>
        <endMatcher>
          <![CDATA[<!-- SIDENAV_END -->]]>
        </endMatcher>
      </stripBetween>
    </handler>

    The following will strip all text between (and including) these two HTML comments: <!-- SIDENAV_START --> and <!-- SIDENAV_END -->.

    Author:
    Pascal Essiembre