Class TextFilter

  • All Implemented Interfaces:
    IXMLConfigurable, IDocumentFilter, IOnMatchFilter, IImporterHandler

    public class TextFilter
    extends AbstractStringFilter

    Filters a document based on a text pattern in a document content (default), or matching fields specified. When used on very large content, it is possible the pattern matching will be done in chunks, sometimes not achieving expected results. Consider using AbstractCharStreamFilter if this is a concern. Refer to AbstractDocumentFilter for the inclusion/exclusion logic.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.filter.impl.RegexContentFilter"
        maxReadSize="(max characters to read at once)"
        sourceCharset="(character encoding)"
        onMatch="[include|exclude]">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <fieldMatcher>
        (Optional expression of field to match. Omit to use document content.)
      </fieldMatcher>
      <valueMatcher>(expression of value to match)</valueMatcher>
    </handler>

    XML usage example:

    
    <handler
        class="TextFilter"
        onMatch="include">
      <valueMatcher>apple</valueMatcher>
    </handler>
    Since:
    3.0.0
    Author:
    Pascal Essiembre