Class EmptyFilter

  • All Implemented Interfaces:
    IXMLConfigurable, IDocumentFilter, IOnMatchFilter, IImporterHandler

    public class EmptyFilter
    extends AbstractDocumentFilter

    Accepts or rejects a document based on whether its content (default) or any of the specified metadata fields are empty or not. For metadata fields, control characters (char <= 32) are removed before evaluating whether their values are empty.

    Filtering on multiple fields:

    It is important to note that when your field matcher expression matches more than one field, only one of the matched fields needs to be empty to trigger a match. If no fields are matched, it is also considered empty. If you expect some fields to be present and they are not, they will not be evaluated thus are not considered empty. To make sure a multiple fields are tested properly, used multiple instances of EmptyFilter, with field matching matching only one field in each.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.filter.impl.EmptyFilter"
        onMatch="[include|exclude]">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <fieldMatcher>
        (optional expression matching fields we want to test for emptiness)
      </fieldMatcher>
    </handler>

    XML usage example:

    
    <handler
        class="EmptyFilter"
        onMatch="exclude">
      <fieldMatcher
          method="regex">
        (title|dc:title)
      </fieldMatcher>
    </handler>

    The above example excludes documents without titles.

    Since:
    3.0.0
    Author:
    Pascal Essiembre