Class KeepOnlyTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class KeepOnlyTagger
    extends AbstractDocumentTagger

    Keep only the metadata fields provided, delete all other ones. Exact field names (case-insensitive) to keep can be provided as well as a regular expression that matches one or many fields (since 2.1.0).

    Note: Unless you have good reasons for doing otherwise, it is recommended to use this handler as one of the last ones to be executed. This is a good practice to ensure all metadata fields are available to other handlers that may require them even if they are not otherwise required.

    Can be used both as a pre-parse or post-parse handler.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.KeepOnlyTagger">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <fieldMatcher>(one or more matching fields to keep)</fieldMatcher>
    </handler>

    XML usage example:

    
    <handler
        class="KeepOnlyTagger">
      <fieldMatcher
          method="regex">
        (title|description)
      </fieldMatcher>
    </handler>

    The above example keeps only the title and description fields from all extracted fields.

    Author:
    Pascal Essiembre
    See Also:
    Pattern