Class MergeTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class MergeTagger
    extends AbstractDocumentTagger

    Merge multiple metadata fields into a single one.

    Use fromFields to list all fields to merge, separated by commas. Use fromFieldsRegex to match fields to merge using a regular expression. Both fromFields and fromFieldsRegex can be used together. Matching fields from both will be combined, in the order provided/matched, starting with fromFields entries.

    Unless singleValue is set to true, each value will be added to the target field, making it a multi-value field. If singleValue is set to true, all values will be combined into one string, optionally separated by the singleValueSeparator. Single values will be constructed without any separator if none are specified.

    You can optionally decide do delete source fields after they were merged by setting deleteFromFields to true.

    The target field can be one of the "from" fields. In such case its content will be replaced with the result of the merge (it will not be deleted even if deleteFromFields is true).

    If only a single source field is specified or found, it will be copied to the target field and its multi-values will still be merged to a single one if configured to do so. In such cases, this class can become an alternative to using ForceSingleValueTagger with a "mergeWith" action.

    Can be used both as a pre-parse or post-parse handler.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.MergeTagger">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <!-- multiple merge tags allowed -->
      <merge
          toField="(name of target field for merged values)"
          deleteFromFields="[false|true]"
          singleValue="[false|true]"
          singleValueSeparator="(text joining multiple-values)">
        <fieldMatcher>(one or more matching fields to merge)</fieldMatcher>
      </merge>
    </handler>

    XML usage example:

    
    <handler
        class="MergeTagger">
      <merge
          toField="title"
          deleteFromFields="true"
          singleValue="true"
          singleValueSeparator=",">
        <fieldMatcher
            method="regex">
          (title|dc.title|dc:title|doctitle)
        </fieldMatcher>
      </merge>
    </handler>

    The following merges several title fields into one, joining multiple occurrences with a coma, and deleting original fields.

    Since:
    2.7.0
    Author:
    Pascal Essiembre