Class HierarchyTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class HierarchyTagger
    extends AbstractDocumentTagger

    Given a separator, split a field string into multiple segments representing each node of a hierarchical branch. This is useful when faceting, to find out how many documents fall under each node of a hierarchy. For example, take this hierarchical string:

       /vegetable/potato/sweet
     

    We specify a slash (/) separator and it will produce the following entries in the specified document metadata field:

       /vegetable
       /vegetable/potato
       /vegetable/potato/sweet
     

    If no target field is specified (toField) the source field (fromField) will be used to store the resulting values. The same applies to the source and target hierarchy separators (fromSeparator and toSeparator).

    You can "keepEmptySegments", as well as specify whether the "fromSeparator" is a regular expression. When using regular expression without a "toSeparator", the text matching the expression is kept as is and thus can be different for each segment.

    Storing values in an existing field

    If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a PropertySetter.

    Can be used both as a pre-parse or post-parse handler.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.HierarchyTagger">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <!-- multiple hierarchy tags allowed -->
      <hierarchy
          fromField="(from field)"
          toField="(optional to field)"
          fromSeparator="(original separator)"
          toSeparator="(optional new separator)"
          regex="[false|true]"
          keepEmptySegments="[false|true]"/>
    </handler>

    XML usage example:

    
    <handler
        class="HierarchyTagger">
      <hierarchy
          fromField="vegetable"
          toField="vegetableHierarchy"
          fromSeparator="/"/>
    </handler>

    The above will expand a slash-separated vegetable hierarchy found in a "vegetable" field into a "vegetableHierarchy" field.

    Since:
    1.3.0
    Author:
    Pascal Essiembre