Class TruncateTagger

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTagger

    public class TruncateTagger
    extends AbstractDocumentTagger

    Truncates a fromField value(s) and optionally replace truncated portion by a hash value to help ensure uniqueness (not 100% guaranteed to be collision-free). If the field to truncate has multiple values, all values will be subject to truncation. You can store the value(s), truncated or not, in another target field.

    Storing values in an existing field

    If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a PropertySetter.

    The maxLength is guaranteed to be respected. This means any appended hash code and suffix will fit within the maxLength.

    Can be used both as a pre-parse or post-parse handler.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.tagger.impl.TruncateTagger"
        maxLength="(maximum length)"
        toField="(optional target field where to store the truncated value)"
        appendHash="[false|true]"
        suffix="(value to append after truncation. Goes before hash if one.)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <fieldMatcher>
        (one or more matching fields to have their values truncated)
      </fieldMatcher>
    </handler>

    XML usage example:

    
    <handler
        class="TruncateTagger"
        maxLength="50"
        appendHash="true"
        suffix="!">
      <fieldMatcher>myField</fieldMatcher>
    </handler>

    Assuming this "myField" value...

        Please truncate me before you start thinking I am too long.

    ...the above example will truncate it to...

        Please truncate me before you start thi!0996700004
    Since:
    2.8.0
    Author:
    Pascal Essiembre
    • Constructor Detail

      • TruncateTagger

        public TruncateTagger()
      • TruncateTagger

        public TruncateTagger​(TextMatcher fieldMatcher,
                              int maxLength)
        Constructor.
        Parameters:
        fieldMatcher - field matcher
        maxLength - truncation length
        Since:
        3.0.0
    • Method Detail

      • getToField

        public String getToField()
      • setToField

        public void setToField​(String keepToField)
      • isOverwrite

        @Deprecated
        public boolean isOverwrite()
        Deprecated.
        Since 3.0.0 use getOnSet().
        Gets whether existing value for the same field should be overwritten.
        Returns:
        true if overwriting existing value.
      • setOverwrite

        @Deprecated
        public void setOverwrite​(boolean overwrite)
        Deprecated.
        Since 3.0.0 use setOnSet(PropertySetter).
        Sets whether existing value for the same field should be overwritten.
        Parameters:
        overwrite - true if overwriting existing value.
      • getOnSet

        public PropertySetter getOnSet()
        Gets the property setter to use when a value is set.
        Returns:
        property setter
        Since:
        3.0.0
      • setOnSet

        public void setOnSet​(PropertySetter onSet)
        Sets the property setter to use when a value is set.
        Parameters:
        onSet - property setter
        Since:
        3.0.0
      • isAppendHash

        public boolean isAppendHash()
      • setAppendHash

        public void setAppendHash​(boolean appendHash)
      • getSuffix

        public String getSuffix()
      • setSuffix

        public void setSuffix​(String suffix)
      • getMaxLength

        public int getMaxLength()
      • setMaxLength

        public void setMaxLength​(int maxLength)
      • getFieldMatcher

        public TextMatcher getFieldMatcher()
        Gets field matcher for fields to truncate.
        Returns:
        field matcher
        Since:
        3.0.0
      • setFieldMatcher

        public void setFieldMatcher​(TextMatcher fieldMatcher)
        Sets the field matcher for fields to truncate.
        Parameters:
        fieldMatcher - field matcher
        Since:
        3.0.0