Class GenericMetadataChecksummer

  • All Implemented Interfaces:
    IMetadataChecksummer, IXMLConfigurable

    public class GenericMetadataChecksummer
    extends AbstractMetadataChecksummer

    Generic implementation of IMetadataChecksummer that uses specified field names and their values to create a checksum. The name and values are simply returned as is, joined using this format: fieldName=fieldValue;fieldName=fieldValue;....

    You have the option to keep the checksum as a document metadata field. When AbstractMetadataChecksummer.setKeep(boolean) is true, the checksum will be stored in the target field name specified. If you do not specify any, it stores it under the metadata field name CrawlDocMetadata.CHECKSUM_METADATA.

    XML configuration usage:

    
    <metadataChecksummer
        class="com.norconex.collector.core.checksum.impl.GenericMetadataChecksummer"
        keep="[false|true]"
        toField="(optional field to store the checksum)">
      <fieldMatcher>
        (expression matching fields used to create the checksum)
      </fieldMatcher>
    </metadataChecksummer>

    toField is ignored unless the keep attribute is set to true.

    XML usage example:

    
    <metadataChecksummer
        class="GenericMetadataChecksummer">
      <fieldMatcher
          method="csv">
        docLastModified,docSize
      </fieldMatcher>
    </metadataChecksummer>

    The above example uses a combination of two (fictitious) fields called "docLastModified" and "docSize" to make the checksum.

    Since 2.0.0, a self-closing <metadataChecksummer/> tag without any attributes is used to disable checksum generation.

    Since:
    1.2.0
    Author:
    Pascal Essiembre
    • Constructor Detail

      • GenericMetadataChecksummer

        public GenericMetadataChecksummer()
    • Method Detail

      • getFieldMatcher

        public TextMatcher getFieldMatcher()
        Gets the field matcher.
        Returns:
        field matcher
        Since:
        2.0.0
      • setFieldMatcher

        public void setFieldMatcher​(TextMatcher fieldMatcher)
        Sets the field matcher.
        Parameters:
        fieldMatcher - field matcher
        Since:
        2.0.0
      • getSourceFields

        @Deprecated
        public List<String> getSourceFields()
        Deprecated.
        Since 2.0.0, use getFieldMatcher().
        Gets the metadata fields used to construct a checksum.
        Returns:
        fields to use for checksum
      • setSourceFields

        @Deprecated
        public void setSourceFields​(String... sourceFields)
        Deprecated.
        Sets the metadata header fields used construct a checksum.
        Parameters:
        sourceFields - fields to use for checksum
      • setSourceFields

        @Deprecated
        public void setSourceFields​(List<String> sourceFields)
        Deprecated.
        Sets the metadata header fields used construct a checksum.
        Parameters:
        sourceFields - fields to use for checksum
      • getSourceFieldsRegex

        @Deprecated
        public String getSourceFieldsRegex()
        Deprecated.
        Since 2.0.0, use getFieldMatcher().
        Gets the regular expression matching metadata fields used to construct a checksum.
        Returns:
        regular expression
        Since:
        1.9.0
      • setSourceFieldsRegex

        @Deprecated
        public void setSourceFieldsRegex​(String sourceFieldsRegex)
        Deprecated.
        Sets the regular expression matching metadata fields used construct a checksum.
        Parameters:
        sourceFieldsRegex - regular expression
        Since:
        1.9.0
      • isDisabled

        @Deprecated
        public boolean isDisabled()
        Deprecated.
        Since 2.0.0, not having a checksummer defined or setting one explicitly to null effectively disables it.
        Deprecated.
        Returns:
        always false
      • setDisabled

        @Deprecated
        public void setDisabled​(boolean disabled)
        Deprecated.
        Since 2.0.0, not having a checksummer defined or setting one explicitly to null effectively disable it.
        Deprecated. Invoking this method has no effect
        Parameters:
        disabled - argument is ignored