Class LastModifiedMetadataChecksummer

  • All Implemented Interfaces:
    IMetadataChecksummer, IXMLConfigurable

    public class LastModifiedMetadataChecksummer
    extends AbstractMetadataChecksummer

    Default implementation of IMetadataChecksummer for the Norconex HTTP Collector which simply returns the exact value of the "Last-Modified" HTTP header field, or null if not present.

    You have the option to keep the checksum as a document metadata field. When AbstractMetadataChecksummer.setKeep(boolean) is true, the checksum will be stored in the target field name specified. If you do not specify any, it stores it under the metadata field name CrawlDocMetadata.CHECKSUM_METADATA.

    To use different fields (one or several) to constitute a checksum, you can instead use the GenericMetadataChecksummer.

    XML configuration usage:

    
    <metadataChecksummer
        class="com.norconex.collector.http.checksum.impl.LastModifiedMetadataChecksummer"
        keep="[false|true]"
        toField="(field to store checksum)"/>

    XML usage example:

    
    <metadataChecksummer
        keep="true"
        toField="metaChecksum"/>

    The above example will store the last modified date used for checksum purposes in a field called "metaChecksum".

    Since 2.0.0, a self-closing <metadataChecksummer/> tag without any attributes is used to disable checksum generation.

    Since:
    2.2.0
    Author:
    Pascal Essiembre
    See Also:
    GenericMetadataChecksummer