public class MD5DocumentChecksummer extends AbstractDocumentChecksummer
Implementation of IDocumentChecksummer
which
returns a MD5 checksum value of the extracted document content unless
one or more given source fields are specified, in which case the MD5
checksum value is constructed from those fields. This checksum is normally
performed right after the document has been imported.
You have the option to keep the checksum as a document metadata field.
When AbstractDocumentChecksummer.setKeep(boolean)
is true
, the checksum will be
stored in the target field name specified. If you do not specify any,
it stores it under the metadata field name
CollectorMetadata.COLLECTOR_CHECKSUM_METADATA
.
Since 1.9.0, it is possible to use regular expressions to match
fields.
Use sourceFields
to list all fields to use, separated by commas.
Use sourceFieldsRegex
to match fields to use using a regular
expression.
Both sourceFields
and sourceFieldsRegex
can be used
together. Matching fields from both will be combined, in the order
provided/matched, starting with sourceFields
entries.
Since 1.9.0, it is possible to use a combination of document content
and fields to create the checksum by setting
combineFieldsAndContent
to true
.
If you combine fields and content but you don't define any source fields,
it will be the equivalent of adding all fields.
If you do not combine the two, specifying one or more source fields
will ignore the content while specifying none will only use the content.
<documentChecksummer class="com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer" disabled="[false|true]" combineFieldsAndContent="[false|true]" keep="[false|true]" targetField="(optional metadata field to store the checksum)"> <sourceFields> (optional coma-separated list fields used to create checksum) </sourceFields> <sourceFieldsRegex> (regular expression matching fields used to create checksum) </sourceFieldsRegex> </documentChecksummer>
targetField
is ignored unless the keep
attribute is set to true
.
This implementation can be disabled in your
configuration by specifying disabled="true"
. When disabled,
the checksum returned is always null
.
The following uses the document body (default) to make the checksum.
<documentChecksummer class="com.norconex.collector.core.checksum.impl.MD5DocumentChecksummer" />
Constructor and Description |
---|
MD5DocumentChecksummer() |
Modifier and Type | Method and Description |
---|---|
String |
doCreateDocumentChecksum(ImporterDocument document) |
boolean |
equals(Object other) |
String[] |
getSourceFields()
Gets the fields used to construct a MD5 checksum.
|
String |
getSourceFieldsRegex()
Gets the regular expression matching metadata fields used to construct
a checksum.
|
int |
hashCode() |
boolean |
isCombineFieldsAndContent()
Gets whether we are combining the fields and content checksums.
|
boolean |
isDisabled()
Whether this checksummer is disabled or not.
|
protected void |
loadChecksummerFromXML(XMLConfiguration xml) |
protected void |
saveChecksummerToXML(EnhancedXMLStreamWriter writer) |
void |
setCombineFieldsAndContent(boolean combineFieldsAndContent)
Sets whether to combine the fields and content checksums.
|
void |
setDisabled(boolean disabled)
Sets whether this checksummer is disabled or not.
|
void |
setSourceFields(String... fields)
Sets the fields used to construct a MD5 checksum.
|
void |
setSourceFieldsRegex(String sourceFieldsRegex)
Sets the regular expression matching metadata fields used construct
a checksum.
|
String |
toString() |
createDocumentChecksum, getTargetField, isKeep, loadFromXML, saveToXML, setKeep, setTargetField
public String doCreateDocumentChecksum(ImporterDocument document)
doCreateDocumentChecksum
in class AbstractDocumentChecksummer
public String[] getSourceFields()
public void setSourceFields(String... fields)
fields
- fields to use to construct the checksumpublic String getSourceFieldsRegex()
public void setSourceFieldsRegex(String sourceFieldsRegex)
sourceFieldsRegex
- regular expressionpublic boolean isDisabled()
null
).true
if disabledpublic void setDisabled(boolean disabled)
null
).disabled
- true
if disabledpublic boolean isCombineFieldsAndContent()
true
if combining fields and content checksumspublic void setCombineFieldsAndContent(boolean combineFieldsAndContent)
combineFieldsAndContent
- true
if combining fields
and content checksumsprotected void loadChecksummerFromXML(XMLConfiguration xml)
loadChecksummerFromXML
in class AbstractDocumentChecksummer
protected void saveChecksummerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
saveChecksummerToXML
in class AbstractDocumentChecksummer
XMLStreamException
public boolean equals(Object other)
equals
in class AbstractDocumentChecksummer
public int hashCode()
hashCode
in class AbstractDocumentChecksummer
public String toString()
toString
in class AbstractDocumentChecksummer
Copyright © 2014–2021 Norconex Inc.. All rights reserved.