public class TruncateTagger extends AbstractDocumentTagger
Truncates a fromField
value(s) and optionally replace truncated
portion by a hash value to help ensure uniqueness (not 100% guaranteed to
be collision-free). If the field to truncate has multiple values, all
values will be subject to truncation. You can store the value(s), truncated
or not, in another fromField.
When storing the truncated values in a new fromField already having one or
more values, the truncated values will be added to the list of
existing values, unless "overwrite" is set to true
.
The maxLength
is guaranteed to be respected. This means any
appended hash code and suffix will fit within the maxLength
.
Can be used both as a pre-parse or post-parse handler.
<tagger class="com.norconex.importer.handler.tagger.impl.TruncateTagger" fromField="(fromField holding one or more values to truncate)" maxLength="(maximum length)" toField="(optional fromField where to store the truncated value)" overwrite="[false|true]" appendHash="[false|true]" suffix="(value to append after truncation. Goes before hash if one.)" > <restrictTo caseSensitive="[false|true]" fromField="(name of header/metadata fromField name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> </tagger>
To truncate this "myField" value...
Please truncate me before you start thinking I am too long.
...to become this...
Please truncate me before you start thi!0996700004...you would set a max length of 50, with a "!" suffix and append a hash. Like this:
<tagger class="com.norconex.importer.handler.tagger.impl.TruncateTagger" fromField="myField" maxLength="50" appendHash="true" suffix="!" />
Constructor and Description |
---|
TruncateTagger() |
TruncateTagger(String fromField,
int maxLength) |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getFromField() |
int |
getMaxLength() |
String |
getSuffix() |
String |
getToField() |
int |
hashCode() |
boolean |
isAppendHash() |
boolean |
isOverwrite() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setAppendHash(boolean appendHash) |
void |
setFromField(String fromField) |
void |
setMaxLength(int maxLength) |
void |
setOverwrite(boolean keepOverwrite) |
void |
setSuffix(String suffix) |
void |
setToField(String keepToField) |
void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public TruncateTagger()
public TruncateTagger(String fromField, int maxLength)
public String getToField()
public void setToField(String keepToField)
public boolean isOverwrite()
public void setOverwrite(boolean keepOverwrite)
public boolean isAppendHash()
public void setAppendHash(boolean appendHash)
public String getSuffix()
public void setSuffix(String suffix)
public int getMaxLength()
public void setMaxLength(int maxLength)
public String getFromField()
public void setFromField(String fromField)
public void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic String toString()
toString
in class AbstractImporterHandler
public boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.