public class TruncateTagger extends AbstractDocumentTagger
Truncates a fromField
value(s) and optionally replace truncated
portion by a hash value to help ensure uniqueness (not 100% guaranteed to
be collision-free). If the field to truncate has multiple values, all
values will be subject to truncation. You can store the value(s), truncated
or not, in another target field.
If a target field with the same name already exists for a document,
values will be added to the end of the existing value list.
It is possible to change this default behavior by supplying a
PropertySetter
.
The maxLength
is guaranteed to be respected. This means any
appended hash code and suffix will fit within the maxLength
.
Can be used both as a pre-parse or post-parse handler.
<handler
class="com.norconex.importer.handler.tagger.impl.TruncateTagger"
maxLength="(maximum length)"
toField="(optional target field where to store the truncated value)"
onSet="[append|prepend|replace|optional]"
appendHash="[false|true]"
suffix="(value to append after truncation. Goes before hash if one.)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(one or more matching fields to have their values truncated)
</fieldMatcher>
</handler>
<handler
class="TruncateTagger"
maxLength="50"
appendHash="true"
suffix="!">
<fieldMatcher>myField</fieldMatcher>
</handler>
Assuming this "myField" value...
Please truncate me before you start thinking I am too long.
...the above example will truncate it to...
Please truncate me before you start thi!0996700004
Constructor and Description |
---|
TruncateTagger() |
TruncateTagger(String fromField,
int maxLength)
Deprecated.
Since 3.0.0, use
TruncateTagger(TextMatcher, int) |
TruncateTagger(TextMatcher fieldMatcher,
int maxLength)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
TextMatcher |
getFieldMatcher()
Gets field matcher for fields to truncate.
|
String |
getFromField()
Deprecated.
Since 3.0.0, use
getFieldMatcher() instead |
int |
getMaxLength() |
PropertySetter |
getOnSet()
Gets the property setter to use when a value is set.
|
String |
getSuffix() |
String |
getToField() |
int |
hashCode() |
boolean |
isAppendHash() |
boolean |
isOverwrite()
Deprecated.
Since 3.0.0 use
getOnSet() . |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setAppendHash(boolean appendHash) |
void |
setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher for fields to truncate.
|
void |
setFromField(String fromField)
Deprecated.
Since 3.0.0, use
setFieldMatcher(TextMatcher) instead |
void |
setMaxLength(int maxLength) |
void |
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.
|
void |
setOverwrite(boolean overwrite)
Deprecated.
Since 3.0.0 use
setOnSet(PropertySetter) . |
void |
setSuffix(String suffix) |
void |
setToField(String keepToField) |
void |
tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public TruncateTagger()
@Deprecated public TruncateTagger(String fromField, int maxLength)
TruncateTagger(TextMatcher, int)
fromField
- field to truncatemaxLength
- truncation lengthpublic TruncateTagger(TextMatcher fieldMatcher, int maxLength)
fieldMatcher
- field matchermaxLength
- truncation lengthpublic String getToField()
public void setToField(String keepToField)
@Deprecated public boolean isOverwrite()
getOnSet()
.true
if overwriting existing value.@Deprecated public void setOverwrite(boolean overwrite)
setOnSet(PropertySetter)
.overwrite
- true
if overwriting existing value.public PropertySetter getOnSet()
public void setOnSet(PropertySetter onSet)
onSet
- property setterpublic boolean isAppendHash()
public void setAppendHash(boolean appendHash)
public String getSuffix()
public void setSuffix(String suffix)
public int getMaxLength()
public void setMaxLength(int maxLength)
public TextMatcher getFieldMatcher()
public void setFieldMatcher(TextMatcher fieldMatcher)
fieldMatcher
- field matcher@Deprecated public String getFromField()
getFieldMatcher()
instead@Deprecated public void setFromField(String fromField)
setFieldMatcher(TextMatcher)
insteadfromField
- from field.public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
protected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.