Class TruncateTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.TruncateTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class TruncateTagger extends AbstractDocumentTagger
Truncates a
fromField
value(s) and optionally replace truncated portion by a hash value to help ensure uniqueness (not 100% guaranteed to be collision-free). If the field to truncate has multiple values, all values will be subject to truncation. You can store the value(s), truncated or not, in another target field.Storing values in an existing field
If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a
PropertySetter
.The
maxLength
is guaranteed to be respected. This means any appended hash code and suffix will fit within themaxLength
.Can be used both as a pre-parse or post-parse handler.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.TruncateTagger" maxLength="(maximum length)" toField="(optional target field where to store the truncated value)" appendHash="[false|true]" suffix="(value to append after truncation. Goes before hash if one.)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher> (one or more matching fields to have their values truncated) </fieldMatcher> </handler>
XML usage example:
<handler class="TruncateTagger" maxLength="50" appendHash="true" suffix="!"> <fieldMatcher>myField</fieldMatcher> </handler>
Assuming this "myField" value...
Please truncate me before you start thinking I am too long.
...the above example will truncate it to...
Please truncate me before you start thi!0996700004
- Since:
- 2.8.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description TruncateTagger()
TruncateTagger(TextMatcher fieldMatcher, int maxLength)
Constructor.TruncateTagger(String fromField, int maxLength)
Deprecated.Since 3.0.0, useTruncateTagger(TextMatcher, int)
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
equals(Object other)
TextMatcher
getFieldMatcher()
Gets field matcher for fields to truncate.String
getFromField()
Deprecated.Since 3.0.0, usegetFieldMatcher()
insteadint
getMaxLength()
PropertySetter
getOnSet()
Gets the property setter to use when a value is set.String
getSuffix()
String
getToField()
int
hashCode()
boolean
isAppendHash()
boolean
isOverwrite()
Deprecated.Since 3.0.0 usegetOnSet()
.protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setAppendHash(boolean appendHash)
void
setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher for fields to truncate.void
setFromField(String fromField)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)
insteadvoid
setMaxLength(int maxLength)
void
setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.void
setOverwrite(boolean overwrite)
Deprecated.Since 3.0.0 usesetOnSet(PropertySetter)
.void
setSuffix(String suffix)
void
setToField(String keepToField)
void
tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Constructor Detail
-
TruncateTagger
public TruncateTagger()
-
TruncateTagger
@Deprecated public TruncateTagger(String fromField, int maxLength)
Deprecated.Since 3.0.0, useTruncateTagger(TextMatcher, int)
Constructor.- Parameters:
fromField
- field to truncatemaxLength
- truncation length
-
TruncateTagger
public TruncateTagger(TextMatcher fieldMatcher, int maxLength)
Constructor.- Parameters:
fieldMatcher
- field matchermaxLength
- truncation length- Since:
- 3.0.0
-
-
Method Detail
-
getToField
public String getToField()
-
setToField
public void setToField(String keepToField)
-
isOverwrite
@Deprecated public boolean isOverwrite()
Deprecated.Since 3.0.0 usegetOnSet()
.Gets whether existing value for the same field should be overwritten.- Returns:
true
if overwriting existing value.
-
setOverwrite
@Deprecated public void setOverwrite(boolean overwrite)
Deprecated.Since 3.0.0 usesetOnSet(PropertySetter)
.Sets whether existing value for the same field should be overwritten.- Parameters:
overwrite
-true
if overwriting existing value.
-
getOnSet
public PropertySetter getOnSet()
Gets the property setter to use when a value is set.- Returns:
- property setter
- Since:
- 3.0.0
-
setOnSet
public void setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.- Parameters:
onSet
- property setter- Since:
- 3.0.0
-
isAppendHash
public boolean isAppendHash()
-
setAppendHash
public void setAppendHash(boolean appendHash)
-
getSuffix
public String getSuffix()
-
setSuffix
public void setSuffix(String suffix)
-
getMaxLength
public int getMaxLength()
-
setMaxLength
public void setMaxLength(int maxLength)
-
getFieldMatcher
public TextMatcher getFieldMatcher()
Gets field matcher for fields to truncate.- Returns:
- field matcher
- Since:
- 3.0.0
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher for fields to truncate.- Parameters:
fieldMatcher
- field matcher- Since:
- 3.0.0
-
getFromField
@Deprecated public String getFromField()
Deprecated.Since 3.0.0, usegetFieldMatcher()
insteadGets the from field.- Returns:
- from field
-
setFromField
@Deprecated public void setFromField(String fromField)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)
insteadSets the from field.- Parameters:
fromField
- from field.
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocument
in classAbstractDocumentTagger
- Throws:
ImporterHandlerException
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-