Class TruncateTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.TruncateTagger
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTagger
public class TruncateTagger extends AbstractDocumentTagger
Truncates a
fromFieldvalue(s) and optionally replace truncated portion by a hash value to help ensure uniqueness (not 100% guaranteed to be collision-free). If the field to truncate has multiple values, all values will be subject to truncation. You can store the value(s), truncated or not, in another target field.Storing values in an existing field
If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a
PropertySetter.The
maxLengthis guaranteed to be respected. This means any appended hash code and suffix will fit within themaxLength.Can be used both as a pre-parse or post-parse handler.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.TruncateTagger" maxLength="(maximum length)" toField="(optional target field where to store the truncated value)" appendHash="[false|true]" suffix="(value to append after truncation. Goes before hash if one.)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher> (one or more matching fields to have their values truncated) </fieldMatcher> </handler>XML usage example:
<handler class="TruncateTagger" maxLength="50" appendHash="true" suffix="!"> <fieldMatcher>myField</fieldMatcher> </handler>Assuming this "myField" value...
Please truncate me before you start thinking I am too long.
...the above example will truncate it to...
Please truncate me before you start thi!0996700004
- Since:
- 2.8.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description TruncateTagger()TruncateTagger(TextMatcher fieldMatcher, int maxLength)Constructor.TruncateTagger(String fromField, int maxLength)Deprecated.Since 3.0.0, useTruncateTagger(TextMatcher, int)
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description booleanequals(Object other)TextMatchergetFieldMatcher()Gets field matcher for fields to truncate.StringgetFromField()Deprecated.Since 3.0.0, usegetFieldMatcher()insteadintgetMaxLength()PropertySettergetOnSet()Gets the property setter to use when a value is set.StringgetSuffix()StringgetToField()inthashCode()booleanisAppendHash()booleanisOverwrite()Deprecated.Since 3.0.0 usegetOnSet().protected voidloadHandlerFromXML(XML xml)Loads configuration settings specific to the implementing class.protected voidsaveHandlerToXML(XML xml)Saves configuration settings specific to the implementing class.voidsetAppendHash(boolean appendHash)voidsetFieldMatcher(TextMatcher fieldMatcher)Sets the field matcher for fields to truncate.voidsetFromField(String fromField)Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)insteadvoidsetMaxLength(int maxLength)voidsetOnSet(PropertySetter onSet)Sets the property setter to use when a value is set.voidsetOverwrite(boolean overwrite)Deprecated.Since 3.0.0 usesetOnSet(PropertySetter).voidsetSuffix(String suffix)voidsetToField(String keepToField)voidtagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)StringtoString()-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Constructor Detail
-
TruncateTagger
public TruncateTagger()
-
TruncateTagger
@Deprecated public TruncateTagger(String fromField, int maxLength)
Deprecated.Since 3.0.0, useTruncateTagger(TextMatcher, int)Constructor.- Parameters:
fromField- field to truncatemaxLength- truncation length
-
TruncateTagger
public TruncateTagger(TextMatcher fieldMatcher, int maxLength)
Constructor.- Parameters:
fieldMatcher- field matchermaxLength- truncation length- Since:
- 3.0.0
-
-
Method Detail
-
getToField
public String getToField()
-
setToField
public void setToField(String keepToField)
-
isOverwrite
@Deprecated public boolean isOverwrite()
Deprecated.Since 3.0.0 usegetOnSet().Gets whether existing value for the same field should be overwritten.- Returns:
trueif overwriting existing value.
-
setOverwrite
@Deprecated public void setOverwrite(boolean overwrite)
Deprecated.Since 3.0.0 usesetOnSet(PropertySetter).Sets whether existing value for the same field should be overwritten.- Parameters:
overwrite-trueif overwriting existing value.
-
getOnSet
public PropertySetter getOnSet()
Gets the property setter to use when a value is set.- Returns:
- property setter
- Since:
- 3.0.0
-
setOnSet
public void setOnSet(PropertySetter onSet)
Sets the property setter to use when a value is set.- Parameters:
onSet- property setter- Since:
- 3.0.0
-
isAppendHash
public boolean isAppendHash()
-
setAppendHash
public void setAppendHash(boolean appendHash)
-
getSuffix
public String getSuffix()
-
setSuffix
public void setSuffix(String suffix)
-
getMaxLength
public int getMaxLength()
-
setMaxLength
public void setMaxLength(int maxLength)
-
getFieldMatcher
public TextMatcher getFieldMatcher()
Gets field matcher for fields to truncate.- Returns:
- field matcher
- Since:
- 3.0.0
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
Sets the field matcher for fields to truncate.- Parameters:
fieldMatcher- field matcher- Since:
- 3.0.0
-
getFromField
@Deprecated public String getFromField()
Deprecated.Since 3.0.0, usegetFieldMatcher()insteadGets the from field.- Returns:
- from field
-
setFromField
@Deprecated public void setFromField(String fromField)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)insteadSets the from field.- Parameters:
fromField- from field.
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocumentin classAbstractDocumentTagger- Throws:
ImporterHandlerException
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandlerLoads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXMLin classAbstractImporterHandler- Parameters:
xml- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandlerSaves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXMLin classAbstractImporterHandler- Parameters:
xml- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toStringin classAbstractImporterHandler
-
-