Class SplitTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- com.norconex.importer.handler.tagger.impl.SplitTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class SplitTagger extends AbstractCharStreamTagger
Splits an existing metadata value into multiple values based on a given value separator (the separator gets discarded). The "toField" argument is optional (the same field will be used to store the splits if no "toField" is specified"). Duplicates are removed.
Can be used both as a pre-parse (metadata or text content) or post-parse handler.
If no "fieldMatcher" expression is specified, the document content will be used. If the "fieldMatcher" matches more than one field, they will all be split and stored in the same multi-value metadata field.
Storing values in an existing field
If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a
PropertySetter
.XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.SplitTagger" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <!-- multiple split tags allowed --> <split toField="targetFieldName"> <fieldMatcher>(one or more matching fields to split)</fieldMatcher> <separator regex="[false|true]"> (separator value) </separator> </split> </handler>
XML usage example:
<handler class="SplitTagger"> <split> <fieldMatcher>myField</fieldMatcher> <separator regex="true"> \s*,\s* </separator> </split> </handler>
The above example splits a single value field holding a comma-separated list into multiple values.
- Since:
- 1.3.0
- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SplitTagger.SplitDetails
-
Constructor Summary
Constructors Constructor Description SplitTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addSplit(String fromField, String separator, boolean regex)
Deprecated.void
addSplit(String fromField, String toField, String separator, boolean regex)
Deprecated.void
addSplitDetails(SplitTagger.SplitDetails sd)
boolean
equals(Object other)
List<SplitTagger.SplitDetails>
getSplitDetailsList()
List<SplitTagger.SplitDetails>
getSplits()
Deprecated.int
hashCode()
protected void
loadCharStreamTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.void
removeSplit(String fromField)
Deprecated.void
removeSplitDetails(String fromField)
protected void
saveCharStreamTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class.protected void
tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractCharStreamTagger
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagTextDocument
protected void tagTextDocument(HandlerDoc doc, Reader input, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagTextDocument
in classAbstractCharStreamTagger
- Throws:
ImporterHandlerException
-
getSplitDetailsList
public List<SplitTagger.SplitDetails> getSplitDetailsList()
-
getSplits
@Deprecated public List<SplitTagger.SplitDetails> getSplits()
Deprecated.
-
removeSplitDetails
public void removeSplitDetails(String fromField)
-
removeSplit
@Deprecated public void removeSplit(String fromField)
Deprecated.
-
addSplitDetails
public void addSplitDetails(SplitTagger.SplitDetails sd)
-
addSplit
@Deprecated public void addSplit(String fromField, String separator, boolean regex)
Deprecated.
-
addSplit
@Deprecated public void addSplit(String fromField, String toField, String separator, boolean regex)
Deprecated.
-
loadCharStreamTaggerFromXML
protected void loadCharStreamTaggerFromXML(XML xml)
Description copied from class:AbstractCharStreamTagger
Loads configuration settings specific to the implementing class.- Specified by:
loadCharStreamTaggerFromXML
in classAbstractCharStreamTagger
- Parameters:
xml
- xml configuration
-
saveCharStreamTaggerToXML
protected void saveCharStreamTaggerToXML(XML xml)
Description copied from class:AbstractCharStreamTagger
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveCharStreamTaggerToXML
in classAbstractCharStreamTagger
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractCharStreamTagger
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractCharStreamTagger
-
toString
public String toString()
- Overrides:
toString
in classAbstractCharStreamTagger
-
-