Class MergeTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.MergeTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class MergeTagger extends AbstractDocumentTagger
Merge multiple metadata fields into a single one.
Use
fromFields
to list all fields to merge, separated by commas. UsefromFieldsRegex
to match fields to merge using a regular expression. BothfromFields
andfromFieldsRegex
can be used together. Matching fields from both will be combined, in the order provided/matched, starting withfromFields
entries.Unless
singleValue
is set totrue
, each value will be added to the target field, making it a multi-value field. IfsingleValue
is set totrue
, all values will be combined into one string, optionally separated by thesingleValueSeparator
. Single values will be constructed without any separator if none are specified.You can optionally decide do delete source fields after they were merged by setting
deleteFromFields
totrue
.The target field can be one of the "from" fields. In such case its content will be replaced with the result of the merge (it will not be deleted even if
deleteFromFields
istrue
).If only a single source field is specified or found, it will be copied to the target field and its multi-values will still be merged to a single one if configured to do so. In such cases, this class can become an alternative to using
ForceSingleValueTagger
with a "mergeWith" action.Can be used both as a pre-parse or post-parse handler.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.MergeTagger"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <!-- multiple merge tags allowed --> <merge toField="(name of target field for merged values)" deleteFromFields="[false|true]" singleValue="[false|true]" singleValueSeparator="(text joining multiple-values)"> <fieldMatcher>(one or more matching fields to merge)</fieldMatcher> </merge> </handler>
XML usage example:
<handler class="MergeTagger"> <merge toField="title" deleteFromFields="true" singleValue="true" singleValueSeparator=","> <fieldMatcher method="regex"> (title|dc.title|dc:title|doctitle) </fieldMatcher> </merge> </handler>
The following merges several title fields into one, joining multiple occurrences with a coma, and deleting original fields.
- Since:
- 2.7.0
- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
MergeTagger.Merge
-
Constructor Summary
Constructors Constructor Description MergeTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addMerge(MergeTagger.Merge merge)
boolean
equals(Object other)
List<MergeTagger.Merge>
getMerges()
int
hashCode()
protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocument
in classAbstractDocumentTagger
- Throws:
ImporterHandlerException
-
getMerges
public List<MergeTagger.Merge> getMerges()
-
addMerge
public void addMerge(MergeTagger.Merge merge)
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-