public class MergeTagger extends AbstractDocumentTagger
Merge multiple metadata fields into a single one.
Use fromFields
to list all fields to merge, separated by commas.
Use fromFieldsRegex
to match fields to merge using a regular
expression.
Both fromFields
and fromFieldsRegex
can be used
together. Matching fields from both will be combined, in the order
provided/matched, starting with fromFields
entries.
Unless
singleValue
is set to true
, each value will be
added to the target field, making it a multi-value field. If
singleValue
is set to true
,
all values will be combined into one string, optionally
separated by the singleValueSeparator
. Single values will
be constructed without any separator if none are specified.
You can optionally decide do delete source fields after they were merged
by setting deleteFromFields
to true
.
The target field can be one of the "from" fields. In such case its
content will be replaced with the result of the merge (it will not be
deleted even if deleteFromFields
is true
).
If only a single source field is specified or found, it will be copied
to the target field and its multi-values will still be merged to a single one
if configured to do so. In such cases, this class can become an alternative
to using ForceSingleValueTagger
with a "mergeWith" action.
Can be used both as a pre-parse or post-parse handler.
<handler
class="com.norconex.importer.handler.tagger.impl.MergeTagger">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<!-- multiple merge tags allowed -->
<merge
toField="(name of target field for merged values)"
deleteFromFields="[false|true]"
singleValue="[false|true]"
singleValueSeparator="(text joining multiple-values)">
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(one or more matching fields to merge)
</fieldMatcher>
</merge>
</handler>
<handler
class="MergeTagger">
<merge
toField="title"
deleteFromFields="true"
singleValue="true"
singleValueSeparator=",">
<fieldMatcher
method="regex">
(title|dc.title|dc:title|doctitle)
</fieldMatcher>
</merge>
</handler>
The following merges several title fields into one, joining multiple occurrences with a coma, and deleting original fields.
Modifier and Type | Class and Description |
---|---|
static class |
MergeTagger.Merge |
Constructor and Description |
---|
MergeTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addMerge(MergeTagger.Merge merge) |
boolean |
equals(Object other) |
List<MergeTagger.Merge> |
getMerges() |
int |
hashCode() |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public List<MergeTagger.Merge> getMerges()
public void addMerge(MergeTagger.Merge merge)
protected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.