public class MergeTagger extends AbstractDocumentTagger
Merge multiple metadata fields into a single one.
Use fromFields
to list all fields to merge, separated by commas.
Use fromFieldsRegex
to match fields to merge using a regular
expression.
Both fromFields
and fromFieldsRegex
can be used
together. Matching fields from both will be combined, in the order
provided/matched, starting with fromFields
entries.
Unless
singleValue
is set to true
, each value will be
added to the target field, making it a multi-value field. If
singleValue
is set to true
,
all values will be combined into one string, optionally
separated by the singleValueSeparator
. Single values will
be constructed without any separator if none are specified.
You can optionally decide do delete source fields after they were merged
by setting deleteFromFields
to true
.
The target field can be one of the "from" fields. In such case its
content will be replaced with the result of the merge (it will not be
deleted even if deleteFromFields
is true
).
If only a single source field is specified or found, it will be copied
to the target field and its multi-values will still be merged to a single one
if configured to do so. In such cases, this class can become an alternative
to using ForceSingleValueTagger
with a "mergeWith" action.
Can be used both as a pre-parse or post-parse handler.
<tagger class="com.norconex.importer.handler.tagger.impl.MergeTagger"> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <merge toField="(name of target field for merged values)" deleteFromFields="[false|true]" singleValue="[false|true]" singleValueSeparator="(text joining multiple-values)" > <fromFields>(coma-separated list of fields to merge)</fromFields> <fromFieldsRegex>(regular expression matching fields to merge)</fromFieldsRegex> </merge> <!-- multiple merge tags allowed --> </tagger>
The following merges several title fields into one, joining multiple occurrences with a coma, and deleting original fields.
<tagger class="com.norconex.importer.handler.tagger.impl.MergeTagger"> <merge toField="title" deleteFromFields="true" singleValue="true" singleValueSeparator="," > <fromFields>title,dc.title,dc:title,doctitle</fromFields> </merge> </tagger>
Modifier and Type | Class and Description |
---|---|
static class |
MergeTagger.Merge |
Constructor and Description |
---|
MergeTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addMerge(MergeTagger.Merge merge) |
boolean |
equals(Object other) |
List<MergeTagger.Merge> |
getMerges() |
int |
hashCode() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public List<MergeTagger.Merge> getMerges()
public void addMerge(MergeTagger.Merge merge)
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.