public class HierarchyTagger extends AbstractDocumentTagger
Given a separator, split a field string into multiple segments representing each node of a hierarchical branch. This is useful when faceting, to find out how many documents fall under each node of a hierarchy. For example, take this hierarchical string:
/vegetable/potato/sweet
We specify a slash (/) separator and it will produce the folowing entries in the specified document metadata field:
/vegetable /vegetable/potato /vegetable/potato/sweet
If no target field is specified (toField
) the
source field (fromField
) will be used to store the resulting
values. The same applies to the source and target hierarchy separators
(fromSeparator
and toSeparator
).
Since 2.10.0, you can "keepEmptySegments", as well as specify whether the "fromSeparator" is a regular expression. When using regular expression without a "toSeparator", the text matching the expression is kept as is and thus can be different for each segment.
Can be used both as a pre-parse or post-parse handler.
<tagger class="com.norconex.importer.handler.tagger.impl.HierarchyTagger"> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <hierarchy fromField="(from field)" toField="(optional to field)" fromSeparator="(original separator)" toSeparator="(optional new separator)" overwrite="[false|true]" regex="[false|true]" keepEmptySegments="[false|true]" /> <!-- multiple hierarchy tags allowed --> </tagger>
The following will expand a slash-separated vegetable hierarchy found in a "vegetable" field into a "vegetableHierarchy" field.
<tagger class="com.norconex.importer.handler.tagger.impl.HierarchyTagger"> <hierarchy fromField="vegetable" toField="vegetableHierarchy" fromSeparator="/"/> </tagger>
Modifier and Type | Class and Description |
---|---|
static class |
HierarchyTagger.HierarchyDetails |
Constructor and Description |
---|
HierarchyTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addHierarcyDetails(HierarchyTagger.HierarchyDetails details)
Adds hierarchy instructions.
|
void |
addHierarcyDetails(String fromField,
String toField,
String fromSeparator,
String toSeparator,
boolean overwrite)
Deprecated.
Since 2.10.0, use
addHierarcyDetails(HierarchyDetails)
instead. |
boolean |
equals(Object other) |
List<HierarchyTagger.HierarchyDetails> |
getHierarchyDetails() |
int |
hashCode() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
@Deprecated public void addHierarcyDetails(String fromField, String toField, String fromSeparator, String toSeparator, boolean overwrite)
addHierarcyDetails(HierarchyDetails)
instead.fromField
- source field nametoField
- target optional target field namefromSeparator
- source separatortoSeparator
- optional target separatoroverwrite
- whether to overwrite target field if it existspublic void addHierarcyDetails(HierarchyTagger.HierarchyDetails details)
details
- hierarchy detailspublic List<HierarchyTagger.HierarchyDetails> getHierarchyDetails()
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.