public class HierarchyTagger extends AbstractDocumentTagger
Given a separator, split a field string into multiple segments representing each node of a hierarchical branch. This is useful when faceting, to find out how many documents fall under each node of a hierarchy. For example, take this hierarchical string:
/vegetable/potato/sweet
We specify a slash (/) separator and it will produce the following entries in the specified document metadata field:
/vegetable /vegetable/potato /vegetable/potato/sweet
If no target field is specified (toField
) the
source field (fromField
) will be used to store the resulting
values. The same applies to the source and target hierarchy separators
(fromSeparator
and toSeparator
).
You can "keepEmptySegments", as well as specify whether the "fromSeparator" is a regular expression. When using regular expression without a "toSeparator", the text matching the expression is kept as is and thus can be different for each segment.
If a target field with the same name already exists for a document,
values will be added to the end of the existing value list.
It is possible to change this default behavior by supplying a
PropertySetter
.
Can be used both as a pre-parse or post-parse handler.
<handler
class="com.norconex.importer.handler.tagger.impl.HierarchyTagger">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<!-- multiple hierarchy tags allowed -->
<hierarchy
fromField="(from field)"
toField="(optional to field)"
fromSeparator="(original separator)"
toSeparator="(optional new separator)"
onSet="[append|prepend|replace|optional]"
regex="[false|true]"
keepEmptySegments="[false|true]"/>
</handler>
<handler
class="HierarchyTagger">
<hierarchy
fromField="vegetable"
toField="vegetableHierarchy"
fromSeparator="/"/>
</handler>
The above will expand a slash-separated vegetable hierarchy found in a "vegetable" field into a "vegetableHierarchy" field.
Modifier and Type | Class and Description |
---|---|
static class |
HierarchyTagger.HierarchyDetails |
Constructor and Description |
---|
HierarchyTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addHierarcyDetails(HierarchyTagger.HierarchyDetails details)
Adds hierarchy instructions.
|
void |
addHierarcyDetails(String fromField,
String toField,
String fromSeparator,
String toSeparator,
boolean overwrite)
Deprecated.
Since 2.10.0, use
addHierarcyDetails(HierarchyDetails)
instead. |
boolean |
equals(Object other) |
List<HierarchyTagger.HierarchyDetails> |
getHierarchyDetails() |
int |
hashCode() |
protected void |
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
tagApplicableDocument(HandlerDoc doc,
InputStream document,
ParseState parseState) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
@Deprecated public void addHierarcyDetails(String fromField, String toField, String fromSeparator, String toSeparator, boolean overwrite)
addHierarcyDetails(HierarchyDetails)
instead.fromField
- source field nametoField
- target optional target field namefromSeparator
- source separatortoSeparator
- optional target separatoroverwrite
- whether to overwrite target field if it existspublic void addHierarcyDetails(HierarchyTagger.HierarchyDetails details)
details
- hierarchy detailspublic List<HierarchyTagger.HierarchyDetails> getHierarchyDetails()
protected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.