Class HierarchyTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.HierarchyTagger
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTagger
public class HierarchyTagger extends AbstractDocumentTagger
Given a separator, split a field string into multiple segments representing each node of a hierarchical branch. This is useful when faceting, to find out how many documents fall under each node of a hierarchy. For example, take this hierarchical string:
/vegetable/potato/sweet
We specify a slash (/) separator and it will produce the following entries in the specified document metadata field:
/vegetable /vegetable/potato /vegetable/potato/sweet
If no target field is specified (
toField) the source field (fromField) will be used to store the resulting values. The same applies to the source and target hierarchy separators (fromSeparatorandtoSeparator).You can "keepEmptySegments", as well as specify whether the "fromSeparator" is a regular expression. When using regular expression without a "toSeparator", the text matching the expression is kept as is and thus can be different for each segment.
Storing values in an existing field
If a target field with the same name already exists for a document, values will be added to the end of the existing value list. It is possible to change this default behavior by supplying a
PropertySetter.Can be used both as a pre-parse or post-parse handler.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.HierarchyTagger"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <!-- multiple hierarchy tags allowed --> <hierarchy fromField="(from field)" toField="(optional to field)" fromSeparator="(original separator)" toSeparator="(optional new separator)" regex="[false|true]" keepEmptySegments="[false|true]"/> </handler>XML usage example:
<handler class="HierarchyTagger"> <hierarchy fromField="vegetable" toField="vegetableHierarchy" fromSeparator="/"/> </handler>The above will expand a slash-separated vegetable hierarchy found in a "vegetable" field into a "vegetableHierarchy" field.
- Since:
- 1.3.0
- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classHierarchyTagger.HierarchyDetails
-
Constructor Summary
Constructors Constructor Description HierarchyTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidaddHierarcyDetails(HierarchyTagger.HierarchyDetails details)Adds hierarchy instructions.voidaddHierarcyDetails(String fromField, String toField, String fromSeparator, String toSeparator, boolean overwrite)Deprecated.Since 2.10.0, useaddHierarcyDetails(HierarchyDetails)instead.booleanequals(Object other)List<HierarchyTagger.HierarchyDetails>getHierarchyDetails()inthashCode()protected voidloadHandlerFromXML(XML xml)Loads configuration settings specific to the implementing class.protected voidsaveHandlerToXML(XML xml)Saves configuration settings specific to the implementing class.voidtagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)StringtoString()-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocumentin classAbstractDocumentTagger- Throws:
ImporterHandlerException
-
addHierarcyDetails
@Deprecated public void addHierarcyDetails(String fromField, String toField, String fromSeparator, String toSeparator, boolean overwrite)
Deprecated.Since 2.10.0, useaddHierarcyDetails(HierarchyDetails)instead.Adds hierarchy instructions.- Parameters:
fromField- source field nametoField- target optional target field namefromSeparator- source separatortoSeparator- optional target separatoroverwrite- whether to overwrite target field if it exists
-
addHierarcyDetails
public void addHierarcyDetails(HierarchyTagger.HierarchyDetails details)
Adds hierarchy instructions.- Parameters:
details- hierarchy details
-
getHierarchyDetails
public List<HierarchyTagger.HierarchyDetails> getHierarchyDetails()
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandlerLoads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXMLin classAbstractImporterHandler- Parameters:
xml- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandlerSaves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXMLin classAbstractImporterHandler- Parameters:
xml- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toStringin classAbstractImporterHandler
-
-