Class KeepOnlyTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.impl.KeepOnlyTagger
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTagger
public class KeepOnlyTagger extends AbstractDocumentTagger
Keep only the metadata fields provided, delete all other ones. Exact field names (case-insensitive) to keep can be provided as well as a regular expression that matches one or many fields (since 2.1.0).
Note: Unless you have good reasons for doing otherwise, it is recommended to use this handler as one of the last ones to be executed. This is a good practice to ensure all metadata fields are available to other handlers that may require them even if they are not otherwise required.
Can be used both as a pre-parse or post-parse handler.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.KeepOnlyTagger"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher>(one or more matching fields to keep)</fieldMatcher> </handler>XML usage example:
<handler class="KeepOnlyTagger"> <fieldMatcher method="regex"> (title|description) </fieldMatcher> </handler>The above example keeps only the title and description fields from all extracted fields.
- Author:
- Pascal Essiembre
- See Also:
Pattern
-
-
Constructor Summary
Constructors Constructor Description KeepOnlyTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidaddField(String field)Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)booleanequals(Object other)TextMatchergetFieldMatcher()Gets field matcher.List<String>getFields()Deprecated.Since 3.0.0, usegetFieldMatcher()StringgetFieldsRegex()Deprecated.Since 3.0.0, usegetFieldMatcher()inthashCode()protected voidloadHandlerFromXML(XML xml)Loads configuration settings specific to the implementing class.voidremoveField(String field)Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)protected voidsaveHandlerToXML(XML xml)Saves configuration settings specific to the implementing class.voidsetFieldMatcher(TextMatcher fieldMatcher)Sets field matcher.voidsetFieldsRegex(String fieldsRegex)Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)voidtagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState)StringtoString()-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
tagApplicableDocument
public void tagApplicableDocument(HandlerDoc doc, InputStream document, ParseState parseState) throws ImporterHandlerException
- Specified by:
tagApplicableDocumentin classAbstractDocumentTagger- Throws:
ImporterHandlerException
-
getFieldMatcher
public TextMatcher getFieldMatcher()
Gets field matcher.- Returns:
- field matcher
- Since:
- 3.0.0
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
Sets field matcher.- Parameters:
fieldMatcher- field matcher- Since:
- 3.0.0
-
getFields
@Deprecated public List<String> getFields()
Deprecated.Since 3.0.0, usegetFieldMatcher()Gets the pattern for fields to keep as first element.- Returns:
- fields to keep
-
addField
@Deprecated public void addField(String field)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)Adds the pattern for fields to keep.- Parameters:
field- fields to add
-
removeField
@Deprecated public void removeField(String field)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)Does nothing.- Parameters:
field- field to keep
-
getFieldsRegex
@Deprecated public String getFieldsRegex()
Deprecated.Since 3.0.0, usegetFieldMatcher()Gets field matcher pattern.- Returns:
- field matcher pattern
-
setFieldsRegex
@Deprecated public void setFieldsRegex(String fieldsRegex)
Deprecated.Since 3.0.0, usesetFieldMatcher(TextMatcher)Sets field matcher pattern.- Parameters:
fieldsRegex- field matcher pattern.
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandlerLoads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXMLin classAbstractImporterHandler- Parameters:
xml- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandlerSaves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXMLin classAbstractImporterHandler- Parameters:
xml- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toStringin classAbstractImporterHandler
-
-