public class KeepOnlyTagger extends AbstractDocumentTagger
Keep only the metadata fields provided, delete all other ones. Exact field names (case-insensitive) to keep can be provided as well as a regular expression that matches one or many fields (since 2.1.0).
Note: Unless you have good reasons for doing otherwise, it is recommended to use this handler as one of the last ones to be executed. This is a good practice to ensure all metadata fields are available to other handlers that may require them even if they are not otherwise required.
Can be used both as a pre-parse or post-parse handler.
<tagger class="com.norconex.importer.handler.tagger.impl.KeepOnlyTagger"> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <fields>(coma-separated list of fields to keep)</fields> <fieldsRegex>(regular expression matching fields to keep)</fieldsRegex> </tagger>
The following keeps only the title and description fields from all extracted fields.
<tagger class="com.norconex.importer.handler.tagger.impl.KeepOnlyTagger"> <fields>title, description</fields> </tagger>
Pattern
Constructor and Description |
---|
KeepOnlyTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addField(String field) |
boolean |
equals(Object other) |
List<String> |
getFields() |
String |
getFieldsRegex() |
int |
hashCode() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
void |
removeField(String field) |
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setFieldsRegex(String fieldsRegex) |
void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public void addField(String field)
public void removeField(String field)
public String getFieldsRegex()
public void setFieldsRegex(String fieldsRegex)
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.