public abstract class AbstractDocumentTagger extends AbstractImporterHandler implements IDocumentTagger
Base class for taggers.
Subclasses inherit this IXMLConfigurable
configuration:
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
Constructor and Description |
---|
AbstractDocumentTagger() |
Modifier and Type | Method and Description |
---|---|
protected abstract void |
tagApplicableDocument(HandlerDoc doc,
InputStream input,
ParseState parseState) |
void |
tagDocument(HandlerDoc doc,
InputStream input,
ParseState parseState)
Tags a document with extra metadata information.
|
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, equals, getRestrictions, hashCode, isApplicable, loadFromXML, loadHandlerFromXML, removeRestriction, removeRestriction, saveHandlerToXML, saveToXML, toString
public final void tagDocument(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
IDocumentTagger
tagDocument
in interface IDocumentTagger
doc
- documentinput
- document contentparseState
- whether the document has been parsed already or not (a
parsed document should normally be text-based)ImporterHandlerException
- problem tagging the documentprotected abstract void tagApplicableDocument(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
ImporterHandlerException
Copyright © 2009–2023 Norconex Inc.. All rights reserved.