Class AbstractDocumentFilter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.filter.AbstractDocumentFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
- Direct Known Subclasses:
AbstractCharStreamFilter
,DateMetadataFilter
,DOMContentFilter
,DOMFilter
,EmptyFilter
,EmptyMetadataFilter
,NumericMetadataFilter
,ReferenceFilter
,RegexMetadataFilter
,RegexReferenceFilter
public abstract class AbstractDocumentFilter extends AbstractImporterHandler implements IDocumentFilter, IOnMatchFilter
Base class for document filters. Subclasses can be set an attribute called "onMatch". The logic whether to include or exclude a document upon matching it is handled by this class. Subclasses only need to focus on whether the document gets matched or not by implementing the
isDocumentMatched(HandlerDoc, InputStream, ParseState)
method.Inclusion/exclusion logic:
The logic for accepting or rejecting documents when a subclass condition is met ("matches") is as follow:
Inclusion/exclusion logic Matches? On match Expected behavior yes exclude Document is rejected. yes include Document is accepted. no exclude Document is accepted. no include Document is accepted if it was accepted by at least one filter with onMatch="include". If no other one exists or if none matched, the document is rejected. When multiple filters are defined and a combination of both "include" and "exclude" are possible, the "exclude" will always take precedence. In other words, it only take one matching "exclude" to reject a document, not matter how many matching "include" were triggered.
XML configuration usage:
onMatch="[include|exclude]"
Subclasses inherit the above
IXMLConfigurable
attribute(s), in addition to <restrictTo>.- Since:
- 2.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description AbstractDocumentFilter()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description boolean
acceptDocument(HandlerDoc doc, InputStream input, ParseState parseState)
Whether to accepts a document.boolean
equals(Object other)
OnMatch
getOnMatch()
Gets the the on match action (exclude or include).int
hashCode()
protected abstract boolean
isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState)
protected abstract void
loadFilterFromXML(XML xml)
protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected abstract void
saveFilterToXML(XML xml)
protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setOnMatch(OnMatch onMatch)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getOnMatch
public OnMatch getOnMatch()
Description copied from interface:IOnMatchFilter
Gets the the on match action (exclude or include).- Specified by:
getOnMatch
in interfaceIOnMatchFilter
- Returns:
- on match (exclude or include)
-
setOnMatch
public final void setOnMatch(OnMatch onMatch)
-
acceptDocument
public boolean acceptDocument(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
Description copied from interface:IDocumentFilter
Whether to accepts a document.- Specified by:
acceptDocument
in interfaceIDocumentFilter
- Parameters:
doc
- the document to evaluateinput
- document contentparseState
- whether the document has been parsed already or not (a parsed document should normally be text-based)- Returns:
true
if document is accepted- Throws:
ImporterHandlerException
- problem reading the document
-
isDocumentMatched
protected abstract boolean isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
- Throws:
ImporterHandlerException
-
saveHandlerToXML
protected final void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
saveFilterToXML
protected abstract void saveFilterToXML(XML xml)
-
loadHandlerFromXML
protected final void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
loadFilterFromXML
protected abstract void loadFilterFromXML(XML xml)
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-