public abstract class AbstractDocumentFilter extends AbstractImporterHandler implements IDocumentFilter, IOnMatchFilter
Base class for document filters. Subclasses can be set an attribute
called "onMatch". The logic whether to include or exclude a document
upon matching it is handled by this class. Subclasses only
need to focus on whether the document gets matched or not by
implementing the
isDocumentMatched(String, InputStream, ImporterMetadata, boolean)
method.
The logic for accepting or rejecting documents when a subclass condition is met ("matches") is as follow:
Matches? | On match | Expected behavior |
yes | exclude | Document is rejected. |
yes | include | Document is accepted. |
no | exclude | Document is accepted. |
no | include | Document is accepted if it was accepted by at least one filter with onMatch="include". If no other one exists or if none matched, the document is rejected. |
When multiple filters are defined and a combination of both "include" and "exclude" are possible, the "exclude" will always take precedence. In other words, it only take one matching "exclude" to reject a document, not matter how many matching "include" were triggered.
Subclasses inherit this IXMLConfigurable
configuration:
<!-- main tag supports onMatch="[include|exclude]" attribute --> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
AbstractOnMatchFilter
Constructor and Description |
---|
AbstractDocumentFilter() |
Modifier and Type | Method and Description |
---|---|
boolean |
acceptDocument(String reference,
InputStream input,
ImporterMetadata metadata,
boolean parsed)
Whether to accepts a document.
|
boolean |
equals(Object other) |
OnMatch |
getOnMatch()
Gets the the on match action (exclude or include).
|
int |
hashCode() |
protected abstract boolean |
isDocumentMatched(String reference,
InputStream input,
ImporterMetadata metadata,
boolean parsed) |
protected abstract void |
loadFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected abstract void |
saveFilterToXML(EnhancedXMLStreamWriter writer) |
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setOnMatch(OnMatch onMatch) |
String |
toString() |
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public OnMatch getOnMatch()
IOnMatchFilter
getOnMatch
in interface IOnMatchFilter
public final void setOnMatch(OnMatch onMatch)
public boolean acceptDocument(String reference, InputStream input, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
IDocumentFilter
acceptDocument
in interface IDocumentFilter
reference
- document referenceinput
- the document to evaluatemetadata
- document metadataparsed
- whether the document has been parsed already or not (a
parsed document should normally be text-based)true
if document is acceptedImporterHandlerException
- problem reading the documentprotected abstract boolean isDocumentMatched(String reference, InputStream input, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
ImporterHandlerException
protected final void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLprotected abstract void saveFilterToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
XMLStreamException
protected final void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected abstract void loadFilterFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
IOException
public String toString()
toString
in class AbstractImporterHandler
public boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.