Class EmptyFilter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.filter.AbstractDocumentFilter
-
- com.norconex.importer.handler.filter.impl.EmptyFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
public class EmptyFilter extends AbstractDocumentFilter
Accepts or rejects a document based on whether its content (default) or any of the specified metadata fields are empty or not. For metadata fields, control characters (char <= 32) are removed before evaluating whether their values are empty.
Filtering on multiple fields:
It is important to note that when your field matcher expression matches more than one field, only one of the matched fields needs to be empty to trigger a match. If no fields are matched, it is also considered empty. If you expect some fields to be present and they are not, they will not be evaluated thus are not considered empty. To make sure a multiple fields are tested properly, used multiple instances of
EmptyFilter
, with field matching matching only one field in each.XML configuration usage:
<handler class="com.norconex.importer.handler.filter.impl.EmptyFilter" onMatch="[include|exclude]"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher> (optional expression matching fields we want to test for emptiness) </fieldMatcher> </handler>
XML usage example:
<handler class="EmptyFilter" onMatch="exclude"> <fieldMatcher method="regex"> (title|dc:title) </fieldMatcher> </handler>
The above example excludes documents without titles.
- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description EmptyFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
TextMatcher
getFieldMatcher()
int
hashCode()
protected boolean
isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState)
protected void
loadFilterFromXML(XML xml)
protected void
saveFilterToXML(XML xml)
void
setFieldMatcher(TextMatcher fieldMatcher)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractDocumentFilter
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getFieldMatcher
public TextMatcher getFieldMatcher()
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
-
isDocumentMatched
protected boolean isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
- Specified by:
isDocumentMatched
in classAbstractDocumentFilter
- Throws:
ImporterHandlerException
-
loadFilterFromXML
protected void loadFilterFromXML(XML xml)
- Specified by:
loadFilterFromXML
in classAbstractDocumentFilter
-
saveFilterToXML
protected void saveFilterToXML(XML xml)
- Specified by:
saveFilterToXML
in classAbstractDocumentFilter
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractDocumentFilter
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractDocumentFilter
-
toString
public String toString()
- Overrides:
toString
in classAbstractDocumentFilter
-
-