public class EmptyFilter extends AbstractDocumentFilter
Accepts or rejects a document based on whether its content (default) or any of the specified metadata fields are empty or not. For metadata fields, control characters (char <= 32) are removed before evaluating whether their values are empty.
It is important to note that when your field matcher expression matches
more than one field, only one of the matched fields needs to be empty
to trigger a match. If no fields are matched, it is also considered empty.
If you expect some fields to be present and they are not, they will not
be evaluated thus are not considered empty. To make sure a multiple fields
are tested properly, used multiple instances of EmptyFilter
, with
field matching matching only one field in each.
<handler
class="com.norconex.importer.handler.filter.impl.EmptyFilter"
onMatch="[include|exclude]">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(optional expression matching fields we want to test for emptiness)
</fieldMatcher>
</handler>
<handler
class="EmptyFilter"
onMatch="exclude">
<fieldMatcher
method="regex">
(title|dc:title)
</fieldMatcher>
</handler>
The above example excludes documents without titles.
Constructor and Description |
---|
EmptyFilter() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
TextMatcher |
getFieldMatcher() |
int |
hashCode() |
protected boolean |
isDocumentMatched(HandlerDoc doc,
InputStream input,
ParseState parseState) |
protected void |
loadFilterFromXML(XML xml) |
protected void |
saveFilterToXML(XML xml) |
void |
setFieldMatcher(TextMatcher fieldMatcher) |
String |
toString() |
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public TextMatcher getFieldMatcher()
public void setFieldMatcher(TextMatcher fieldMatcher)
protected boolean isDocumentMatched(HandlerDoc doc, InputStream input, ParseState parseState) throws ImporterHandlerException
isDocumentMatched
in class AbstractDocumentFilter
ImporterHandlerException
protected void loadFilterFromXML(XML xml)
loadFilterFromXML
in class AbstractDocumentFilter
protected void saveFilterToXML(XML xml)
saveFilterToXML
in class AbstractDocumentFilter
public boolean equals(Object other)
equals
in class AbstractDocumentFilter
public int hashCode()
hashCode
in class AbstractDocumentFilter
public String toString()
toString
in class AbstractDocumentFilter
Copyright © 2009–2023 Norconex Inc.. All rights reserved.