public class TextFilter extends AbstractStringFilter
Filters a document based on a text pattern in a document content
(default), or matching fields specified.
When used on very large content, it is possible the pattern matching will
be done in chunks, sometimes not achieving expected results. Consider
using AbstractCharStreamFilter
if this is a concern.
Refer to AbstractDocumentFilter
for the inclusion/exclusion logic.
<handler
class="com.norconex.importer.handler.filter.impl.RegexContentFilter"
maxReadSize="(max characters to read at once)"
sourceCharset="(character encoding)"
onMatch="[include|exclude]">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(Optional expression of field to match. Omit to use document content.)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(expression of value to match)
</valueMatcher>
</handler>
<handler
class="TextFilter"
onMatch="include">
<valueMatcher>apple</valueMatcher>
</handler>
Constructor and Description |
---|
TextFilter() |
TextFilter(TextMatcher valueMatcher) |
TextFilter(TextMatcher valueMatcher,
OnMatch onMatch) |
TextFilter(TextMatcher fieldMatcher,
TextMatcher valueMatcher) |
TextFilter(TextMatcher fieldMatcher,
TextMatcher valueMatcher,
OnMatch onMatch) |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
TextMatcher |
getFieldMatcher()
Gets the text matcher of field names.
|
TextMatcher |
getValueMatcher()
Gets the text matcher for field values.
|
int |
hashCode() |
protected boolean |
isStringContentMatching(HandlerDoc doc,
StringBuilder content,
ParseState parseState,
int sectionIndex) |
protected void |
loadStringFilterFromXML(XML xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveStringFilterToXML(XML xml)
Saves configuration settings specific to the implementing class.
|
void |
setFieldMatcher(TextMatcher fieldMatcher)
Sets the text matcher of field names.
|
void |
setValueMatcher(TextMatcher valueMatcher)
Sets the text matcher for field values.
|
String |
toString() |
getMaxReadSize, isTextDocumentMatching, loadCharStreamFilterFromXML, saveCharStreamFilterToXML, setMaxReadSize
getSourceCharset, isDocumentMatched, loadFilterFromXML, saveFilterToXML, setSourceCharset
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public TextFilter()
public TextFilter(TextMatcher valueMatcher)
public TextFilter(TextMatcher valueMatcher, OnMatch onMatch)
public TextFilter(TextMatcher fieldMatcher, TextMatcher valueMatcher)
public TextFilter(TextMatcher fieldMatcher, TextMatcher valueMatcher, OnMatch onMatch)
public TextMatcher getValueMatcher()
public void setValueMatcher(TextMatcher valueMatcher)
valueMatcher
- text matcherpublic TextMatcher getFieldMatcher()
public void setFieldMatcher(TextMatcher fieldMatcher)
fieldMatcher
- text matcherprotected boolean isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
isStringContentMatching
in class AbstractStringFilter
ImporterHandlerException
protected void saveStringFilterToXML(XML xml)
AbstractStringFilter
saveStringFilterToXML
in class AbstractStringFilter
xml
- the XMLprotected void loadStringFilterFromXML(XML xml)
AbstractStringFilter
loadStringFilterFromXML
in class AbstractStringFilter
xml
- XML configurationpublic boolean equals(Object other)
equals
in class AbstractStringFilter
public int hashCode()
hashCode
in class AbstractStringFilter
public String toString()
toString
in class AbstractStringFilter
Copyright © 2009–2023 Norconex Inc.. All rights reserved.