Class TextFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
public class TextFilter extends AbstractStringFilter
Filters a document based on a text pattern in a document content (default), or matching fields specified. When used on very large content, it is possible the pattern matching will be done in chunks, sometimes not achieving expected results. Consider using
AbstractCharStreamFilter
if this is a concern. Refer toAbstractDocumentFilter
for the inclusion/exclusion logic.XML configuration usage:
<handler class="com.norconex.importer.handler.filter.impl.RegexContentFilter" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)" onMatch="[include|exclude]"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <fieldMatcher> (Optional expression of field to match. Omit to use document content.) </fieldMatcher> <valueMatcher>(expression of value to match)</valueMatcher> </handler>
XML usage example:
<handler class="TextFilter" onMatch="include"> <valueMatcher>apple</valueMatcher> </handler>
- Since:
- 3.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description TextFilter()
TextFilter(TextMatcher valueMatcher)
TextFilter(TextMatcher fieldMatcher, TextMatcher valueMatcher)
TextFilter(TextMatcher fieldMatcher, TextMatcher valueMatcher, OnMatch onMatch)
TextFilter(TextMatcher valueMatcher, OnMatch onMatch)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
TextMatcher
getFieldMatcher()
Gets the text matcher of field names.TextMatcher
getValueMatcher()
Gets the text matcher for field values.int
hashCode()
protected boolean
isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
protected void
loadStringFilterFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringFilterToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setFieldMatcher(TextMatcher fieldMatcher)
Sets the text matcher of field names.void
setValueMatcher(TextMatcher valueMatcher)
Sets the text matcher for field values.String
toString()
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractStringFilter
getMaxReadSize, isTextDocumentMatching, loadCharStreamFilterFromXML, saveCharStreamFilterToXML, setMaxReadSize
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractCharStreamFilter
getSourceCharset, isDocumentMatched, loadFilterFromXML, saveFilterToXML, setSourceCharset
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractDocumentFilter
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Constructor Detail
-
TextFilter
public TextFilter()
-
TextFilter
public TextFilter(TextMatcher valueMatcher)
-
TextFilter
public TextFilter(TextMatcher valueMatcher, OnMatch onMatch)
-
TextFilter
public TextFilter(TextMatcher fieldMatcher, TextMatcher valueMatcher)
-
TextFilter
public TextFilter(TextMatcher fieldMatcher, TextMatcher valueMatcher, OnMatch onMatch)
-
-
Method Detail
-
getValueMatcher
public TextMatcher getValueMatcher()
Gets the text matcher for field values.- Returns:
- text matcher
-
setValueMatcher
public void setValueMatcher(TextMatcher valueMatcher)
Sets the text matcher for field values. Copies it.- Parameters:
valueMatcher
- text matcher
-
getFieldMatcher
public TextMatcher getFieldMatcher()
Gets the text matcher of field names.- Returns:
- field matcher
-
setFieldMatcher
public void setFieldMatcher(TextMatcher fieldMatcher)
Sets the text matcher of field names. Copies it.- Parameters:
fieldMatcher
- text matcher
-
isStringContentMatching
protected boolean isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
- Specified by:
isStringContentMatching
in classAbstractStringFilter
- Throws:
ImporterHandlerException
-
saveStringFilterToXML
protected void saveStringFilterToXML(XML xml)
Description copied from class:AbstractStringFilter
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringFilterToXML
in classAbstractStringFilter
- Parameters:
xml
- the XML
-
loadStringFilterFromXML
protected void loadStringFilterFromXML(XML xml)
Description copied from class:AbstractStringFilter
Loads configuration settings specific to the implementing class.- Specified by:
loadStringFilterFromXML
in classAbstractStringFilter
- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringFilter
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringFilter
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringFilter
-
-