Class RegexContentFilter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.filter.AbstractDocumentFilter
-
- com.norconex.importer.handler.filter.AbstractCharStreamFilter
-
- com.norconex.importer.handler.filter.AbstractStringFilter
-
- com.norconex.importer.handler.filter.impl.RegexContentFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
@Deprecated public class RegexContentFilter extends AbstractStringFilter
Deprecated.Since 3.0.0, useTextFilter
instead.Filters a document based on a pattern matching in its content. Based on document size, it is possible the pattern matching will be done in chunks, sometimes not achieving expected results. Consider using
AbstractCharStreamFilter
if this is a concern. Refer toAbstractDocumentFilter
for the inclusion/exclusion logic.Since 2.2.0, the following regular expression flags are always active:
Pattern.MULTILINE
andPattern.DOTALL
.XML configuration usage:
<handler class="com.norconex.importer.handler.filter.impl.RegexContentFilter" onMatch="[include|exclude]" caseSensitive="[false|true]" sourceCharset="(character encoding)" maxReadSize="(max characters to read at once)" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <regex>(regular expression of value to match)</regex> </handler>
Usage example:
This example will accept only documents containing word "apple".
<handler class="RegexContentFilter" onMatch="include"> <regex>.*apple.*</regex> </handler>
- Since:
- 2.0.0
- Author:
- Pascal Essiembre
-
-
Constructor Summary
Constructors Constructor Description RegexContentFilter()
Deprecated.RegexContentFilter(String regex)
Deprecated.RegexContentFilter(String regex, OnMatch onMatch)
Deprecated.RegexContentFilter(String regex, OnMatch onMatch, boolean caseSensitive)
Deprecated.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
equals(Object other)
Deprecated.String
getRegex()
Deprecated.int
hashCode()
Deprecated.boolean
isCaseSensitive()
Deprecated.protected boolean
isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
Deprecated.protected void
loadStringFilterFromXML(XML xml)
Deprecated.Loads configuration settings specific to the implementing class.protected void
saveStringFilterToXML(XML xml)
Deprecated.Saves configuration settings specific to the implementing class.void
setCaseSensitive(boolean caseSensitive)
Deprecated.void
setRegex(String regex)
Deprecated.String
toString()
Deprecated.-
Methods inherited from class com.norconex.importer.handler.filter.AbstractStringFilter
getMaxReadSize, isTextDocumentMatching, loadCharStreamFilterFromXML, saveCharStreamFilterToXML, setMaxReadSize
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractCharStreamFilter
getSourceCharset, isDocumentMatched, loadFilterFromXML, saveFilterToXML, setSourceCharset
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractDocumentFilter
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Constructor Detail
-
RegexContentFilter
public RegexContentFilter()
Deprecated.
-
RegexContentFilter
public RegexContentFilter(String regex)
Deprecated.
-
-
Method Detail
-
getRegex
public String getRegex()
Deprecated.
-
setRegex
public final void setRegex(String regex)
Deprecated.
-
isCaseSensitive
public boolean isCaseSensitive()
Deprecated.
-
setCaseSensitive
public void setCaseSensitive(boolean caseSensitive)
Deprecated.
-
isStringContentMatching
protected boolean isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
Deprecated.- Specified by:
isStringContentMatching
in classAbstractStringFilter
- Throws:
ImporterHandlerException
-
saveStringFilterToXML
protected void saveStringFilterToXML(XML xml)
Deprecated.Description copied from class:AbstractStringFilter
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringFilterToXML
in classAbstractStringFilter
- Parameters:
xml
- the XML
-
loadStringFilterFromXML
protected void loadStringFilterFromXML(XML xml)
Deprecated.Description copied from class:AbstractStringFilter
Loads configuration settings specific to the implementing class.- Specified by:
loadStringFilterFromXML
in classAbstractStringFilter
- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
Deprecated.- Overrides:
equals
in classAbstractStringFilter
-
hashCode
public int hashCode()
Deprecated.- Overrides:
hashCode
in classAbstractStringFilter
-
toString
public String toString()
Deprecated.- Overrides:
toString
in classAbstractStringFilter
-
-