Class ScriptFilter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.filter.AbstractDocumentFilter
-
- com.norconex.importer.handler.filter.AbstractCharStreamFilter
-
- com.norconex.importer.handler.filter.AbstractStringFilter
-
- com.norconex.importer.handler.filter.impl.ScriptFilter
-
- All Implemented Interfaces:
IXMLConfigurable
,IDocumentFilter
,IOnMatchFilter
,IImporterHandler
public class ScriptFilter extends AbstractStringFilter
Filter incoming documents using a scripting language. The default script engine is
JavaScript
.Refer to
ScriptRunner
for more information on using a scripting language with Norconex Importer.How to filter documents with scripting:
The following are variables made available to your script for each document:
- reference: Document unique reference as a string.
- content: Document content, as a string
(of
maxReadSize
length). - metadata: Document metadata as a
Properties
object. - parsed: Whether the document was already parsed, as a boolean.
- sectionIndex: Content section index if it had to be split, as an integer.
The expected return value from your script is a boolean indicating whether the document was matched or not.
XML configuration usage:
<handler class="com.norconex.importer.handler.filter.impl.ScriptFilter" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)" onMatch="[include|exclude]" engineName="(script engine name)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <script>(your script)</script> </handler>
Usage example:
JavaScript:
<handler class="ScriptFilter"> <script> <![CDATA[ var isAppleDoc = metadata.getString('fruit') == 'apple' || content.indexOf('Apple') > -1; /*return*/ isAppleDoc; ]]> </script> </handler>
Lua:
<handler class="ScriptFilter" engineName="lua"> <script> <![CDATA[ local isAppleDoc = metadata:getString('fruit') == 'apple' and content:find('Apple') ~= nil; return isAppleDoc; ]]> </script> </handler>
- Since:
- 2.4.0
- Author:
- Pascal Essiembre
- See Also:
ScriptRunner
-
-
Constructor Summary
Constructors Constructor Description ScriptFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getEngineName()
String
getScript()
int
hashCode()
protected boolean
isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
protected void
loadStringFilterFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringFilterToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setEngineName(String engineName)
void
setScript(String script)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractStringFilter
getMaxReadSize, isTextDocumentMatching, loadCharStreamFilterFromXML, saveCharStreamFilterToXML, setMaxReadSize
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractCharStreamFilter
getSourceCharset, isDocumentMatched, loadFilterFromXML, saveFilterToXML, setSourceCharset
-
Methods inherited from class com.norconex.importer.handler.filter.AbstractDocumentFilter
acceptDocument, getOnMatch, loadHandlerFromXML, saveHandlerToXML, setOnMatch
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getEngineName
public String getEngineName()
-
setEngineName
public void setEngineName(String engineName)
-
getScript
public String getScript()
-
setScript
public void setScript(String script)
-
isStringContentMatching
protected boolean isStringContentMatching(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
- Specified by:
isStringContentMatching
in classAbstractStringFilter
- Throws:
ImporterHandlerException
-
saveStringFilterToXML
protected void saveStringFilterToXML(XML xml)
Description copied from class:AbstractStringFilter
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringFilterToXML
in classAbstractStringFilter
- Parameters:
xml
- the XML
-
loadStringFilterFromXML
protected void loadStringFilterFromXML(XML xml)
Description copied from class:AbstractStringFilter
Loads configuration settings specific to the implementing class.- Specified by:
loadStringFilterFromXML
in classAbstractStringFilter
- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringFilter
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringFilter
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringFilter
-
-