Class ScriptTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- com.norconex.importer.handler.tagger.AbstractStringTagger
-
- com.norconex.importer.handler.tagger.impl.ScriptTagger
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTagger
public class ScriptTagger extends AbstractStringTagger
Tag incoming documents using a scripting language. The default script engine is
JavaScript
.Refer to
ScriptRunner
for more information on using a scripting language with Norconex Importer.How to tag documents with scripting:
The following are variables made available to your script for each document:
- reference: Document unique reference as a string.
- content: Document content, as a string
(of
maxReadSize
length). - metadata: Document metadata as an
Properties
object. - parsed: Whether the document was already parsed, as a boolean.
- sectionIndex: Content section index if it had to be split, as an integer.
There are no expected return value from your script. Returning one has no effect.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.ScriptTagger" engineName="(script engine name)" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <script>(your script)</script> </handler>
Usage example:
The following examples add new metadata field indicating which fruit is a document about.
JavaScript:
<handler class="ScriptTagger"> <script> <![CDATA[ metadata.add('fruit', 'apple'); ]]> </script> </handler>
Lua:
<handler class="ScriptTagger" engineName="lua"> <script> <![CDATA[ metadata:addString('fruit', {'apple'}); ]]> </script> </handler>
- Since:
- 2.4.0
- Author:
- Pascal Essiembre
- See Also:
ScriptRunner
-
-
Constructor Summary
Constructors Constructor Description ScriptTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getEngineName()
String
getScript()
int
hashCode()
protected void
loadStringTaggerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringTaggerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setEngineName(String engineName)
void
setScript(String script)
protected void
tagStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
String
toString()
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractStringTagger
getMaxReadSize, loadCharStreamTaggerFromXML, saveCharStreamTaggerToXML, setMaxReadSize, tagTextDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractCharStreamTagger
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getEngineName
public String getEngineName()
-
setEngineName
public void setEngineName(String engineName)
-
getScript
public String getScript()
-
setScript
public void setScript(String script)
-
tagStringContent
protected void tagStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
- Specified by:
tagStringContent
in classAbstractStringTagger
- Throws:
ImporterHandlerException
-
saveStringTaggerToXML
protected void saveStringTaggerToXML(XML xml)
Description copied from class:AbstractStringTagger
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTaggerToXML
in classAbstractStringTagger
- Parameters:
xml
- the XML
-
loadStringTaggerFromXML
protected void loadStringTaggerFromXML(XML xml)
Description copied from class:AbstractStringTagger
Loads configuration settings specific to the implementing class.- Specified by:
loadStringTaggerFromXML
in classAbstractStringTagger
- Parameters:
xml
- xml configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringTagger
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringTagger
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringTagger
-
-