Class ScriptTagger
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.tagger.AbstractDocumentTagger
-
- com.norconex.importer.handler.tagger.AbstractCharStreamTagger
-
- com.norconex.importer.handler.tagger.AbstractStringTagger
-
- com.norconex.importer.handler.tagger.impl.ScriptTagger
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTagger
public class ScriptTagger extends AbstractStringTagger
Tag incoming documents using a scripting language. The default script engine is
JavaScript.Refer to
ScriptRunnerfor more information on using a scripting language with Norconex Importer.How to tag documents with scripting:
The following are variables made available to your script for each document:
- reference: Document unique reference as a string.
- content: Document content, as a string
(of
maxReadSizelength). - metadata: Document metadata as an
Propertiesobject. - parsed: Whether the document was already parsed, as a boolean.
- sectionIndex: Content section index if it had to be split, as an integer.
There are no expected return value from your script. Returning one has no effect.
XML configuration usage:
<handler class="com.norconex.importer.handler.tagger.impl.ScriptTagger" engineName="(script engine name)" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <script>(your script)</script> </handler>Usage example:
The following examples add new metadata field indicating which fruit is a document about.
JavaScript:
<handler class="ScriptTagger"> <script> <![CDATA[ metadata.add('fruit', 'apple'); ]]> </script> </handler>Lua:
<handler class="ScriptTagger" engineName="lua"> <script> <![CDATA[ metadata:addString('fruit', {'apple'}); ]]> </script> </handler>- Since:
- 2.4.0
- Author:
- Pascal Essiembre
- See Also:
ScriptRunner
-
-
Constructor Summary
Constructors Constructor Description ScriptTagger()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object other)StringgetEngineName()StringgetScript()inthashCode()protected voidloadStringTaggerFromXML(XML xml)Loads configuration settings specific to the implementing class.protected voidsaveStringTaggerToXML(XML xml)Saves configuration settings specific to the implementing class.voidsetEngineName(String engineName)voidsetScript(String script)protected voidtagStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)StringtoString()-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractStringTagger
getMaxReadSize, loadCharStreamTaggerFromXML, saveCharStreamTaggerToXML, setMaxReadSize, tagTextDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractCharStreamTagger
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.tagger.AbstractDocumentTagger
tagDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getEngineName
public String getEngineName()
-
setEngineName
public void setEngineName(String engineName)
-
getScript
public String getScript()
-
setScript
public void setScript(String script)
-
tagStringContent
protected void tagStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
- Specified by:
tagStringContentin classAbstractStringTagger- Throws:
ImporterHandlerException
-
saveStringTaggerToXML
protected void saveStringTaggerToXML(XML xml)
Description copied from class:AbstractStringTaggerSaves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTaggerToXMLin classAbstractStringTagger- Parameters:
xml- the XML
-
loadStringTaggerFromXML
protected void loadStringTaggerFromXML(XML xml)
Description copied from class:AbstractStringTaggerLoads configuration settings specific to the implementing class.- Specified by:
loadStringTaggerFromXMLin classAbstractStringTagger- Parameters:
xml- xml configuration
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractStringTagger
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractStringTagger
-
toString
public String toString()
- Overrides:
toStringin classAbstractStringTagger
-
-