Class ScriptTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- com.norconex.importer.handler.transformer.AbstractStringTransformer
-
- com.norconex.importer.handler.transformer.impl.ScriptTransformer
-
- All Implemented Interfaces:
IXMLConfigurable,IImporterHandler,IDocumentTransformer
public class ScriptTransformer extends AbstractStringTransformer implements IXMLConfigurable
Transform incoming documents using a scripting language. The default script engine is
JavaScript.Refer to
ScriptRunnerfor more information on using a scripting language with Norconex Importer.How to transform documents with scripting:
The following are variables made available to your script for each document:
- reference: Document unique reference as a string.
- content: Document content, as a string
(of
maxReadSizelength). - metadata: Document metadata as an
Propertiesobject. - parsed: Whether the document was already parsed, as a boolean.
- sectionIndex: Content section index if it had to be split, as an integer.
The expected return value from your script is a string holding the modified content.
XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.ScriptTransformer" engineName="(script engine name)" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <script>(your script)</script> </handler>Usage example:
The following example replaces all occurences of "Alice" with "Roger" in a document content.
JavaScript:
<handler class="ScriptTransformer"> <script> <![CDATA[ modifiedContent = content.replace(/Alice/g, 'Roger'); /*return*/ modifiedContent; ]]> </script> </handler>Lua:
<handler class="ScriptTransformer" engineName="lua"> <script> <![CDATA[ modifiedContent = content:gsub('Alice', 'Roger'); return modifiedContent; ]]> </script> </handler>- Since:
- 2.4.0
- Author:
- Pascal Essiembre
- See Also:
ScriptRunner
-
-
Constructor Summary
Constructors Constructor Description ScriptTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object other)StringgetEngineName()StringgetScript()inthashCode()protected voidloadStringTransformerFromXML(XML xml)Loads configuration settings specific to the implementing class.protected voidsaveStringTransformerToXML(XML xml)Saves configuration settings specific to the implementing class.voidsetEngineName(String engineName)voidsetScript(String script)StringtoString()protected voidtransformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractStringTransformer
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML
-
-
-
-
Method Detail
-
getEngineName
public String getEngineName()
-
setEngineName
public void setEngineName(String engineName)
-
getScript
public String getScript()
-
setScript
public void setScript(String script)
-
transformStringContent
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
- Specified by:
transformStringContentin classAbstractStringTransformer- Throws:
ImporterHandlerException
-
saveStringTransformerToXML
protected void saveStringTransformerToXML(XML xml)
Description copied from class:AbstractStringTransformerSaves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTransformerToXMLin classAbstractStringTransformer- Parameters:
xml- the XML
-
loadStringTransformerFromXML
protected void loadStringTransformerFromXML(XML xml)
Description copied from class:AbstractStringTransformerLoads configuration settings specific to the implementing class.- Specified by:
loadStringTransformerFromXMLin classAbstractStringTransformer- Parameters:
xml- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equalsin classAbstractStringTransformer
-
hashCode
public int hashCode()
- Overrides:
hashCodein classAbstractStringTransformer
-
toString
public String toString()
- Overrides:
toStringin classAbstractStringTransformer
-
-