Class ScriptTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
-
- com.norconex.importer.handler.transformer.AbstractStringTransformer
-
- com.norconex.importer.handler.transformer.impl.ScriptTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
public class ScriptTransformer extends AbstractStringTransformer implements IXMLConfigurable
Transform incoming documents using a scripting language. The default script engine is
JavaScript
.Refer to
ScriptRunner
for more information on using a scripting language with Norconex Importer.How to transform documents with scripting:
The following are variables made available to your script for each document:
- reference: Document unique reference as a string.
- content: Document content, as a string
(of
maxReadSize
length). - metadata: Document metadata as an
Properties
object. - parsed: Whether the document was already parsed, as a boolean.
- sectionIndex: Content section index if it had to be split, as an integer.
The expected return value from your script is a string holding the modified content.
XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.ScriptTransformer" engineName="(script engine name)" maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <script>(your script)</script> </handler>
Usage example:
The following example replaces all occurences of "Alice" with "Roger" in a document content.
JavaScript:
<handler class="ScriptTransformer"> <script> <![CDATA[ modifiedContent = content.replace(/Alice/g, 'Roger'); /*return*/ modifiedContent; ]]> </script> </handler>
Lua:
<handler class="ScriptTransformer" engineName="lua"> <script> <![CDATA[ modifiedContent = content:gsub('Alice', 'Roger'); return modifiedContent; ]]> </script> </handler>
- Since:
- 2.4.0
- Author:
- Pascal Essiembre
- See Also:
ScriptRunner
-
-
Constructor Summary
Constructors Constructor Description ScriptTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object other)
String
getEngineName()
String
getScript()
int
hashCode()
protected void
loadStringTransformerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveStringTransformerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setEngineName(String engineName)
void
setScript(String script)
String
toString()
protected void
transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractStringTransformer
getMaxReadSize, loadCharStreamTransformerFromXML, saveCharStreamTransformerToXML, setMaxReadSize, transformTextDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractCharStreamTransformer
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML
-
-
-
-
Method Detail
-
getEngineName
public String getEngineName()
-
setEngineName
public void setEngineName(String engineName)
-
getScript
public String getScript()
-
setScript
public void setScript(String script)
-
transformStringContent
protected void transformStringContent(HandlerDoc doc, StringBuilder content, ParseState parseState, int sectionIndex) throws ImporterHandlerException
- Specified by:
transformStringContent
in classAbstractStringTransformer
- Throws:
ImporterHandlerException
-
saveStringTransformerToXML
protected void saveStringTransformerToXML(XML xml)
Description copied from class:AbstractStringTransformer
Saves configuration settings specific to the implementing class. The parent tag along with the "class" attribute are already written. Implementors must not close the writer.- Specified by:
saveStringTransformerToXML
in classAbstractStringTransformer
- Parameters:
xml
- the XML
-
loadStringTransformerFromXML
protected void loadStringTransformerFromXML(XML xml)
Description copied from class:AbstractStringTransformer
Loads configuration settings specific to the implementing class.- Specified by:
loadStringTransformerFromXML
in classAbstractStringTransformer
- Parameters:
xml
- XML configuration
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractStringTransformer
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractStringTransformer
-
toString
public String toString()
- Overrides:
toString
in classAbstractStringTransformer
-
-