Class ScriptTransformer

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentTransformer

    public class ScriptTransformer
    extends AbstractStringTransformer
    implements IXMLConfigurable

    Transform incoming documents using a scripting language. The default script engine is JavaScript.

    Refer to ScriptRunner for more information on using a scripting language with Norconex Importer.

    How to transform documents with scripting:

    The following are variables made available to your script for each document:

    • reference: Document unique reference as a string.
    • content: Document content, as a string (of maxReadSize length).
    • metadata: Document metadata as an Properties object.
    • parsed: Whether the document was already parsed, as a boolean.
    • sectionIndex: Content section index if it had to be split, as an integer.

    The expected return value from your script is a string holding the modified content.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.transformer.impl.ScriptTransformer"
        engineName="(script engine name)"
        maxReadSize="(max characters to read at once)"
        sourceCharset="(character encoding)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <script>(your script)</script>
    </handler>

    Usage example:

    The following example replaces all occurences of "Alice" with "Roger" in a document content.

    JavaScript:
    
    <handler
        class="ScriptTransformer">
      <script>
        <![CDATA[
           modifiedContent = content.replace(/Alice/g, 'Roger');
           /&#42;return&#42;/ modifiedContent;
       ]]>
      </script>
    </handler>
    Lua:
    
    <handler
        class="ScriptTransformer"
        engineName="lua">
      <script>
        <![CDATA[
           modifiedContent = content:gsub('Alice', 'Roger');
           return modifiedContent;
       ]]>
      </script>
    </handler>
    Since:
    2.4.0
    Author:
    Pascal Essiembre
    See Also:
    ScriptRunner