Norconex Importer

Configuration

Importer Options

The following are the available XML configuration options. Click on an expandable tag to get relevant documentation and more configuration options.

<importer>
The Importer is reposible for extracting raw text out of any documents, in addition to transforming, decorating, and filtering content.
Maximum number of bytes used for memory caching of files being processed by the Importer. Only applicable when invoking the Importer from command-line or programmatically via Importer#importDocument(ImporterRequest).
Default:
1 GB
Maximum number of bytes used for memory caching of a single file being processed by the Importer. Only applicable when invoking the Importer from command-line or programmatically via Importer#importDocument(ImporterRequest).
Default:
100 MB
Directory where temporary files are written. Only applicable when invoking the Importer from command-line or programmatically via Importer#importDocument(ImporterRequest).
Default:
System temporary directory
Directory where file generating parsing errors are saved.
Default:
None (not saved)
One or a series of <handler> elements applied to imported documents in their original format BEFORE their parsing has occurred. Can be mixed with XML-based condition wrappers to create a processing "flow" (if, ifNot).
Repeatable
Used to conditionally execute one or more <handler>s if a condition (or group of conditions) returns true. Must contain exactly one of <conditions> or <condition> as a direct child element, followed by exactly one <then>, and optionally one <else>.
Used to group multiple <condition> or <conditions> together.
Repeatable
More documentation:
Refer to previously documented <condition> (under <if>) for available options.
Repeatable
More documentation:
Refer to previously documented <conditions> (under <if>) for available options.
Wrapper around handlers executed when the condition is met. Can also contain nested <if> and <ifNot>.
Repeatable
More documentation:
Refer to previously documented <handler> (under <preParseHandlers>) for available options.
Repeatable
More documentation:
Refer to previously documented <if> (under <preParseHandlers>) for available options.
Repeatable
More documentation:
Refer to previously documented <ifNot> (under <preParseHandlers>) for available options.
Wrapper around handlers executed when the condition is not met. Can also contain nested <if> and <ifNot>.
Repeatable
More documentation:
Refer to previously documented <handler> (under <preParseHandlers>) for available options.
Repeatable
More documentation:
Refer to previously documented <if> (under <preParseHandlers>) for available options.
Repeatable
More documentation:
Refer to previously documented <ifNot> (under <preParseHandlers>) for available options.
Repeatable
Used to conditionally execute one or more <handler>s if a condition (or group of conditions) returns false.
More documentation:
Supports the same options as <if>. Refer to previously documented <if> (under <preParseHandlers>) for available options.
Factory to select and configure document parsers to use for each content types encountered.
Default:
GenericDocumentParserFactory
One or a series of <handler> elements applied to imported documents AFTER their parsing has occurred and their raw text extracted.
More documentation:
Supports the same options as <preParseHandlers>. Refer to previously documented <preParseHandlers> (under <importer>) for available options.
<responseProcessor
class="..."
Required
>
Repeatable
One or more optional custom classes that processes an Importer response to modify it or perform other actions as required before it is returned.
</responseProcessor>
</importer>