Norconex SQL Committer

Configuration

When used with a Norconex Crawler, you can use the following XML to configure a SQL database in the <committer> section of your Norconex Crawler configuration:

<committer class="com.norconex.committer.sql.SQLCommitter">
    <!-- Mandatory settings -->
    <driverClass>...</driverClass>
    <connectionUrl>...</connectionUrl>
    <tableName>...</tableName>
 
    <!-- Other settings -->
    <driverPath>...</driverPath>
    <properties>
        <property key="...">...</property>
        ...
    </properties>
    <createTableSQL>...</createTableSQL>
    <createFieldSQL>...</createFieldSQL>
    <multiValuesJoiner>...</multiValuesJoiner>
 
    <fixFieldNames>[false|true]</fixFieldNames>
    <fixFieldValues>[false|true]</fixFieldValues>
 
    <!-- Use the following if authentication is required. -->
    <username>..</username>
    <password>...</password>
 
    <!-- Use the following if password is encrypted. -->
    <passwordKey>...</passwordKey>
    <passwordKeySource>[key|file|environment|property]</passwordKeySource>
 
    <sourceReferenceField keep="[false|true]">...</sourceReferenceField>
    <targetReferenceField>...</targetReferenceField>
    <sourceContentField keep="[false|true]">...</sourceContentField>
    <targetContentField>...</targetContentField>
    <queueDir>...</queueDir>
    <queueSize>...</queueSize>
    <commitBatchSize>...</commitBatchSize>
    <maxRetries>...</maxRetries>
    <maxRetryWait>...</maxRetryWait>
</committer>

Tag descriptions:

Tag Description
driverClass Class name of the JDBC driver to use.
connectionUrl JDBC connection URL.
tableName The target database table name where documents will be committed.
driverPath Path to JDBC driver. Not required if already in classpath.
properties Key/value database properties.
createTableSQL Optional CREATE statement used to create a table if it does not already exist. Default assumes the database table already exists. The following variables are expected and will be replaced with the configuration options of the same name: ${tableName}, ${targetReferenceField} and ${targetContentField}. Example:
  CREATE TABLE ${tableName} (
      ${targetReferenceField} VARCHAR(32672) NOT NULL, 
      ${targetContentField}  CLOB, 
      PRIMARY KEY ( ${targetReferenceField} ),
      title VARCHAR(256)
  )
createFieldSQL Optional ALTER statement used to create missing table fields. Default assumes all database fields are already present. The ${tableName} variable and will be replaced with the configuration option of the same name. The ${fieldName} variable will be replaced by newly encountered field names. Example:
  ALTER TABLE ${tableName} ADD ${fieldName} VARCHAR(32672)     
multiValuesJoiner One or more characters to join multi-value fields. Default is a vertical bar ("|").
fixFieldNames Set to true to attempt to prevent insertion errors by converting characters that are not underscores or alphanumeric to underscores. Will also remove all non alphabetic characters that begins a field name.
fixFieldValues Set to true to attemp to prevent insertion errors by truncating values that are larger than their defined maximum field length.
username database user name.
password Database password.
passwordKey Reference to password key (or actual key) for encrypted passwords. See the API Documentation for encryption instructions.
passwordKeySource Source of password key for encrypted passwords. See the API Documentation for encryption instructions.
sourceReferenceField Name of source field that will be mapped to the SQL id field. Default is the document reference the Committer stores as document.reference. The metadata source field is deleted, unless keep is set to true.
targetReferenceField Name of target id field. Default is id.
sourceContentField Source field name for a document content/body. Default is not a field, but rather the document body content. Once re-mapped, the metadata source field is deleted, unless keep is set to true.
targetContentField Target field name for a document content/body. Default is: content.
queueDir Optional path where to queue files before sending them to SQL. Default is: ./committer-queue.
queueSize Optional maximum queue size before sending document to SQL. Default is: 1000.
commitBatchSize Optional maximum of documents to send to SQL at once. Default is: 100. Maximum is 1000.
maxRetries Maximum retries upon commit failures. Default is 0 (no retry).
maxRetryWait Maximum delay (millisecond) between retries. Default is 0 (no delay).