When used with a Norconex Crawler,
you can use the following XML to configure
a SQL database in the <committer>
section of your
Norconex Crawler configuration:
<committer class="com.norconex.committer.sql.SQLCommitter"> <!-- Mandatory settings --> <driverClass>...</driverClass> <connectionUrl>...</connectionUrl> <tableName>...</tableName> <!-- Other settings --> <driverPath>...</driverPath> <properties> <property key="...">...</property> ... </properties> <createTableSQL>...</createTableSQL> <createFieldSQL>...</createFieldSQL> <multiValuesJoiner>...</multiValuesJoiner> <fixFieldNames>[false|true]</fixFieldNames> <fixFieldValues>[false|true]</fixFieldValues> <!-- Use the following if authentication is required. --> <username>..</username> <password>...</password> <!-- Use the following if password is encrypted. --> <passwordKey>...</passwordKey> <passwordKeySource>[key|file|environment|property]</passwordKeySource> <sourceReferenceField keep="[false|true]">...</sourceReferenceField> <targetReferenceField>...</targetReferenceField> <sourceContentField keep="[false|true]">...</sourceContentField> <targetContentField>...</targetContentField> <queueDir>...</queueDir> <queueSize>...</queueSize> <commitBatchSize>...</commitBatchSize> <maxRetries>...</maxRetries> <maxRetryWait>...</maxRetryWait> </committer>
Tag descriptions:
Tag | Description |
---|---|
driverClass | Class name of the JDBC driver to use. |
connectionUrl | JDBC connection URL. |
tableName | The target database table name where documents will be committed. |
driverPath | Path to JDBC driver. Not required if already in classpath. |
properties | Key/value database properties. |
createTableSQL |
Optional CREATE statement used to create a table if it does not
already exist. Default assumes the database table already exists.
The following variables are expected
and will be replaced with the configuration options of the same name:
${tableName} ,
${targetReferenceField} and
${targetContentField} .
Example:
CREATE TABLE ${tableName} ( ${targetReferenceField} VARCHAR(32672) NOT NULL, ${targetContentField} CLOB, PRIMARY KEY ( ${targetReferenceField} ), title VARCHAR(256) ) |
createFieldSQL |
Optional ALTER statement used to create missing table
fields. Default assumes all database fields are already present.
The ${tableName} variable and will be replaced with
the configuration option of the same name. The ${fieldName}
variable will be replaced by newly encountered field names.
Example:
ALTER TABLE ${tableName} ADD ${fieldName} VARCHAR(32672) |
multiValuesJoiner | One or more characters to join multi-value fields. Default is a vertical bar ("|"). |
fixFieldNames |
Set to true to attempt to prevent insertion errors by
converting characters that are not underscores or alphanumeric to
underscores. Will also remove all non alphabetic characters that begins
a field name.
|
fixFieldValues |
Set to true to attemp to prevent insertion errors by
truncating values that are larger than their defined maximum field
length.
|
username | database user name. |
password | Database password. |
passwordKey | Reference to password key (or actual key) for encrypted passwords. See the API Documentation for encryption instructions. |
passwordKeySource | Source of password key for encrypted passwords. See the API Documentation for encryption instructions. |
sourceReferenceField |
Name of source field that will be mapped to the SQL id field.
Default is the document reference the Committer stores as
document.reference . The metadata source field is deleted,
unless keep is set to true .
|
targetReferenceField | Name of target id field. Default is id . |
sourceContentField |
Source field name for a document content/body. Default is not a field,
but rather the document body content. Once re-mapped, the metadata
source field is deleted, unless keep is set to
true .
|
targetContentField |
Target field name for a document content/body. Default is:
content .
|
queueDir |
Optional path where to queue files before sending them to SQL.
Default is: ./committer-queue .
|
queueSize |
Optional maximum queue size before sending document to SQL.
Default is: 1000 .
|
commitBatchSize |
Optional maximum of documents to send to SQL at once.
Default is: 100 . Maximum is 1000.
|
maxRetries | Maximum retries upon commit failures. Default is 0 (no retry). |
maxRetryWait | Maximum delay (millisecond) between retries. Default is 0 (no delay). |