When used with a Norconex Crawler,
you can use the following XML to configure
a SQL database in the <committer>
section of your
Norconex Crawler configuration:
<committer class="com.norconex.committer.sql.SQLCommitter"> <!-- Mandatory settings --> <driverClass>...</driverClass> <connectionUrl>...</connectionUrl> <tableName>...</tableName> <!-- Other settings --> <driverPath>...</driverPath> <properties> <property key="...">...</property> ... </properties> <toUppercase>[false|true]</toUppercase> <createMissing>[false|true]</createMissing> <createTableSQL>...</createTableSQL> <multiValuesJoiner>...</multiValuesJoiner> <!-- Use the following if authentication is required. --> <username>..</username> <password>...</password> <!-- Use the following if password is encrypted. --> <passwordKey>...</passwordKey> <passwordKeySource>[key|file|environment|property]</passwordKeySource> <sourceReferenceField keep="[false|true]">...</sourceReferenceField> <targetReferenceField>...</targetReferenceField> <sourceContentField keep="[false|true]">...</sourceContentField> <targetContentField>...</targetContentField> <queueDir>...</queueDir> <queueSize>...</queueSize> <commitBatchSize>...</commitBatchSize> <maxRetries>...</maxRetries> <maxRetryWait>...</maxRetryWait> </committer>
Tag descriptions:
Tag | Description |
---|---|
driverClass | Class name of the JDBC driver to use. |
connectionUrl | JDBC connection URL. |
tableName | The target database table name where documents will be committed. |
driverPath | Path to JDBC driver. Not required if already in classpath. |
properties | Key/value database properties. |
toUppercase |
Default will send all table and field names as lowercase.
Set to true to send as uppercase.
Default is false .
|
createMissing |
Create missing table if not found. Default is false .
|
createTableSQL | Optional SQL for creating missing table if not found. Default uses a predefined SQL. |
multiValuesJoiner | One or more characters to join multi-value fields. Default is a vertical bar ("|"). |
username | database user name. |
password | Database password. |
passwordKey | Reference to password key (or actual key) for encrypted passwords. |
passwordKeySource | Source of password key for encrypted passwords. |
sourceReferenceField |
Name of source field that will be mapped to the SQL id field.
Default is the document reference the Committer stores as
document.reference . The metadata source field is deleted,
unless keep is set to true .
|
targetReferenceField | Name of target id field. Default is id . |
sourceContentField |
Source field name for a document content/body. Default is not a field,
but rather the document body content. Once re-mapped, the metadata
source field is deleted, unless keep is set to
true .
|
targetContentField |
Target field name for a document content/body. Default is:
content .
|
queueDir |
Optional path where to queue files before sending them to SQL.
Default is: ./committer-queue .
|
queueSize |
Optional maximum queue size before sending document to SQL.
Default is: 1000 .
|
commitBatchSize |
Optional maximum of documents to send to SQL at once.
Default is: 100 . Maximum is 1000.
|
maxRetries | Maximum retries upon commit failures. Default is 0 (no retry). |
maxRetryWait | Maximum delay (millisecond) between retries. Default is 0 (no delay). |