When used with a Norconex Crawler,
you can use the following XML to configure
Microfocus IDOL as the <committer>
section of your
Norconex Crawler configuration:
<committer class="com.norconex.committer.idol.IdolCommitter"> <!-- To commit documents to IDOL or DIH: --> <host>(IDOL/DIH host name or IP)</host> <indexPort>(IDOL/DIH index port)</indexPort> <databaseName>(Optional IDOL Database Name where to store documents)</databaseName> <dreAddDataParams> <param name="(parameter name)">(parameter value)</param> </dreAddDataParams> <dreDeleteRefParams> <param name="(parameter name)">(parameter value)</param> </dreDeleteRefParams> <!-- To commit documents to CFS: --> <host>(CFS host name or IP)</host> <cfsPort>(CFS Server/Ingest port)</cfsPort> <!-- Common settings: --> <sourceReferenceField keep="[false|true]"> (Optional name of field that contains the document reference, when the default document reference is not used. The reference value will be mapped to the IDOL "DREREFERENCE" field, or the "targetReferenceField" specified. Once re-mapped, this metadata source field is deleted, unless "keep" is set to true.) </sourceReferenceField> <targetReferenceField> (Optional name of IDOL target field where to store the source reference. If not specified, default is "DREREFERENCE".) </targetReferenceField> <sourceContentField keep="[false|true]"> (If you wish to use a metadata field to act as the document "content", you can specify that field here. Default does not take a metadata field but rather the document content. Once re-mapped, the metadata source field is deleted, unless "keep" is set to true.) </sourceContentField> <targetContentField> (IDOL target field name for a document content/body. Default is: DRECONTENT) </targetContentField> <commitBatchSize> (max number of docs to send IDOL at once) </commitBatchSize> <queueDir>(optional path where to queue files)</queueDir> <queueSize>(max queue size before committing)</queueSize> <maxRetries>(max retries upon commit failures)</maxRetries> <maxRetryWait>(max delay between retries)</maxRetryWait> </committer>
Tag descriptions:
Tag | Description |
---|---|
host | IDOL Server or DIH host name. |
indexPort | IDOL indexing port. Only one of indexPort or cfsPort can be specified. |
cfsPort | CFS port. Only one of indexPort or cfsPort can be specified. |
databaseName | Optional IDOL Databse Name where to store documents. |
dreAddDataParams/param | IDOL URL parameter to be appended to DREADDDATA requests. |
dreDeleteRefParams/param | IDOL URL parameter to be appended to DREDELETEREF requests. |
sourceReferenceField | Optional name of source field that will be mapped to the IDOL
"DREREFERENCE" field or whatever "targetReferenceField" specified.
Defaults to document.reference . |
targetReferenceField | Optional name of IDOL target field where to store a document unique identifier (sourceReferenceField). Defaults to "DREREFERENCE". |
sourceContentField | Optional metadata field to act as document "content". Default takes the content document itself. Once re-mapped, the metadata source field is deleted, unless "keep" is set to true. |
targetContentField | Optional IDOL target field name for a document content/body. Default to "DRECONTENT". |
queueDir | Optional path where to queue files. |
queueSize | Max queue size before committing. |
commitBatchSize | Maximum number of document addition/deletion commands to send at once to IDOL. |
maxRetries | Maximum retries upon commit failures. Default is 0 (no retry). |
maxRetryWait | Maximum delay (millisecond) between retries. Default is 0 (no delay). |