Norconex Azure Search Committer

Configuration

When used with a Norconex Crawler, you can use the following XML to configure Azure Search as the <committer> section of your Norconex Crawler configuration:

<committer class="com.norconex.committer.azuresearch.AzureSearchCommitter">
    <endpoint>...</endpoint>
    <apiVersion>...</apiVersion>
    <apiKey>...</apiKey>
    <indexName>...</indexName>
    <disableReferenceEncoding>[false|true]</disableReferenceEncoding>
    <ignoreValidationErrors>[false|true]</ignoreValidationErrors>
    <ignoreResponseErrors>[false|true]</ignoreResponseErrors>
    <useWindowsAuth>[false|true]</useWindowsAuth>
    <proxyHost>...</proxyHost>
    <proxyPort>...</proxyPort>
    <proxyRealm>...</proxyRealm>
    <proxyScheme>...</proxyScheme>
    <proxyUsername>...</proxyUsername>
    <proxyPassword>...</proxyPassword>
    <proxyPasswordKey>...</proxyPasswordKey>
    <proxyPasswordKeySource>[key|file|environment|property]</proxyPasswordKeySource>
    <sourceReferenceField keep="[false|true]">...</sourceReferenceField>
    <targetReferenceField>...</targetReferenceField>
    <sourceContentField keep="[false|true]">...</sourceContentField>
    <targetContentField>...</targetContentField>
    <queueDir>...</queueDir>
    <queueSize>...</queueSize>
    <commitBatchSize>...</commitBatchSize>
    <maxRetries>...</maxRetries>
    <maxRetryWait>...</maxRetryWait>
</committer>

Tag descriptions:

Tag Description
endpoint Azure Search endpoint (https://[service name].search.windows.net).
indexName Index name to use when committing documents to Azure Search.
apiKey Azure Search API admin key.
apiVersion Optional Azure Search API version to use.
disableReferenceEncoding Disable URL-safe Base64 encoding of document references. Default is false.
ignoreValidationErrors Ignoring validation errors will log errors detected by the committer instead of throwing exceptions. Default is false.
ignoreResponseErrors Ignoring response errors will log errors returned by Azure Search instead of throwing exceptions. Default is false.
useWindowsAuth Whether to use Windows Authentication. Default is false.
proxyHost Optional proxy host.
proxyPort Optional proxy port.
proxyRealm Optional proxy realm.
proxyScheme Optional proxy scheme.
proxyUsername Optional proxy username.
proxyPassword Optional proxy password.
proxyPasswordKey Optional proxy password key if password is encrypted. Refer to the API Documentation for more details.
proxyPasswordKeySource Optional password encryption key source. One of key, file, environment, or property. Refer to the API Documentation for more details.
sourceReferenceField Name of source field that will be mapped to the Azure Search id field. Default is the document reference the Committer stores as document.reference. The metadata source field is deleted, unless keep is set to true.
targetReferenceField Name of target id field. Default is id.
sourceContentField Source field name for a document content/body. Default is not a field, but rather the document body content. Once re-mapped, the metadata source field is deleted, unless keep is set to true.
targetContentField Target field name for a document content/body. Default is: content.
queueDir Optional path where to queue files before sending them to Azure Search. Default is: ./committer-queue.
queueSize Optional maximum queue size before sending document to Azure Search. Default is: 1000.
commitBatchSize Optional maximum of documents to send to Azure Search at once. Default is: 100. Maximum is 1000.
maxRetries Maximum retries upon commit failures. Default is 0 (no retry).
maxRetryWait Maximum delay (millisecond) between retries. Default is 0 (no delay).