Norconex Amazon CloudSearch Committer

Configuration

When used with a Norconex Crawler, you can use the following XML to configure Amazon CloudSearch as the <committer> section of your Norconex Crawler configuration:

  <committer class="com.norconex.committer.cloudsearch.CloudSearchCommitter">
 
      <!-- Mandatory: -->
      <documentEndpoint>...</documentEndpoint>
 
      <!-- Mandatory if not configured elsewhere: -->
      <accessKey>...</accessKey>
      <secretKey>...</secretKey>
 
      <!-- Optional settings: -->
      <fixBadIds>[false|true]</fixBadIds>
 
      <!-- Proxy (since 1.4.0) -->
      <proxyHost>...</proxyHost>
      <proxyPort>...</proxyPort>
      <proxyUsername>...</proxyUsername>
      <proxyPassword>...</proxyPassword>
      <!-- Use the following if password is encrypted. -->
      <proxyPasswordKey>...</proxyPasswordKey>
      <proxyPasswordKeySource>...</proxyPasswordKeySource>
 
      <sourceReferenceField keep="[false|true]">...</sourceReferenceField>
      <sourceContentField keep="[false|true]">...</sourceContentField>
      <targetContentField>...</targetContentField>
      <commitBatchSize>...</commitBatchSize>
      <queueDir>...</queueDir>
      <queueSize>...</queueSize>
      <maxRetries>...</maxRetries>
      <maxRetryWait>...</maxRetryWait>
  </committer>

Tag descriptions:

Tag Description
documentEndpoint CloudSearch document endpoint (where to send documents for indexing).
accessKey Optional CloudSearch access key. Will be taken from environment when blank.
secretKey Optional CloudSearch secret key. Will be taken from environment when blank.
fixBadIds Flag to fix ids not matching CloudSearch ID limitations.
proxyHost Optional proxy host.
proxyPort Optional proxy port.
proxyUsername Optional proxy username.
proxyPassword Optional proxy password.
proxyPasswordKey Optional proxy password key if password is encrypted. Refer to the API Documentation for more details.
proxyPasswordKeySource Optional password encryption key source. One of key, file, environment, or property. Refer to the API Documentation for more details.
sourceReferenceField Name of source field that will be mapped to the CloudSearch target id field. Default is the document reference the Committer stores as committer.reference. Once re-mapped, the metadata source field is deleted, unless keep is set to true.
targetReferenceField Name of target id field. Default is id.
sourceContentField CloudSearch source field name for a document content/body. Default is not a field, but rather the document body content. Once re-mapped, the metadata source field is deleted, unless keep is set to true.
targetContentField CloudSearch target field name for a document content/body. Default is: content.
queueDir Path where to queue files before sending them to CloudSearch. Default is: ./committer-queue
queueSize Number of documents or deletes to queue before sending to CloudSearch. Default is: 1000.
commitBatchSize Maximum number of documents to send CloudSearch at once. Default is: 100.
maxRetries Maximum number of retries upon commit failures. Default is: 0 (no retry).
maxRetryWait Delay between retries. Default is: 0 (no delay).