CloudSearchCommitter (Norconex Committer CloudSearch 2.0.0 API)

java.lang.Object
- com.norconex.committer.core3.AbstractCommitter
- - com.norconex.committer.core3.batch.AbstractBatchCommitter
  - - com.norconex.committer.cloudsearch.CloudSearchCommitter

All Implemented Interfaces:: IBatchConsumer, ICommitter, IXMLConfigurable, AutoCloseable

public class CloudSearchCommitter
extends AbstractBatchCommitter

Commits documents to Amazon CloudSearch.

Authentication:

An access key and security key are required to connect to and interact with CloudSearch. For enhanced security, it is best to use one of the methods described in DefaultAWSCredentialsProviderChain for setting them (environment variables, system properties, profile file, etc). Do not explicitly set "accessKey" and "secretKey" on this class if you want to rely on safer methods.

CloudSearch ID limitations:

As of this writing, CloudSearch has a 128 characters length limitation on its "id" field. In addition, certain characters are not allowed. By default, an error will result from trying to submit documents with an invalid ID. You can get around this by setting setFixBadIds(boolean) to true. It will truncate references that are too long and append a hash code to it to keep uniqueness. It will also convert invalid characters to underscore. This approach is not 100% collision-free (uniqueness), but it should safely cover the vast majority of cases.

Password encryption:

Passwords can be encrypted using EncryptionUtil (or command-line "encrypt.bat" or "encrypt.sh" if those are available to you). In order for the password to be decrypted properly, you need to specify the encryption key used to encrypt it. The key can obtained from a few supported locations. The combination of the password key "value" and "source" is used to properly locate the key. The supported sources are:

`key`	The actual encryption key.
`file`	Path to a file containing the encryption key.
`environment`	Name of an environment variable containing the key.
`property`	Name of a JVM system property containing the key.

Restricting committer to specific documents

Optionally apply a committer only to certain type of documents. Documents are restricted based on their metadata field names and values. This option can be used to perform document routing when you have multiple committers defined.

Field mappings

By default, this abstract class applies field mappings for metadata fields, but leaves the document reference and content (input stream) for concrete implementations to handle. In other words, they only apply to a committer request metadata. Field mappings are performed on committer requests before upserts and deletes are actually performed.

XML configuration usage:


<committer
    class="com.norconex.committer.cloudsearch.CloudSearchCommitter">
  <!-- Mandatory: -->
  <serviceEndpoint>(CloudSearch service endpoint)</serviceEndpoint>
  <!-- Mandatory if not configured elsewhere: -->
  <accessKey>
    (Optional CloudSearch access key. Will be taken from environment
     when blank.)
  </accessKey>
  <secretKey>
    (Optional CloudSearch secret key. Will be taken from environment
     when blank.)
  </secretKey>
  <!-- Optional settings: -->
  <fixBadIds>
    [false|true](Forces references to fit into a CloudSearch id field.)
  </fixBadIds>
  <signingRegion>(CloudSearch signing region)</signingRegion>
  <proxySettings>
    <host>
      <name>(host name)</name>
      <port>(host port)</port>
    </host>
    <scheme>(Default is "http")</scheme>
    <realm>(Authentication realm. Default is any.)</realm>
    <credentials>
      <username>(the username)</username>
      <password>(the optionally encrypted password)</password>
      <passwordKey>
        <value>
          (The actual password encryption key or a reference to it.)
        </value>
        <source>[key|file|environment|property]</source>
        <size>(Size in bits of encryption key. Default is 128.)</size>
      </passwordKey>
    </credentials>
  </proxySettings>
  <sourceIdField>
    (Optional document field name containing the value that will be stored
    in CloudSearch "id" field. Default is the document reference.)
  </sourceIdField>
  <targetContentField>
    (Optional CloudSearch field name to store the document
    content/body. Default is "content".)
  </targetContentField>
  <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
  <restrictTo>
    <fieldMatcher
        method="[basic|csv|wildcard|regex]"
        ignoreCase="[false|true]"
        ignoreDiacritic="[false|true]"
        partial="[false|true]">
      (field-matching expression)
    </fieldMatcher>
    <valueMatcher
        method="[basic|csv|wildcard|regex]"
        ignoreCase="[false|true]"
        ignoreDiacritic="[false|true]"
        partial="[false|true]">
      (value-matching expression)
    </valueMatcher>
  </restrictTo>
  <fieldMappings>
    <!-- Add as many field mappings as needed -->
    <mapping
        fromField="(source field name)"
        toField="(target field name)"/>
  </fieldMappings>
  <!-- Settings for default queue implementation ("class" is optional): -->
  <queue
      class="com.norconex.committer.core3.batch.queue.impl.FSQueue">
    <batchSize>
      (Optional number of documents queued after which we process a batch.
       Default is 20.)
    </batchSize>
    <maxPerFolder>
      (Optional maximum number of files or directories that can be queued
       in a single folder before a new one gets created. Default is 500.)
    </maxPerFolder>
    <commitLeftoversOnInit>
      (Optionally force to commit any leftover documents from a previous
       execution. E.g., prematurely ended.  Default is "false").
    </commitLeftoversOnInit>
    <onCommitFailure>
      <splitBatch>[OFF|HALF|ONE]</splitBatch>
      <maxRetries>(Max retries upon commit failures. Default is 0.)</maxRetries>
      <retryDelay>
        (Delay in milliseconds between retries. Default is 0.)
      </retryDelay>
      <ignoreErrors>
        [false|true]
        (When true, non-critical exceptions when interacting with the target

         repository won't be thrown to try continue the execution with other

         files to be committed. Instead, errors will be logged.
         In both cases the failing batch/files are moved to an
         "error" folder. Other types of exceptions may still be thrown.)
      </ignoreErrors>
    </onCommitFailure>
  </queue>
</committer>

XML configuration entries expecting millisecond durations can be provided in human-readable format (English only), as per DurationParser (e.g., "5 minutes and 30 seconds" or "5m30s").

XML usage example:


<committer
    class="com.norconex.committer.cloudsearch.CloudSearchCommitter">
  <serviceEndpoint>
    search-example-xyz.some-region.cloudsearch.amazonaws.com
  </serviceEndpoint>
</committer>

The above example uses the minimum required settings (relying on environment variables for AWS keys).

Author:: Pascal Essiembre

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`COULDSEARCH_ID_FIELD` CloudSearch mandatory ID field
`static String`	`DEFAULT_COULDSEARCH_CONTENT_FIELD` Default CloudSearch content field
`static Pattern`	`FIELD_PATTERN` CouldSearch mandatory field pattern.

Constructor Summary

Constructors
Constructor and Description
`CloudSearchCommitter()`
`CloudSearchCommitter(String serviceEndpoint)`
`CloudSearchCommitter(String serviceEndpoint, String signingRegion)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`closeBatchCommitter()`
`protected void`	`commitBatch(Iterator<ICommitterRequest> it)`
`boolean`	`equals(Object other)`
`String`	`getAccessKey()` Gets the CloudSearch access key.
`ProxySettings`	`getProxySettings()`
`String`	`getSecretKey()` Gets the CloudSearch secret key.
`String`	`getServiceEndpoint()` Gets AWS service endpoint.
`String`	`getSigningRegion()` Gets the AWS signing region.
`String`	`getSourceIdField()` Gets the document field name containing the value to be stored in CloudSearch "id" field.
`String`	`getTargetContentField()` Gets the name of the CloudSearch field where content will be stored.
`int`	`hashCode()`
`protected void`	`initBatchCommitter()`
`boolean`	`isFixBadIds()` Gets whether to fix IDs that are too long for CloudSearch ID limitation (128 characters max).
`protected void`	`loadBatchCommitterFromXML(XML xml)`
`protected void`	`saveBatchCommitterToXML(XML xml)`
`void`	`setAccessKey(String accessKey)` Sets the CloudSearch access key.
`void`	`setFixBadIds(boolean fixBadIds)` Sets whether to fix IDs that are too long for CloudSearch ID limitation (128 characters max).
`void`	`setProxySettings(ProxySettings proxy)`
`void`	`setSecretKey(String secretKey)` Sets the CloudSearch secret key.
`void`	`setServiceEndpoint(String serviceEndpoint)` Sets AWS service endpoint.
`void`	`setSigningRegion(String signingRegion)` Gets the AWS signing region.
`void`	`setSourceIdField(String sourceIdField)` Sets the document field name containing the value to be stored in CloudSearch "id" field.
`void`	`setTargetContentField(String targetContentField)` Sets the name of the CloudSearch field where content will be stored.
`String`	`toString()`

Methods inherited from class com.norconex.committer.core3.batch.AbstractBatchCommitter
consume, doClean, doClose, doDelete, doInit, doUpsert, getCommitterQueue, loadCommitterFromXML, saveCommitterToXML, setCommitterQueue

Methods inherited from class com.norconex.committer.core3.AbstractCommitter
accept, addRestriction, addRestrictions, applyFieldMappings, clean, clearFieldMappings, clearRestrictions, close, delete, fireDebug, fireDebug, fireError, fireError, fireInfo, fireInfo, getCommitterContext, getFieldMappings, getRestrictions, init, loadFromXML, removeFieldMapping, removeRestriction, removeRestriction, saveToXML, setFieldMapping, setFieldMappings, upsert

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable
loadFromXML, saveToXML

- Field Detail
  - FIELD_PATTERN
```
public static final Pattern FIELD_PATTERN
```
    CouldSearch mandatory field pattern. Characters not matching the pattern will be replaced by an underscore.
  - COULDSEARCH_ID_FIELD
```
public static final String COULDSEARCH_ID_FIELD
```
    CloudSearch mandatory ID field
    
    See Also:
    
    Constant Field Values
  - DEFAULT_COULDSEARCH_CONTENT_FIELD
```
public static final String DEFAULT_COULDSEARCH_CONTENT_FIELD
```
    Default CloudSearch content field
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - CloudSearchCommitter
```
public CloudSearchCommitter()
```
  - CloudSearchCommitter
```
public CloudSearchCommitter(String serviceEndpoint)
```
  - CloudSearchCommitter
```
public CloudSearchCommitter(String serviceEndpoint,
                            String signingRegion)
```
- Method Detail
  - getServiceEndpoint
```
public String getServiceEndpoint()
```
    Gets AWS service endpoint.
    
    Returns:
    
    AWS service endpoint
  - setServiceEndpoint
```
public void setServiceEndpoint(String serviceEndpoint)
```
    Sets AWS service endpoint.
    
    Parameters:
    
    serviceEndpoint - AWS service endpoint
  - getSigningRegion
```
public String getSigningRegion()
```
    Gets the AWS signing region.
    
    Returns:
    
    the AWS signing region
  - setSigningRegion
```
public void setSigningRegion(String signingRegion)
```
    Gets the AWS signing region.
    
    Parameters:
    
    signingRegion - the AWS signing region
  - getAccessKey
```
public String getAccessKey()
```
    Gets the CloudSearch access key. If null, the access key will be obtained from the environment, as detailed in DefaultAWSCredentialsProviderChain.
    
    Returns:
    
    the access key
  - setAccessKey
```
public void setAccessKey(String accessKey)
```
    Sets the CloudSearch access key. If null, the access key will be obtained from the environment, as detailed in DefaultAWSCredentialsProviderChain.
    
    Parameters:
    
    accessKey - the access key
  - getSecretKey
```
public String getSecretKey()
```
    Gets the CloudSearch secret key. If null, the secret key will be obtained from the environment, as detailed in DefaultAWSCredentialsProviderChain.
    
    Returns:
    
    the secret key
  - setSecretKey
```
public void setSecretKey(String secretKey)
```
    Sets the CloudSearch secret key. If null, the secret key will be obtained from the environment, as detailed in DefaultAWSCredentialsProviderChain.
    
    Parameters:
    
    secretKey - the secret key
  - getTargetContentField
```
public String getTargetContentField()
```
    Gets the name of the CloudSearch field where content will be stored. Default is "content".
    
    Returns:
    
    field name
  - setTargetContentField
```
public void setTargetContentField(String targetContentField)
```
    Sets the name of the CloudSearch field where content will be stored. Specifying a null value will disable storing the content.
    
    Parameters:
    
    targetContentField - field name
  - getSourceIdField
```
public String getSourceIdField()
```
    Gets the document field name containing the value to be stored in CloudSearch "id" field. Default is not a field, but rather the document reference.
    
    Returns:
    
    name of field containing id value
  - setSourceIdField
```
public void setSourceIdField(String sourceIdField)
```
    Sets the document field name containing the value to be stored in CloudSearch "id" field. Set null to use the document reference instead of a field (default).
    
    Parameters:
    
    sourceIdField - name of field containing id value, or null
  - isFixBadIds
```
public boolean isFixBadIds()
```
    Gets whether to fix IDs that are too long for CloudSearch ID limitation (128 characters max). If true, long IDs will be truncated and a hash code representing the truncated part will be appended.
    
    Returns:
    
    true to fix IDs that are too long
  - setFixBadIds
```
public void setFixBadIds(boolean fixBadIds)
```
    Sets whether to fix IDs that are too long for CloudSearch ID limitation (128 characters max). If true, long IDs will be truncated and a hash code representing the truncated part will be appended.
    
    Parameters:
    
    fixBadIds - true to fix IDs that are too long
  - getProxySettings
```
public ProxySettings getProxySettings()
```
  - setProxySettings
```
public void setProxySettings(ProxySettings proxy)
```
  - initBatchCommitter
```
protected void initBatchCommitter()
                           throws CommitterException
```
    Overrides:
    
    initBatchCommitter in class AbstractBatchCommitter
    
    Throws:
    
    CommitterException
  - commitBatch
```
protected void commitBatch(Iterator<ICommitterRequest> it)
                    throws CommitterException
```
    Specified by:
    
    commitBatch in class AbstractBatchCommitter
    
    Throws:
    
    CommitterException
  - closeBatchCommitter
```
protected void closeBatchCommitter()
                            throws CommitterException
```
    Overrides:
    
    closeBatchCommitter in class AbstractBatchCommitter
    
    Throws:
    
    CommitterException
  - saveBatchCommitterToXML
```
protected void saveBatchCommitterToXML(XML xml)
```
    Specified by:
    
    saveBatchCommitterToXML in class AbstractBatchCommitter
  - loadBatchCommitterFromXML
```
protected void loadBatchCommitterFromXML(XML xml)
```
    Specified by:
    
    loadBatchCommitterFromXML in class AbstractBatchCommitter
  - equals
```
public boolean equals(Object other)
```
    Overrides:
    
    equals in class AbstractBatchCommitter
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class AbstractBatchCommitter
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class AbstractBatchCommitter

Class CloudSearchCommitter

Authentication:

CloudSearch ID limitations:

Password encryption:

Restricting committer to specific documents

Field mappings

XML configuration usage:

XML usage example:

Field Summary

Constructor Summary

Method Summary

Methods inherited from class com.norconex.committer.core3.batch.AbstractBatchCommitter

Methods inherited from class com.norconex.committer.core3.AbstractCommitter

Methods inherited from class java.lang.Object

Methods inherited from interface com.norconex.commons.lang.xml.IXMLConfigurable

Field Detail

FIELD_PATTERN

COULDSEARCH_ID_FIELD

DEFAULT_COULDSEARCH_CONTENT_FIELD

Constructor Detail

CloudSearchCommitter

CloudSearchCommitter

CloudSearchCommitter

Method Detail

getServiceEndpoint

setServiceEndpoint

getSigningRegion

setSigningRegion

getAccessKey

setAccessKey

getSecretKey

setSecretKey

getTargetContentField

setTargetContentField

getSourceIdField

setSourceIdField

isFixBadIds

setFixBadIds

getProxySettings

setProxySettings

initBatchCommitter

commitBatch

closeBatchCommitter

saveBatchCommitterToXML

loadBatchCommitterFromXML

equals

hashCode

toString