public class CloudSearchCommitter extends AbstractBatchCommitter
Commits documents to Amazon CloudSearch.
An access key and security key are required to connect to and interact with
CloudSearch. For enhanced security, it is best to use one of the methods
described in DefaultAWSCredentialsProviderChain
for setting them
(environment variables, system properties, profile file, etc).
Do not explicitly set "accessKey" and "secretKey" on this class if you
want to rely on safer methods.
As of this writing, CloudSearch has a 128 characters length limitation
on its "id" field. In addition, certain characters are not allowed.
By default, an error will result from trying to submit
documents with an invalid ID. You can get around this by
setting setFixBadIds(boolean)
to true
. It will
truncate references that are too long and append a hash code to it
to keep uniqueness. It will also convert invalid
characters to underscore. This approach is not 100%
collision-free (uniqueness), but it should safely cover the vast
majority of cases.
Passwords can be encrypted using EncryptionUtil
(or
command-line "encrypt.bat" or "encrypt.sh" if those are available to you).
In order for the password to be decrypted properly, you need
to specify the encryption key used to encrypt it. The key can obtained
from a few supported locations. The combination of the password key
"value" and "source" is used to properly locate the key.
The supported sources are:
key |
The actual encryption key. |
file |
Path to a file containing the encryption key. |
environment |
Name of an environment variable containing the key. |
property |
Name of a JVM system property containing the key. |
Optionally apply a committer only to certain type of documents. Documents are restricted based on their metadata field names and values. This option can be used to perform document routing when you have multiple committers defined.
By default, this abstract class applies field mappings for metadata fields, but leaves the document reference and content (input stream) for concrete implementations to handle. In other words, they only apply to a committer request metadata. Field mappings are performed on committer requests before upserts and deletes are actually performed.
<committer
class="com.norconex.committer.cloudsearch.CloudSearchCommitter">
<!-- Mandatory: -->
<serviceEndpoint>(CloudSearch service endpoint)</serviceEndpoint>
<!-- Mandatory if not configured elsewhere: -->
<accessKey>
(Optional CloudSearch access key. Will be taken from environment
when blank.)
</accessKey>
<secretKey>
(Optional CloudSearch secret key. Will be taken from environment
when blank.)
</secretKey>
<!-- Optional settings: -->
<fixBadIds>
[false|true](Forces references to fit into a CloudSearch id field.)
</fixBadIds>
<signingRegion>(CloudSearch signing region)</signingRegion>
<proxySettings>
<host>
<name>(host name)</name>
<port>(host port)</port>
</host>
<scheme>(Default is "http")</scheme>
<realm>(Authentication realm. Default is any.)</realm>
<credentials>
<username>(the username)</username>
<password>(the optionally encrypted password)</password>
<passwordKey>
<value>
(The actual password encryption key or a reference to it.)
</value>
<source>[key|file|environment|property]</source>
<size>(Size in bits of encryption key. Default is 128.)</size>
</passwordKey>
</credentials>
</proxySettings>
<sourceIdField>
(Optional document field name containing the value that will be stored
in CloudSearch "id" field. Default is the document reference.)
</sourceIdField>
<targetContentField>
(Optional CloudSearch field name to store the document
content/body. Default is "content".)
</targetContentField>
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMappings>
<!-- Add as many field mappings as needed -->
<mapping
fromField="(source field name)"
toField="(target field name)"/>
</fieldMappings>
<!-- Settings for default queue implementation ("class" is optional): -->
<queue
class="com.norconex.committer.core3.batch.queue.impl.FSQueue">
<batchSize>
(Optional number of documents queued after which we process a batch.
Default is 20.)
</batchSize>
<maxPerFolder>
(Optional maximum number of files or directories that can be queued
in a single folder before a new one gets created. Default is 500.)
</maxPerFolder>
<commitLeftoversOnInit>
(Optionally force to commit any leftover documents from a previous
execution. E.g., prematurely ended. Default is "false").
</commitLeftoversOnInit>
<onCommitFailure>
<splitBatch>[OFF|HALF|ONE]</splitBatch>
<maxRetries>(Max retries upon commit failures. Default is 0.)</maxRetries>
<retryDelay>
(Delay in milliseconds between retries. Default is 0.)
</retryDelay>
<ignoreErrors>
[false|true]
(When true, non-critical exceptions when interacting with the target
repository won't be thrown to try continue the execution with other
files to be committed. Instead, errors will be logged.
In both cases the failing batch/files are moved to an
"error" folder. Other types of exceptions may still be thrown.)
</ignoreErrors>
</onCommitFailure>
</queue>
</committer>
XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
<committer
class="com.norconex.committer.cloudsearch.CloudSearchCommitter">
<serviceEndpoint>
search-example-xyz.some-region.cloudsearch.amazonaws.com
</serviceEndpoint>
</committer>
The above example uses the minimum required settings (relying on environment variables for AWS keys).
Modifier and Type | Field and Description |
---|---|
static String |
COULDSEARCH_ID_FIELD
CloudSearch mandatory ID field
|
static String |
DEFAULT_COULDSEARCH_CONTENT_FIELD
Default CloudSearch content field
|
static Pattern |
FIELD_PATTERN
CouldSearch mandatory field pattern.
|
Constructor and Description |
---|
CloudSearchCommitter() |
CloudSearchCommitter(String serviceEndpoint) |
CloudSearchCommitter(String serviceEndpoint,
String signingRegion) |
Modifier and Type | Method and Description |
---|---|
protected void |
closeBatchCommitter() |
protected void |
commitBatch(Iterator<ICommitterRequest> it) |
boolean |
equals(Object other) |
String |
getAccessKey()
Gets the CloudSearch access key.
|
ProxySettings |
getProxySettings() |
String |
getSecretKey()
Gets the CloudSearch secret key.
|
String |
getServiceEndpoint()
Gets AWS service endpoint.
|
String |
getSigningRegion()
Gets the AWS signing region.
|
String |
getSourceIdField()
Gets the document field name containing the value to be stored
in CloudSearch "id" field.
|
String |
getTargetContentField()
Gets the name of the CloudSearch field where content will be stored.
|
int |
hashCode() |
protected void |
initBatchCommitter() |
boolean |
isFixBadIds()
Gets whether to fix IDs that are too long for CloudSearch
ID limitation (128 characters max).
|
protected void |
loadBatchCommitterFromXML(XML xml) |
protected void |
saveBatchCommitterToXML(XML xml) |
void |
setAccessKey(String accessKey)
Sets the CloudSearch access key.
|
void |
setFixBadIds(boolean fixBadIds)
Sets whether to fix IDs that are too long for CloudSearch
ID limitation (128 characters max).
|
void |
setProxySettings(ProxySettings proxy) |
void |
setSecretKey(String secretKey)
Sets the CloudSearch secret key.
|
void |
setServiceEndpoint(String serviceEndpoint)
Sets AWS service endpoint.
|
void |
setSigningRegion(String signingRegion)
Gets the AWS signing region.
|
void |
setSourceIdField(String sourceIdField)
Sets the document field name containing the value to be stored
in CloudSearch "id" field.
|
void |
setTargetContentField(String targetContentField)
Sets the name of the CloudSearch field where content will be stored.
|
String |
toString() |
consume, doClean, doClose, doDelete, doInit, doUpsert, getCommitterQueue, loadCommitterFromXML, saveCommitterToXML, setCommitterQueue
accept, addRestriction, addRestrictions, applyFieldMappings, clean, clearFieldMappings, clearRestrictions, close, delete, fireDebug, fireDebug, fireError, fireError, fireInfo, fireInfo, getCommitterContext, getFieldMappings, getRestrictions, init, loadFromXML, removeFieldMapping, removeRestriction, removeRestriction, saveToXML, setFieldMapping, setFieldMappings, upsert
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public static final Pattern FIELD_PATTERN
public static final String COULDSEARCH_ID_FIELD
public static final String DEFAULT_COULDSEARCH_CONTENT_FIELD
public CloudSearchCommitter()
public CloudSearchCommitter(String serviceEndpoint)
public String getServiceEndpoint()
public void setServiceEndpoint(String serviceEndpoint)
serviceEndpoint
- AWS service endpointpublic String getSigningRegion()
public void setSigningRegion(String signingRegion)
signingRegion
- the AWS signing regionpublic String getAccessKey()
null
, the access key
will be obtained from the environment, as detailed in
DefaultAWSCredentialsProviderChain
.public void setAccessKey(String accessKey)
null
, the access key
will be obtained from the environment, as detailed in
DefaultAWSCredentialsProviderChain
.accessKey
- the access keypublic String getSecretKey()
null
, the secret key
will be obtained from the environment, as detailed in
DefaultAWSCredentialsProviderChain
.public void setSecretKey(String secretKey)
null
, the secret key
will be obtained from the environment, as detailed in
DefaultAWSCredentialsProviderChain
.secretKey
- the secret keypublic String getTargetContentField()
public void setTargetContentField(String targetContentField)
null
value will disable storing the content.targetContentField
- field namepublic String getSourceIdField()
public void setSourceIdField(String sourceIdField)
null
to use the
document reference instead of a field (default).sourceIdField
- name of field containing id value,
or null
public boolean isFixBadIds()
true
,
long IDs will be truncated and a hash code representing the
truncated part will be appended.true
to fix IDs that are too longpublic void setFixBadIds(boolean fixBadIds)
true
,
long IDs will be truncated and a hash code representing the
truncated part will be appended.fixBadIds
- true
to fix IDs that are too longpublic ProxySettings getProxySettings()
public void setProxySettings(ProxySettings proxy)
protected void initBatchCommitter() throws CommitterException
initBatchCommitter
in class AbstractBatchCommitter
CommitterException
protected void commitBatch(Iterator<ICommitterRequest> it) throws CommitterException
commitBatch
in class AbstractBatchCommitter
CommitterException
protected void closeBatchCommitter() throws CommitterException
closeBatchCommitter
in class AbstractBatchCommitter
CommitterException
protected void saveBatchCommitterToXML(XML xml)
saveBatchCommitterToXML
in class AbstractBatchCommitter
protected void loadBatchCommitterFromXML(XML xml)
loadBatchCommitterFromXML
in class AbstractBatchCommitter
public boolean equals(Object other)
equals
in class AbstractBatchCommitter
public int hashCode()
hashCode
in class AbstractBatchCommitter
public String toString()
toString
in class AbstractBatchCommitter
Copyright © 2009–2022 Norconex Inc.. All rights reserved.