public class AzureSearchCommitter extends AbstractBatchCommitter
Commits documents to Microsoft Azure Search.
By default the document reference (Azure Search Document Key) is
encoded using URL-safe Base64 encoding. This is Azure Search recommended
approach when a document unique id can contain special characters
(e.g. a URL). If you know your document references to be safe
(e.g. a sequence number), you can
set AzureSearchCommitterConfig.setDisableDocKeyEncoding(boolean)
to true
.
To otherwise store a reference value un-encoded, you can additionally
store it in a field other than your reference ("id") field.
Fields with single value will be sent as such, while multi-value fields are sent as array. If you have a field defined as an array in Azure Search, sending a single value may cause an error.
It is possible for values to always
be sent as arrays for specific fields. This is done using
AzureSearchCommitterConfig.setArrayFields(String)
.
It expects comma-separated-value list
or a regular expression, depending of the value you set for
AzureSearchCommitterConfig.setArrayFieldsRegex(boolean)
.
Azure Search will produce an error if any of the documents in a submitted
batch contains one or more fields with invalid characters. To prevent
sending those in vain, the committer will validate your fields
and throw an exception upon encountering an invalid one.
To prevent exceptions from being thrown, you can set
AzureSearchCommitterConfig.setIgnoreValidationErrors(boolean)
to true
to log those errors instead.
An exception will also be thrown for errors returned by Azure Search
(e.g. a field is not defined in your
Azure Search schema). To also log those errors instead of throwing an
exception, you can set
AzureSearchCommitterConfig.setIgnoreResponseErrors(boolean)
to true
.
Those are the field naming rules mandated for Azure Search (in force for Azure Search version 2016-09-01): Search version
Passwords can be encrypted using EncryptionUtil
(or
command-line "encrypt.bat" or "encrypt.sh" if those are available to you).
In order for the password to be decrypted properly, you need
to specify the encryption key used to encrypt it. The key can obtained
from a few supported locations. The combination of the password key
"value" and "source" is used to properly locate the key.
The supported sources are:
key |
The actual encryption key. |
file |
Path to a file containing the encryption key. |
environment |
Name of an environment variable containing the key. |
property |
Name of a JVM system property containing the key. |
Optionally apply a committer only to certain type of documents. Documents are restricted based on their metadata field names and values. This option can be used to perform document routing when you have multiple committers defined.
By default, this abstract class applies field mappings for metadata fields, but leaves the document reference and content (input stream) for concrete implementations to handle. In other words, they only apply to a committer request metadata. Field mappings are performed on committer requests before upserts and deletes are actually performed.
<committer
class="com.norconex.committer.azuresearch.AzureSearchCommitter">
<endpoint>(Azure Search endpoint)</endpoint>
<apiVersion>(Optional Azure Search API version to use)</apiVersion>
<apiKey>(Azure Search API admin key)</apiKey>
<indexName>(Name of the index to use)</indexName>
<disableDocKeyEncoding>[false|true]</disableDocKeyEncoding>
<ignoreValidationErrors>[false|true]</ignoreValidationErrors>
<ignoreResponseErrors>[false|true]</ignoreResponseErrors>
<useWindowsAuth>[false|true]</useWindowsAuth>
<arrayFields
regex="[false|true]">
(Optional fields to be forcefully sent as array, even if single
value. Unless "regex" is true, expects a CSV list of field names.)
</arrayFields>
<proxySettings>
<host>
<name>(host name)</name>
<port>(host port)</port>
</host>
<scheme>(Default is "http")</scheme>
<realm>(Authentication realm. Default is any.)</realm>
<credentials>
<username>(the username)</username>
<password>(the optionally encrypted password)</password>
<passwordKey>
<value>
(The actual password encryption key or a reference to it.)
</value>
<source>[key|file|environment|property]</source>
<size>(Size in bits of encryption key. Default is 128.)</size>
</passwordKey>
</credentials>
</proxySettings>
<sourceKeyField>
(Optional document field name containing the value that will be stored
in Azure Search target document key field. Default is the document
reference.)
</sourceKeyField>
<targetKeyField>
(Optional name of Azure Search document field where to store a
document unique key identifier (sourceKeydField).
Default is "id".)
</targetKeyField>
<targetContentField>
(Optional Azure Search document field name to store document
content/body. Default is "content".)
</targetContentField>
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<fieldMappings>
<!-- Add as many field mappings as needed -->
<mapping
fromField="(source field name)"
toField="(target field name)"/>
</fieldMappings>
<!-- Settings for default queue implementation ("class" is optional): -->
<queue
class="com.norconex.committer.core3.batch.queue.impl.FSQueue">
<batchSize>
(Optional number of documents queued after which we process a batch.
Default is 20.)
</batchSize>
<maxPerFolder>
(Optional maximum number of files or directories that can be queued
in a single folder before a new one gets created. Default is 500.)
</maxPerFolder>
<commitLeftoversOnInit>
(Optionally force to commit any leftover documents from a previous
execution. E.g., prematurely ended. Default is "false").
</commitLeftoversOnInit>
<onCommitFailure>
<splitBatch>[OFF|HALF|ONE]</splitBatch>
<maxRetries>(Max retries upon commit failures. Default is 0.)</maxRetries>
<retryDelay>
(Delay in milliseconds between retries. Default is 0.)
</retryDelay>
<ignoreErrors>
[false|true]
(When true, non-critical exceptions when interacting with the target
repository won't be thrown to try continue the execution with other
files to be committed. Instead, errors will be logged.
In both cases the failing batch/files are moved to an
"error" folder. Other types of exceptions may still be thrown.)
</ignoreErrors>
</onCommitFailure>
</queue>
</committer>
XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
<committer
class="com.norconex.committer.azuresearch.AzureSearchCommitter">
<endpoint>https://example.search.windows.net</endpoint>
<apiKey>1234567890ABCDEF1234567890ABCDEF</apiKey>
<indexName>sample-index</indexName>
</committer>
The above example uses the minimum required settings.
Constructor and Description |
---|
AzureSearchCommitter() |
AzureSearchCommitter(AzureSearchCommitterConfig config) |
Modifier and Type | Method and Description |
---|---|
protected void |
closeBatchCommitter() |
protected void |
commitBatch(Iterator<ICommitterRequest> it) |
boolean |
equals(Object other) |
AzureSearchCommitterConfig |
getConfig() |
int |
hashCode() |
protected void |
initBatchCommitter() |
protected void |
loadBatchCommitterFromXML(XML xml) |
protected void |
saveBatchCommitterToXML(XML xml) |
String |
toString() |
consume, doClean, doClose, doDelete, doInit, doUpsert, getCommitterQueue, loadCommitterFromXML, saveCommitterToXML, setCommitterQueue
accept, addRestriction, addRestrictions, applyFieldMappings, clean, clearFieldMappings, clearRestrictions, close, delete, fireDebug, fireDebug, fireError, fireError, fireInfo, fireInfo, getCommitterContext, getFieldMappings, getRestrictions, init, loadFromXML, removeFieldMapping, removeRestriction, removeRestriction, saveToXML, setFieldMapping, setFieldMappings, upsert
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
loadFromXML, saveToXML
public AzureSearchCommitter()
public AzureSearchCommitter(AzureSearchCommitterConfig config)
protected void initBatchCommitter() throws CommitterException
initBatchCommitter
in class AbstractBatchCommitter
CommitterException
protected void commitBatch(Iterator<ICommitterRequest> it) throws CommitterException
commitBatch
in class AbstractBatchCommitter
CommitterException
protected void closeBatchCommitter() throws CommitterException
closeBatchCommitter
in class AbstractBatchCommitter
CommitterException
public AzureSearchCommitterConfig getConfig()
protected void loadBatchCommitterFromXML(XML xml)
loadBatchCommitterFromXML
in class AbstractBatchCommitter
protected void saveBatchCommitterToXML(XML xml)
saveBatchCommitterToXML
in class AbstractBatchCommitter
public boolean equals(Object other)
equals
in class AbstractBatchCommitter
public int hashCode()
hashCode
in class AbstractBatchCommitter
public String toString()
toString
in class AbstractBatchCommitter
Copyright © 2017–2022 Norconex Inc.. All rights reserved.