public class AzureSearchCommitter extends AbstractMappedCommitter
Commits documents to Microsoft Azure Search.
By default the document reference (Azure Search Document Key) is
encoded using URL-safe Base64 encoding. This is Azure Search recommended
approach when a document unique id can contain special characters
(e.g. a URL). If you know your document references to be safe
(e.g. a sequence number), you can
set setDisableReferenceEncoding(boolean)
to true
.
To otherwise store a reference value un-encoded, you can additionally
store it in a field other than your reference ("id") field.
Fields with single value will be sent as such, while multi-value fields are sent as array. If you have a field defined as an array in Azure Search, sending a single value may cause an error.
Since 1.2.0, it is now possible to values to always
be sent as arrays for specific fields. This is done thanks to
setArrayFields(String)
. It expects Comma-Separated-Value list
or a regular expression, depending of the value you set for
setArrayFieldsRegex(boolean)
.
Azure Search will produce an error if any of the documents in a submitted
batch contains one or more fields with invalid characters. To prevent
sending those in vain, the committer will validate your fields
and throw an exception upon encountering an invalid one.
To prevent exceptions from being thrown, you can set
setIgnoreValidationErrors(boolean)
to true
to
log those errors instead.
An exception will also be thrown for errors returned by Azure Search
(e.g. a field is not defined in your
Azure Search schema). To also log those errors instead of throwing an
exception, you can set setIgnoreResponseErrors(boolean)
to true
.
Those are the field naming rules mandated for Azure Search (in force for Azure Search version 2016-09-01): Search version
The proxyPassword
can take a password that has been
encrypted using EncryptionUtil
(or command-line encrypt.[bat|sh]).
In order for the password to be decrypted properly by the crawler, you need
to specify the encryption key used to encrypt it. The key can be stored
in a few supported locations and a combination of
proxyPasswordKey
and proxyPasswordKeySource
must be specified to properly
locate the key. The supported sources are:
proxyPasswordKeySource |
proxyPasswordKey |
---|---|
key |
The actual encryption key. |
file |
Path to a file containing the encryption key. |
environment |
Name of an environment variable containing the key. |
property |
Name of a JVM system property containing the key. |
<committer class="com.norconex.committer.azuresearch.AzureSearchCommitter"> <endpoint>(Azure Search endpoint)</endpoint> <apiVersion>(Optional Azure Search API version to use)</apiVersion> <apiKey>(Azure Search API admin key)</apiKey> <indexName>(Name of the index to use)</indexName> <disableReferenceEncoding>[false|true]</disableReferenceEncoding> <ignoreValidationErrors>[false|true]</ignoreValidationErrors> <ignoreResponseErrors>[false|true]</ignoreResponseErrors> <useWindowsAuth>[false|true]</useWindowsAuth> <arrayFields regex="[false|true]"> (Optional fields to be forcefully sent as array, even if single value. Unless "regex" is true, expects a CSV list of field names.) </arrayFields> <proxyHost>...</proxyHost> <proxyPort>...</proxyPort> <proxyRealm>...</proxyRealm> <proxyScheme>...</proxyScheme> <proxyUsername>...</proxyUsername> <proxyPassword>...</proxyPassword> <!-- Use the following if password is encrypted. --> <proxyPasswordKey>(the encryption key or a reference to it)</proxyPasswordKey> <proxyPasswordKeySource>[key|file|environment|property]</proxyPasswordKeySource> <sourceReferenceField keep="[false|true]"> (Optional name of field that contains the document reference, when the default document reference is not used. The reference value will be mapped to the Azure Search ID field. Once re-mapped, this metadata source field is deleted, unless "keep" is set totrue
.) </sourceReferenceField> <targetReferenceField> (Name of Azure Search target field where the store a document unique identifier (sourceReferenceField). If not specified, default is "id".) </targetReferenceField> <sourceContentField keep="[false|true]"> (If you wish to use a metadata field to act as the document "content", you can specify that field here. Default does not take a metadata field but rather the document content. Once re-mapped, the metadata source field is deleted, unless "keep" is set totrue
.) </sourceContentField> <targetContentField> (Target repository field name for a document content/body. Default is "content".) </targetContentField> <commitBatchSize> (Max number of documents to send to Azure Search at once. Maximum is 1000.) </commitBatchSize> <queueDir>(optional path where to queue files)</queueDir> <queueSize>(max queue size before committing)</queueSize> <maxRetries>(max retries upon commit failures)</maxRetries> <maxRetryWait>(max delay in milliseconds between retries)</maxRetryWait> </committer>
XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
The following example uses the minimum required settings:.
<committer class="com.norconex.committer.azuresearch.AzureSearchCommitter"> <endpoint>https://example.search.windows.net</endpoint> <apiKey>1234567890ABCDEF1234567890ABCDEF</apiKey> <indexName>sample-index</indexName> </committer>
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_API_VERSION
Default Azure Search API version
|
static String |
DEFAULT_AZURE_CONTENT_FIELD
Default Azure Search content field
|
static String |
DEFAULT_AZURE_ID_FIELD
Default Azure Search document key field
|
DEFAULT_COMMIT_BATCH_SIZE
DEFAULT_QUEUE_DIR, filesCommitting
DEFAULT_QUEUE_SIZE, queueSize
Constructor and Description |
---|
AzureSearchCommitter()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
append(StringBuilder json,
String field,
List<String> values) |
protected void |
buildHttpClient(org.apache.http.impl.client.HttpClientBuilder builder) |
protected void |
close() |
void |
commit() |
protected void |
commitBatch(List<ICommitOperation> batch) |
boolean |
equals(Object other) |
String |
getApiKey()
Gets the Azure API admin key.
|
String |
getApiVersion()
Gets the Azure API version.
|
String |
getArrayFields()
Gets fields which values should always be treated as array.
|
String |
getEndpoint()
Gets the Azure Search endpoint
(https://[service name].search.windows.net).
|
String |
getIndexName()
Gets the index name.
|
ProxySettings |
getProxySettings()
Gets the proxy settings.
|
int |
hashCode() |
boolean |
isArrayFieldsRegex()
Gets whether the list of fields to be always treated as array
is represented as regular expression.
|
boolean |
isDisableReferenceEncoding()
Whether to disable document reference encoding.
|
boolean |
isIgnoreResponseErrors()
Whether to ignore response errors.
|
boolean |
isIgnoreValidationErrors()
Whether to ignore validation errors.
|
boolean |
isUseWindowsAuth()
Whether to use integrated Windows Authentication (if applicable).
|
protected void |
loadFromXml(XMLConfiguration xml) |
protected void |
saveToXML(XMLStreamWriter writer) |
void |
setApiKey(String apiKey)
Sets the Azure API admin key.
|
void |
setApiVersion(String apiVersion)
Sets the Azure API version.
|
void |
setArrayFields(String arrayFields)
Sets fields which values should always be treated as array.
|
void |
setArrayFieldsRegex(boolean arrayFieldsRegex)
Sets whether the list of fields to be always treated as array
is represented as regular expression.
|
void |
setDisableReferenceEncoding(boolean disableReferenceEncoding)
Sets whether to disable document reference encoding.
|
void |
setEndpoint(String endpoint)
Sets the Azure Search endpoint
(https://[service name].search.windows.net).
|
void |
setIgnoreResponseErrors(boolean ignoreResponseErrors)
Sets whether to ignore response errors.
|
void |
setIgnoreValidationErrors(boolean ignoreValidationErrors)
Sets whether to ignore validation errors.
|
void |
setIndexName(String indexName)
Sets the index name.
|
void |
setUseWindowsAuth(boolean useWindowsAuth)
Sets whether to use integrated Windows Authentication (if applicable).
|
String |
toString() |
getSourceContentField, getSourceReferenceField, getTargetContentField, getTargetReferenceField, isKeepSourceContentField, isKeepSourceReferenceField, loadFromXML, prepareCommitAddition, saveToXML, setKeepSourceContentField, setKeepSourceReferenceField, setSourceContentField, setSourceReferenceField, setTargetContentField, setTargetReferenceField
commitAddition, commitComplete, commitDeletion, getCommitBatchSize, getMaxRetries, getMaxRetryWait, setCommitBatchSize, setMaxRetries, setMaxRetryWait
getInitialQueueDocCount, getQueueDir, prepareCommitDeletion, queueAddition, queueRemoval, setQueueDir
add, getQueueSize, remove, setQueueSize
public static final String DEFAULT_API_VERSION
public static final String DEFAULT_AZURE_ID_FIELD
public static final String DEFAULT_AZURE_CONTENT_FIELD
public String getIndexName()
public void setIndexName(String indexName)
indexName
- the index namepublic String getEndpoint()
public void setEndpoint(String endpoint)
endpoint
- Azure Search endpointpublic String getApiVersion()
DEFAULT_API_VERSION
.public void setApiVersion(String apiVersion)
apiVersion
- Azure API versionpublic String getApiKey()
public void setApiKey(String apiKey)
apiKey
- Azure API admin keypublic boolean isDisableReferenceEncoding()
true
,
document references will be sent as is if they pass validation.true
if disabling reference encodingpublic void setDisableReferenceEncoding(boolean disableReferenceEncoding)
false
, references are encoded using a URL-safe Base64
encoding. When true
, document references will be sent as
is if they pass validation.disableReferenceEncoding
- true
if disabling
reference encodingpublic boolean isIgnoreValidationErrors()
true
the validation errors are logged
instead and the faulty field or document is not committed.true
when ignoring validation errorspublic void setIgnoreValidationErrors(boolean ignoreValidationErrors)
false
, an exception is
thrown if a document contains a field that Azure Search will reject.
When true
the validation errors are logged
instead and the faulty field or document is not committed.ignoreValidationErrors
- true
when ignoring validation
errorspublic boolean isIgnoreResponseErrors()
true
the errors are logged instead.true
when ignoring response errorspublic void setIgnoreResponseErrors(boolean ignoreResponseErrors)
false
, an exception is
thrown if the Azure Search response contains an error.
When true
the errors are logged instead.ignoreResponseErrors
- true
when ignoring response
errorspublic ProxySettings getProxySettings()
null
.public boolean isUseWindowsAuth()
true
if using Windows Authenticationpublic void setUseWindowsAuth(boolean useWindowsAuth)
useWindowsAuth
- true
if using Windows Authenticationpublic String getArrayFields()
isArrayFieldsRegex()
.isArrayFieldsRegex()
public void setArrayFields(String arrayFields)
isArrayFieldsRegex()
.arrayFields
- list of fields or regular expression matching fieldssetArrayFieldsRegex(boolean)
public boolean isArrayFieldsRegex()
true
if regular expressiongetArrayFields()
public void setArrayFieldsRegex(boolean arrayFieldsRegex)
arrayFieldsRegex
- true
if regular expressionsetArrayFields(String)
public void commit()
commit
in interface ICommitter
commit
in class AbstractFileQueueCommitter
protected void close()
protected void commitBatch(List<ICommitOperation> batch)
commitBatch
in class AbstractBatchCommitter
protected void append(StringBuilder json, String field, List<String> values)
protected void buildHttpClient(org.apache.http.impl.client.HttpClientBuilder builder)
protected void saveToXML(XMLStreamWriter writer) throws XMLStreamException
saveToXML
in class AbstractMappedCommitter
XMLStreamException
protected void loadFromXml(XMLConfiguration xml)
loadFromXml
in class AbstractMappedCommitter
public boolean equals(Object other)
equals
in class AbstractMappedCommitter
public int hashCode()
hashCode
in class AbstractMappedCommitter
public String toString()
toString
in class AbstractMappedCommitter
Copyright © 2017–2021 Norconex Inc.. All rights reserved.