public class TranslatorSplitter extends AbstractDocumentSplitter
Translate documents using one of the supported translation API. The following lists the supported APIs, along with the required authentication properties or settings for each:
For example, the Microsoft Translation API requires a client ID and a client secret, both obtained on Microsoft Azure Marketplace with your Microsoft account.
Translated documents will have the original document language stored in a field "document.translatedFrom".
This class is not a document "splitter" per se, but like regular splitters, the translation will create children documents for each translation performed. The parent document will always remain the original document, while the children will always be the translations.
<handler
class="com.norconex.importer.handler.splitter.impl.TranslatorSplitter"
api="(microsoft|google|lingo24|moses|yandex)">
<!-- multiple "restrictTo" tags allowed (only one needs to match) -->
<restrictTo>
<fieldMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(field-matching expression)
</fieldMatcher>
<valueMatcher
method="[basic|csv|wildcard|regex]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
partial="[false|true]">
(value-matching expression)
</valueMatcher>
</restrictTo>
<ignoreContent>(false|true)</ignoreContent>
<ignoreNonTranslatedFields>(false|true)</ignoreNonTranslatedFields>
<fieldsToTranslate>(coma-separated list of fields)</fieldsToTranslate>
<sourceLanguageField>(field containing language)</sourceLanguageField>
<sourceLanguage>(language when no source language field)</sourceLanguage>
<targetLanguages>(coma-separated list of languages)</targetLanguages>
<!-- Microsoft -->
<clientId>...</clientId>
<clientSecret>...</clientSecret>
<!-- Google -->
<apiKey>...</apiKey>
<!-- Lingo24 -->
<userKey>...</userKey>
<!-- Moses -->
<smtPath>...</smtPath>
<scriptPath>...</scriptPath>
<!-- Yandex -->
<apiKey>...</apiKey>
</handler>
<handler
class="TranslatorSplitter"
api="google">
<sourceLanguageField>langField</sourceLanguageField>
<targetLanguages>fr</targetLanguages>
<apiKey>...MYKEYHERE...</apiKey>
</handler>
The above example uses the Google translation API to translate documents into French, taking the source document language from a field called "langField".
Modifier and Type | Field and Description |
---|---|
static String |
API_GOOGLE |
static String |
API_LINGO24 |
static String |
API_MICROSOFT |
static String |
API_MOSES |
static String |
API_YANDEX |
Constructor and Description |
---|
TranslatorSplitter()
Constructor.
|
splitDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
public static final String API_MICROSOFT
public static final String API_GOOGLE
public static final String API_LINGO24
public static final String API_MOSES
public static final String API_YANDEX
protected List<Doc> splitApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState) throws ImporterHandlerException
splitApplicableDocument
in class AbstractDocumentSplitter
ImporterHandlerException
public boolean isIgnoreContent()
public void setIgnoreContent(boolean ignoreContent)
public void setFieldsToTranslate(String... fieldsToTranslate)
public boolean isIgnoreNonTranslatedFields()
public void setIgnoreNonTranslatedFields(boolean ignoreNonTranslatedFields)
public String getSourceLanguageField()
public void setSourceLanguageField(String sourceLanguageField)
public String getSourceLanguage()
public void setSourceLanguage(String sourceLanguage)
public void setTargetLanguages(String... targetLanguages)
public String getApiKey()
public void setApiKey(String apiKey)
public String getUserKey()
public void setUserKey(String userKey)
public String getSmtPath()
public void setSmtPath(String smtPath)
public String getScriptPath()
public void setScriptPath(String scriptPath)
public String getClientId()
public void setClientId(String clientId)
public String getClientSecret()
public void setClientSecret(String clientSecret)
public String getApi()
public void setApi(String api)
public static void main(String[] args) throws ImporterHandlerException
ImporterHandlerException
protected void loadHandlerFromXML(XML xml)
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- XML configurationprotected void saveHandlerToXML(XML xml)
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
xml
- the XMLpublic boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
public String toString()
toString
in class AbstractImporterHandler
Copyright © 2009–2023 Norconex Inc.. All rights reserved.