Class TranslatorSplitter
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.splitter.AbstractDocumentSplitter
-
- com.norconex.importer.handler.splitter.impl.TranslatorSplitter
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentSplitter
public class TranslatorSplitter extends AbstractDocumentSplitter
Translate documents using one of the supported translation API. The following lists the supported APIs, along with the required authentication properties or settings for each:
- microsoft
- clientId
- clientSecret
- google
- apiKey
- lingo24
- userKey
- moses
- smtPath
- scriptPath
- yandex
- apiKey
For example, the Microsoft Translation API requires a client ID and a client secret, both obtained on Microsoft Azure Marketplace with your Microsoft account.
Translated documents will have the original document language stored in a field "document.translatedFrom".
This class is not a document "splitter" per se, but like regular splitters, the translation will create children documents for each translation performed. The parent document will always remain the original document, while the children will always be the translations.
XML configuration usage:
<handler class="com.norconex.importer.handler.splitter.impl.TranslatorSplitter" api="(microsoft|google|lingo24|moses|yandex)"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <ignoreContent>(false|true)</ignoreContent> <ignoreNonTranslatedFields>(false|true)</ignoreNonTranslatedFields> <fieldsToTranslate>(coma-separated list of fields)</fieldsToTranslate> <sourceLanguageField>(field containing language)</sourceLanguageField> <sourceLanguage>(language when no source language field)</sourceLanguage> <targetLanguages>(coma-separated list of languages)</targetLanguages> <!-- Microsoft --> <clientId>...</clientId> <clientSecret>...</clientSecret> <!-- Google --> <apiKey>...</apiKey> <!-- Lingo24 --> <userKey>...</userKey> <!-- Moses --> <smtPath>...</smtPath> <scriptPath>...</scriptPath> <!-- Yandex --> <apiKey>...</apiKey> </handler>
XML usage example:
<handler class="TranslatorSplitter" api="google"> <sourceLanguageField>langField</sourceLanguageField> <targetLanguages>fr</targetLanguages> <apiKey>...MYKEYHERE...</apiKey> </handler>
The above example uses the Google translation API to translate documents into French, taking the source document language from a field called "langField".
- Since:
- 2.1.0
- Author:
- Pascal Essiembre
-
-
Field Summary
Fields Modifier and Type Field Description static String
API_GOOGLE
static String
API_LINGO24
static String
API_MICROSOFT
static String
API_MOSES
static String
API_YANDEX
-
Constructor Summary
Constructors Constructor Description TranslatorSplitter()
Constructor.
-
Method Summary
-
Methods inherited from class com.norconex.importer.handler.splitter.AbstractDocumentSplitter
splitDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Field Detail
-
API_MICROSOFT
public static final String API_MICROSOFT
- See Also:
- Constant Field Values
-
API_GOOGLE
public static final String API_GOOGLE
- See Also:
- Constant Field Values
-
API_LINGO24
public static final String API_LINGO24
- See Also:
- Constant Field Values
-
API_MOSES
public static final String API_MOSES
- See Also:
- Constant Field Values
-
API_YANDEX
public static final String API_YANDEX
- See Also:
- Constant Field Values
-
-
Method Detail
-
splitApplicableDocument
protected List<Doc> splitApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState) throws ImporterHandlerException
- Specified by:
splitApplicableDocument
in classAbstractDocumentSplitter
- Throws:
ImporterHandlerException
-
isIgnoreContent
public boolean isIgnoreContent()
-
setIgnoreContent
public void setIgnoreContent(boolean ignoreContent)
-
setFieldsToTranslate
public void setFieldsToTranslate(String... fieldsToTranslate)
-
isIgnoreNonTranslatedFields
public boolean isIgnoreNonTranslatedFields()
-
setIgnoreNonTranslatedFields
public void setIgnoreNonTranslatedFields(boolean ignoreNonTranslatedFields)
-
getSourceLanguageField
public String getSourceLanguageField()
-
setSourceLanguageField
public void setSourceLanguageField(String sourceLanguageField)
-
getSourceLanguage
public String getSourceLanguage()
-
setSourceLanguage
public void setSourceLanguage(String sourceLanguage)
-
setTargetLanguages
public void setTargetLanguages(String... targetLanguages)
-
getApiKey
public String getApiKey()
-
setApiKey
public void setApiKey(String apiKey)
-
getUserKey
public String getUserKey()
-
setUserKey
public void setUserKey(String userKey)
-
getSmtPath
public String getSmtPath()
-
setSmtPath
public void setSmtPath(String smtPath)
-
getScriptPath
public String getScriptPath()
-
setScriptPath
public void setScriptPath(String scriptPath)
-
getClientId
public String getClientId()
-
setClientId
public void setClientId(String clientId)
-
getClientSecret
public String getClientSecret()
-
setClientSecret
public void setClientSecret(String clientSecret)
-
getApi
public String getApi()
-
setApi
public void setApi(String api)
-
main
public static void main(String[] args) throws ImporterHandlerException
- Throws:
ImporterHandlerException
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-