Class TranslatorSplitter

  • All Implemented Interfaces:
    IXMLConfigurable, IImporterHandler, IDocumentSplitter

    public class TranslatorSplitter
    extends AbstractDocumentSplitter

    Translate documents using one of the supported translation API. The following lists the supported APIs, along with the required authentication properties or settings for each:

    For example, the Microsoft Translation API requires a client ID and a client secret, both obtained on Microsoft Azure Marketplace with your Microsoft account.

    Translated documents will have the original document language stored in a field "document.translatedFrom".

    This class is not a document "splitter" per se, but like regular splitters, the translation will create children documents for each translation performed. The parent document will always remain the original document, while the children will always be the translations.

    XML configuration usage:

    
    <handler
        class="com.norconex.importer.handler.splitter.impl.TranslatorSplitter"
        api="(microsoft|google|lingo24|moses|yandex)">
      <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
      <restrictTo>
        <fieldMatcher>(field-matching expression)</fieldMatcher>
        <valueMatcher>(value-matching expression)</valueMatcher>
      </restrictTo>
      <ignoreContent>(false|true)</ignoreContent>
      <ignoreNonTranslatedFields>(false|true)</ignoreNonTranslatedFields>
      <fieldsToTranslate>(coma-separated list of fields)</fieldsToTranslate>
      <sourceLanguageField>(field containing language)</sourceLanguageField>
      <sourceLanguage>(language when no source language field)</sourceLanguage>
      <targetLanguages>(coma-separated list of languages)</targetLanguages>
      <!-- Microsoft -->
      <clientId>...</clientId>
      <clientSecret>...</clientSecret>
      <!-- Google -->
      <apiKey>...</apiKey>
      <!-- Lingo24 -->
      <userKey>...</userKey>
      <!-- Moses -->
      <smtPath>...</smtPath>
      <scriptPath>...</scriptPath>
      <!-- Yandex -->
      <apiKey>...</apiKey>
    </handler>

    XML usage example:

    
    <handler
        class="TranslatorSplitter"
        api="google">
      <sourceLanguageField>langField</sourceLanguageField>
      <targetLanguages>fr</targetLanguages>
      <apiKey>...MYKEYHERE...</apiKey>
    </handler>

    The above example uses the Google translation API to translate documents into French, taking the source document language from a field called "langField".

    Since:
    2.1.0
    Author:
    Pascal Essiembre