How to improve the accuracy of translations

To improve the accuracy of translations:

  • Specify the source language. Some words are written the same in different languages, but have different meanings. If the model detects the wrong source language, these words are translated differently.
  • Specify your translation glossary. One word can be translated different ways. For example, the English word oil can be translated to Russian as масло or нефть. You can use a glossary to indicate the proper translation of a word or phrase. Learn more about glossaries.

Before getting started

To try the examples in this section:

  1. On the billing page, make sure that the payment account has the ACTIVE or TRIAL_ACTIVE status. If you don't have a payment account, create one.
  2. Make sure you have installed the cURL utility that is used in the examples.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  4. Get an IAM token for your Yandex account.

To perform these operations on behalf of the service account:

  1. Assign the editor role or a higher role to the service account for the folder where it was created.
  2. Do not specify the folder ID in the request: the service uses the folder where the service account was created.
  3. Choose the authentication method: get an IAM token or API key.

Specify the source language

Words are sometimes written the same in different languages, but are translated differently. For example, the word angel in English means a spiritual being, while in German it means a fishing rod. If the text you pass contains such words, Translate may detect the wrong source language.

To avoid mistakes, specify the source language in the sourceLanguageCode field:

{
    "folder_id": "b1gvmob95yysaplct532",
    "texts": ["angel"],
    "targetLanguageCode": "ru",
    "sourceLanguageCode": "de"
}

Save the request body in a file (for example, body.json) and pass the file using the translate method:

$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${IAM_TOKEN}" \
    -d '@body.json' \
    "https://translate.api.cloud.yandex.net/translate/v2/translate"

This returns a translation from the correct language:

{
    "translations": [
        {
            "text": "удочка"
        }
    ]
}

Specify your translation glossary

One word can be translated different ways. For example, the English word oil can be translated as масло or нефть. To improve the accuracy of translations, use a glossary of your terms and phrases with a single translation.

Specify the glossary in the glossaryConfig field. Currently, you can only pass a glossary as an array of text pairs.

In the sourceLanguageCode field, specify the source language. This field is required when you use glossaries:

{
    "sourceLanguageCode": "tr",
    "targetLanguageCode": "ru",
    "texts": [
        "cırtlı çocuk spor ayakkabı"
    ],
    "folderId": "b1gvmob95yysaplct532",
    "glossaryConfig": {
        "glossaryData": {
            "glossaryPairs": [
                {
                    "sourceText": "spor ayakkabı",
                    "translatedText": "кроссовки"
                }
            ]
        }
    }
}

Save the request body in a file (for example, body.json) and pass the file using the translate method:

$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${IAM_TOKEN}" \
    -d '@body.json' \
    "https://translate.api.cloud.yandex.net/translate/v2/translate"

The response will contain a translation based on terms in your glossary:

{
 "translations": [
  {
   "text": "Детские кроссовки с липучкой"
  }
 ]
}

Without the glossary, the translation would be:

{
 "translations": [
  {
   "text": "детская спортивная обувь с липучкой"
  }
 ]
}