Detect the language of a text

To detect the language of a text, use the detectLanguage method.

Examples

To try the examples in this section:

  1. On the billing page, make sure that the payment account has the ACTIVE or TRIAL_ACTIVE status. If you don't have a payment account, create one.
  2. Make sure you have installed the cURL utility that is used in the examples.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  4. Get an IAM token for your Yandex account.

To perform these operations on behalf of the service account:

  1. Assign the editor role or a higher role to the service account for the folder where it was created.
  2. Do not specify the folder ID in the request: the service uses the folder where the service account was created.
  3. Choose the authentication method: get an IAM token or API key.

Hello, world

In this example, we will detect the language that the phrase Hello, world is written in.

  1. To detect the language of the text, pass it in the detectLanguage request body:

    $ export FOLDER_ID=b1gvmob95yysaplct532
    $ export IAM_TOKEN=CggaATEVAgA...
    $ export TEXT="Hello, world"
    $ curl -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -d "{\"folder_id\": \"${FOLDER_ID}\", \"text\": \"${TEXT}\"}" \
        "https://translate.api.cloud.yandex.net/translate/v2/detect"
    

    The service responds with the language code of the source text:

    {
        "languageCode": "en"
    }
    
  2. To find out what language corresponds to the received code, use the listLanguages method:

    curl -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -d "{\"folder_id\": \"${FOLDER_ID}\"}" \
        "https://translate.api.cloud.yandex.net/translate/v2/languages"
    

    The response will contain a list of language names in the corresponding language:

    {
        "languages": [
            {
            "code": "az",
            "name": "azərbaycan"
            },
            {
            "code": "sq",
            "name": "shqip"
            },
            {
            "code": "am",
            "name": "አማርኛ"
            },
            {
            "code": "en",
            "name": "English"
            },
            ...
        ]
    }
    

Specify the most likely languages

In some languages, one and the same word has the same spelling. For example, the English word hand is also written as hand in German, Swedish, and Dutch. If the text you transmit contains words like this, SpeechKit may detect the source language incorrectly.

To avoid mistakes, you can use the languageCodeHints field to specify which languages should be given priority when determining the language of the text:

{
    "folder_id": "b1gvmob95yysaplct532",
    "languageCodeHints":["ru", "de"],
    "text": "hand"
}

Save the request body in a file (for example, body.json) and pass it using the detectLanguage method:

$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${IAM_TOKEN}" \
    -d @body.json \
    "https://translate.api.cloud.yandex.net/translate/v2/detect"

{
    "languageCode": "de"
}