Getting started with SpeechKit

If you want to see how the service synthesizes or recognizes speech, use the demo on the service page.

In this section, you'll learn how to use the SpeechKit API. First you will create an audio file from text and then try to recognize the audio.

Before you start

To try the examples in this section:

  1. On the billing page, make sure that the payment account has the ACTIVE or TRIAL_ACTIVE status. If you don't have a payment account, create one.
  2. Make sure you have installed the cURL utility that is used in the examples.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  4. Get an IAM token for your Yandex account.

To perform these operations on behalf of the service account:

  1. Assign the editor role or a higher role to the service account for the folder where it was created.
  2. Do not specify the folder ID in the request: the service uses the folder where the service account was created.
  3. Choose the authentication method: get an IAM token or API key.

Text-to-speech

With speech synthesis, you can convert text to speech and save it to an audio file.

The service supports the following languages:

  • ru-RU — Russian
  • en-US — English
  • tr-TR — Turkish

Pass the text in the text field as the request message body using URL encoding. In the lang parameter, set the text language. In the folderId parameter, specify the folder ID obtained before you started. Write the response to the file:

$ export TEXT="Hello world!"
$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     --data-urlencode "text=Hello World" \
     -d "lang=en-US&folderId=${FOLDER_ID}" \
     "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

The synthesized speech will be written to the speech.ogg file in the directory that you executed this command from.

By default, audio is created in the OggOpus format. You can listen to the created file in a browser like Yandex Browser or Mozilla Firefox.

Read more about the format of a speech synthesis request.

Speech recognition

The service can recognize speech three different ways. This section describes recognition of short audio files.

Pass the binary content of your audio file in the request message body. In the Query parameters, specify the recognition language (lang) and the folder ID (folderId). The service responds with the recognized text:

$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     --data-binary "@speech.ogg" \
     "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?folderId=${FOLDER_ID}"

{"result":"Hello world"}

What's next