Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex SpeechKit
  • Getting started
  • Speech recognition
    • About the technology
    • Short audio recognition
    • Recognition of long audio fragments
    • Data streaming recognition
    • Audio formats
    • Recognition models
  • Speech synthesis
    • About the technology
    • API method description
    • List of voices
    • Using SSML
    • List of supported SSML phonemes
  • IVR integration
  • Using the API
    • Authentication in the API
    • Response format
    • Troubleshooting
  • Quotas and limits
  • Access management
  • Pricing policy
    • Current pricing policy
    • Archive
      • Policy before January 1, 2019
  • Questions and answers
  1. Getting started

Getting started with SpeechKit

  • Before you start
  • Text-to-speech
  • Speech recognition

If you want to see how the service synthesizes or recognizes speech, use the demo on the service page.

In this section, you'll learn how to use the SpeechKit API. First you will create an audio file from text and then try to recognize the audio.

Before you start

To use the examples, install cURL and get the authorization data for your account:

User's account on Yandex
Service accounts
Federated account
  1. On the billing page, make sure that your billing account status is ACTIVE or TRIAL_ACTIVE. If you don't have a billing account, create one.
  2. Get an IAM token required for authentication.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  1. Select the authentication method:

    • Get an IAM token used in the examples.

    • Create an API key. Pass the API key in the Authorization header in the following format:

      Authorization: Api-Key <API key>
      
  2. Assign the editor role or a higher role to the service account for the folder where it was created.

    Don't specify the folder ID in your requests: the service uses the folder where the service account was created.

  1. Authenticate with the CLI as a federated user.

  2. Use the CLI to get an IAM token required for authentication:

    $ yc iam create-token
    
  3. Get the ID of any folder that your account is granted the editor role or higher for.

Text-to-speech

With speech synthesis, you can convert text to speech and save it to an audio file.

The service supports the following languages:

  • ru-RU — Russian
  • en-US — English
  • tr-TR — Turkish

Pass the text in the text field as the request message body using URL encoding. In the lang parameter, set the text language. In the folderId parameter, specify the folder ID obtained before you started. Write the response to the file:

$ export TEXT="Hello world!"
$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     --data-urlencode "text=Hello World" \
     -d "lang=en-US&folderId=${FOLDER_ID}" \
     "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

The synthesized speech will be written to the speech.ogg file in the directory that you executed this command from.

By default, audio is created in the OggOpus format. You can listen to the created file in a browser like Yandex Browser or Mozilla Firefox.

Read more about the format of a speech synthesis request.

Speech recognition

The service can recognize speech three different ways. This section describes recognition of short audio files.

Pass the binary content of your audio file in the request message body. In the Query parameters, specify the recognition language (lang) and the folder ID (folderId). The service responds with the recognized text:

$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     --data-binary "@speech.ogg" \
     "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?folderId=${FOLDER_ID}"

{"result":"Hello world"}

What's next

  • Read more about speech synthesis
  • Read more about speech recognition
  • Learn about API authentication methods
  • Learn about IVR integration
In this article:
  • Before you start
  • Text-to-speech
  • Speech recognition
Language
Careers
Privacy policy
Terms of use
© 2021 Yandex.Cloud LLC