Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
Yandex SpeechKit
  • Getting started
  • Speech recognition
    • About the technology
    • Data streaming recognition
    • Audio file recognition
      • Synchronous recognition
      • Asynchronous recognition
    • Recognition models
    • Extending a speech recognition model
    • Uploading model retraining data
  • Speech synthesis
    • About the technology
    • List of voices
    • Configuring speech generation
    • Using SSML
    • List of supported SSML phonemes
    • SpeechKit Brand Voice
      • About SpeechKit Brand Voice
      • Input data format for Brand Voice Adaptive
      • Uploading data for Brand Voice
  • Audio formats
  • IVR integration
  • Using the API
    • Authentication in the API
    • Response format
    • Troubleshooting
    • Recognition API reference guides
      • API v2
        • Streaming Recognition API
        • Synchronous Recognition API
        • Asynchronous Recognition API
      • gRPC API v3 (eng)
        • Overview
        • Recognizer
    • Example uses for the recognition API
      • Streaming recognition, API v3
      • Streaming recognition, API v2
      • Synchronous recognition, API v2
      • Asynchronous recognition of LPCM format, API v2
      • Asynchronous recognition of OggOpus format, API v2
    • Synthesis API reference guides
      • API v1
      • gRPC API v3 (eng)
        • Overview
        • Synthesizer
    • Example uses for the synthesis API
      • Synthesis API v3
  • SpeechKit Hybrid
    • About the technology
    • System requirements
  • Quotas and limits
  • Access management
  • Pricing policy
  • Releases
    • Recognition releases
    • Synthesis releases
    • Release archive
  • Public materials
  • Questions and answers
  1. Getting started

Getting started with SpeechKit

Written by
Yandex Cloud
  • Before you start
  • Text-to-speech
  • Speech recognition

If you want to see how the service synthesizes or recognizes speech, use the demo on the service page.

In this section, you'll learn how to use the SpeechKit API. First you will create an audio file from text and then try to recognize the audio.

For information about SpeechKit usage costs, see Pricing for SpeechKit.

Before you start

To use the examples, install cURL and get the authorization data for your account:

User's account on Yandex
Service accounts
Federated account
  1. On the billing page, make sure that your billing account status is ACTIVE or TRIAL_ACTIVE. If you don't have a billing account, create one.
  2. Get an IAM token required for authentication.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  1. Select the authentication method:

    • Get an IAM token used in the examples.

    • Create an API key. Pass the API key in the Authorization header in the following format:

      Authorization: Api-Key <API key>
      
  2. Assign the editor role or a higher role to the service account for the folder where it was created.

    Don't specify the folder ID in your requests: the service uses the folder where the service account was created.

  1. Authenticate with the CLI as a federated user.

  2. Use the CLI to get an IAM token required for authentication:

    yc iam create-token
    
  3. Get the ID of any folder that your account is granted the editor role or higher for.

Text-to-speech

With speech synthesis, you can convert text to speech and save it to an audio file.

Send the request to convert speech to text:

read -r -d '' TEXT << EOM
> I'm Yandex Speech+Kit.
> I can turn any text into speech.
> Now yo+u can, too!
> EOM
export FOLDER_ID=<folder ID>
export IAM_TOKEN=<IAM token>
curl -X POST \
   -H "Authorization: Bearer ${IAM_TOKEN}" \
   --data-urlencode "text=${TEXT}" \
   -d "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
   "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

Where:

  • TEXT: Text to be recognized with the applied URL encoding.
  • FOLDER_ID: Folder ID received before starting.
  • IAM_TOKEN: IAM token received before starting.
  • lang: Language of the text.
  • voice: Voice for speech synthesis.
  • speech.ogg: The file to which the response will be written.

Note

For homographs, use + before the stressed vowel. For example, +import, im+port. To mark a pause between words, use -. Maximum string length: 500 characters.

The synthesized speech will be written to the speech.ogg file in the directory that you executed this command from.

By default, audio is created in the OggOpus format. You can listen to the created file in a browser like Yandex Browser or Mozilla Firefox.

Read more about the request format for speech synthesis.

Speech recognition

The service can recognize speech in different ways. In this section, synchronous recognition is used.

Pass the binary content of your audio file in the request body, specifying the following in its parameters:

  • lang: Desired recognition language.
  • folderId: Folder ID received before starting.
curl -X POST \
   -H "Authorization: Bearer ${IAM_TOKEN}" \
   --data-binary "@speech.ogg" \
   "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?folderId=${FOLDER_ID}&lang=ru-RU"

The service responds with the recognized text:

{"result":"I'm Yandex SpeechKit. I can turn any text into speech. Now you can, too!"}

What's next

  • Read more about speech synthesis
  • Read more about speech recognition
  • Learn about API authentication methods
  • Learn how to integrate IVR

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Before you start
  • Text-to-speech
  • Speech recognition