How to synthesize speech in SpeechKit API v3

Written by

Updated at December 26, 2023

Speech synthesis converts text to speech and saves it to an audio file. In this section, you will learn how to synthesize speech from text using the SpeechKit API v1 (REST).

Send a request to convert text to speech:

read -r -d '' TEXT << EOM
> I'm Yandex Speech+Kit.
> I can turn any text into speech.
> Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl -X POST \
   -H "Authorization: Bearer ${IAM_TOKEN}" \
   --data-urlencode "text=${TEXT}" \
   -d "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
  "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

Where:

FOLDER_ID: Folder ID received before starting.
IAM_TOKEN: IAM token received before starting.
TEXT: Text to be recognized with the applied URL encoding.
lang: Language of the text.
voice: Voice for speech synthesis.
speech.ogg: The file where the response will be written.

Note

For homographs, use + before the stressed vowel. For example, +import, im+port. To mark a pause between words, use -. Maximum string length: 5,000 characters.

The synthesized speech will be written to the speech.ogg file in the directory that you executed this command from.

By default, audio is created in OggOpus format. You can listen to the file you created in your browser, e.g., Yandex Browser or Mozilla Firefox.

See the description of request format for speech synthesis.

How to synthesize speech in SpeechKit API v3

Tutorials

Was the article helpful?

How to synthesize speech in SpeechKit API v3

TutorialsTutorials

Was the article helpful?

Tutorials