How to synthesize speech in SpeechKit API v3
Written by
Updated at December 26, 2023
Speech synthesis converts text to speech and saves it to an audio file. In this section, you will learn how to synthesize speech from text using the SpeechKit API v1 (REST).
Send a request to convert text to speech:
read -r -d '' TEXT << EOM
> I'm Yandex Speech+Kit.
> I can turn any text into speech.
> Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl -X POST \
-H "Authorization: Bearer ${IAM_TOKEN}" \
--data-urlencode "text=${TEXT}" \
-d "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
"https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
Where:
FOLDER_ID
: Folder ID received before starting.IAM_TOKEN
: IAM token received before starting.TEXT
: Text to be recognized with the applied URL encoding.lang
: Language of the text.voice
: Voice for speech synthesis.speech.ogg
: The file where the response will be written.
Note
For homographs, use +
before the stressed vowel. For example, +import
, im+port
. To mark a pause between words, use -
. Maximum string length: 5,000 characters.
The synthesized speech will be written to the speech.ogg
file in the directory that you executed this command from.
By default, audio is created in OggOpus
See the description of request format for speech synthesis.