Getting started with SpeechKit
If you want to see how the service synthesizes or recognizes speech, use the demo on the service page.
In this section, you'll learn how to use the SpeechKit API. First you will create an audio file from text and then try to recognize the audio.
For information about SpeechKit usage costs, see Pricing for SpeechKit.
Before you start
To use the examples, install cURL and get the authorization data for your account:
- On the billing page, make sure that your billing account status is
ACTIVE
orTRIAL_ACTIVE
. If you don't have a billing account, create one. - Get an IAM token required for authentication.
- Get the ID of any folder that your account is granted the
editor
role or higher for.
-
Select the authentication method:
-
Get an IAM token used in the examples.
-
Create an API key. Pass the API key in the
Authorization
header in the following format:Authorization: Api-Key <API key>
-
-
Assign the
editor
role or a higher role to the service account for the folder where it was created.Don't specify the folder ID in your requests: the service uses the folder where the service account was created.
-
Use the CLI to get an IAM token required for authentication:
yc iam create-token
-
Get the ID of any folder that your account is granted the
editor
role or higher for.
Text-to-speech
With speech synthesis, you can convert text to speech and save it to an audio file.
Send the request to convert speech to text:
read -r -d '' TEXT << EOM
> I'm Yandex Speech+Kit.
> I can turn any text into speech.
> Now yo+u can, too!
> EOM
export FOLDER_ID=<folder ID>
export IAM_TOKEN=<IAM token>
curl -X POST \
-H "Authorization: Bearer ${IAM_TOKEN}" \
--data-urlencode "text=${TEXT}" \
-d "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
"https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
Where:
TEXT
: Text to be recognized with the applied URL encoding.FOLDER_ID
: Folder ID received before starting.IAM_TOKEN
: IAM token received before starting.lang
: Language of the text.voice
: Voice for speech synthesis.speech.ogg
: The file to which the response will be written.
Note
For homographs, use +
before the stressed vowel. For example, +import
, im+port
. To mark a pause between words, use -
. Maximum string length: 500 characters.
The synthesized speech will be written to the speech.ogg
file in the directory that you executed this command from.
By default, audio is created in the OggOpus format. You can listen to the created file in a browser like Yandex Browser or Mozilla Firefox.
Read more about the request format for speech synthesis.
Speech recognition
The service can recognize speech in different ways. In this section, synchronous recognition is used.
Pass the binary content of your audio file in the request body, specifying the following in its parameters:
lang
: Desired recognition language.folderId
: Folder ID received before starting.
curl -X POST \
-H "Authorization: Bearer ${IAM_TOKEN}" \
--data-binary "@speech.ogg" \
"https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?folderId=${FOLDER_ID}&lang=ru-RU"
The service responds with the recognized text:
{"result":"I'm Yandex SpeechKit. I can turn any text into speech. Now you can, too!"}