Getting started with SpeechKit
If you want to see how the service synthesizes or recognizes speech, use the demo on the service page.
In this section, you'll learn how to use the SpeechKit API. First you will create an audio file from text and then try to recognize the audio.
Before you start
To use the examples, install cURL and get the authorization data for your account:
- On the billing page, make sure that your billing account status is
ACTIVE
orTRIAL_ACTIVE
. If you don't have a billing account, create one. - Get an IAM token required for authentication.
- Get the ID of any folder that your account is granted the
editor
role or higher for.
-
Select the authentication method:
-
Get an IAM token used in the examples.
-
Create an API key. Pass the API key in the
Authorization
header in the following format:Authorization: Api-Key <API key>
-
-
Assign the
editor
role or a higher role to the service account for the folder where it was created.Don't specify the folder ID in your requests: the service uses the folder where the service account was created.
-
Use the CLI to get an IAM token required for authentication:
$ yc iam create-token
-
Get the ID of any folder that your account is granted the
editor
role or higher for.
Text-to-speech
With speech synthesis, you can convert text to speech and save it to an audio file.
The service supports the following languages:
ru-RU
— Russianen-US
— Englishtr-TR
— Turkish
Pass the text in the text
field as the request message body using URL encoding. In the lang
parameter, set the text language. In the folderId
parameter, specify the folder ID obtained before you started. Write the response to the file:
$ export TEXT="Hello world!"
$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
-H "Authorization: Bearer ${IAM_TOKEN}" \
--data-urlencode "text=Hello World" \
-d "lang=en-US&folderId=${FOLDER_ID}" \
"https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
The synthesized speech will be written to the speech.ogg
file in the directory that you executed this command from.
By default, audio is created in the OggOpus format. You can listen to the created file in a browser like Yandex Browser or Mozilla Firefox.
Read more about the format of a speech synthesis request.
Speech recognition
The service can recognize speech three different ways. This section describes recognition of short audio files.
Pass the binary content of your audio file in the request message body. In the Query parameters, specify the recognition language (lang
) and the folder ID (folderId
). The service responds with the recognized text:
$ curl -X POST \
-H "Authorization: Bearer ${IAM_TOKEN}" \
--data-binary "@speech.ogg" \
"https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?folderId=${FOLDER_ID}"
{"result":"Hello world"}