Asynchronously recognizing audio files in LPCM format
Below is an example of asynchronous recognition of speech from an audio file using the SpeechKit API v2. This example uses the following parameters:
- Language: Russian
- Language model:
general:rc
- Format of the submitted audio: LPCM with a sampling rate of 8000 Hz
- Number of audio channels: 1 (default)
- Other parameters left by default.
You can generate and send a speech recognition request using cURL
An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.
Getting started
-
Create a bucket and upload the audio file to recognize there.
-
Create a service account.
Warning
Please note that you can only recognize audio files asynchronously under a service account. Do not use any other accounts in Yandex Cloud for that.
-
Assign the service account the
storage.uploader
andai.speechkit-stt.user
roles for the folder where you created the bucket. -
Get an IAM token for the created service account.
If you do not have an LPCM audio file, you can download a sample file
Perform speech recognition via the API
-
Get a link to an audio file in Object Storage.
-
Create a file, e.g.,
body.json
, and paste the following code to it:{ "config": { "specification": { "languageCode": "ru-RU", "model": "general:rc", "audioEncoding": "LINEAR16_PCM", "sampleRateHertz": 8000, "audioChannelCount": 1 } }, "audio": { "uri": "<link_to_audio_file>" } }
Where:
-
languageCode
: Recognition language -
model
: Language model -
audioEncoding
: Format of the submitted audio -
sampleRateHertz
: Audio file sampling rate in Hz -
audioChannelCount
: Number of audio channels -
uri
: Link to the audio file in Object Storage, e.g.,https://storage.yandexcloud.net/speechkit/speech.pcm
The link contains additional query parameters (after
?
) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.
-
-
Run the created file:
export IAM_TOKEN=<service_account_IAM_token> && \ curl -X POST \ -H "Authorization: Bearer ${IAM_TOKEN}" \ -d "@body.json"\ https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
Where
IAM_TOKEN
is the IAM token of the service account.Result example:
{ "done": false, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:29Z" }
Save the recognition operation
id
that you received in the response. -
Wait for the recognition to complete. It takes about 10 seconds to recognize one minute of a single-channel audio file.
-
Send a request to get information about the operation:
curl -H "Authorization: Bearer ${IAM_TOKEN}" \ https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse", "chunks": [ { "alternatives": [ { "text": "hello world", "confidence": 1 } ], "channelTag": "1" } ] }, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:36Z" }