Asynchronously recognizing audio files in LPCM format

Written by

Updated at February 16, 2024

Getting started
Perform speech recognition via the API

Below is an example of asynchronous recognition of speech from an audio file using the SpeechKit API v2. This example uses the following parameters:

Language: Russian
Language model: general:rc
Format of the submitted audio: LPCM with a sampling rate of 8000 Hz
Number of audio channels: 1 (default)
Other parameters left by default.

You can generate and send a speech recognition request using cURL.

An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.

Getting started

Create a bucket and upload the audio file to recognize there.
Create a service account.

Warning

Please note that you can only recognize audio files asynchronously under a service account. Do not use any other accounts in Yandex Cloud for that.
Assign the service account the storage.uploader and ai.speechkit-stt.user roles for the folder where you created the bucket.
Get an IAM token for the created service account.

If you do not have an LPCM audio file, you can download a sample file.

Perform speech recognition via the API

cURL

Get a link to an audio file in Object Storage.
Create a file, e.g., body.json, and paste the following code to it:
```
{
   "config": {
      "specification": {
         "languageCode": "ru-RU",
         "model": "general:rc",
         "audioEncoding": "LINEAR16_PCM",
         "sampleRateHertz": 8000,
         "audioChannelCount": 1
      }
   },
   "audio": {
      "uri": "<link_to_audio_file>"
   }
}
```
Where:
- languageCode: Recognition language
- model: Language model
- audioEncoding: Format of the submitted audio
- sampleRateHertz: Audio file sampling rate in Hz
- audioChannelCount: Number of audio channels
- uri: Link to the audio file in Object Storage, e.g., https://storage.yandexcloud.net/speechkit/speech.pcm
  
  The link contains additional query parameters (after ?) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.

Run the created file:

export IAM_TOKEN=<service_account_IAM_token> && \
curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     -d "@body.json"\
     https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize

Where IAM_TOKEN is the IAM token of the service account.

Result example:

{
   "done": false,
   "id": "e03sup6d5h1q********",
   "createdAt": "2019-04-21T22:49:29Z",
   "createdBy": "ajes08feato8********",
   "modifiedAt": "2019-04-21T22:49:29Z"
}

Save the recognition operation id that you received in the response.

Wait for the recognition to complete. It takes about 10 seconds to recognize one minute of a single-channel audio file.

Send a request to get information about the operation:

curl -H "Authorization: Bearer ${IAM_TOKEN}" \
   https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>

Result example:

{
   "done": true,
   "response": {
      "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse",
      "chunks": [
         {
            "alternatives": [
               {
                  "text": "hello world",
                  "confidence": 1
               }
            ],
            "channelTag": "1"
         }
      ]
   },
   "id": "e03sup6d5h1q********",
   "createdAt": "2019-04-21T22:49:29Z",
   "createdBy": "ajes08feato8********",
   "modifiedAt": "2019-04-21T22:49:36Z"
}

Asynchronously recognizing audio files in LPCM format

Getting started

Perform speech recognition via the API

See also

Was the article helpful?

Asynchronously recognizing audio files in LPCM format

Getting startedGetting started

Perform speech recognition via the APIPerform speech recognition via the API

See alsoSee also

Was the article helpful?

Getting started

Perform speech recognition via the API

See also