Short audio recognition

Short audio recognition ensures fast response times and is suitable for small single-channel audio fragments.

If you want to recognize speech over the same connection, use streaming mode. In streaming mode, you can get intermediate recognition results.

Audio requirements

The audio you send must meet the following requirements:

  1. Maximum file size — {{ stt-short—fileSize }}.
  2. Maximum length — 1 minute.
  3. Maximum number of audio channels — 1.

If your file is larger, longer, or has more audio channels, use long audio recognition.

HTTP request

POST https://stt.api.cloud.yandex.net/speech/v1/stt:recognize

Use the "Transfer-Encoding: chunked" header for data streaming.

Query parameters

Parameter Description
lang The language for speech recognition.
Acceptable values:
  • ru-RU (default) — Russian.
  • en-US — English.
  • tr-TR — Turkish.
topic The language model to be used for recognition.
The closer the model is matched, the better the recognition result. You can only specify one model per request.
Acceptable values depend on the selected language. Default parameter value: general.
profanityFilter This parameter controls the profanity filter in recognized speech.
Acceptable values:
  • false (default) — Profanity is not excluded from recognition results.
  • true — Profanity is excluded from recognition results.
format The format of the submitted audio.
Acceptable values:
sampleRateHertz The sampling frequency of the submitted audio.
Used if format is set to lpcm. Acceptable values:
  • 48000 (default) — Sampling rate of 48 kHz.
  • 16000 — Sampling rate of 16 kHz.
  • 8000 — Sampling rate of 8 kHz.
folderId

ID of the folder that you have access to. Required for authorization with a user account (see the UserAccount resource). Don't specify this field if you make a request on behalf of a service account.

Maximum string length: 50 characters.

Parameters in the request body

The request body has to contain the binary content of an audio file.

Response

The recognized text is returned in the response in the result field.

{
  "result": <recognized text>
}

Examples

To recognize speech in Russian, send an audio fragment (for example, speech.ogg) to the service.

Sample request

POST /speech/v1/stt:recognize?topic=general&lang=ru-RU&folderId={folder ID} HTTP/1.1
Host: stt.api.cloud.yandex.net
Authorization: Bearer <IAM-TOKEN>

... (binary content of an audio file)
$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     -H "Transfer-Encoding: chunked" \
     --data-binary "@speech.ogg" \
     "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?topic=general&folderId=${FOLDER_ID}"
import urllib.request
import json

FOLDER_ID = "b1gvmob95yysaplct532" # ID of the folder
IAM_TOKEN = "CggaATEVAgA..." # IAM token

with open("speech.ogg", "rb") as f:
    data = f.read()

params = "&".join([
    "topic=general",
    "folderId=%s" % FOLDER_ID,
    "lang=ru-RU"
])

url = urllib.request.Request("https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?%s" % params, data=data)
url.add_header("Authorization", "Bearer %s" % IAM_TOKEN)

responseData = urllib.request.urlopen(url).read().decode('UTF-8')
decodedData = json.loads(responseData)

if decodedData.get("error_code") is None:
    print(decodedData.get("result"))
<?php

$token = 'CggaATEVAgA...'; # IAM token
$folderId = "b1gvmob95yysaplct532"; # ID of the folder
$audioFileName = "speech.ogg";

$file = fopen($audioFileName, 'rb');

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?lang=ru-RU&folderId=${folderId}&format=oggopus");
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: Bearer ' . $token, 'Transfer-Encoding: chunked'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);

curl_setopt($ch, CURLOPT_INFILE, $file);
curl_setopt($ch, CURLOPT_INFILESIZE, filesize($audioFileName));
$res = curl_exec($ch);
curl_close($ch);
$decodedResponse = json_decode($res, true);
if (isset($decodedResponse["result"])) {
    echo $decodedResponse["result"];
} else {
    echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
    echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
}

fclose($file);

Sample response

HTTP/1.1 200 OK
YaCloud-Billing-Units: 15
{
  "result": "your number is 212-85-06"
}