API method description

Converts speech from an audio file into text.

HTTP request

POST https://stt.api.cloud.yandex.net/speech/v1/stt:recognize

Use the "Transfer-Encoding: chunked" header for data streaming.

Query parameters

Parameter Description
lang The language for speech recognition.
Acceptable values:
  • ru-RU (default) — Russian.
  • en-US — English.
  • tr-TR — Turkish.
topic The language model to be used for recognition.
The closer the model is matched, the better the recognition result. You can only specify one model per request.
Acceptable values depend on the selected language. Default parameter value: general.
profanityFilter This parameter controls the profanity filter in recognized speech.
Acceptable values:
  • false (default) — Profanity is not excluded from recognition results.
  • true — Profanity is excluded from recognition results.
format The format of the submitted audio.
Acceptable values:
  • lpcm — Audio file in the LPCM format with no WAV header. Audio characteristics:
    • Sampling — 8, 16, or 48 kHz, depending on the sampleRateHertz parameter value.
    • Bit depth — 16-bit.
    • Byte order — Reversed (little-endian).
    • Audio data is stored as signed integers.
  • oggopus (default) — Data is encoded using the OPUS audio codec and compressed using the OGG container format (OggOpus).
sampleRateHertz The sampling frequency of the submitted audio.
Used if format is set to lpcm. Acceptable values:
  • 48000 (default) — Sampling rate of 48 kHz.
  • 16000 — Sampling rate of 16 kHz.
  • 8000 — Sampling rate of 8 kHz.
folderId Required parameter.
ID of your folder.
For this API method, folderId is passed in the Query parameters rather than the request body.
For more information about how to find the folder ID, see the section Authorization in the API.

Parameters in the request body

The request body must pass the binary content of an audio file that meets the following requirements:

  1. Maximum size: 1 MB
  2. Maximum duration: 1 minute.
  3. Number of audio channels: 1.

Response

The response contains a recognition hypothesis.

The recognition hypothesis is what the recognition system assumes has been said.

The recognized text is processed before it is sent back: numbers are converted to digits, certain punctuation marks (such as hyphens) are added, and so on. The converted text is the final recognition result that is sent in the response body.

The response is returned in JSON format.

{
  "result": <recognition hypothesis>
}

Examples

To use the SpeechKit API for speech recognition in Russian, send a small audio fragment (for example, speech.ogg) via a POST request.

Sample request

POST /speech/v1/stt:recognize?topic=general&lang=ru-RU&folderId={folder ID} HTTP/1.1
Host: stt.api.cloud.yandex.net
Authorization: Bearer <IAM-TOKEN>

... (binary content of an audio file)
$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     -H "Transfer-Encoding: chunked" \
     --data-binary "@speech.ogg" \
     "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?topic=general&folderId=${FOLDER_ID}"
import urllib.request
import json

FOLDER_ID = "b1gvmob95yysaplct532" # ID of the folder
IAM_TOKEN = "CggaATEVAgA..." # IAM token

with open("speech.ogg", "rb") as f:
    data = f.read()

params = "&".join([
    "topic=general",
    "folderId=%s" % FOLDER_ID,
    "lang=ru-RU"
])

url = urllib.request.Request("https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?%s" % params, data=data)
url.add_header("Authorization", "Bearer %s" % IAM_TOKEN)

responseData = urllib.request.urlopen(url).read().decode('UTF-8')
decodedData = json.loads(responseData)

if decodedData.get("error_code") is None:
    print(decodedData.get("result"))
<?php

$token = 'CggaATEVAgA...'; # IAM token
$folderId = "b1gvmob95yysaplct532"; # ID of the folder
$audioFileName = "speech.ogg";

$file = fopen($audioFileName, 'rb');

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize?lang=ru-RU&folderId=${folderId}&format=oggopus");
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: Bearer ' . $token, 'Transfer-Encoding: chunked'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);

curl_setopt($ch, CURLOPT_INFILE, $file);
curl_setopt($ch, CURLOPT_INFILESIZE, filesize($audioFileName));
$res = curl_exec($ch);
curl_close($ch);
$decodedResponse = json_decode($res, true);
if (isset($decodedResponse["result"])) {
    echo $decodedResponse["result"];
} else {
    echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
    echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
}

fclose($file);

Sample response

HTTP/1.1 200 OK
YaCloud-Billing-Units: 15
{
  "result": "your number is 212-85-06"
}