Yandex SpeechKit technology overview
Written by
Updated at October 9, 2023
Yandex SpeechKit voice technologies help handle any task related to human speech. SpeechKit can recognize speech in real time and using pre-recorded audio files, automatically detecting the speaker's language. It can also vocalize pattern phrases and long texts using SpeechKit standard voices.
SpeechKit runs using the API interfaces. Depending on the task at hand, you can use the gRPC or REST interfaces. For more information about API implementations in Yandex Cloud, see Yandex Cloud API concepts.
The table provides the most common SpeechKit use cases so that you can choose the appropriate technologies and configure them to meet your needs.
Description | Recommended technologies | Features and settings |
---|---|---|
Voice robot | ||
Full or partial automation of telephone communications with customers. | To input a user's request: Streaming recognition. For a system response: Speech synthesis using standard voices and a Brand Voice specifically designed for you. |
|
Voice analysisOperator performance quality control | ||
Transcribing and further analysis of audio recordings of dialogs between customers and call center operators or robots. | To recognize pre-recorded audio files: Asynchronous recognition of audio files. |
|
Voice control in apps and smart devices Voice assistant | ||
The user requests an action or search using voice and the service responds with an action with a voice comment or an image. | To input a user's request: Streaming recognition. For a system response: Speech synthesis using standard voices and a Brand Voice. |
|
Service adaptation to people with visual impairments | ||
Voice control, voice hints and comments for visually impaired users. | To input a user's request: Streaming recognition. For a system response: Speech synthesis using standard voices and a Brand Voice. |
|
Recognizing audio recordings made during a meeting | ||
Transcribing the audio recordings after the meeting is completed. | To recognize pre-recorded audio files: Asynchronous recognition of audio files. |
|
Voicing books and videos | ||
Voicing a book or video with no human speaker involved. | Speech synthesis using standard voices and a Brand Voice. |
|
Recording the minutes of a meeting | ||
Transcribing the meeting minutes in real time | To recognize the participants' speech: Streaming recognition. |
|
Video subtitles | ||
Creating subtitles for recorded videos | To recognize an audio track: Asynchronous recognition of audio files. |
|
Broadcast subtitles | ||
Transcribing broadcasts in real time. | To recognize the broadcast speech: Streaming recognition. |
|
Transcribing voice messages | ||
Converting short voice messages to text in messengers | To recognize audio files: Synchronous recognition. | Recognition result settings. |