Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Solutions
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex SpeechKit
  • Getting started
  • Releases
  • Speech recognition
    • About the technology
    • Short audio recognition
    • Recognition of long audio fragments
    • Data streaming recognition
    • Audio formats
    • Recognition models
  • Speech synthesis
    • About the technology
    • API method description
    • List of voices
    • Using SSML
    • List of supported SSML phonemes
  • IVR integration
  • Using the API
    • Authentication in the API
    • Response format
    • Troubleshooting
  • Quotas and limits
  • Access management
  • Pricing policy
    • Current pricing policy
    • Archive
      • Policy before January 1, 2019
  • Questions and answers
  1. Speech recognition
  2. Recognition models

Recognition models

  • Supported recognition models
    • Tags and version availability
    • Releases
    • Versions

A recognition model is a model that is trained to recognize speech in a specific language. The models are trained on datasets generated by Yandex services and applications. This allows us to continually improve the quality of speech recognition.

Supported recognition models

Russian
English
Turkish
  • general: The main supported model for Russian language that recognizes speech on any topic in Russian, including short and long utterances, names, addresses, dates, and numbers.

Tags and version availability

We continuously improve the quality of the model and release new versions:

  • The current main version of the model is available by the general tag.
  • The experimental version is available by the general:rc tag.
  • The previous version is available by the general:deprecated tag.

Note

Versions available by the general:deprecated tag stop being supported when new models are released: we guarantee two weeks of support for the previous version after updating the version by the general tag.

We recommend testing model versions using the general:rc tag and share your feedback with our support team.

For long audio fragment recognition, the following models are also available:

  • The experimental improved model is available under the hqa tag.
  • The delayed transcription model is available at a special price under the deferred-general tag. This model assumes that audio is queued with a lower priority and processed at the least busy time, but within 24 hours.

Note

For recognition by the deferred-general tag, the same version of the model that is available under the general tag is used.

Releases

Release 14.12.20

Introducing a new version of the Amati hqa model available now in transcription. The new model has fixed issues when silence was recognized instead of speech. In addition, text recognition from news and medicine subject domains has been improved.

The following models and versions are now available by tags.

In transcription:

  • hqa: The Amati version.
  • general:rc: The Zeno version.
  • general: The Zeno version.
  • general:deprecated: The Anaximander version.

In streaming and short audio recognition:

  • general:rc: The Zeno version.
  • general: The Zeno version.
  • general:deprecated: The Anaximander version.

Release 01.12.20

In streaming, transcription, and short audio recognition by the general:rc tag, a new version of the Galen model is now available. The new version provides a significantly better basic quality of recognition and improved recognition of words related to COVID-19.

We invite you to join testing of the version. We'll be grateful for your feedback and comments.

The following models and versions are now available by tags.

In transcription:

  • hqa: The Stradivarius version.
  • general:rc: The Galen version.
  • general: The Zeno version.
  • general:deprecated: The Anaximander version.

In streaming and short audio recognition:

  • general:rc: The Galen version.
  • general: The Zeno version.
  • general:deprecated: The Anaximander version.

Release 24.11.20

After successful testing, the Zeno version becomes the main released version of the general model in streaming, transcription, and short audio recognition.

The following models and versions are now available by tags.

In transcription:

  • hqa: The Stradivarius version.
  • general:rc: The Zeno version.
  • general: The Zeno version.
  • general:deprecated: The Anaximander version.

In streaming and short audio recognition:

  • general:rc: The Zeno version.
  • general: The Zeno version.
  • general:deprecated: The Anaximander version.

Release 17.11.20

Updates in speech synthesis. A new version of the premium voice alena is available under the alena tag.

Major changes in this release: numerous corrections in the pronunciation of individual words.

Release 26.10.20

Introducing a new generation of recognition models. The new model is called hqa and so far available in transcription only. This model has a noticeably richer vocabulary, so recognition results are much better and more understandable to readers. The difference is especially noticeable with long audio fragment recognition.

The following versions are now available in transcription by tags:

  • hqa: The Stradivarius version.
  • general:rc: The Zeno version.
  • general: The Anaximander version.
  • general:deprecated: The Marcus Aurelius.

The set of tags and versions for streaming and short audio recognition is the same as previously:

  • general:rc: The Zeno version.
  • general: The Anaximander version.
  • general:deprecated: The Marcus Aurelius.

Release 12.10.20

New versions of the general model are now available for testing in streaming, transcription, and short audio recognition. The new version provides a significantly better basic quality of recognition.

The following versions are now available by tags:

  • general:rc: The Zeno version.
  • general: The Anaximander version.
  • general:deprecated: The Marcus Aurelius.

Release 18.08.20

Minor update for transcription in the Anaximander version:

  • Improved handling of dense speech flows, having no detectable pauses in speech for more than 30 seconds.
  • Timing fixed.
  • Fixed an error with partial recognition results arriving after the final result.

The acoustic and language properties of the model have not changed.

The list of available versions of the general model:

  • general:rc: The Anaximander version (updated).
  • general: The Anaximander version.
  • general:deprecated: The Marcus Aurelius version.

These versions are available for streaming recognition, transcription, and short audio recognition.

Release 21.07.20

Following successful testing, we offer Anaximander as the main operating version for streaming recognition, transcription, and short audio recognition.

The list of available versions of the general model:

  • general:rc: The Anaximander version.
  • general: The Anaximander version.
  • general:deprecated: The Marcus Aurelius version.

Release 27.05.20

New versions of the general model are now available in transcription and short audio recognition. For those types of recognition, the following versions are available with the tags:

  • general:rc: The Anaximander version.
  • general: The Marcus Aurelius version.
  • general:deprecated: The Marcus Aurelius.

Three versions of the model are available for streaming speech recognition:

  • general:rc: The Anaximander version.
  • general: The Marcus Aurelius version.
  • general:deprecated: The Diogenes version.

Release 15.05.20

For streaming speech recognition, the new version of the Anaximander model is now available with the general:rc tag.

We invite you to join testing of the version. We'll be grateful for your feedback and comments.

The list of available versions of the general model:

  • general:rc: The Anaximander version.
  • general: The Marcus Aurelius version.
  • general:deprecated: The Diogenes version.

The versions for short and long audio recognition remain unchanged.

Release 16.04.20

For streaming speech recognition by the general tag, a new version of the Marcus Aurelius model is now available.

The list of available versions of the general model:

  • general:rc: The Marcus Aurelius version.
  • general: The Marcus Aurelius version.
  • general:deprecated: The Diogenes version.

The versions for short and long audio recognition available with the general tag remain unchanged.

Versions

Anaximander version

  • Improved recognition quality due to advances in acoustics processing and language model updates: for example, the word "coronavirus" is now recognized better.
  • Improved model stability on "empty" records.

Marcus Aurelius version

  • Improved recognition quality, especially for short audio.
  • The endOfUtterance flag is now set more accurately for audio of any length. For audio where there is only silence, this flag isn't returned now.
  • If you activated the partialResults flag and use partial recognition results, you may notice that these results are more accurate and come faster now. By default, they now come in unnormalized form.
  • From the Marcus Aurelius version and above, additional recognition settings are supported:
    • If your audio has a lot of background noise, use the reduce sensitivity flag (see Experimental additional recognition settings).
    • If you activated the partialResults flag and it's important for you to get partial recognition results in normalized form, set the appropriate flag (see Experimental additional recognition settings).

Diogenes version

  • Improved phone call recognition quality.
  • Reduced the number of cases where noise is recognized as words. The vocabulary of the model is still extensive: you can use this model to recognize speech on any topic, just like the previous version of the general model.
  • general: Short utterances containing 3-5 words on various topics, including search engine and website queries:
    • connect me to the sales department
    • another cup of coffee and two soft French rolls
  • maps: Addresses and names of companies or geographical features:
    • go to Abbey Road
  • general: Short utterances containing 3-5 words on various topics, including search engine and website queries:
    • satış departmanıyla görüşmek istiyorum
    • bir kahve daha ve iki küçük kurabiye
  • maps: Addresses and names of companies or geographical features:
    • Atatürk Bulvarı'na git
Language / Region
Careers
Privacy policy
Terms of use
Brandbook
© 2021 Yandex.Cloud LLC