Speech synthesis

Speech synthesis (Text-to-speech, TTS) ) is the process of generating speech from typed text. SpeechKit can convert any texts in several languages into speech. You can choose the voice type (male or female) and tone.


  • Russian
  • English
  • Turkish

Quality of speech synthesis

The quality of synthesized speech is characterized by its similarity to a human voice and its capacity to convey emotion through intonation.

A highlight of Yandex speech technology is that we do not stitch fragments of real speech together, but train our acoustic model on the speaker's voice. To do this, we use a statistical approach based on recurrent neural networks. This method of speech synthesis may result in a somewhat artificial speech tone. However, the speech sounds smooth and has a natural intonation.

The statistical approach also allows us to change the parameters of existing voices. So you can choose the intonation to pronounce your text.

See also