Speech synthesis (Text-to-speech, TTS) ) is the process of generating speech from typed text. SpeechKit can convert any texts in several languages into speech. You can choose the voice type (male or female) and tone.
Quality of speech synthesis
The quality of synthesized speech is characterized by its similarity to a human voice and its capacity to convey emotion through intonation.
A highlight of Yandex speech technology is that we do not stitch fragments of real speech together, but train our acoustic model on the speaker's voice. To do this, we use a statistical approach based on recurrent neural networks. This method of speech synthesis may result in a somewhat artificial speech tone. However, the speech sounds smooth and has a natural intonation.
The statistical approach also allows us to change the parameters of existing voices. So you can choose the intonation to pronounce your text.