Yandex SpeechKit release notes: Speech synthesis
- Current version
- Previous versions
- Release on 03/04/24
- Release on 20/02/24
- Release on 06/02/24
- Release on 10/01/24
- Release on 05/12/23
- Release on 23/10/23
- Release on 27/07/23
- Release on 19/06/23
- Release on 08/06/23
- Release on 18/04/23
- Release on 21/03/23
- Release on 07/03/23
- Release on 07/10/22
- Release on 09/06/22
- Release on 19/05/22
- Release on 30/03/22
- Release on 17/03/22
- Release on 24/01/22
- Release on 16/12/21
- Release on 18/11/21
SpeechKit provides updates based on the system model and version.
For more information about voice models, see About technology.
Current version
Release on 09/04/24
In API v1, marina
is now the default voice.
Previous versions
Release on 03/04/24
Changed the default voice in API v3. All synthesis project without an explicitly specified voice will now use the marina
voice.
Release on 20/02/24
Improved the voices: masha
, marina
, anton
, alexander
, dasha
, julia
.
Release on 06/02/24
Added REST API v3 support.
Release on 10/01/24
- Added support for cardinal number normalization (English). Normalization works for positive integers only. Ordinal numbers are not supported.
- Added DurationHint to the API which you can use to specify minimum and maximum time spent on synthesizing the text.
- Added the
text_chunk
,start_ms
, andlength_ms
fields to the UtteranceSynthesisResponse message. These fields store the info on the text, as well as the start and end time of the audio that came with the fragment.
Release on 05/12/23
Improved the quality of speech synthesis for all languages except Russian.
Release on 23/10/23
- A new voice,
masha
, is now available in three roles. - Additional roles are now available for Russian-language voices.
- Optimized the normalizer for the Kazakh language.
- Improved the pronunciation quality of "SMS" for Kazakh and Uzbek.
Release on 27/07/23
- Added the
pitch_shift
parameter to API v3. You can use it to increase the pitch contour of an entire synthesized audio by a fixed value in Hz. Shifting the contour makes a voice sound more lively. - Seven new voices are now available for speech synthesis in Russian:
dasha
,julia
,lera
,marina
,alexander
,kirill
, andanton
.
Release on 19/06/23
Improved the quality of pronunciation of car brands for Uzbek.
Release on 08/06/23
- Added normalization for cardinal numbers written in Arabic numerals for Uzbek.
- Improved the quality of speech synthesis for Uzbek. The changes primarily enhance the synthesis of short texts.
Release on 18/04/23
- Speech synthesis for Uzbek now supports phoneme-based format to transcribe text (see the list of supported phonemes here). In addition, the Uzbek model can now automatically replace apostrophes. However, for efficient speech synthesis, you should only use the straight
ʼ
and reversedʻ
typographic apostrophes. - For pattern-based synthesis, the default volume normalization has been changed. Now, if the normalization type is not set explicitly, the volume of variables is normalized using the initial pattern.
Release on 21/03/23
-
A normalizer has been added for the Kazakh language. Now the model can pronounce numbers written in Arabic numerals.
-
Added two types of apostrophes for the Uzbek language: the typographic straight apostrophe
ʼ
and the typographic reversed apostropheʻ
. Now you can synthesize phrases in Uzbek written in Latin script with these apostrophes.Yaʼni mana shu beret kiygan notanish odamni.
Soʻng yana pastga qarab ketiladi.Warning
Use only these options for apostrophes. The model does not support automatic replace, and the synthesis quality strongly depends on the input quality.
Release on 07/03/23
- Significantly revised the SpeechKit Brand Voice technology for creating custom voices.
- Added support for pauses in all languages in test mode when using TTS markup. Please report any pausing errors by submitting a request to the support team. Your feedback will help us improve the functionality in future releases.
Release on 07/10/22
The general
branch has new voices and languages available for testing:
lea
female voice: German.madi
male voice: Kazakh.madirus
male voice: Russian.nigora
female voice: Uzbek.
The general
branch now has amira
and john
voices.
Release on 09/06/22
-
Intonations and emphasis have been improved in all voices.
-
More pausing features were added:
- The error when pauses shorter than 1200 milliseconds were not taken into account in SSML markup has been fixed. Note that pauses shorter than 700 milliseconds are considered a synthesis cue and do not allow accurate control of the duration of a pause between words.
- SSML pauses with the
x-weak
,weak
, andmedium
values have a greater impact on synthesized text. - You can now apply pauses when using TTS markup. Using the
<[small]>
tag, you can set the duration of a pause in a synthesized text, for example:Hello, <[small]>
. The duration of a pause can have the following values:tiny
,small
,medium
,large
,huge
.
-
Support for
filipp:deprecated
was ceased.filipp:deprecated
andfilipp
sound the same now.
Release on 19/05/22
-
Support for the deprecated voices will cease starting May 31, 2022.
-
The
rc
branch has new voices and languages available for testing:amira
female voice: Kazakh.john
male voice: English.
The voices are only available in API v3 using the
x-service-branch:rc
header.
Release on 30/03/22
-
The standard voices are currently only available through the
:deprecated
tag and will continue to be supported through May 31, 2022. -
Intonations and issues with rare artifacts in texts with many numbers have been fixed following a technical support request (issue CLOUDSUPPORT-138703).
Release on 17/03/22
-
Added the ability to synthesize audio files in MP3 format. This feature is available in API v3 and when using premium voices in API v1.
-
For new voices, we added role, which is extended emotional tone. See the
emotion
parameter in API v1androle
in API v3for details. Different roles are available for different voices. For a complete list of values, see List of voices. If an incorrect role is selected, the service will return an error. -
Degraded quality of emphasis placement has been corrected for
alena
andfilipp
. Emphasis placement and synthesized voice perception have been improved for all voices. -
Major update of standard voices started:
oksana
,ermil
,jane
,omazh
,zahar
will be replaced withoksana:rc
,ermil:rc
,jane:rc
,omazh:rc
,zahar:rc
, respectively. The update will not affect the cost of the regular voices. Existing voices, such asoksana
,ermil
,jane
,omazh
, andzahar
, are available under the:deprecated
branch.
Release on 24/01/22
-
Updated the generation model. The new version improves the way numbers and abbreviations from the finance domain are pronounced.
-
You can now emphasize using markup:
Are you **happy** to see me?
-
Processing of SSML pauses and SIL tags has been made consistent to support integration with Yandex.Dialogs
. Pauses in text in SSML or SIL notation are considered the end of an utterance, causing intonation representing the end of an utterance to replace the tag in the generated text. SSML pauses and SIL tags are supported when generating both short and long speech segments.
Release on 16/12/21
-
Limits for API v3 requests have been increased: length of a synthesized phrase is 250 characters or 24 seconds of audio. Important: request costs remain unchanged for the time being but may increase.
-
The
unsafe_mode
option available in API v3 enables you to automatically split long segments of text to synthesize into separate phrases. -
The silence after the last word is synthesized is much shorter. Now, the audio ends almost immediately after the final word is synthesized.
Release on 18/11/21
- Fixes have been made to stabilize the way the
alena
premium voice is synthesized. It sounds consistent now. - Pronunciation errors have been fixed for
alena
. - Pausing in REST API has been improved.
- New premium voices have been added in test mode:
oksana:rc
ermil:rc
jane:rc
omazh:rc
zahar:rc