TTS markup

Written by

Updated at December 26, 2023

Markup components

TTS markup grants the ability to control speech synthesis using special characters and SIL tags. This helps make synthesized speech more natural.

TTS markup is supported for Russian in API v3 and API v1 for data in text format.

Pausing is available for testing in all languages the speech synthesis is supported for.

Note

SpeechKit is designed for natural speech synthesis. Marking up data for speech synthesis helps set up pronunciation of separate words, phrases, and sentences. However, it is not intended for generating separate sounds and silence.

Markup components

Description	Special characters and tags
Apply stress	`+`
Explicitly define a pause between sentences	`sil<[t]>`, where t is the pause in milliseconds.
Define a pause depending on the context	`<[small]>`. Acceptable values: `tiny`, `small`, `medium`, `large`, `huge`
Place emphasis	`<[accented]>` or `emphasized word`
Use phonetic pronunciation	`[[ <word_phonemes_separated_with_spaces> ]]`

Apply stress

Use + before a stressed vowel any time you need to define the correct way to pronounce a word, such as:

They signed the c+ontract the following day.

If you contr+act malaria, you will have to spend a long time in a hospital.

Explicitly define a pause between sentences

To explicitly define a pause between sentences, you can add a special tag, sil<[t]>, where t is the duration of the pause in milliseconds:

Stop. sil<[300]> Think about it!

Note

Text to synthesize may not contain markup characters only. You cannot use SpeechKit to generate a file with silence instead of speech.

Set a pause depending on the context

Use cue words to automatically select the duration of pauses.

Frost and sun; <[medium]> a wonderful day!

Place emphasis

To define emphasis, you can use the <[accented]> tag or highlight a word like this: **accented**. For example:

Convenient interfaces for solving <[problems]>.

You become responsible **forever** for what you have tamed.

Use phonetic pronunciation

Tag text with [[ ]] to check the proper pronunciation using phonemes. The following tagged text will be used for playback:

Hello, My name is [[v a sʲ ʌ]]

You can find the list of supported phonemes here.

TTS markup

Markup componentsMarkup components

Apply stressApply stress

Explicitly define a pause between sentencesExplicitly define a pause between sentences

Set a pause depending on the contextSet a pause depending on the context

Place emphasisPlace emphasis

Use phonetic pronunciationUse phonetic pronunciation