TTS markup
TTS markup grants the ability to control speech synthesis using special characters and SIL tags. This helps make synthesized speech more natural.
TTS markup is supported for Russian in API v3 and API v1 for data in text
format.
Pausing is available for testing in all languages the speech synthesis is supported for.
Note
SpeechKit is designed for natural speech synthesis. Marking up data for speech synthesis helps set up pronunciation of separate words, phrases, and sentences. However, it is not intended for generating separate sounds and silence.
Markup components
Description | Special characters and tags |
---|---|
Apply stress | + |
Explicitly define a pause between sentences | sil<[t]> , where t is the pause in milliseconds. |
Define a pause depending on the context | <[small]> . Acceptable values: tiny , small , medium , large , huge |
Place emphasis | <[accented]> or **emphasized word** |
Use phonetic pronunciation | [[ <word_phonemes_separated_with_spaces> ]] |
Apply stress
Use +
before a stressed vowel any time you need to define the correct way to pronounce a word, such as:
They signed the c+ontract the following day.
or
If you contr+act malaria, you will have to spend a long time in a hospital.
Explicitly define a pause between sentences
To explicitly define a pause between sentences, you can add a special tag, sil<[t]>
, where t is the duration of the pause in milliseconds:
Stop. sil<[300]> Think about it!
Note
Text to synthesize may not contain markup characters only. You cannot use SpeechKit to generate a file with silence instead of speech.
Set a pause depending on the context
Use cue words to automatically select the duration of pauses.
Frost and sun; <[medium]> a wonderful day!
Place emphasis
To define emphasis, you can use the <[accented]>
tag or highlight a word like this: **accented**
. For example:
Convenient interfaces for solving <[problems]>.
or
You become responsible **forever** for what you have tamed.
Use phonetic pronunciation
Tag text with [[ ]]
to check the proper pronunciation using phonemes. The following tagged text will be used for playback:
Hello, My name is [[v a sʲ ʌ]]
You can find the list of supported phonemes here.