Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex SpeechKit
  • Getting started
  • Releases
  • Speech recognition
    • About the technology
    • Short audio recognition
    • Recognition of long audio fragments
    • Data streaming recognition
    • Audio formats
    • Recognition models
  • Speech synthesis
    • About the technology
    • API method description
    • List of voices
    • Using SSML
    • List of supported SSML phonemes
  • IVR integration
  • Using the API
    • Authentication in the API
    • Response format
    • Troubleshooting
  • Quotas and limits
  • Access management
  • Pricing policy
    • Current pricing policy
    • Archive
      • Policy before January 1, 2019
  • Questions and answers
  1. Speech synthesis
  2. API method description

API method description

  • HTTP request
  • Parameters in the request body
  • Response
  • Examples
    • Convert text to speech in OggOpus format
    • Convert text to speech in WAV format
    • Convert text to speech in SSML format

Generates speech from received text.

HTTP request

POST https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize

Parameters in the request body

All parameters must be URL-encoded. The maximum size of the POST request body is 15 KB.

Parameter Description
text string
UTF-8 encoded text to be converted to speech.
You can only use one text and ssml field.
For homographs, place a + before the stressed vowel. For example, contr+ol or def+ect.
To indicate a pause between words, use -.
Maximum string length: 5000 characters.
ssml string
Text in SSML format to be converted into speech.
You can only use one text and ssml fields.
lang string
Language.
Acceptable values:
  • ru-RU (default) — Russian.
  • en-US — English.
  • tr-TR — Turkish.
voice string
Preferred speech synthesis voice from the list. Default value: oksana.
For more information about choosing a voice, see Voices and speech quality.
emotion (deprecated) string
The emotional tone of the voice. This feature is supported only when you select Russian (ru-RU) with jane or omazh.
Accepted values:
  • good: Friendly.
  • evil: Irritated.
  • neutral (default): Emotionless.
speed string
Rate (speed) of synthesized speech.
The rate of speech is set as a decimal number in the range from 0.1 to 3.0. Where:
  • 3.0 — Fastest rate.
  • 1.0 (default) — Average human speech rate.
  • 0.1 — Slowest speech rate.
format string
The format of the synthesized audio.
Acceptable values:
  • lpcm — Audio file is synthesized in LPCM format with no WAV header. Audio properties:
    • Sampling — 8, 16, or 48 kHz, depending on the value of the sampleRateHertz parameter.
    • Bit depth — 16-bit.
    • Byte order — Reversed (little-endian).
    • Audio data is stored as signed integers.
  • oggopus (default) — Data in the audio file is encoded using the OPUS audio codec and compressed using the OGG container format (OggOpus).
sampleRateHertz string
The sampling frequency of the synthesized audio.
Used if format is set to lpcm. Acceptable values:
  • 48000 (default) — Sampling rate of 48 kHz.
  • 16000 — Sampling rate of 16 kHz.
  • 8000 — Sampling rate of 8 kHz.
folderId string

ID of the folder that you have access to. Required for authorization with a user account (see the UserAccount resource). Don't specify this field if you make a request on behalf of a service account.

Maximum string length: 50 characters.

Response

If speech synthesis is successful, the response contains the binary content of the audio file. The output data format depends on the value of the format parameter.

Examples

To use the examples, install cURL and get the authorization data for your account:

User's account on Yandex
Service accounts
Federated account
  1. On the billing page, make sure that your billing account status is ACTIVE or TRIAL_ACTIVE. If you don't have a billing account, create one.
  2. Get an IAM token required for authentication.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  1. Select the authentication method:

    • Get an IAM token used in the examples.

    • Create an API key. Pass the API key in the Authorization header in the following format:

      Authorization: Api-Key <API key>
      
  2. Assign the editor role or a higher role to the service account for the folder where it was created.

    Don't specify the folder ID in your requests: the service uses the folder where the service account was created.

  1. Authenticate with the CLI as a federated user.

  2. Use the CLI to get an IAM token required for authentication:

    $ yc iam create-token
    
  3. Get the ID of any folder that your account is granted the editor role or higher for.

Convert text to speech in OggOpus format

In this example, the text "Hello world" is synthesized and recorded as an audio file.

By default, data in the audio file is encoded using the OPUS audio codec and compressed using the OGG container format (OggOpus).

cURL
C#
Python
PHP
$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     --data-urlencode "text=Hello World" \
     -d "lang=en-US&folderId=${FOLDER_ID}" \
     "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;

namespace TTS
{
  class Program
  {
    static void Main()
    {
      Tts().GetAwaiter().GetResult();
    }

    static async Task Tts()
    {
      const string iamToken = "CggaATEVAgA..."; // Specify the IAM token.
      const string folderId = "b1gvmob95yysaplct532"; // Specify the folder ID.

      HttpClient client = new HttpClient();
      client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken);
      var values = new Dictionary<string, string>
      {
        { "text", "Hello World" },
        { "lang", "en-US" },
        { "folderId", folderId }
      };
      var content = new FormUrlEncodedContent(values);
      var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content);
      var responseBytes = await response.Content.ReadAsByteArrayAsync();
      File.WriteAllBytes("speech.ogg", responseBytes);
    }
  }
}
  1. Create a file (for example, test.py), and add the following code to it:

    import argparse
    import requests
    
    
    def synthesize(folder_id, iam_token, text):
        url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
        headers = {
            'Authorization': 'Bearer ' + iam_token,
        }
    
        data = {
            'text': text,
            'lang': 'en-US',
            'folderId': folder_id
        }
    
        with requests.post(url, headers=headers, data=data, stream=True) as resp:
            if resp.status_code != 200:
                raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))
    
            for chunk in resp.iter_content(chunk_size=None):
                yield chunk
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--token", required=True, help="IAM token")
        parser.add_argument("--folder_id", required=True, help="Folder id")
        parser.add_argument("--text", required=True, help="Text for synthesize")
        parser.add_argument("--output", required=True, help="Output file name")
        args = parser.parse_args()
    
        with open(args.output, "wb") as f:
            for audio_content in synthesize(args.folder_id, args.token, args.text):
                f.write(audio_content)
    
  2. Execute the created file by passing arguments with the IAM token, folder ID, text, and name of the file for audio recording:

    $ export FOLDER_ID=b1gvmob95yysaplct532
    $ export IAM_TOKEN=CggaATEVAgA...
    $ python test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --output speech.ogg --text "Hello World"
    
<?

$token = 'CggaATEVAgA...'; # IAM token
$folderId = "b1gvmob95yysaplct532"; # ID of the folder
$url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize";

$post = "text=" . urlencode("Hello World") . "&lang=en-US&folderId=${folderId}";
$headers = ['Authorization: Bearer ' . $token];
$ch = curl_init();

curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, false);
if ($post !== false) {
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
}
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);


$response = curl_exec($ch);
if (curl_errno($ch)) {
    print "Error: " . curl_error($ch);
}
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
    $decodedResponse = json_decode($response, true);
    echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
    echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
} else {
    file_put_contents("speech.ogg", $response);
}
curl_close($ch);

Convert text to speech in WAV format

In this example, we synthesize the submitted text in LPCM format with a sampling rate of 48kHz and save it to the file speech.raw. This file is then converted to WAV format using the SoX utility.

  1. Synthesize a file in LCPM format:

    cURL
    C#
    Python
    PHP
    $ export FOLDER_ID=b1gvmob95yysaplct532
    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -o speech.raw \
        --data-urlencode "text=Hello World" \
        -d "lang=en-US&folderId=${FOLDER_ID}&format=lpcm&sampleRateHertz=48000" \
        https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize
    
    using System;
    using System.Collections.Generic;
    using System.Net.Http;
    using System.Threading.Tasks;
    using System.IO;
    
    namespace TTS
    {
      class Program
      {
        static void Main()
        {
          Tts().GetAwaiter().GetResult();
        }
    
        static async Task Tts()
        {
          const string iamToken = "CggaATEVAgA..."; // Specify the IAM token.
          const string folderId = "b1gvmob95yysaplct532"; // Specify the folder ID.
    
          HttpClient client = new HttpClient();
          client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken);
          var values = new Dictionary<string, string>
          {
            { "text", "Hello World" },
            { "lang", "en-US" },
            { "folderId", folderId },
            { "format", "lpcm" },
            { "sampleRateHertz", "48000" }
          };
          var content = new FormUrlEncodedContent(values);
          var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content);
          var responseBytes = await response.Content.ReadAsByteArrayAsync();
          File.WriteAllBytes("speech.raw", responseBytes);
        }
      }
    }
    
    1. Create a file (for example, test.py), and add the following code to it:

      import argparse
      import requests
      
      
      def synthesize(folder_id, iam_token, text):
          url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
          headers = {
              'Authorization': 'Bearer ' + iam_token,
          }
      
          data = {
              'text': text,
              'lang': 'en-US',
              'folderId': folder_id,
              'format': 'lpcm',
              'sampleRateHertz': 48000,
          }
      
          with requests.post(url, headers=headers, data=data, stream=True) as resp:
              if resp.status_code != 200:
                  raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))
      
              for chunk in resp.iter_content(chunk_size=None):
                  yield chunk
      
      
      if __name__ == "__main__":
          parser = argparse.ArgumentParser()
          parser.add_argument("--token", required=True, help="IAM token")
          parser.add_argument("--folder_id", required=True, help="Folder id")
          parser.add_argument("--text", required=True, help="Text for synthesize")
          parser.add_argument("--output", required=True, help="Output file name")
          args = parser.parse_args()
      
          with open(args.output, "wb") as f:
              for audio_content in synthesize(args.folder_id, args.token, args.text):
                  f.write(audio_content)
      
    2. Execute the created file by passing arguments with the IAM token, folder ID, text, and name of the file for audio recording:

      $ export FOLDER_ID=b1gvmob95yysaplct532
      $ export IAM_TOKEN=CggaATEVAgA...
      $ python test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --output speech.raw --text "Hello World"
      
    <?
    
    const FORMAT_PCM = "lpcm";
    const FORMAT_OPUS = "oggopus";
    
    $token = 'CggaATEVAgA...'; # IAM token
    $folderId = "b1gvmob95yysaplct532"; # ID of the folder
    $url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize";
    $post = "text=" . urlencode("Hello World") . "&lang=en-US&folderId=${folderId}&sampleRateHertz=48000&format=" . FORMAT_PCM;
    $headers = ['Authorization: Bearer ' . $token];
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, false);
    if ($post !== false) {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
    }
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    
    
    $response = curl_exec($ch);
    if (curl_errno($ch)) {
        print "Error: " . curl_error($ch);
    }
    if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
        $decodedResponse = json_decode($response, true);
        echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
        echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
    } else {
        file_put_contents("speech.raw", $response);
    }
    curl_close($ch);
    
  2. Convert the file to WAV format using the SoX utility.

    $ sox -r 48000 -b 16 -e signed-integer -c 1 speech.raw speech.wav
    

Convert text to speech in SSML format

In this example, the submitted text is in SSML format.

The text is synthesized and recorded as an audio file. By default, data in the audio file is encoded using the OPUS audio codec and compressed using the OGG container format (OggOpus).

Bash
  1. Create a file (for example, text.xml) and enter text in SSML format:

    <speak>
      Here are some examples of how you can use SSML.
      You can add a custom pause to your text:<break time="2s"/> Ta-daaah!
      Or mark up your text into paragraphs and sentences. Pauses between paragraphs are longer.
      <p><s>Sentence one</s><s>Sentence two</s></p>
      You can also substitute phrases.
      For example, you can use this feature to pronounce abbreviations, <sub alias="et cetera">etc.</sub>
    </speak>
    
  2. Send a request with the text to the server. Pass the text in the ssml parameter. In this example, the contents of the file are read using the cat utility:

    $ export FOLDER_ID=b1gvmob95yysaplct532
    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
      -H "Authorization: Bearer ${IAM_TOKEN}" \
      --data-urlencode "ssml=`cat text.xml`" \
      -d "lang=ru-RU&folderId=${FOLDER_ID}" \
      "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
    
In this article:
  • HTTP request
  • Parameters in the request body
  • Response
  • Examples
  • Convert text to speech in OggOpus format
  • Convert text to speech in WAV format
  • Convert text to speech in SSML format
Language
Careers
Privacy policy
Terms of use
© 2021 Yandex.Cloud LLC