API method description

Generates speech for the received text.

HTTP request

POST https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize

Parameters in the request body

All parameters must be URL-encoded. The maximum size of the POST request body is 30 KB.

Parameter Description
text Required parameter.
UTF-8 encoded text to be converted into speech.
For homographs, use + before the stressed vowel. For example, con+trol or de+fect.
To indicate a pause between words, use -.
Maximum string length: 5000 characters.
lang Language.
Acceptable values:
  • ru-RU (default) — Russian.
  • en-US — English.
  • tr-TR — Turkish.
voice The voice for the synthesized speech.
You can choose one of the following voices:
  • Female voice: alyss, jane, oksana and omazh.
  • Male voice: zahar and ermil.
Default value of the parameter: oksana.
emotion Emotional tone of the voice.
Acceptable values:
  • good — Cheerful and friendly.
  • evil — Irritated.
  • neutral (default) — Without emotion.
speed The rate (tempo) of the synthesized speech.
The speech rate is set as a decimal number in the range from 0.1 to 3.0. Where:
  • 3.0 — Fastest rate.
  • 1.0 (default) — Average human speech rate.
  • 0.1 — Slowest speech rate.
format The format of the synthesized audio.
Acceptable values:
  • lpcm — Audio file is synthesized in the LPCM format with no WAV header. Audio characteristics:
    • Sampling — 8, 16, or 48 kHz, depending on the sampleRateHertz parameter value.
    • Bit depth — 16-bit.
    • Byte order — Reversed (little-endian).
    • Audio data is stored as signed integers.
  • oggopus (default) — Data in the audio file is encoded using the OPUS audio codec and compressed using the OGG container format (OggOpus).
sampleRateHertz The sampling frequency of the synthesized audio.
Used if format is set to lpcm. Acceptable values:
  • 48000 (default) — Sampling rate of 48 kHz.
  • 16000 — Sampling rate of 16 kHz.
  • 8000 — Sampling rate of 8 kHz.
folderId Required parameter.
ID of your folder.
For more information about how to find the folder ID, see the section Authorization in the API.

Response

If speech synthesis is successful, the response contains the binary content of the audio file. The output data format depends on the value of the format parameter.

Examples

Convert text to speech in Ogg format

In this example, the text "Hello world" is synthesized and recorded as an audio file.

By default, data in the audio file is encoded using the OPUS audio codec and compressed using the OGG container format (OggOpus).

$ export FOLDER_ID=b1gvmob95yysaplct532
$ export IAM_TOKEN=CggaATEVAgA...
$ curl -X POST \
     -H "Authorization: Bearer ${IAM_TOKEN}" \
     --data-urlencode "text=Hello World" \
     -d "lang=en-US&folderId=${FOLDER_ID}" \
     "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;

namespace TTS
{
  class Program
  {
    static void Main()
    {
      Tts().GetAwaiter().GetResult();
    }

    static async Task Tts()
    {
      const string iamToken = "CggaATEVAgA..."; // Specify the IAM token.
      const string folderId = "b1gvmob95yysaplct532"; // Specify the folder ID.

      HttpClient client = new HttpClient();
      client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken);
      var values = new Dictionary<string, string>
      {
        { "text", "Hello World" },
        { "lang", "en-US" },
        { "folderId", folderId }
      };
      var content = new FormUrlEncodedContent(values);
      var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content);
      var responseBytes = await response.Content.ReadAsByteArrayAsync();
      File.WriteAllBytes("speech.ogg", responseBytes);
    }
  }
}
  1. Create a file (for example, test.py), and add the following code to it:

    import argparse
    import requests
    
    
    def synthesize(folder_id, iam_token, text):
        url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
        headers = {
            'Authorization': 'Bearer ' + iam_token,
        }
    
        data = {
            'text': text,
            'lang': 'en-US',
            'folderId': folder_id
        }
    
        resp = requests.post(url, headers=headers, data=data)
        if resp.status_code != 200:
            raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))
    
        return resp.content
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--token", required=True, help="IAM token")
        parser.add_argument("--folder_id", required=True, help="Folder id")
        parser.add_argument("--text", required=True, help="Text for synthesize")
        parser.add_argument("--output", required=True, help="Output file name")
        args = parser.parse_args()
    
        audio_content = synthesize(args.folder_id, args.iam_token, args.text)
        with open(args.output, "wb") as f:
            f.write(audio_content)
    
  2. Execute the created file by passing arguments with the IAM token, folder ID, text, and name of the file for audio recording:

    $ export FOLDER_ID=b1gvmob95yysaplct532
    $ export IAM_TOKEN=CggaATEVAgA...
    $ python test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --output speech.ogg --text "Hello World"
    
<?

$token = 'CggaATEVAgA...'; # IAM token
$folderId = "b1gvmob95yysaplct532"; # ID of the folder
$url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize";

$post = "text=" . urlencode("Hello World") . "&lang=en-US&folderId=${folderId}";
$headers = ['Authorization: Bearer ' . $token];
$ch = curl_init();

curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
if ($post !== false) {
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
}
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);


$response = curl_exec($ch);
if (curl_errno($ch)) {
    print "Error: " . curl_error($ch);
}
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
    $decodedResponse = json_decode($response, true);
    echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
    echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
} else {
    file_put_contents("speech.ogg", $response);
}
curl_close($ch);

Convert text to speech in WAV format

In this example, the submitted text is synthesized in the LPCM format with a sampling rate of 48kHz and saved to a speech.raw file. This file is then converted to WAV format using the SoX utility.

  1. Synthesize a file in LCPM format:

    $ export FOLDER_ID=b1gvmob95yysaplct532
    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -o speech.raw \
        --data-urlencode "text=Hello World" \
        -d "lang=en-US&folderId=${FOLDER_ID}&format=lpcm&sampleRateHertz=48000" \
        https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize
    
    using System;
    using System.Collections.Generic;
    using System.Net.Http;
    using System.Threading.Tasks;
    using System.IO;
    
    namespace TTS
    {
      class Program
      {
        static void Main()
        {
          Tts().GetAwaiter().GetResult();
        }
    
        static async Task Tts()
        {
          const string iamToken = "CggaATEVAgA..."; // Specify the IAM token.
          const string folderId = "b1gvmob95yysaplct532"; // Specify the folder ID.
    
          HttpClient client = new HttpClient();
          client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken);
          var values = new Dictionary<string, string>
          {
            { "text", "Hello World" },
            { "lang", "en-US" },
            { "folderId", folderId },
            { 'format': 'lpcm' },
            { 'sampleRateHertz': 48000 }
          };
          var content = new FormUrlEncodedContent(values);
          var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content);
          var responseBytes = await response.Content.ReadAsByteArrayAsync();
          File.WriteAllBytes("speech.raw", responseBytes);
        }
      }
    }
    
    1. Create a file (for example, test.py), and add the following code to it:

      import argparse
      import requests
      
      
      def synthesize(folder_id, iam_token, text):
          url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
          headers = {
              'Authorization': 'Bearer ' + iam_token,
          }
      
          data = {
              'text': text,
              'lang': 'en-US',
              'folderId': folder_id,
              'format': 'lpcm',
              'sampleRateHertz': 48000,
          }
      
          resp = requests.post(url, headers=headers, data=data)
          if resp.status_code != 200:
              raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))
      
          return resp.content
      
      
      if __name__ == "__main__":
          parser = argparse.ArgumentParser()
          parser.add_argument("--token", required=True, help="IAM token")
          parser.add_argument("--folder_id", required=True, help="Folder id")
          parser.add_argument("--text", required=True, help="Text for synthesize")
          parser.add_argument("--output", required=True, help="Output file name")
          args = parser.parse_args()
      
          audio_content = synthesize(args.folder_id, args.iam_token, args.text)
          with open(args.output, "wb") as f:
              f.write(audio_content)
      
    2. Execute the created file by passing arguments with the IAM token, folder ID, text, and name of the file for audio recording:

      $ export FOLDER_ID=b1gvmob95yysaplct532
      $ export IAM_TOKEN=CggaATEVAgA...
      $ python test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --output speech.raw --text "Hello World"
      
    <?
    
    const FORMAT_PCM = "lpcm";
    const FORMAT_OPUS = "oggopus";
    
    $token = 'CggaATEVAgA...'; # IAM token
    $folderId = "b1gvmob95yysaplct532"; # ID of the folder
    $url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize";
    $post = "text=" . urlencode("Hello World") . "&lang=en-US&folderId=${folderId}&sampleRateHertz=48000&format=" . FORMAT_PCM;
    $headers = ['Authorization: Bearer ' . $token];
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    if ($post !== false) {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
    }
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    
    
    $response = curl_exec($ch);
    if (curl_errno($ch)) {
        print "Error: " . curl_error($ch);
    }
    if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
        $decodedResponse = json_decode($response, true);
        echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
        echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
    } else {
        file_put_contents("speech.raw", $response);
    }
    curl_close($ch);
    
  2. Convert the file to WAV format using the SoX utility.

    $ sox -r 48000 -b 16 -e signed-integer -c 1 speech.raw speech.wav