SpeechKit pricing

Written by

Updated at February 16, 2024

What goes into the cost of using SpeechKit
Pricing

To calculate the cost of using the service, use the calculator on the Yandex Cloud website or see the prices in this section.

What goes into the cost of using SpeechKit

Using speech synthesis

The cost of using SpeechKit for speech synthesis depends on the version of the API you use.

API v1

For the API v1, the cost is calculated based on the total number of characters sent to generate speech from text in a calendar month (Reporting period).

API v3

The cost of using the API v3 depends on the number of synthesis requests sent. The cost is calculated for a calendar month (Reporting period).

By default, speech synthesis requests have such limitations as 250 characters and 24 seconds. For synthesis of longer phrases, you can use unsafe_mode. In this case, you will be charged per 250 characters, e.g.:

A request that is shorter than 250 characters is charged for as a single billing unit.
A request that is from 250 to 500 characters long is charged for as two billing units.
A request that is from 500 to 750 characters long is charged for as three billing units.

Empty request

The number of characters in a request is determined considering spaces and special characters. The cost of an empty request depends on the API version:

An empty request to the API v1 is charged for as a single character.
An empty request to the API v3 is charged for as a single billing unit.

Internal server errors

You are not charged for a request that fails due to an internal server error.

Using speech recognition

The cost of using SpeechKit for speech recognition depends on the recognition type and duration of a recognized audio fragment. The cost is calculated for a calendar month (Reporting period).

Streaming speech recognition

The cost of using SpeechKit streaming recognition is calculated based on the pricing rules for synchronous recognition.

Synchronous recognition

These rules apply to synchronous recognition and streaming mode recognition when using the API v2 and API v3.

The billing unit is a 15-second segment of a single-channel audio file. Shorter segments are rounded up (1 second becomes 15 seconds).

Warning

In streaming mode, billing begins when a message with recognition settings is sent. Even if you do not send any audio after this message, it will be treated as one consumed billing unit.

Examples:

One audio fragment that is 37 seconds long is billed as 45 seconds.

Explanation: The audio is divided into two 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Thus, we have three segments, 15 seconds each.

Two audio fragments that are 5 and 8 seconds long are billed as 30 seconds.

Explanation: The length of each audio is rounded up to 15 seconds. Thus, we have two segments, 15 seconds each.

Asynchronous recognition

These rules apply when using asynchronous recognition.

The billing unit is a one-second segment of a two-channel audio file. Shorter segments are rounded up. The number of channels is rounded up to an even number.

The minimum billable amount is 15 seconds for every pair of channels. Shorter audio fragments are billed as 15 seconds.

Examples of rounding audio length:

Length	Number of channels	Seconds charged
1 second	1	15
1 second	2	15
1 second	3	30
15.5 seconds	2	16
15.5 seconds	4	32

Number of characters sent per month: 2,023

Cost calculation:

2,023 × (10.56 / 1,000,000) = $ 0.02

Total: $ 0.02.

Where:

10.56 is the cost per million characters.
10.56 / 1,000,000 is the cost per character.

Speech synthesis using the API v3

The cost of using SpeechKit for speech synthesis using the API v3 with the following parameters:

Number of requests sent: 3
Number of characters in requests: 150, 300, 600

Cost calculation:

(1 + 2 + 3) × 0.001280 = $ 0.00768

Total: $ 0.00768

Where:

1 is the number of billing units charged for the first request of 150 characters.
2 is the number of billing units charged for the second request of 300 characters made using unsafe_mode.
3 is the number of billing units charged for the third request of 600 characters made using unsafe_mode.
0.001280: Cost per billing unit.

Streaming speech recognition

The cost of using SpeechKit for streaming speech recognition with the following parameters:

Number of audio fragments: 2
Audio fragment duration: 5 seconds, 37 seconds

Cost calculation:

((1 + 1) + (3 + 1)) × 0.001280 = $ 0.00768

Total: $ 0.00768

Where:

(1 + 1) is the number of billing units charged for the first 5-second audio fragment rounded up to 15 seconds with recognition settings message considered.
(3 + 1) is the number of billing units charged for the second 37-second audio fragment rounded up to 45 seconds with recognition settings message considered.
0.001280: Cost per billing unit.

Synchronous speech recognition

The cost of using SpeechKit for synchronous speech recognition with the following parameters:

Number of audio fragments: 2
Audio fragment duration: 5 seconds, 37 seconds

Cost calculation:

(1 + 3) × 0.001280 = $ 0.00512

Total: $ 0.00512

Where:

1 is the number of billing units charged for the first 5-second audio fragment rounded up to 15 seconds.
3 is the number of billing units charged for the second 37-second audio fragment rounded up to 45 seconds.
0.001280: Cost per billing unit.

Asynchronous speech recognition

The cost of using SpeechKit for asynchronous speech recognition with the following parameters:

Number of audio fragments: 4
Audio fragment duration: 5 seconds, 5 seconds, 15.5 seconds, 15.5 seconds
Number of channels in audio fragments: 1, 3, 2, 4

Cost calculation:

(15 + 30 + 16 + 32) × 0.000128 = $ 0.011904

Total: $ 0.011904

Where:

15 is the number of billing units charged for the first single-channel 5-second audio fragment rounded up to 2 channels and 15 seconds.
30 is the number of billing units charged for the second 3-channel 5-second audio fragment rounded up to 4 channels and 15 seconds.
16 is the number of billing units charged for the third 2-channel 15.5-second audio fragment rounded up to 16 seconds.
32 is the number of billing units charged for the fourth 4-channel 15.5-second audio fragment rounded up to 16 seconds.
0.000128: Cost per billing unit.

Asynchronous speech recognition in deferred mode

The cost of using SpeechKit for asynchronous speech recognition in deferred mode with the following parameters:

Number of audio fragments: 3
Audio fragment duration: 2 seconds, 14 seconds, 19.5 seconds
Number of channels in audio fragments: 2, 3, 4

Cost calculation:

(15 + 30 + 40) × 0.000032 = $ 0.00272

Total: $ 0.00272

Where:

15 is the number of billing units charged for the first 2-channel 2-second audio fragment rounded up to 15 seconds.
30 is the number of billing units charged for the second 3-channel 14-second audio fragment rounded up to 4 channels and 15 seconds.
40 is the number of billing units charged for the third 4-channel 19.5-second audio fragment rounded up to 20 seconds.
0.000032: Cost per billing unit.

Pricing

Speech synthesis

Service	Rate for the billable unit, without VAT
Speech synthesis using API v1, for 1 million characters	$10.560000
Speech synthesis using API v3, per request	$0.001280

SpeechKit Brand Voice

Service	Price per unit, without VAT
SpeechKit Brand Voice Self Service model hosting, per month	Contact us
SpeechKit Brand Voice Premium model hosting, per month	Contact us
Request to SpeechKit Brand Voice Call Center model	$0.001280
Request to SpeechKit Brand Voice Self Service model	$0.001280
Request to SpeechKit Brand Voice Premium model	$0.001280

Speech recognition

Service	Rate for the billable unit, without VAT
Streaming recognition	$0.001280
Synchronous file recognition	$0.001280
Asynchronous file recognition	$0.000080
Asynchronous file recognition, deferred mode model	$0.000020

SpeechKit pricing

What goes into the cost of using SpeechKit

Using speech synthesis

API v1

API v3

Empty request

Internal server errors

Using speech recognition

Streaming speech recognition

Synchronous recognition

Asynchronous recognition

Empty request

Internal server errors

Examples of cost calculation

Speech synthesis using API v1

Speech synthesis using the API v3

Streaming speech recognition

Synchronous speech recognition

Asynchronous speech recognition

Asynchronous speech recognition in deferred mode

Pricing

Speech synthesis

SpeechKit Brand Voice

Speech recognition

Was the article helpful?

SpeechKit pricing

What goes into the cost of using SpeechKitWhat goes into the cost of using SpeechKit

Using speech synthesisUsing speech synthesis

API v1API v1

API v3API v3

Empty requestEmpty request

Internal server errorsInternal server errors

Using speech recognitionUsing speech recognition

Streaming speech recognitionStreaming speech recognition

Synchronous recognitionSynchronous recognition

Asynchronous recognitionAsynchronous recognition

Empty requestEmpty request

Internal server errorsInternal server errors

Examples of cost calculationExamples of cost calculation

Speech synthesis using API v1Speech synthesis using API v1

Speech synthesis using the API v3Speech synthesis using the API v3

Streaming speech recognitionStreaming speech recognition

Synchronous speech recognitionSynchronous speech recognition

Asynchronous speech recognitionAsynchronous speech recognition

Asynchronous speech recognition in deferred modeAsynchronous speech recognition in deferred mode

PricingPricing

Speech synthesisSpeech synthesis

SpeechKit Brand VoiceSpeechKit Brand Voice

Speech recognitionSpeech recognition

Was the article helpful?

What goes into the cost of using SpeechKit

Using speech synthesis

API v1

API v3

Empty request

Internal server errors

Using speech recognition

Streaming speech recognition

Synchronous recognition

Asynchronous recognition

Empty request

Internal server errors

Examples of cost calculation

Speech synthesis using API v1

Speech synthesis using the API v3

Streaming speech recognition

Synchronous speech recognition

Asynchronous speech recognition

Asynchronous speech recognition in deferred mode

Pricing

Speech synthesis

SpeechKit Brand Voice

Speech recognition