Pricing for SpeechKit
To calculate the cost of using the service, use the calculator or see the prices on this page.
Note
Currency of Service rates (prices) depends on the company you made a contract with:
- prices in US dollars are applicable to customers of Yandex Services AG,
- prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.
What goes into the cost of using SpeechKit
Using speech synthesis
The cost of using the SpeechKit API for speech synthesis is calculated based on the voice used and the total number of characters sent to generate speech from text in a calendar month (Reporting period).
The number of characters in a request is determined considering spaces and special characters. The cost of an empty request is equal to the cost of one character.
Using speech recognition
The final cost depends on the duration of the recognized audio. Cost is calculated for a calendar month (Reporting period).
Short audio recognition
These rules apply to short audio recognition by request and in streaming mode.
Billable unit — a 15-second segment of single-channel audio. Shorter segments are rounded up (1 second becomes 15 seconds).
Warning
In streaming mode, billing begins when a message with recognition settings is sent. Even if you don't send any audio after this message, it's treated as 1 consumed billable unit.
Examples:
-
1 audio fragment that is 37 seconds is billed as 45 seconds.
Explanation: the audio is divided into 2 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Total: 3 segments, 15 seconds each.
-
2 audio fragments that are 5 and 8 seconds are billed as 30 seconds.
Explanation: the length of each audio is rounded up to 15 seconds. Total: 2 segments, 15 seconds each.
Recognition of long audio fragments
These rules apply to long audio recognition.
Billable unit — 1 second of two-channel audio. Shorter segments are rounded up. The number of channels is rounded up to an even number.
The minimum billable amount is 15 seconds for every pair of channels. Audio with a shorter duration is billed as 15 seconds.
Examples of rounding audio length:
Length | Number of channels | Seconds charged |
---|---|---|
1 second | 1 | 15 |
1 second | 2 | 15 |
1 second | 3 | 30 |
15.5 seconds | 2 | 16 |
15.5 seconds | 4 | 32 |
Pricing
Speech synthesis
Service | Rate for 1 million characters, without VAT |
---|---|
Speech synthesis using standard voices | $2.346805 |
Speech synthesis using premium voices | $15,384615 |
Service | Rate per 1 million characters, with VAT |
---|---|
Speech synthesis using standard voices | ₽183.0508 |
Speech synthesis using premium voices | ₽1200 |
Speech recognition
Service | Rate for the billable unit, without VAT |
---|---|
Short audio recognition | $0.001953 |
Long audio recognition | $0.000128 |
Long audio recognition by the deferred-general model | $0.000032 |
Service | Rate for the billable unit, with VAT |
---|---|
Short audio recognition | ₽0.1524 |
Long audio recognition | ₽0.01 |
Long audio recognition by the deferred-general model | ₽0,0025 |