Pricing for SpeechKit
To calculate the cost of using the service, use the calculator or see the prices on this page.
What goes into the cost of using SpeechKit
Using speech synthesis
The cost of using SpeechKit for speech synthesis depends on the version of the API used. For API v1, the cost is calculated based on the total number of characters sent to generate speech from text in a calendar month (Reporting period).
The number of characters in a request is determined considering spaces and special characters. The cost of an empty request is equal to the cost of one character.
The cost using API v3 depends on the number of synthesis requests sent. Speech synthesis requests have limitations — 250 characters and 24 seconds.
Using speech recognition
The cost of using SpeechKit for speech recognition depends on the recognition type and duration of a recognized audio fragment. The cost is calculated for a calendar month (Reporting period).
Streaming speech recognition
The cost of using SpeechKit streaming recognition is calculated based on the pricing rules for synchronous recognition.
Synchronous recognition
These rules apply to synchronous recognition and streaming mode recognition when using API v2 and API v3.
Billable unit — a 15-second segment of single-channel audio. Shorter segments are rounded up (1 second becomes 15 seconds).
Warning
In streaming mode, billing begins when a message with recognition settings is sent. Even if you don't send any audio after this message, it's treated as 1 consumed billable unit.
Examples:
1 audio fragment that is 37 seconds is billed as 45 seconds.
Explanation: the audio is divided into 2 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Total: 3 segments, 15 seconds each.
2 audio fragments that are 5 and 8 seconds are billed as 30 seconds.
Explanation: the length of each audio is rounded up to 15 seconds. Total: 2 segments, 15 seconds each.
Asynchronous recognition
These rules apply when using asynchronous recognition.
Billable unit — 1 second of two-channel audio. Shorter segments are rounded up. The number of channels is rounded up to an even number.
The minimum billable amount is 15 seconds for every pair of channels. Audio that is shorter is billed as 15 seconds.
Examples of rounding audio length:
Length | Number of channels | Seconds charged |
---|---|---|
1 second | 1 | 15 |
1 second | 2 | 15 |
1 second | 3 | 30 |
15.5 seconds | 2 | 16 |
15.5 seconds | 4 | 32 |
Pricing
Speech synthesis
Service | Rate for billable unit, without VAT |
---|---|
Speech synthesis using API v1, for 1 million characters | $10.560000 |
Speech synthesis using API v3, for request | $0.001280 |
Brand Voice
Service | Rate for request, without VAT |
---|---|
SpeechKit Brand Voice API request | $0.001280 |
Hosting Brand Voice models
Service | Rate per month, without VAT |
---|---|
SpeechKit Brand Voice Adaptive | $1923.076950 |
SpeechKit Brand Voice Full | By request |
Speech recognition
Service | Rate for the billable unit, without VAT |
---|---|
Streaming recognition | $0.001280 |
Synchronous file recognition | $0.001280 |
Asynchronous file recognition | $0.000128 |
Asynchronous file recognition, deferred mode model | $0.000032 |