Pricing for SpeechKit

To calculate the cost of using the service, use the calculator or see the prices on this page.

Note

Currency of Service rates (prices) depends on the company you made a contract with:

  • prices in US dollars are applicable to customers of Yandex Services AG,
  • prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.

What goes into the cost of using SpeechKit

Using speech synthesis

The cost of using the SpeechKit API for speech synthesis is calculated based on the voice used and the total number of characters sent to generate speech from text in a calendar month (Reporting period).

The number of characters in a request is determined considering spaces and special characters. The cost of an empty request is equal to the cost of one character.

Using speech recognition

The final cost depends on the duration of the recognized audio. Cost is calculated for a calendar month (Reporting period).

Short audio recognition

These rules apply to short audio recognition by request and in streaming mode.

Billable unit — a 15-second segment of single-channel audio. Shorter segments are rounded up (1 second becomes 15 seconds).

Important

In streaming mode, billing begins when a message with recognition settings is sent. Even if you don't send any audio after this message, it's treated as 1 consumed billable unit.

Examples:

  • 1 audio fragment that is 37 seconds is billed as 45 seconds.

    Explanation: the audio is divided into 2 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Total: 3 segments, 15 seconds each.

  • 2 audio fragments that are 5 and 8 seconds are billed as 30 seconds.

    Explanation: the length of each audio is rounded up to 15 seconds. Total: 2 segments, 15 seconds each.

Recognition of long audio fragments

These rules apply to long audio recognition.

Billable unit — 1 second of two-channel audio. Shorter segments are rounded up. The number of channels is rounded up to an even number.

The minimum billable amount is 15 seconds for every pair of channels. Audio with a shorter duration is billed as 15 seconds.

Examples of rounding audio length:

Length Number of channels Seconds charged
1 second 1 15
1 second 2 15
1 second 3 30
15.5 seconds 2 16
15.5 seconds 4 32

Pricing

Speech synthesis

Service Rate for 1 million characters, without VAT
Speech synthesis using standard voices $2.346805
Speech synthesis using premium voices $15,384615
Service Rate per 1 million characters,
with VAT
Speech synthesis using standard voices ₽183.0508
Speech synthesis using premium voices ₽1200

Speech recognition

Service Rate for the billable unit, without VAT
Short audio recognition $0.001953
Long audio recognition $0.000128
Service Rate for the billable unit, with VAT
Short audio recognition ₽0.1524
Long audio recognition ₽0.01