Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex SpeechKit
  • SpeechKit technology overview
  • Getting started
  • Speech recognition
    • About the technology
    • Streaming recognition
    • Audio file recognition
      • Synchronous recognition
      • Asynchronous recognition
    • Supported languages
    • Extending a speech recognition model
  • Speech synthesis
    • About the technology
    • List of voices
    • Text markup
      • TTS markup
      • SSML markup
      • List of supported SSML phonemes
    • SpeechKit Brand Voice
      • About SpeechKit Brand Voice technology
      • Input data format for Brand Voice Adaptive
  • Audio formats
  • IVR integration
  • Using the API
    • Authentication in the API
    • Response format
    • Troubleshooting
    • Recognition API reference guides
      • API v2
        • Streaming Recognition API
        • Synchronous Recognition API
        • Asynchronous Recognition API
      • gRPC API v3 (eng)
        • Overview
        • Recognizer
    • Example uses for the recognition API
      • Streaming recognition, API v3
      • Automatic language detection, API v3
      • Streaming recognition, API v2
      • Synchronous recognition, API v2
      • Asynchronous recognition of LPCM format, API v2
      • Asynchronous recognition of OggOpus format, API v2
    • Synthesis API reference guides
      • API v1
      • gRPC API v3 (eng)
        • Overview
        • Synthesizer
    • Example uses for the synthesis API
      • Synthesis API v3
      • Synthesize speech in WAV format using API v1
      • Synthesize speech in OggOpus format using API v1
      • Synthesize SSML text using API v1
  • SpeechKit Hybrid
    • About the technology
    • System requirements
  • Quotas and limits
  • Access management
  • Pricing policy
  • Releases
    • Recognition releases
    • Synthesis releases
    • Release archive
  • Questions and answers
    • General questions
    • Speech recognition
    • Speech synthesis
    • All questions on one page
  1. Pricing policy

Pricing for SpeechKit

Written by
Yandex Cloud
  • What goes into the cost of using SpeechKit
    • Using speech synthesis
    • Using speech recognition
  • Pricing
    • Speech synthesis
    • Hosting Brand Voice models
    • Speech recognition

To calculate the cost of using the service, use the calculator or see the prices on this page.

What goes into the cost of using SpeechKit

Using speech synthesis

The cost of using SpeechKit for speech synthesis depends on the version of the API used. For API v1, the cost is calculated based on the total number of characters sent to generate speech from text in a calendar month (Reporting period).

The number of characters in a request is determined considering spaces and special characters. The cost of an empty request is equal to the cost of one character.

The cost using API v3 depends on the number of synthesis requests sent. Speech synthesis requests have limitations — 250 characters and 24 seconds.

Using speech recognition

The cost of using SpeechKit for speech recognition depends on the recognition type and duration of a recognized audio fragment. The cost is calculated for a calendar month (Reporting period).

Streaming speech recognition

The cost of using SpeechKit streaming recognition is calculated based on the pricing rules for synchronous recognition.

Synchronous recognition

These rules apply to synchronous recognition and streaming mode recognition when using API v2 and API v3.

Billable unit — a 15-second segment of single-channel audio. Shorter segments are rounded up (1 second becomes 15 seconds).

Warning

In streaming mode, billing begins when a message with recognition settings is sent. Even if you don't send any audio after this message, it's treated as 1 consumed billable unit.

Examples:

  • 1 audio fragment that is 37 seconds is billed as 45 seconds.

    Explanation: the audio is divided into 2 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Total: 3 segments, 15 seconds each.

  • 2 audio fragments that are 5 and 8 seconds are billed as 30 seconds.

    Explanation: the length of each audio is rounded up to 15 seconds. Total: 2 segments, 15 seconds each.

Asynchronous recognition

These rules apply when using asynchronous recognition.

Billable unit — 1 second of two-channel audio. Shorter segments are rounded up. The number of channels is rounded up to an even number.

The minimum billable amount is 15 seconds for every pair of channels. Audio that is shorter is billed as 15 seconds.

Examples of rounding audio length:

Length Number of channels Seconds charged
1 second 1 15
1 second 2 15
1 second 3 30
15.5 seconds 2 16
15.5 seconds 4 32

Pricing

Speech synthesis

Service Rate for billable unit,
without VAT
Speech synthesis using API v1, for 1 million characters $10.560000
Speech synthesis using API v3, for request $0.001280

Brand Voice

Service Rate for request,
without VAT
SpeechKit Brand Voice API request $0.001280

Hosting Brand Voice models

Service Rate per month,
without VAT
SpeechKit Brand Voice Adaptive $1923.076950
SpeechKit Brand Voice Full By request

Speech recognition

Service Rate for the billable unit,
without VAT
Streaming recognition $0.001280
Synchronous file recognition $0.001280
Asynchronous file recognition $0.000128
Asynchronous file recognition, deferred mode model $0.000032

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • What goes into the cost of using SpeechKit
  • Using speech synthesis
  • Using speech recognition
  • Pricing
  • Speech synthesis
  • Hosting Brand Voice models
  • Speech recognition