Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
Yandex DataSphere
  • Getting started
  • Step-by-step instructions
    • All instructions
    • Project management
      • Creating a project
      • Installing dependencies
      • Managing computing resources
      • Setting up consumption limits for a project
      • Setting up consumption limits for a folder
      • Resizing project storage
      • Changing a name or description
      • Deleting a notebook or project
    • Sharing a notebook
      • Publishing a notebook
      • Exporting a project
    • Working with a notebook
      • Running sample code in a notebook
      • Versioning. Working with checkpoints
      • Clearing the interpreter state
      • Working with Git
    • Managing Docker images
      • Docker image for a project
      • Docker image in a cell
    • Connecting to data sources
      • Connecting to a ClickHouse database
      • Connecting to a PostgreSQL database
      • Connecting to S3 storage
    • Setting up integration with Data Proc
    • Working with confidential data
      • Creating a secret
      • Referencing a secret
      • Editing a secret
      • Copying a secret
      • Destroying a secret
    • Launching distributed training
    • Deploying models
      • Creating a node from a Python code cell
      • Configuring the node environment
      • Queries to nodes
  • Concepts
    • Overview
    • Project
    • List of pre-installed software
    • Available commands
    • #pragma service commands
    • Computing resource configurations
    • Integration with version and data control systems
    • Saving a state
    • Integration with Data Proc
    • Background operations
    • Datasets
    • Private data storage
    • Deploying models
    • Using TensorBoard in Yandex DataSphere
    • Distributed training
    • Cost management
    • Quotas and limits
  • Early access
    • Overview
    • Special background operations
  • Practical guidelines
    • All tutorials
    • Getting started with Yandex DataSphere
    • Voice biometrics
    • Evaluating the quality of STT models
    • Marking up audio files
    • Classification of images in video frames
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • AppTokenService
      • FolderBudgetService
      • NodeService
      • ProjectDataService
      • ProjectService
      • OperationService
    • REST
      • Overview
      • AppToken
        • Overview
        • validate
      • FolderBudget
        • Overview
        • get
        • set
      • Node
        • Overview
        • execute
      • Project
        • Overview
        • create
        • delete
        • execute
        • get
        • getCellOutputs
        • getNotebookMetadata
        • getStateVariables
        • getUnitBalance
        • list
        • open
        • setUnitBalance
        • update
  • Access management
  • Pricing policy
  • Releases
  • Questions and answers
  1. Practical guidelines
  2. Evaluating the quality of STT models

Evaluating the quality of STT models

Written by
Yandex.Cloud
  • Before you start
  • Upload the library
  • Run the test case
  • See how to fix speech recognition errors
  • Evaluate the recognition quality for multiple audio recordings at once

Speech-to-Text (STT) recognition results on the Yandex SpeechKit platform depend on the choice of recognition model. To evaluate the quality of speech recognition, use a common WER (Word Error Rate) metric. The lower the metric value, the more accurately a speech fragment is recognized. The metric in SpeechKit is calculated using a special library named stt_metrics.

To calculate the WER metric in Yandex DataSphere using this library:

  1. Upload the library.
  2. Run the test case.
  3. See how to fix speech recognition errors.
  4. Evaluate the recognition quality for multiple audio recordings at once.

Before you start

  1. Create a project in DataSphere and open it.

  2. Clone the Git repository that contains the notebooks with the Yandex Cloud API usage examples:

    https://github.com/yandex-cloud/examples.git
    

    Wait until cloning is complete. It may take some time. Once the operation is complete, in the File Browser section, a folder of the cloned repository will appear.

  3. Open the examples/speechkitpro/estimate_quality folder and review the contents of the estimate_quality.ipynb notebook. At the beginning of the notebook, the task of checking the quality of STT models and the WER (Word Error Rate) metric are described.

Upload the library

  1. Select the cell with the code in the Evaluating the quality of STT models section:

    from stt_metrics import WER, ClusterReferences
    from stt_metrics.text_transform import Lemmatizer
    
  2. Run the selected cell. To do this, choose Run → Run Selected Cells or press Shift+Enter.

  3. Wait for the operation to complete.

As a result, modules for evaluating the quality of STT models are uploaded.

Note

If you update the browser tab where the notebook is running or close it, the state of the notebook is saved. The variables and results of previous computations are not reset during these actions.

Run the test case

Go to the WER metric usage example section. The following operations are performed there:

  1. Uploading examples of:
    • Recognized speech
    • Text with markup
  2. Creating an object named WER() for processing data and calculating the metric.
  3. Creating an object with information for WER calculation.
  4. Calculating the WER metric to determine the recognition quality.
  5. Displaying calculation results:
    • The number of recognition errors.
    • The number of words in the compared texts.
    • Text alignment results.
    • The WER metric value.

To calculate the WER metric:

  1. Select all the cells with the code in the WER metric usage example section and run them.
  2. Wait for the operation to complete.

See how to fix speech recognition errors

Speech recognition errors may occur for the following reasons:

  • Markup artifacts. For example, spelling variants of the same word (such as realize and realise).
  • Different spelling of phrases. For example, the phrase theater center can be marked up as theater center, theatre center, theater centre, or theatre centre.
  • Variants of word forms. For example, gender and cases of pronouns, verb tenses, and so on.

To improve the value of the WER metric, fix the errors using the following techniques:

  • Preprocessing of the marked-up text. For example, you can delete markup artifacts.
  • Uploading a set of synonyms into the metric calculation model using the ClusterReferences() method.
  • Reducing words to their base form (lemmatization) using the Lemmatizer() method. The base form of a word is:
    • For nouns: Nominative case, singular.
    • For pronouns: Nominative case, singular.
    • For verbs: Infinitive.

To test the suggested techniques:

  1. Select all the cells with the code in the Fixing errors section and run them.
  2. Wait for the operation to complete.
  3. Check that the WER metric value decreases from 0.27 to 0.2 and 0.13 with the sequential use of the methods.

Evaluate the recognition quality for multiple audio recordings at once

Go to the WER metric usage example (aggregate) section. It shows how to calculate the WER metric simultaneously for multiple fragments of marked-up text using the evaluate_wer method. The example contains two pairs of audio files with marked-up text and recognized speech.

To test the suggested method:

  1. Select all cells with code in the WER metric usage example (aggregate) section and run them.
  2. Wait for the operation to complete.
  3. Make sure that the WER metric is calculated for two marked-up texts.

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Before you start
  • Upload the library
  • Run the test case
  • See how to fix speech recognition errors
  • Evaluate the recognition quality for multiple audio recordings at once