Speaker labeling in recognition results

Written by

Updated at February 12, 2024

Note

You can test speaker labeling in recognition results in the general:rc version.

In recognition results, the API v3 can specify which speaker uttered each recognized phrase.

Speaker labeling is only available for recognition in the FULL_DATA mode for mono records. Recognition results may not feature more than two speakers.

To enable speaker labeling, use the following session parameters:

Python 3

recognize_options = stt_pb2.StreamingOptions(
  speaker_labeling=stt_pb2.SpeakerLabelingOptions(
    # Enable speaker labeling
      speaker_labeling=stt_pb2.SpeakerLabelingOptions.SPEAKER_LABELING_ENABLED
  ),
  recognition_model=stt_pb2.RecognitionModelOptions(
    # Recognition model version
      model="general:rc",
      audio_format=stt_pb2.AudioFormatOptions(
          container_audio=stt_pb2.ContainerAudio(
              container_audio_type=stt_pb2.ContainerAudio.WAV
          )
      ),
    # Recognition mode
      audio_processing_type=stt_pb2.RecognitionModelOptions.FULL_DATA
      )
  )

You will see channel_tag labels in recognition results, with the values of either 0 or 1. Each value refers to a single speaker. You can process the results as follows:

Python 3

try:
    for r in it:
        event_type, alternatives = r.WhichOneof('Event'), None
        if event_type == 'final':
            alternatives = [a.text for a in r.final.alternatives]
        elif event_type == 'final_refinement':
            alternatives = [a.text for a in r.final_refinement.normalized_text.alternatives]
        else:
            continue
        print(f'type={event_type}, alternatives={alternatives}, channel_tag = {r.channel_tag}')
except grpc._channel._Rendezvous as err:
    print(f'Error code {err._state.code}, message: {err._state.details}')
    raise err

Speaker labeling in recognition results

Was the article helpful?