Batch Transcription using Azure Speech Service

Amit Damle
3 min readMay 11, 2020

--

Online conferencing is a new way to keep going in current situation. Even most of educational institutions started distant learning. My teenage son who is adapting to this change, complained to me about the information overload and his preference to read over hearing the content. I started wondering most of the companies who collect the audio logs from their customer care will end up having same situation. If you are tasked with the requirement to convert audio to Text or Perform sentiment analysis then have a look at following solution

Azure Speech to Text API provides batch transcription option to satisfy following requirements —

1. Batch transcription of audio files
2. Get customer sentiments from the audio calls
3. Identify specific intent(using LUIS) and act on it

Prerequisite —

1. Azure Subscription

2. Azure Speech Service Subscription

What is Azure Speech Service?

Speech Service facilitates conversion from speech-to-text, text-to-speech, and speech-translation. It provides Speech SDK, Device SDK and REST API to perform operations.

Following table shows features that are supported by SDK and APIs

Reference — https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview

For more information on Azure Speech Service Please refer Azure Speech Service

Now that we are familiar with Speech Service lets get back to solving the problem Following architecture will provide a base solution that can be customized to suite your need while performing Speech Transcription or Sentiment Analysis from Audio files —

Notes

1. Supported Audio Formats —

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#the-batch-transcription-api

2. SAS URL for Recordings —

At the time of writing Speech API version 2.0 does not support blob container URL or list of SAS URLs for files to be transcribed. Code that sends transcription request should list all files, create SAS URIs and create individual transcription requests for each audio file.

3. Sentiment Analysis —

Provides a quick sentiments with Negative, Neutral and Positive as output, for advance sentiment analysis users need to use the dedicated Text Analytic service part of Microsoft cognitive services

4. Polling For Response —

Speech API is an asynchronous API that returns 202 status code in response to the POST request. 202 Specifies request is accepted and is being processed. It also provides location and retry-after attributes in return. Location specifies the URI to be called to get the response & retry-after can be used for setting the polling frequency. HTTP Polling can be done using custom code or using Azure PaSS services e.g. Durable functions , Logic Apps etc.

References

Speech Service Overview

Swagger APIs

Azure Function Reference (Python)

--

--

No responses yet