In the last article, I showed you how to create an Azure AI Speech Service. You can use this service to write an application that creates speech from text. You can access the service via API calls, but it is easier if you use an SDK. In this article, I will show how to use the .NET Speech Service SDK to convert speech into text.


Log into the Azure Portal and navigate to a Speech Service you created. See this article) to learn how to create a Speech Service.


Fig. 1 shows the "Overview" blade of the Speech Service. This contains the region and the keys for this service. Copy and save the Region and one of the keys.



Speech Service Overview Tab

Fig. 1


To work with the SDK, your project needs to reference the Microsoft.CognitiveServices.Speech NuGet package. The following command installs the package in the current project.


dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0

This package contains two relevant classes: SpeechConfig and SpeechRecognizer. SpeechConfig accepts our Speech Service's key and region as constructor parameters, so that it knows where to call to handle the translation. The SpeechRecognizer class takes as constructor parameters an instance of SpeechConfig and an AudioConfig object. The AudioConfig object allows you to specify the source of the speech, which can be a microphone or a file.


Here is the code:


string aiSvcKey = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
string aiSvcRegion = "xxxxx";
SpeechConfig speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

Replace the x's with the key and region of your service. In a real (not demo) application, you would choose to store these values in a configuration store or file, rather than in code. This is for demo purposes.


Finally, we call the SpeechRecognizer's RecognizeOnceAsync method to translate the spoken words into text. This method returns a SpeechRecognitionResult object that contains a Text property: The text of what it heard spoken. SpeechRecognitionResult also has a Reason property that can come in handy if an error occurs.


Here is the code:


Console.WriteLine("Speak into the default microphone!");
SpeechRecognitionResult result = await speechRecognizer.RecognizeOnceAsync();
Console.WriteLine($"You said: {result.Text}");
Console.WriteLine($"Result: {result.Reason}");

Below is the full code for a console app that allows the user to speak into a microphone, then print what was spoken:


using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace SpeechToTextDemo
{
     class Program
     {
         static async Task Main(string[] args)
        
{
             string aiSvcKey = "8cda2fb96d0c431aa975e9b103911a24";
             string aiSvcRegion = "EastUS";

            SpeechConfig speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
             using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
             using SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

            Console.WriteLine("Speak into the default microphone!");
             SpeechRecognitionResult result = await speechRecognizer.RecognizeOnceAsync();
             Console.WriteLine($"You said: {result.Text}");
             Console.WriteLine($"Result: {result.Reason}");
         }
     }
}


In this article, you learned how to convert speech to text in a C# application, using the Azure AI Speech Service.