In the last article, I showed you how to create an Azure AI Speech Service. You can use this service to write an application that creates speech from text. You can access the service via API calls, but it is easier if you use an SDK. In this article, I will show how to use the .NET Speech Service SDK to convert text into speech.

Log into the Azure Portal and navigate to a Speech Service you created. See this article) to learn how to create a Speech Service.

Fig. 1 shows the "Overview" blade of the Speech Service. This contains the region and the keys for this service. Copy and save the Region and one of the keys.

Speech Service Overview Tab
Fig. 1

To work with the SDK, your project needs to reference the Microsoft.CognitiveServices.Speech NuGet package. The following command installs the package in the current project.

dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0

This package contains two important classes: SpeechConfig and SpeechSynthesizer.

We create a SpeechConfig object, passing our Speech Service's key and region into the constructor, so that it knows where to call to handle the translation. This object also allows us to set properties, such as the voice to use and the output audio device.

Here is the code:

string aiSvcKey = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
string aiSvcRegion = "xxxxx";
SpeechConfig speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);

Replace the x's with the key and region of your service. In a real (not demo) application, you would choose to store these values in a configuration store or file, rather than in code. This is for demo purposes.

Next, we select a voice and accent in which to speak. Microsoft provides hundreds of voices in dozens of languages. You can find a full list here.

The code below sets the voice to an American male.

speechConfig.SpeechSynthesisVoiceName = "en-US-AriaNeural";

We then SpeechSynthesizer object, passing the SpeechConfig object into its constructor, as in the following code:

using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);

SpeechSynthesizer's TranscribeCommand method attempts to convert the text to speech. If successful, you will hear a voice speaking the input text in the voice coming from the output speakers you chose. The method returns a SpeechSynthesisResult object, which contains a Reason property. This can be useful to examine if something goes wrong. Here is code to call the method:

var text = "Hello World";
SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(text);
Console.WriteLine("Result reason = {speak.Reason}");

Below is the full code for a console app that allows the user to input text, then speaks that text aloud:

using Microsoft.CognitiveServices.Speech;

namespace TextToSpeechDemo
{
    class Program
    {
        static async Task Main(string[] args)
        {
            string aiSvcKey = "8cda2fb96d0c431aa975e9b103911a24";
            string aiSvcRegion = "EastUS";
            SpeechConfig speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
            speechConfig.SpeechSynthesisVoiceName = "en-US-AriaNeural";
            using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);

            Console.WriteLine("Type text to speak:");
            var text = Console.ReadLine();

            SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(text);
            Console.WriteLine("Result reason = {speak.Reason}");

        }
    }
}

In this article, you learned how to convert text to speech in a C# application, using the Azure AI Speech Service.