Coding Grounds: May 2024

Saturday 18 May 2024

Discriminated Union Part Two - The C# side of things

In this article , discriminated unions will be further looked into, continuing from the last article. It visited these topics using F#. The previous article showing the previous article focused on F# and discriminated unions is available here:

https://toreaurstad.blogspot.com/2024/05/discriminated-unions-part-one-f-side-of.html

In this article, C# will be used. As last noted, discriminated unions are a set of types that are allowed to be used. In F#, these types dont have to be in an inheritance chain, they can really be a mix of different types. In C# however, one has to use a base type for the union itself and declare this as abstract, i.e. a placeholder for our discriminated union, called DU from now in this article. C# is a mix of object oriented and functional programming language. It does not support discriminated unions as built-in constructs, such as F#. We must use object inheritance still, but pattern matching in C# with type testing. Lets first look at the POCOs that are included in this example, we must use a base class for our union. In F# we had this:



type Shape =
    | Rectangle of width : float * length : float
    | Circle of radius : float
    | Prism of width : float * depth:float * height : float
    | Cube of width : float

In C# we use an abstract record, since they possess immutability after construction has been made and are therefore a good match for functional programming (FP). Also, records offer a compact syntax which lets itself nice to FP. (we COULD use an abstract class too, but records are now available and lends themsevles better to FP since they are immutable after construction is finished). We could define this abstract record baseclass which will function as a Discriminated Union (DU) like:



public abstract record Shape;

However, keeping note of which types are allowed into the DU is easier if we nest our types. I have also included the methods on the Shape objects as static methods that uses pattern matching with type testing to define the bounds of our DU.




public abstract record Shape
{

	public record Rectangle(float Width, float Length) : Shape;
	public record Circle(float Radius) : Shape;
	public record Prism(float Width, float Depth, float Length) : Shape;
	public record Cube(float Width) : Shape;
	public record Torus(float LargeRadius, float SmallRadius) : Shape; //we will discriminate this shape, not include it in our supported calculations

	public static double CalcArea(Shape shape) => shape switch
	{
		Rectangle rect => rect.Width * rect.Length,
		Circle circ => Math.PI * Math.Pow(circ.Radius, 2),
		Prism prism => 2.0*(prism.Width*prism.Depth) + 2.0*(prism.Width+prism.Depth)*prism.Length,
		Cube cube => 6 * Math.Pow(cube.Width, 2),
		_ => throw new NotSupportedException($"Area calculation for this Shape: ${shape.GetType()}")
	};

	public static double CalcVolume(Shape shape) => shape switch
	{
		Prism prism => prism.Width * prism.Depth * prism.Length,
		Cube cube => Math.Pow(cube.Width, 3),
		_ => throw new NotSupportedException($"Volume calculation for this Shape: ${shape.GetType()}")
	};

};

Sample code of using this source code is shown below:



void Main()
{
	var torus = new Shape.Torus(LargeRadius: 7, SmallRadius: 3);
	//var torusArea = Shape.CalcArea(torus);

	var rect = new Shape.Rectangle(Width: 1.3f, Length: 10.0f);
	var circle = new Shape.Circle(Radius: 2.0f);
	var prism = new Shape.Prism(Width: 15, Depth: 5, Length: 7);
	var cube = new Shape.Cube(Width: 2.0f);

	var rectArea = Shape.CalcArea(rect);
	var circleArea = Shape.CalcArea(circle);
	var prismArea = Shape.CalcArea(prism);
	var cubeArea = Shape.CalcArea(cube);

	//var circleVolume = Shape.CalcVolume(circle);
	var prismVolume = Shape.CalcVolume(prism);
	var cubeVolume = Shape.CalcVolume(cube);
	//var rectVolume = Shape.CalcVolume(rect);

	Console.WriteLine("\nAREA CALCULATIONS:");
	Console.WriteLine($"Circle area: {circleArea:F2}");
	Console.WriteLine($"Prism area: {prismArea:F2}");
	Console.WriteLine($"Cube area: {cubeArea:F2}");
	Console.WriteLine($"Rectangle area: {rectArea:F2}");

	Console.WriteLine("\nVOLUME CALCULATIONS:");
	//Console.WriteLine( "Circle volume: %A", circleVolume);
	Console.WriteLine($"Prism volume: {prismVolume:F2}");
	Console.WriteLine($"Cube volume: {cubeVolume:F2}");
	//Console.WriteLine( "Rectangle volume: %A", rectVolume);
}

I have commented out some lines here, they will throw an UnsupportedException if one uncomments them running the code. The torus forexample lacks support for area and volume calculation by intent, it is not supported (yet). The calculations of the volume of a circle and a rectangle is not possible, since they are 2D geometric figures and not 3D, i.e. do not posess a volume. Output from running the program is shown below:


AREA CALCULATIONS:
Circle area: 12,57
Prism area: 430,00
Cube area: 24,00
Rectangle area: 13,00

VOLUME CALCULATIONS:
Prism volume: 525,00
Cube volume: 8,00

Conclusions F# vs C#

True support for DU is only available in F#, but we can get close to it using C#, inheritance, pattern matching with type checking. F# got much better support for it for now, but C# probably will catch up in a few years and also finally get support for it as a built-in construct. The syntax for DU in F# an C# is fairly similar, using records and pattern switching with type checking makes the code in C# not longer than in F#, but F# got direct support for DU, in C# we have to add additional code to support something that is a built-in functionality of F#. Listed on the page What's new in C# 13, DU has not made their way into the list, .NET 9 Preview SDK will be available probably in November this year (2024).

https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-13

There are different approaches to writing DU in C# for now. Some go for the OneOf operator of functional programming, not presented further in this article. Probably discriminated unions will make their way in .NET 10 in November 2027, so there will still be a lot of waiting around for getting this feature into C#. For now, being aware what the buzz about DU is all about, my two articles on it hopefully made it a bit clearer. One disadvantage of this is that it’s not consistent like in F#. We have to manually manage which types we want to support in each method. However, this is done using inheritance in C#. At the same time, we need to adjust the inheritance hierarchy so that all types inherit from such a discriminated union (DU). If a type needs to be part of MULTIPLE different DUs, we face limitations in C# since we can only inherit from a specific type in the hierarchy. This is likely why many C# developers are requesting DU functionality. As of now, Microsoft’s language team seems to be leaning toward something called ENUM CLASSES. It appears that this feature will be included in .NET 10, which means it won’t be available until 2027

Further viewing/reading of the topic

There are proposals for better support of DU in C# is taking its form now in concrete propals. A proposal for Enum classes are available here, it could be the design choice C# language team lands on:

https://github.com/dotnet/csharplang/blob/main/proposals/discriminated-unions.md

Lead Designer Mads Torgersen comments around DU in C# in this video at 21:00 :

https://learn.microsoft.com/en-us/shows/ask-the-expert/ask-the-expert-whats-new-in-c-100

Saturday 11 May 2024

Discriminated Unions Part One - The F# side of things

I decided to look more into what the discussion of Discriminated unions in C#, or their lack of it is all about. I will first look at the F# side of things. How can we create a discriminated union in F# ? And then I will look at how we can implement the F# program in C# in the next article for the topic Discriminated unions. In this article we will look at some F# code that shows how discriminated unions are built-in supported in F#. Discriminated unions are special containers that can hold different types. This is not supported in C# without adding some additional plumbing code and it is not considered true discriminated unions, although in C# we can get close to Discriminated unions. For the rest of the article, we will call discriminated unions for DU. Let's first declare a DU in F# that describes different types of geometric figures.



type Shape =
    | Rectangle of width : float * length : float
    | Circle of radius : float
    | Prism of width : float * depth:float * height : float
    | Cube of width : float

The '*' operator in F# means when it is used in type definitions above as a separator of the properties that each type got,
e.g. Rectangle of width : float * length : float means
that the type Rectangle got two properties, width of type float and length of the same type.

Let's add some methods to our F# program, calculating area and calculating volume. We also want our F# to be fault tolerant so either we get a result or we get an error, for example this additional DU which is also generic.


type Result<'T> =
    | Success of 'T
    | Error of string

We also neeed a way to print errors if we want to not crash the program, say if want to calculate the volume of a circle or a rectangle, which is not supported since it is 2D figures.



let handleResult (result: Result<float>) =
    match result with
    | Success value -> printfn "%A" value
    | Error msg -> printfn "Error: %s" msg; () // Return NaN for error cases

To add some functionality to the discriminated unions we add the module below:



module ShapeOperations =
    let CalcArea(shape : Shape) : Result<float> =
        match shape with 
        | Rectangle (width, length) -> Success(width * length)
        | Circle (radius) -> Success(Math.PI * radius**2)
        | Prism (width, depth, height) -> (2.0*(width*depth) + 2.0*(width+depth)*height)
        | Cube (width) -> Success(6.0 * width * width)
        // | _ -> failwith "Area calculation is not supported"
    let CalcVolume(shape : Shape) : Result<float> = 
        match shape with 
        | Prism (width, height, depth) -> Success(width * height * depth)
        | Cube (width) -> Success(width**3)        
        | _ -> Error(sprintf "Volume calculation is not supported  for: %A" shape)

The rest of the code is shown below where we instantiate geometric figures and calculate the area and volume of them and output their values.


 
let rect = Rectangle(length = 1.3, width = 10.0)
let circle = Circle (2.0)
let prism = Prism(width = 15, depth = 5.0, height = 7.0)
let cube = Cube(3)

let rectArea = ShapeOperations.CalcArea rect 
let circleArea = ShapeOperations.CalcArea circle
let prismArea = ShapeOperations.CalcArea prism
let cubeArea = ShapeOperations.CalcArea cube

let circleVolume = handleResult (ShapeOperations.CalcVolume circle)
let prismVolume = ShapeOperations.CalcVolume prism
let cubeVolume = ShapeOperations.CalcVolume cube
let rectVolume = ShapeOperations.CalcVolume rect

printfn "\nAREA CALCULATIONS:"
printfn "Circle area: %A" circleArea
printfn "Prism area: %A" prismArea 
printfn "Cube area: %A" cubeArea 
printfn "Rectangle area %A" rectArea 

printfn "\nVOLUME CALCULATIONS:"
printfn "Circle volume: %A" circleVolume 
printfn "Prism volume: %A" prismVolume 
printfn "Cube volume: %A" cubeVolume 
printfn "Rectangle volume: %A" rectVolume

We get this output after running the program :



Error: Volume calculation is not supported  for: Circle 2.0

AREA CALCULATIONS:
Circle area: Success 12.56637061
Prism area: Success 430
Cube area: Success 18.0
Rectangle area Success 13.0

VOLUME CALCULATIONS:
Circle volume: ()
Prism volume: Success 525.0
Cube volume: Success 27.0
Rectangle volume: Error "Volume calculation is not supported  for: Rectangle (10.0, 1.3)"

As we can see, creating DUs in F# is easy, we use the '|' operator to define multiple types and we can create generic DUs too and match different types with functional expressions. In the next article we will look at the code shown here and test out if we can recreate it in C# using different constructs. C# has gotten more support of functional programming in 2020 and most likely it will involve records, pattern matching (newer switch based syntax) and extension methods.

Thursday 9 May 2024

Azure Cognitive Synthesized Text To Speech with voice styles

Using Azure Cognitive Services, it is possible to translate text into other languages and also synthesize the text to speech. It is also possible to add voice effects such as style of the voice. This adds more realism by adding emotions to a synthesized voice. The voice is already trained by neural net training and adding voice style makes the synthesized speech even more realistic and multi-purpose. The Github repo for this is available here as .NET Maui Blazor client written with .NET 8 :

MultiLingual translator DEMO Github repo

Not all the voices supported in Azure Cognitive Services do support voice effects. An overview of which voices are shown here:

https://learn.microsoft.com/nb-no/azure/ai-services/speech-service/language-support?tabs=tts#voice-styles-and-roles

More and more synthetic voices in Azure Cognitive Services gets more and more voice styles which express emotions. For now, most of the voices are either english (en-US) or chinese (zh-CN) and a few other languages got some few voices supporting styles. This will most likely be improved into the future where these neural net trained voices are trained in voice styles or some generic voice style algorithm is achieved that can infer emotions on a generic level, although that still sounds a bit sci-fi.

Azure Cognitive Text-To-Speech Voices with support for emotions / voice styles

Voice	Styles	Roles
de-DE-ConradNeural1	cheerful	Not supported
en-GB-SoniaNeural	cheerful, sad	Not supported
en-US-AriaNeural	angry, chat, cheerful, customerservice, empathetic, excited, friendly, hopeful, narration-professional, newscast-casual, newscast-formal, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-DavisNeural	angry, chat, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-GuyNeural	angry, cheerful, excited, friendly, hopeful, newscast, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-JaneNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-JasonNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-JennyNeural	angry, assistant, chat, cheerful, customerservice, excited, friendly, hopeful, newscast, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-NancyNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-SaraNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-TonyNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
es-MX-JorgeNeural	chat, cheerful	Not supported
fr-FR-DeniseNeural	cheerful, sad	Not supported
fr-FR-HenriNeural	cheerful, sad	Not supported
it-IT-IsabellaNeural	chat, cheerful	Not supported
ja-JP-NanamiNeural	chat, cheerful, customerservice	Not supported
pt-BR-FranciscaNeural	calm	Not supported
zh-CN-XiaohanNeural	affectionate, angry, calm, cheerful, disgruntled, embarrassed, fearful, gentle, sad, serious	Not supported
zh-CN-XiaomengNeural	chat	Not supported
zh-CN-XiaomoNeural	affectionate, angry, calm, cheerful, depressed, disgruntled, embarrassed, envious, fearful, gentle, sad, serious	Boy, Girl, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, YoungAdultFemale, YoungAdultMale
zh-CN-XiaoruiNeural	angry, calm, fearful, sad	Not supported
zh-CN-XiaoshuangNeural	chat	Not supported
zh-CN-XiaoxiaoNeural	affectionate, angry, assistant, calm, chat, chat-casual, cheerful, customerservice, disgruntled, fearful, friendly, gentle, lyrical, newscast, poetry-reading, sad, serious, sorry, whisper	Not supported
zh-CN-XiaoyiNeural	affectionate, angry, cheerful, disgruntled, embarrassed, fearful, gentle, sad, serious	Not supported
zh-CN-XiaozhenNeural	angry, cheerful, disgruntled, fearful, sad, serious	Not supported
zh-CN-YunfengNeural	angry, cheerful, depressed, disgruntled, fearful, sad, serious	Not supported
zh-CN-YunhaoNeural2	advertisement-upbeat	Not supported
zh-CN-YunjianNeural3,4	angry, cheerful, depressed, disgruntled, documentary-narration, narration-relaxed, sad, serious, sports-commentary, sports-commentary-excited	Not supported
zh-CN-YunxiaNeural	angry, calm, cheerful, fearful, sad	Not supported
zh-CN-YunxiNeural	angry, assistant, chat, cheerful, depressed, disgruntled, embarrassed, fearful, narration-relaxed, newscast, sad, serious	Boy, Narrator, YoungAdultMale
zh-CN-YunyangNeural	customerservice, narration-professional, newscast-casual	Not supported
zh-CN-YunyeNeural	angry, calm, cheerful, disgruntled, embarrassed, fearful, sad, serious	Boy, Girl, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, YoungAdultFemale, YoungAdultMale
zh-CN-YunzeNeural	angry, calm, cheerful, depressed, disgruntled, documentary-narration, fearful, sad, serious	OlderAdultMale, SeniorMale

Screenshot from the DEMO showing its user interface. You enter the text to translate at the top and the language of the text is detected using Azure Cognitive Services text detection functionality. And you can then select which language to translate the text into. It will call a REST call to Azure Cognitive Services to translate the text. And it is also possible to hear the speech of the text. Now, it is also added to add voice style. Use the table shown above to select a voice actor that supports a voice style you want to test. As noted, voice styles are still limited to a few languages and voice actors supporting emotions or voice styles. You will hear the voice from the voice actor in a normal mood or voice style if additional emotions or voice styles are not supported.

Let's look at some code for this DEMO too. You can study the Github repo and clone it to test it out yourself. The TextToSpeechUtil class handles much of the logic of creating voice from text input and also create the SSML-XML contents and performt the REST api call to create the voice file. Note that SSML mentioned here, is the Speech Synthesis Markup Language (SSML). The SSML standard is documented here on MSDN, it is a standard adopted by others too including Google.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup




using Microsoft.Extensions.Configuration;
using MultiLingual.Translator.Lib.Models;
using System;
using System.Security;
using System.Text;
using System.Xml.Linq;
using static System.Runtime.InteropServices.JavaScript.JSType;

namespace MultiLingual.Translator.Lib
{
    public class TextToSpeechUtil : ITextToSpeechUtil
    {

        public TextToSpeechUtil(IConfiguration configuration)
        {
            _configuration = configuration;
        }

        public async Task<TextToSpeechResult> GetSpeechFromText(string text, string language, TextToSpeechLanguage[] actorVoices, 
            string? preferredVoiceActorId, string? preferredVoiceStyle)
        {
            var result = new TextToSpeechResult();

            result.Transcript = GetSpeechTextXml(text, language, actorVoices, preferredVoiceActorId, preferredVoiceStyle, result);
            result.ContentType = _configuration[TextToSpeechSpeechContentType];
            result.OutputFormat = _configuration[TextToSpeechSpeechXMicrosoftOutputFormat];
            result.UserAgent = _configuration[TextToSpeechSpeechUserAgent];
            result.AvailableVoiceActorIds = ResolveAvailableActorVoiceIds(language, actorVoices);
            result.LanguageCode = language;

            string? token = await GetUpdatedToken();

            HttpClient httpClient = GetTextToSpeechWebClient(token);

            string ttsEndpointUrl = _configuration[TextToSpeechSpeechEndpoint];
            var response = await httpClient.PostAsync(ttsEndpointUrl, new StringContent(result.Transcript, Encoding.UTF8, result.ContentType));

            using (var memStream = new MemoryStream()) {
                var responseStream = await response.Content.ReadAsStreamAsync();
                responseStream.CopyTo(memStream);
                result.VoiceData = memStream.ToArray();
            }

            return result;
        }

        private async Task<string?> GetUpdatedToken()
        {
            string? token = _token?.ToNormalString();
            if (_lastTimeTokenFetched == null || DateTime.Now.Subtract(_lastTimeTokenFetched.Value).Minutes > 8)
            {
                token = await GetIssuedToken();
            }

            return token;
        }

        private HttpClient GetTextToSpeechWebClient(string? token)
        {
            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", token);
            httpClient.DefaultRequestHeaders.Add("X-Microsoft-OutputFormat", _configuration[TextToSpeechSpeechXMicrosoftOutputFormat]);
            httpClient.DefaultRequestHeaders.Add("User-Agent", _configuration[TextToSpeechSpeechUserAgent]);
            return httpClient;
        }
       
        public string GetSpeechTextXml(string text, string language, TextToSpeechLanguage[] actorVoices, string? preferredVoiceActorId,
              string? preferredVoiceStyle, TextToSpeechResult result)
        {
            result.VoiceActorId = ResolveVoiceActorId(language, preferredVoiceActorId, actorVoices);
            string speechXml = $@"
            <speak version='1.0' xml:lang='en-US' xmlns:mstts='https://www.w3.org/2001/mstts'>
                <voice xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice {result.VoiceActorId}'>
                    <prosody rate='1'>{text}</prosody>
                </voice>
            </speak>";

            speechXml = AddVoiceStyleEffectIfDesired(preferredVoiceStyle, speechXml);

            return speechXml;
        }

        /// <summary>
        /// Adds voice style / expression to the SSML markup for the voice
        /// </summary>
        private static string AddVoiceStyleEffectIfDesired(string? preferredVoiceStyle, string speechXml)
        {
            if (!string.IsNullOrWhiteSpace(preferredVoiceStyle) && preferredVoiceStyle != "normal-neutral")
            {
                var voiceDoc = XDocument.Parse(speechXml); //https://learn.microsoft.com/nb-no/azure/ai-services/speech-service/speech-synthesis-markup-voice#use-speaking-styles-and-roles

                XElement? prosody = voiceDoc.Descendants("prosody").FirstOrDefault();
                if (prosody?.Value != null)
                {
                    // Create the <mstts:express-as> element, for now skip the ':' letter and replace at the end

                    var expressedAsWrappedElement = new XElement("msttsexpress-as",
                        new XAttribute("style", preferredVoiceStyle));
                    expressedAsWrappedElement.Value = prosody!.Value;
                    prosody?.ReplaceWith(expressedAsWrappedElement);
                    speechXml = voiceDoc.ToString().Replace(@"msttsexpress-as", "mstts:express-as");
                }
            }

            return speechXml;
        }

        private List<string> ResolveAvailableActorVoiceIds(string language, TextToSpeechLanguage[] actorVoices)
        {
            if (actorVoices?.Any() == true)
            {
                var voiceActorIds = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
                return voiceActorIds;
            }
            return new List<string>();
        }

        private string ResolveVoiceActorId(string language, string? preferredVoiceActorId, TextToSpeechLanguage[] actorVoices)
        {
            string actorVoiceId = "(en-AU, NatashaNeural)"; //default to a select voice actor id 
            if (actorVoices?.Any() == true)
            {
                var voiceActorsForLanguage = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
                if (voiceActorsForLanguage != null)
                {
                    if (voiceActorsForLanguage.Any() == true)
                    {
                        var resolvedPreferredVoiceActorId = voiceActorsForLanguage.FirstOrDefault(v => v == preferredVoiceActorId);
                        if (!string.IsNullOrWhiteSpace(resolvedPreferredVoiceActorId))
                        {
                            return resolvedPreferredVoiceActorId!;
                        }
                        actorVoiceId = voiceActorsForLanguage.First();
                    }
                }
            }
            return actorVoiceId;
        }

        private async Task<string> GetIssuedToken()
        {
            var httpClient = new HttpClient();
            string? textToSpeechSubscriptionKey = Environment.GetEnvironmentVariable("AZURE_TEXT_SPEECH_SUBSCRIPTION_KEY", EnvironmentVariableTarget.Machine);
            httpClient.DefaultRequestHeaders.Add(OcpApiSubscriptionKeyHeaderName, textToSpeechSubscriptionKey);
            string tokenEndpointUrl = _configuration[TextToSpeechIssueTokenEndpoint];
            var response = await httpClient.PostAsync(tokenEndpointUrl, new StringContent("{}"));
            _token = (await response.Content.ReadAsStringAsync()).ToSecureString();
            _lastTimeTokenFetched = DateTime.Now;
            return _token.ToNormalString();
        }

        public async Task<List<string>> GetVoiceStyles()
        {
            var voiceStyles = new List<string>
            {
                "normal-neutral",
                "advertisement_upbeat",
                "affectionate",
                "angry",
                "assistant",
                "calm",
                "chat",
                "cheerful",
                "customerservice",
                "depressed",
                "disgruntled",
                "documentary-narration",
                "embarrassed",
                "empathetic",
                "envious",
                "excited",
                "fearful",
                "friendly",
                "gentle",
                "hopeful",
                "lyrical",
                "narration-professional",
                "narration-relaxed",
                "newscast",
                "newscast-casual",
                "newscast-formal",
                "poetry-reading",
                "sad",
                "serious",
                "shouting",
                "sports_commentary",
                "sports_commentary_excited",
                "whispering",
                "terrified",
                "unfriendly"
            };
            return await Task.FromResult(voiceStyles);
        }

        private const string OcpApiSubscriptionKeyHeaderName = "Ocp-Apim-Subscription-Key";
        private const string TextToSpeechIssueTokenEndpoint = "TextToSpeechIssueTokenEndpoint";
        private const string TextToSpeechSpeechEndpoint = "TextToSpeechSpeechEndpoint";        
        private const string TextToSpeechSpeechContentType = "TextToSpeechSpeechContentType";
        private const string TextToSpeechSpeechUserAgent = "TextToSpeechSpeechUserAgent";
        private const string TextToSpeechSpeechXMicrosoftOutputFormat = "TextToSpeechSpeechXMicrosoftOutputFormat";

        private readonly IConfiguration _configuration;

        private DateTime? _lastTimeTokenFetched = null;
        private SecureString _token = null;

    }
}

The REST call to generate the voice file is using following set up: TTS endpoint url: https://norwayeast.tts.speech.microsoft.com/cognitiveservices/v1 The transcript (text to translate into speech) is the following in my test as a SSML-XML document:



<speak version="1.0" xml:lang="en-US" xmlns:mstts="https://www.w3.org/2001/mstts">
  <voice xml:gender="Male" name="Microsoft Server Speech Text to Speech Voice (en-US, JaneNeural)">
    <mstts:express-as style="angry">I listen to Eurovision and cheer for Norway</mstts:express-as>
  </voice>
</speak>

The SSML also contains an extension called mstts extension language that adds features to SSML such as the express-as set to a voice style or emotion of "angry". Not all emotions or voice styles are supported by every voice actor in Azure Cognitive Services. But this is a list of the voice styles that could be supported, it varies which voice actor you choose (and inherently which language).

"normal-neutral"
"advertisement_upbeat"
"affectionate"
"angry"
"assistant"
"calm"
"chat"
"cheerful"
"customerservice"
"depressed"
"disgruntled"
"documentary-narration"
"embarrassed"
"empathetic"
"envious"
"excited"
"fearful"
"friendly"
"gentle"
"hopeful"
"lyrical"
"narration-professional"
"narration-relaxed"
"newscast"
"newscast-casual"
"newscast-formal"
"poetry-reading"
"sad"
"serious"
"shouting"
"sports_commentary"
"sports_commentary_excited"
"whispering"
"terrified"
"unfriendly

Microsoft has come a long way from the early work with SAPI - Microsoft Speech API with Microsoft SAM around 2000. The realism of synthetic voices more than 20 years ago were rather crude and robotic. Nowaydays, voice actors provided by Azure Cloud computing platform as shown here are neural net trained and very realistic based upon training from real voice actors and now more and more voice actor voices support emotions or voice styles. The usages of this can be diverse. Making use of text synthesis can serve in automated answering services and apps in diverse fields such as healthcare and public services or education and more. Making this demo has been fun for me and it can be used to learn languages and with the voice functionality you can train on not only the translation but also pronounciation.

Coding Grounds