Coding Grounds: AI

Showing posts with label AI. Show all posts

Saturday, 22 March 2025

Image classification using ML.NET Machine Learning

I added a demo using ML.Net in a Github. The demo is available in this repository :

https://github.com/toreaurstadboss/ImageClassificationMLNetBlazorDemo

A screenshot shows the application running below :

ML.Net is Microsoft's machine learning library. It is combined with tooling inside VS 2022 an easy way to locally use machine learning models on your CPU or GPU, or hosted in Azure cloud services. The website for ML.Net is available here for more information about ML.Net and documentation:

https://dotnet.microsoft.com/en-us/apps/ai/ml-dotnet

In the demo above I have trained the model to recognize either horses or mooses. These species are both mammals and herbivores and somewhat are similar in appearance. I have trained the machine learning model in this demo only with ten images of each category, then again with ten other test images that checks if the model recognizes correctly if we see a horse or a moose. Already with just ten images, it did not miss once, and of course a better example for a real world machine learning model would have scoured over tens of thousand of images to handle all edge cases. ML.Net is very easy to run, it can be run locally on your own machine, using the CPU or GPU. The GPU must be CUDA compatible. That actually means you need a NVIDIA card with 8-series. I got such a card on a laptop of mine and have tested it. The following links points to download pages of NVIDIA for downloading the necessary software as of March 2025 to run ML.Net image classification functionality on GPUs :

Download Cuda 10.1
Cuda 10.1 can be downloaded from here:

https://developer.nvidia.com/cuda-10.1-download-archive-base

CuDnn 7.6.4
CuDnn can be downloaded from here:

https://developer.nvidia.com/rdp/cudnn-archive

Getting started with image classification using ML.Net

It is easiest to use VS 2022 to add a ML.Net machine learning model. Inside VS 2022, right click your project and choose Add and choose Machine Learning Model In case you do not see this option, hit the start menu and type in Visual Studio installer Now, hit the button Modify for your VS installation. Choose Individual Components Search for 'ml'. Select the ML.NET Model Builder. There are also a package called ML.NET Model Builder 2022, I also chose that.

Choosing the scenario

Now, after adding the Machine Learning model, the first page asks for a scenario. I choose Image Classification here, below Computer Vision scenario category.

Choosing the environment

Then I hit the button Local. In the next step, I select Local (CPU). Note that I have tested also Nvidia Cuda-compatible graphics card / GPU on another laptop and it also worked great and should be preferred if you have a GPU compatible and have installed Cuda 10.1 and Cdnn 7.6.4 as shown in links above.

Hit the button Next Step.

Choosing the Data

It is time to train the machine learning model with data ! I have gathered ten sample images of mooses and horses each. By pointing to a folder with images where each category of images are gathered in subfolders of this folder.

Next step is Train

Training the model

Here you can hit the button Train again. When you have trained enough here the model, you can hit the button Next step . Training the machine learning will take some time depending on you using CPU or GPU and the number of input images here. Usually it takes a few seconds, but not many minutes to churn through a couple of images as shown here, 20 images in total.

Loading up the image data and using the machine learning model

Note that ML.Net demands support to renderinteractive rendering of web apps, pure Blazor WASM apps are not supported. The following file shows how the Blazor serverside app is set up.

Program.cs


using ImageClassificationMLNetBlazorDemo.Components;

var builder = WebApplication.CreateBuilder(args);

// Add services to the container.
builder.Services.AddRazorComponents()
    .AddInteractiveServerComponents();

var app = builder.Build();

// Configure the HTTP request pipeline.
if (!app.Environment.IsDevelopment())
{
    app.UseExceptionHandler("/Error", createScopeForErrors: true);
    // The default HSTS value is 30 days. You may want to change this for production scenarios, see https://aka.ms/aspnetcore-hsts.
    app.UseHsts();
}

app.UseHttpsRedirection();

app.UseStaticFiles();
app.UseAntiforgery();

app.MapRazorComponents<App>()
    .AddInteractiveServerRenderMode();

app.Run();

InteractiveServer is set up inside the App.razor using the HeadOutlet.

App.razor



<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <base href="/" />
    <link rel="stylesheet" href="app.css" />
    <link rel="stylesheet" href="lib/bootstrap/css/bootstrap.min.css" />
    <link rel="stylesheet" href="ImageClassificationMLNetBlazorDemo.styles.css" />
    <link rel="icon" type="image/png" href="favicon.png" />

    <HeadOutlet @rendermode="InteractiveServer" />
</head>

<body>
    <Routes @rendermode="InteractiveServer" />
    <script src="_framework/blazor.web.js"></script>
</body>

</html>

The following codebehind of the razor component Home.razor in the demo repo shows how a file uploaded using the InputFile control in Blazor serverside. Home.razor.cs



@code {

    private string? _base64ImageSource = null;
    private string? _predictedLabel = "No classification";
    private IOrderedEnumerable<KeyValuePair<string, float>>? _predictedLabels = null;
    private int? _assessedPredictionQuality = null;
    private string? _errorMessage = null;

    private async Task LoadFileAsync(InputFileChangeEventArgs e)
    {
        try
        {
            ResetPrivateFields();

            if (e.File.Size <= 0 || e.File.Size >= 2 * 1024 * 1024)
            {
                _errorMessage = "Sorry, the uploaded image but be between 1 byte and 2 MB!";
                return;
            }

            byte[] imageBytes = await GetImageBytes(e.File);
            _base64ImageSource = GetBase64ImageSourceString(e.File.ContentType, imageBytes);

            PredictImageClassification(imageBytes);

        }
        catch (Exception err)
        {
            Console.WriteLine(err);
        }
    }

    private void ResetPrivateFields()
    {
        _base64ImageSource = null;
        _predictedLabel = null;
        _predictedLabels = null;
        _assessedPredictionQuality = null;
    }

    private int GetAssesPrediction()
    {
        int result = 1;
        if (_predictedLabel != null && _predictedLabels != null)
        {
            foreach (var label in _predictedLabels)
            {
                if (label.Key == _predictedLabel)
                {
                    result = label.Value switch
                    {
                        <= 0.50f => 1,
                        <= 0.70f => 2,
                        <= 0.80f => 3,
                        <= 0.85f => 4,
                        <= 0.90f => 5,
                        <= 1.0f => 6,
                        _ => 1 //default to dice we get some other score here..
                    };
                }
            }
        }

        return result;
    }

    private void PredictImageClassification(byte[] imageBytes)
    {

        var input = new ModelInput
            {
                ImageSource = imageBytes
            };
        ModelOutput output = HorseOrMooseImageClassifier.Predict(input);
        _predictedLabel = output.PredictedLabel;

        _predictedLabels = HorseOrMooseImageClassifier.PredictAllLabels(input);

        _assessedPredictionQuality = GetAssesPrediction(); //check how good the prediction is, give a score from 1-6 (dice score!)

        StateHasChanged();
    }


    private async Task<byte[]> GetImageBytes(IBrowserFile file) 
    {
        using MemoryStream memoryStream = new();
        var stream = file.OpenReadStream(2 * 1024 * 1024, CancellationToken.None);
        await stream.CopyToAsync(memoryStream);
        return memoryStream.ToArray();
    }

    private string GetBase64ImageSourceString(string contentType, byte[] bytes)
    {
        string preAmble = $"data:{contentType};base64,";
        return $"{preAmble}{(Convert.ToBase64String(bytes))}";
    }
}

As the code shows above, using the machine learning model is quite convenient, we just use the methods Predict to get the Label that is decided exists in the loaded image. This is the image classiciation that the machine learning found. Note that using the method PredictAllLabels get the confidence of the different labels show in this demo. There are no limitations on the number of categories here in the image classification labels that one could train a model to look after. A benefit with ML.Net is the option to use it on-premise servers and get fairly good result on just a few sample images. But the more sample images you obtain for a label, the more precise the machine learning model will become. It is possible to download a pre-trained model such as Inceptionv3 that is compatible with Tensorflow used here that supports up to 1000 categories. More information is available here from Microsoft about using a pre-trained model such as InceptionV3:

https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification

Sunday, 16 February 2025

Outputting tags/objects using Azure AI

This article presents a way to output tags for an image and output it to the console. Azure AI is used, more specifically the ImageAnalysisClient. The article shows how you can define a way to consume the data for an IAsyncEnumerable, so you can use await foreach to consume the data. I would recommend this approach for many services in Azure Ai (and similar) where there is no support out of the box for async enumerable and hide away the deails in a helper extension method as shown in this article.




  public static async void ExtractImageTags()
  {
      string visionApiKey = Environment.GetEnvironmentVariable("VISION_KEY")!;
      string visionApiEndpoint = Environment.GetEnvironmentVariable("VISION_ENDPOINT")!;

      var credentials = new AzureKeyCredential(visionApiKey);
      var serviceUri = new Uri(visionApiEndpoint);

      var imageAnalysisClient = new ImageAnalysisClient(serviceUri, credentials);
      await foreach (var tag in imageAnalysisClient.ExtractImageTagsAsync("Images/Store.png"))
      {
          Console.WriteLine(tag);
      }           
  }

The code creates an ImageAnalysisClient, defined in the Azure.AI.Vision.ImageAnalysis Nuget package. I got two environment variables here to store the key and endpoint. Note that not all Azure Ai features are available in all regions. If you just want to test out some Azure Ai features, you can first off just test it out at US East region, as that region will have most likely all features you want to test, then you can just a more local region if you are planning to do more workloads using Azure Ai.

Then we use an await foreach pattern here to extract the image tags asynchronously. This is a custom extension method I created so I can output the tags asynchronously using await foreach and also specify a wait time between each new tag being outputted, defaulting to 200 milliseconds here.

The extension method looks like this:



using Azure.AI.Vision.ImageAnalysis;

namespace UseAzureAIServicesFromNET.Vision;

public static class ImageAnalysisClientExtensions
{

    /// <summary>
    /// Extracts the tags for image at specified path, if existing.
    /// The results are returned as async enumerable of strings. 
    /// </summary>
    /// <param name="client"></param>
    /// <param name="imagePath"></param>
    /// <param name="waitTimeInMsBetweenOutputTags">Default wait time in ms between output. Defaults to 200 ms.</param>
    /// <returns></returns>
    public static async IAsyncEnumerable<string?> ExtractImageTagsAsync(this ImageAnalysisClient client, 
    	string imagePath, int waitTimeInMsBetweenOutputTags = 200)
    {
        if (!File.Exists(imagePath))
        {
            yield return default(string); //just return null if a file is not found at provided path
        }
        using FileStream imageStream = new FileStream(imagePath, FileMode.Open);
        var analysisResult = 
        	await client.AnalyzeAsync(BinaryData.FromStream(imageStream), VisualFeatures.Tags | VisualFeatures.Caption);
        yield return $"Description: {analysisResult.Value.Caption.Text}";
        foreach (var tag in analysisResult.Value.Tags.Values)
        {
            yield return $"Tag: {tag.Name}, Confidence: {tag.Confidence:F2}";        
            await Task.Delay(waitTimeInMsBetweenOutputTags);
        }
    }

}

The console output of the tags looks like this:

In addition to tags, we can also output objects in the image in a very similar extension method:



/// <summary>
/// Extracts the objects for image at specified path, if existing.
/// The results are returned as async enumerable of strings. 
/// </summary>
/// <param name="client"></param>
/// <param name="imagePath"></param>
/// <param name="waitTimeInMsBetweenOutputTags">Default wait time in ms between output. Defaults to 200 ms.</param>
/// <returns></returns>
public static async IAsyncEnumerable<string?> ExtractImageObjectsAsync(this ImageAnalysisClient client,
string imagePath, int waitTimeInMsBetweenOutputTags = 200)
{
    if (!File.Exists(imagePath))
    {
        yield return default(string); //just return null if a file is not found at provided path
    }
    using FileStream imageStream = new FileStream(imagePath, FileMode.Open);
    var analysisResult =
    	await client.AnalyzeAsync(BinaryData.FromStream(imageStream), VisualFeatures.Objects | VisualFeatures.Caption);
    yield return $"Description: {analysisResult.Value.Caption.Text}";
    foreach (var objectInImage in analysisResult.Value.Objects.Values)
    {
            yield return $"""
Object tag: {objectInImage.Tags.FirstOrDefault()?.Name} Confidence: {objectInImage.Tags.FirstOrDefault()?.Confidence}, 
Position (bbox): {objectInImage.BoundingBox}
""";
        await Task.Delay(waitTimeInMsBetweenOutputTags);
    }
}

The code is nearly identical, we set the VisualFeatures of the image to extract and we read out the objects (not the tags). The console output of the objects looks like this:

Sunday, 8 December 2024

Extending Azure AI Search with data sources

This article will present both code and tips around getting Azure AI Search to utilize additional data sources. The article builds upon the previous article in the blog:

https://toreaurstad.blogspot.com/2024/12/azure-ai-openai-chat-gpt-4-client.html

This code will use Open AI Chat GPT-4 together with additional data source. I have tested this using Storage account in Azure which contains blobs with documents. First off, create Azure AI services if you do not have this yet.

Then create an Azure AI Search

Choose the location and the Pricing Tier. You can choose the Free (F) pricing tier to test out the Azure AI Search. The standard pricing tier comes in at about 250 USD per month, so a word of caution here as billing might incur if you do not choose the Free tier. Head over to the Azure AI Search service after it is crated and note inside the Overview the Url. Expand the Search management and choose the folowing menu options and fill out them in this order:

Data sources
Indexes
Indexers

There are several types of data sources you can add.

Azure Blog Storage
Azure Data Lake Storage Gen2
Azure Cosmos DB
Azure SQL Database
Azure Table Storage
Fabric OneLake files

Upload files to the blob container

I have tested out adding a data source using Azure Blob Storage. I had to create a new storage account and I believe Azure might have changed it over the years, so for best compability, add a brand new storage account. Then choose a blob container inside the Blob storage, then hit the Create button.
Head over to your Storage browser inside your storage account, then choose Blob container. You can add a Blob container and then after it is created, click the Upload button.
You can then upload multiple files into the blob container (it is like a folder, which saves your files as blobs).

Setting up the index

After the Blob storage (storage account) is added to the data source, choose the Indexes menu button inside Azure AI search. Click Add index.
After the index is added, choose the button Add field
Add a field name called : Edit.String of type Edm.String.
Click the checkbox for Retrievable and Searchable. Click the button Save

Setting up the indexer

Choose to add an Indexer via button Add indexer
Choose the Index you added
Choose the Data source you added
Select the indexed extensions and specify which file types to index. Probably you should select text based files here, such as .md and .markdown files and even some binary file type such as .pdf and .docx can be selected here
Data to extract: Choose Content and metadata

Source code for this article

The source code can be cloned from this Github repo:
br /> https://github.com/toreaurstadboss/OpenAIDemo.git

The code for this article is available in the branch:
feature/openai-search-documentsources To add the data source to our ChatClient instance, we do the following. Please note that this method will be changed in the Azure AI SDK in the future :



            ChatCompletionOptions? chatCompletionOptions = null;
            if (dataSources?.Any() == true)
            {
                chatCompletionOptions = new ChatCompletionOptions();

                foreach (var dataSource in dataSources!)
                {
#pragma warning disable AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                    chatCompletionOptions.AddDataSource(new AzureSearchChatDataSource()
                    {
                        Endpoint = new Uri(dataSource.endpoint),
                        IndexName = dataSource.indexname,
                        Authentication = DataSourceAuthentication.FromApiKey(dataSource.authentication)
                    });
#pragma warning restore AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                }

            }

The updated version of the extension class of OpenAI.Chat.ChatClient then looks like this: ChatClientExtensions.cs



using Azure.AI.OpenAI.Chat;
using OpenAI.Chat;
using System.ClientModel;
using System.Text;

namespace ToreAurstadIT.OpenAIDemo
{
    public static class ChatclientExtensions
    {

        /// <summary>
        /// Provides a stream result from the Chatclient service using AzureAI services.
        /// </summary>
        /// <param name="chatClient">ChatClient instance</param>
        /// <param name="message">The message to send and communicate to the ai-model</param>
        /// <returns>Streamed chat reply / result. Consume using 'await foreach'</returns>
        public static AsyncCollectionResult<StreamingChatCompletionUpdate> GetStreamedReplyAsync(this ChatClient chatClient, string message,
            (string endpoint, string indexname, string authentication)[]? dataSources = null)
        {
            ChatCompletionOptions? chatCompletionOptions = null;
            if (dataSources?.Any() == true)
            {
                chatCompletionOptions = new ChatCompletionOptions();

                foreach (var dataSource in dataSources!)
                {
#pragma warning disable AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                    chatCompletionOptions.AddDataSource(new AzureSearchChatDataSource()
                    {
                        Endpoint = new Uri(dataSource.endpoint),
                        IndexName = dataSource.indexname,
                        Authentication = DataSourceAuthentication.FromApiKey(dataSource.authentication)
                    });
#pragma warning restore AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                }

            }

            return chatClient.CompleteChatStreamingAsync(
                [new SystemChatMessage("You are an helpful, wonderful AI assistant"), new UserChatMessage(message)], chatCompletionOptions);
        }

        public static async Task<string> GetStreamedReplyStringAsync(this ChatClient chatClient, string message, (string endpoint, string indexname, string authentication)[]? dataSources = null, bool outputToConsole = false)
        {
            var sb = new StringBuilder();
            await foreach (var update in GetStreamedReplyAsync(chatClient, message, dataSources))
            {
                foreach (var textReply in update.ContentUpdate.Select(cu => cu.Text))
                {
                    sb.Append(textReply);
                    if (outputToConsole)
                    {
                        Console.Write(textReply);
                    }
                }
            }
            return sb.ToString();
        }

    }
}

The updated code for the demo app then looks like this, I chose to just use tuples here for the endpoint, index name and api key:

ChatpGptDemo.cs



using OpenAI.Chat;
using OpenAIDemo;
using System.Diagnostics;

namespace ToreAurstadIT.OpenAIDemo
{
    public class ChatGptDemo
    {

        public async Task<string?> RunChatGptQuery(ChatClient? chatClient, string msg)
        {
            if (chatClient == null)
            {
                Console.WriteLine("Sorry, the demo failed. The chatClient did not initialize propertly.");
                return null;
            }

            Console.WriteLine("Searching ... Please wait..");

            var stopWatch = Stopwatch.StartNew();

            var chatDataSources = new[]{
                (
                    SearchEndPoint: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_ENDPOINT", EnvironmentVariableTarget.User) ?? "N/A",
                    SearchIndexName: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_INDEXNAME", EnvironmentVariableTarget.User) ?? "N/A",
                    SearchApiKey: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_APIKEY", EnvironmentVariableTarget.User) ?? "N/A"
                )
            };

            string reply = "";

            try
            {

                reply = await chatClient.GetStreamedReplyStringAsync(msg, dataSources: chatDataSources, outputToConsole: true);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.WriteLine($"The operation took: {stopWatch.ElapsedMilliseconds} ms");


            Console.WriteLine();

            return reply;
        }

    }
}

The code here expects that three user-specific environment variables exists. Please note that the API key can be found under the menu item Keys in Azure AI Search. There are two admin keys and multiple query keys. To distribute keys to other users, you of course share the API query key, not the admin key(s). The screenshot below shows the demo. It is a console application, it could be web application or other client :

Please note that the Free tier of Azure AI Search is rather slow and seems to only allow queryes at a certain interval, it will suffice to just test it out. To really test it out in for example an Intranet scenario, the standard tier Azure AI search service is recommended, at about 250 USD per month as noted.

Conclusions

Getting an Azure AI Chat service to work in intranet scenarios using a combination of Open AI Chat GPT-4 together with a custom collection of files that are indexed offers a nice combination of building up a knowledge base which you can query against. It is rather convenient way of building an on-premise solution for intranet AI chat service using Azure cloud services.

Thursday, 9 May 2024

Azure Cognitive Synthesized Text To Speech with voice styles

Using Azure Cognitive Services, it is possible to translate text into other languages and also synthesize the text to speech. It is also possible to add voice effects such as style of the voice. This adds more realism by adding emotions to a synthesized voice. The voice is already trained by neural net training and adding voice style makes the synthesized speech even more realistic and multi-purpose. The Github repo for this is available here as .NET Maui Blazor client written with .NET 8 :

MultiLingual translator DEMO Github repo

Not all the voices supported in Azure Cognitive Services do support voice effects. An overview of which voices are shown here:

https://learn.microsoft.com/nb-no/azure/ai-services/speech-service/language-support?tabs=tts#voice-styles-and-roles

More and more synthetic voices in Azure Cognitive Services gets more and more voice styles which express emotions. For now, most of the voices are either english (en-US) or chinese (zh-CN) and a few other languages got some few voices supporting styles. This will most likely be improved into the future where these neural net trained voices are trained in voice styles or some generic voice style algorithm is achieved that can infer emotions on a generic level, although that still sounds a bit sci-fi.

Azure Cognitive Text-To-Speech Voices with support for emotions / voice styles

Voice	Styles	Roles
de-DE-ConradNeural1	cheerful	Not supported
en-GB-SoniaNeural	cheerful, sad	Not supported
en-US-AriaNeural	angry, chat, cheerful, customerservice, empathetic, excited, friendly, hopeful, narration-professional, newscast-casual, newscast-formal, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-DavisNeural	angry, chat, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-GuyNeural	angry, cheerful, excited, friendly, hopeful, newscast, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-JaneNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-JasonNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-JennyNeural	angry, assistant, chat, cheerful, customerservice, excited, friendly, hopeful, newscast, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-NancyNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-SaraNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
en-US-TonyNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	Not supported
es-MX-JorgeNeural	chat, cheerful	Not supported
fr-FR-DeniseNeural	cheerful, sad	Not supported
fr-FR-HenriNeural	cheerful, sad	Not supported
it-IT-IsabellaNeural	chat, cheerful	Not supported
ja-JP-NanamiNeural	chat, cheerful, customerservice	Not supported
pt-BR-FranciscaNeural	calm	Not supported
zh-CN-XiaohanNeural	affectionate, angry, calm, cheerful, disgruntled, embarrassed, fearful, gentle, sad, serious	Not supported
zh-CN-XiaomengNeural	chat	Not supported
zh-CN-XiaomoNeural	affectionate, angry, calm, cheerful, depressed, disgruntled, embarrassed, envious, fearful, gentle, sad, serious	Boy, Girl, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, YoungAdultFemale, YoungAdultMale
zh-CN-XiaoruiNeural	angry, calm, fearful, sad	Not supported
zh-CN-XiaoshuangNeural	chat	Not supported
zh-CN-XiaoxiaoNeural	affectionate, angry, assistant, calm, chat, chat-casual, cheerful, customerservice, disgruntled, fearful, friendly, gentle, lyrical, newscast, poetry-reading, sad, serious, sorry, whisper	Not supported
zh-CN-XiaoyiNeural	affectionate, angry, cheerful, disgruntled, embarrassed, fearful, gentle, sad, serious	Not supported
zh-CN-XiaozhenNeural	angry, cheerful, disgruntled, fearful, sad, serious	Not supported
zh-CN-YunfengNeural	angry, cheerful, depressed, disgruntled, fearful, sad, serious	Not supported
zh-CN-YunhaoNeural2	advertisement-upbeat	Not supported
zh-CN-YunjianNeural3,4	angry, cheerful, depressed, disgruntled, documentary-narration, narration-relaxed, sad, serious, sports-commentary, sports-commentary-excited	Not supported
zh-CN-YunxiaNeural	angry, calm, cheerful, fearful, sad	Not supported
zh-CN-YunxiNeural	angry, assistant, chat, cheerful, depressed, disgruntled, embarrassed, fearful, narration-relaxed, newscast, sad, serious	Boy, Narrator, YoungAdultMale
zh-CN-YunyangNeural	customerservice, narration-professional, newscast-casual	Not supported
zh-CN-YunyeNeural	angry, calm, cheerful, disgruntled, embarrassed, fearful, sad, serious	Boy, Girl, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, YoungAdultFemale, YoungAdultMale
zh-CN-YunzeNeural	angry, calm, cheerful, depressed, disgruntled, documentary-narration, fearful, sad, serious	OlderAdultMale, SeniorMale

Screenshot from the DEMO showing its user interface. You enter the text to translate at the top and the language of the text is detected using Azure Cognitive Services text detection functionality. And you can then select which language to translate the text into. It will call a REST call to Azure Cognitive Services to translate the text. And it is also possible to hear the speech of the text. Now, it is also added to add voice style. Use the table shown above to select a voice actor that supports a voice style you want to test. As noted, voice styles are still limited to a few languages and voice actors supporting emotions or voice styles. You will hear the voice from the voice actor in a normal mood or voice style if additional emotions or voice styles are not supported.

Let's look at some code for this DEMO too. You can study the Github repo and clone it to test it out yourself. The TextToSpeechUtil class handles much of the logic of creating voice from text input and also create the SSML-XML contents and performt the REST api call to create the voice file. Note that SSML mentioned here, is the Speech Synthesis Markup Language (SSML). The SSML standard is documented here on MSDN, it is a standard adopted by others too including Google.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup




using Microsoft.Extensions.Configuration;
using MultiLingual.Translator.Lib.Models;
using System;
using System.Security;
using System.Text;
using System.Xml.Linq;
using static System.Runtime.InteropServices.JavaScript.JSType;

namespace MultiLingual.Translator.Lib
{
    public class TextToSpeechUtil : ITextToSpeechUtil
    {

        public TextToSpeechUtil(IConfiguration configuration)
        {
            _configuration = configuration;
        }

        public async Task<TextToSpeechResult> GetSpeechFromText(string text, string language, TextToSpeechLanguage[] actorVoices, 
            string? preferredVoiceActorId, string? preferredVoiceStyle)
        {
            var result = new TextToSpeechResult();

            result.Transcript = GetSpeechTextXml(text, language, actorVoices, preferredVoiceActorId, preferredVoiceStyle, result);
            result.ContentType = _configuration[TextToSpeechSpeechContentType];
            result.OutputFormat = _configuration[TextToSpeechSpeechXMicrosoftOutputFormat];
            result.UserAgent = _configuration[TextToSpeechSpeechUserAgent];
            result.AvailableVoiceActorIds = ResolveAvailableActorVoiceIds(language, actorVoices);
            result.LanguageCode = language;

            string? token = await GetUpdatedToken();

            HttpClient httpClient = GetTextToSpeechWebClient(token);

            string ttsEndpointUrl = _configuration[TextToSpeechSpeechEndpoint];
            var response = await httpClient.PostAsync(ttsEndpointUrl, new StringContent(result.Transcript, Encoding.UTF8, result.ContentType));

            using (var memStream = new MemoryStream()) {
                var responseStream = await response.Content.ReadAsStreamAsync();
                responseStream.CopyTo(memStream);
                result.VoiceData = memStream.ToArray();
            }

            return result;
        }

        private async Task<string?> GetUpdatedToken()
        {
            string? token = _token?.ToNormalString();
            if (_lastTimeTokenFetched == null || DateTime.Now.Subtract(_lastTimeTokenFetched.Value).Minutes > 8)
            {
                token = await GetIssuedToken();
            }

            return token;
        }

        private HttpClient GetTextToSpeechWebClient(string? token)
        {
            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", token);
            httpClient.DefaultRequestHeaders.Add("X-Microsoft-OutputFormat", _configuration[TextToSpeechSpeechXMicrosoftOutputFormat]);
            httpClient.DefaultRequestHeaders.Add("User-Agent", _configuration[TextToSpeechSpeechUserAgent]);
            return httpClient;
        }
       
        public string GetSpeechTextXml(string text, string language, TextToSpeechLanguage[] actorVoices, string? preferredVoiceActorId,
              string? preferredVoiceStyle, TextToSpeechResult result)
        {
            result.VoiceActorId = ResolveVoiceActorId(language, preferredVoiceActorId, actorVoices);
            string speechXml = $@"
            <speak version='1.0' xml:lang='en-US' xmlns:mstts='https://www.w3.org/2001/mstts'>
                <voice xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice {result.VoiceActorId}'>
                    <prosody rate='1'>{text}</prosody>
                </voice>
            </speak>";

            speechXml = AddVoiceStyleEffectIfDesired(preferredVoiceStyle, speechXml);

            return speechXml;
        }

        /// <summary>
        /// Adds voice style / expression to the SSML markup for the voice
        /// </summary>
        private static string AddVoiceStyleEffectIfDesired(string? preferredVoiceStyle, string speechXml)
        {
            if (!string.IsNullOrWhiteSpace(preferredVoiceStyle) && preferredVoiceStyle != "normal-neutral")
            {
                var voiceDoc = XDocument.Parse(speechXml); //https://learn.microsoft.com/nb-no/azure/ai-services/speech-service/speech-synthesis-markup-voice#use-speaking-styles-and-roles

                XElement? prosody = voiceDoc.Descendants("prosody").FirstOrDefault();
                if (prosody?.Value != null)
                {
                    // Create the <mstts:express-as> element, for now skip the ':' letter and replace at the end

                    var expressedAsWrappedElement = new XElement("msttsexpress-as",
                        new XAttribute("style", preferredVoiceStyle));
                    expressedAsWrappedElement.Value = prosody!.Value;
                    prosody?.ReplaceWith(expressedAsWrappedElement);
                    speechXml = voiceDoc.ToString().Replace(@"msttsexpress-as", "mstts:express-as");
                }
            }

            return speechXml;
        }

        private List<string> ResolveAvailableActorVoiceIds(string language, TextToSpeechLanguage[] actorVoices)
        {
            if (actorVoices?.Any() == true)
            {
                var voiceActorIds = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
                return voiceActorIds;
            }
            return new List<string>();
        }

        private string ResolveVoiceActorId(string language, string? preferredVoiceActorId, TextToSpeechLanguage[] actorVoices)
        {
            string actorVoiceId = "(en-AU, NatashaNeural)"; //default to a select voice actor id 
            if (actorVoices?.Any() == true)
            {
                var voiceActorsForLanguage = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
                if (voiceActorsForLanguage != null)
                {
                    if (voiceActorsForLanguage.Any() == true)
                    {
                        var resolvedPreferredVoiceActorId = voiceActorsForLanguage.FirstOrDefault(v => v == preferredVoiceActorId);
                        if (!string.IsNullOrWhiteSpace(resolvedPreferredVoiceActorId))
                        {
                            return resolvedPreferredVoiceActorId!;
                        }
                        actorVoiceId = voiceActorsForLanguage.First();
                    }
                }
            }
            return actorVoiceId;
        }

        private async Task<string> GetIssuedToken()
        {
            var httpClient = new HttpClient();
            string? textToSpeechSubscriptionKey = Environment.GetEnvironmentVariable("AZURE_TEXT_SPEECH_SUBSCRIPTION_KEY", EnvironmentVariableTarget.Machine);
            httpClient.DefaultRequestHeaders.Add(OcpApiSubscriptionKeyHeaderName, textToSpeechSubscriptionKey);
            string tokenEndpointUrl = _configuration[TextToSpeechIssueTokenEndpoint];
            var response = await httpClient.PostAsync(tokenEndpointUrl, new StringContent("{}"));
            _token = (await response.Content.ReadAsStringAsync()).ToSecureString();
            _lastTimeTokenFetched = DateTime.Now;
            return _token.ToNormalString();
        }

        public async Task<List<string>> GetVoiceStyles()
        {
            var voiceStyles = new List<string>
            {
                "normal-neutral",
                "advertisement_upbeat",
                "affectionate",
                "angry",
                "assistant",
                "calm",
                "chat",
                "cheerful",
                "customerservice",
                "depressed",
                "disgruntled",
                "documentary-narration",
                "embarrassed",
                "empathetic",
                "envious",
                "excited",
                "fearful",
                "friendly",
                "gentle",
                "hopeful",
                "lyrical",
                "narration-professional",
                "narration-relaxed",
                "newscast",
                "newscast-casual",
                "newscast-formal",
                "poetry-reading",
                "sad",
                "serious",
                "shouting",
                "sports_commentary",
                "sports_commentary_excited",
                "whispering",
                "terrified",
                "unfriendly"
            };
            return await Task.FromResult(voiceStyles);
        }

        private const string OcpApiSubscriptionKeyHeaderName = "Ocp-Apim-Subscription-Key";
        private const string TextToSpeechIssueTokenEndpoint = "TextToSpeechIssueTokenEndpoint";
        private const string TextToSpeechSpeechEndpoint = "TextToSpeechSpeechEndpoint";        
        private const string TextToSpeechSpeechContentType = "TextToSpeechSpeechContentType";
        private const string TextToSpeechSpeechUserAgent = "TextToSpeechSpeechUserAgent";
        private const string TextToSpeechSpeechXMicrosoftOutputFormat = "TextToSpeechSpeechXMicrosoftOutputFormat";

        private readonly IConfiguration _configuration;

        private DateTime? _lastTimeTokenFetched = null;
        private SecureString _token = null;

    }
}

The REST call to generate the voice file is using following set up: TTS endpoint url: https://norwayeast.tts.speech.microsoft.com/cognitiveservices/v1 The transcript (text to translate into speech) is the following in my test as a SSML-XML document:



<speak version="1.0" xml:lang="en-US" xmlns:mstts="https://www.w3.org/2001/mstts">
  <voice xml:gender="Male" name="Microsoft Server Speech Text to Speech Voice (en-US, JaneNeural)">
    <mstts:express-as style="angry">I listen to Eurovision and cheer for Norway</mstts:express-as>
  </voice>
</speak>

The SSML also contains an extension called mstts extension language that adds features to SSML such as the express-as set to a voice style or emotion of "angry". Not all emotions or voice styles are supported by every voice actor in Azure Cognitive Services. But this is a list of the voice styles that could be supported, it varies which voice actor you choose (and inherently which language).

"normal-neutral"
"advertisement_upbeat"
"affectionate"
"angry"
"assistant"
"calm"
"chat"
"cheerful"
"customerservice"
"depressed"
"disgruntled"
"documentary-narration"
"embarrassed"
"empathetic"
"envious"
"excited"
"fearful"
"friendly"
"gentle"
"hopeful"
"lyrical"
"narration-professional"
"narration-relaxed"
"newscast"
"newscast-casual"
"newscast-formal"
"poetry-reading"
"sad"
"serious"
"shouting"
"sports_commentary"
"sports_commentary_excited"
"whispering"
"terrified"
"unfriendly

Microsoft has come a long way from the early work with SAPI - Microsoft Speech API with Microsoft SAM around 2000. The realism of synthetic voices more than 20 years ago were rather crude and robotic. Nowaydays, voice actors provided by Azure Cloud computing platform as shown here are neural net trained and very realistic based upon training from real voice actors and now more and more voice actor voices support emotions or voice styles. The usages of this can be diverse. Making use of text synthesis can serve in automated answering services and apps in diverse fields such as healthcare and public services or education and more. Making this demo has been fun for me and it can be used to learn languages and with the voice functionality you can train on not only the translation but also pronounciation.

Monday, 22 April 2024

Pii - Detecting Personally Identifiable Information using Azure Cognitive Services

This article will look at detecting Person Identifiable Information (Pii) using Azure Cognitive Services. I have created a demo using .NET Maui Blazor has been created and the Github repo is here:
https://github.com/toreaurstadboss/PiiDetectionDemo

Person Identifiable Information (Pii) is desired to detect and also redact, that is using censorship or obscuring Pii to prepare documents for publication. The Pii feature in Azure Cognitive Services is a part of the Language resource service. A quickstart for using Pii is available here:
https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/quickstart?pivots=programming-language-csharp

After creating the Language resource, look up the keys and endpoints for you service. Using Azure CLI inside Cloud shell, you can enter this command to find the keys, in Azure many services has got two keys you can exchange with new keys through regeneration:


  az cognitiveservices account keys list --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices

This is how you can query after endpoint of language resource using Azure CLI :


  az cognitiveservices account show --query "properties.endpoint" --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices

Next, the demo of this article. Connecting to the Pii Removal Text Analytics is possible using this Nuget package (REST calls can also be done manually): - Azure.AI.TextAnalytics version 5.3.0 Here is the other Nugets of my Demo included from the .csproj file :

PiiDetectionDemo.csproj



  <ItemGroup>
        <PackageReference Include="Azure.AI.TextAnalytics" Version="5.3.0" />
        <PackageReference Include="Microsoft.Maui.Controls" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.Maui.Controls.Compatibility" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.AspNetCore.Components.WebView.Maui" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.Extensions.Logging.Debug" Version="8.0.0" />
    </ItemGroup>

A service using this Pii removal feature is simply making use of a TextAnalyticsClient and method RecognizePiiEntitiesAsync.

PiiRemovalTextClientService.cs IPiiRemovalTextClientService.cs




using Azure;
using Azure.AI.TextAnalytics;

namespace PiiDetectionDemo.Util
{
    public interface IPiiRemovalTextAnalyticsClientService
    {
        Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language);
    }
}


namespace PiiDetectionDemo.Util
{
    public class PiiRemovalTextAnalyticsClientService : IPiiRemovalTextAnalyticsClientService
    {

        private TextAnalyticsClient _client;

        public PiiRemovalTextAnalyticsClientService()
        {
            var azureEndpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT");
            var azureKey = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY");

            if (string.IsNullOrWhiteSpace(azureEndpoint))
            {
                throw new ArgumentNullException(nameof(azureEndpoint), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_ENDPOINT");
            }
            if (string.IsNullOrWhiteSpace(azureKey))
            {
                throw new ArgumentNullException(nameof(azureKey), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_KEY");
            }

            _client = new TextAnalyticsClient(new Uri(azureEndpoint), new AzureKeyCredential(azureKey));
        }

        public async Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language)
        {
            var piiEntities = await _client.RecognizePiiEntitiesAsync(document, language);
            return piiEntities;
        }

    }
}

The UI codebehind of the razor component page showing the UI looks like this:

Home.razor.cs



using Azure;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Models;
using PiiDetectionDemo.Util;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace PiiDetectionDemo.Components.Pages
{
    public partial class Home
    {

        private IndexModel Model = new();
        private bool isProcessing = false;
        private bool isSearchPerformed = false;

        private async Task Submit()
        {
            isSearchPerformed = false;
            isProcessing = true;
            try
            {
                var response = await _piiRemovalTextAnalyticsClientService.RecognizePiiEntitiesAsync(Model.InputText, null);
                Model.RedactedText = response?.Value?.RedactedText;
                Model.UpdateHtmlRedactedText();
                Model.AnalysisResult = response?.Value;
                StateHasChanged();
            }
            catch (Exception ex)
            {
                await Console.Out.WriteLineAsync(ex.ToString());
            }
            isProcessing = false;
            isSearchPerformed = true;
        }

        private void removeWhitespace(ChangeEventArgs args)
        {
            Model.InputText = args.Value?.ToString()?.CleanupAllWhiteSpace();
            StateHasChanged();
        }



    }
}

To get the redacted or censored text void of any Pii that the Pii detection feature was able to detect, access the Value of type Azure.AI.TextAnalytics.PiiEntityCollection. Inside this object, the string RedactedText contains the censored / redacted text. The IndexModel looks like this :



using Azure.AI.TextAnalytics;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Util;
using System.ComponentModel.DataAnnotations;
using System.Text;

namespace PiiDetectionDemo.Models
{

    public class IndexModel
    {

        [Required]
        public string? InputText { get; set; }

        public string? RedactedText { get; set; }

        public string? HtmlRedactedText { get; set; }

        public MarkupString HtmlRedactedTextMarkupString { get; set; }

        public void UpdateHtmlRedactedText()
        {
            var sb = new StringBuilder(RedactedText);
            if (AnalysisResult != null && RedactedText != null)
            {
                foreach (var piiEntity in AnalysisResult.OrderByDescending(a => a.Offset))
                {
                    sb.Insert(piiEntity.Offset + piiEntity.Length, "</b></span>");
                    sb.Insert(piiEntity.Offset, $"<span style='background-color:lightgray;border:1px solid black;corner-radius:2px; color:{GetBackgroundColor(piiEntity)}' title='{piiEntity.Category}: {piiEntity.SubCategory} Confidence: {piiEntity.ConfidenceScore} Redacted Text: {piiEntity.Text}'><b>");
                }
            }
            HtmlRedactedText = sb.ToString()?.CleanupAllWhiteSpace();    
            HtmlRedactedTextMarkupString = new MarkupString(HtmlRedactedText ?? string.Empty);
        }

        private string GetBackgroundColor(PiiEntity piiEntity)
        {
            if (piiEntity.Category == PiiEntityCategory.PhoneNumber)
            {
                return "yellow";
            }
            if (piiEntity.Category == PiiEntityCategory.Organization)
            {
                return "orange";
            }
            if (piiEntity.Category == PiiEntityCategory.Address)
            {
                return "green";
            }
            return "gray";                   
        }

        public long ExecutionTime { get; set; }
        public PiiEntityCollection? AnalysisResult { get; set; }

    }
}

Frontend UI looks like this: Home.razor



@page "/"
@using PiiDetectionDemo.Util

@inject IPiiRemovalTextAnalyticsClientService _piiRemovalTextAnalyticsClientService;

<h3>Azure HealthCare Text Analysis - Pii detection feature - Azure Cognitive Services</h3>

<em>Pii = Person identifiable information</em>

<EditForm Model="@Model" OnValidSubmit="@Submit">
    <DataAnnotationsValidator />
    <ValidationSummary />

    <div class="form-group row">
        <label><strong>Text input</strong></label>
        <InputTextArea @oninput="removeWhitespace" class="overflow-scroll" style="max-height:500px;max-width:900px;font-size: 10pt;font-family:Verdana, Geneva, Tahoma, sans-serif" @bind-Value="@Model.InputText" rows="5" />
    </div>

    <div class="form-group row">
        <div class="col">
            <br />
            <button class="btn btn-outline-primary" type="submit">Run</button>
        </div>
        <div class="col">
        </div>
        <div class="col">
        </div>
    </div>

    <br />

    @if (isProcessing)
    {

        <div class="progress" style="max-width: 90%">
            <div class="progress-bar progress-bar-striped progress-bar-animated"
                 style="width: 100%; background-color: green">
                Retrieving result from Azure Text Analysis Pii detection feature. Processing..
            </div>
        </div>
        <br />

    }

    <div class="form-group row">
        <label><strong>Analysis result</strong></label>

        @if (isSearchPerformed)
        {
            <br />
            <b>Execution time took: @Model.ExecutionTime ms (milliseconds)</b>

            <br />
            <br />

            <b>Redacted text (Pii removed)</b>
            <br />

            <div class="form-group row">
               <label><strong>Categorized Pii redacted text</strong></label>
               <div>
               @Model.HtmlRedactedTextMarkupString
               </div>
            </div>

            <br />
            <br />

            <table class="table table-striped table-dark table-hover">
                <thead>
                <th>Pii text</th>
                <th>Category</th>
                <th>SubCategory</th>
                <th>Offset</th>
                <th>Length</th>
                <th>ConfidenceScore</th>
                </thead>
                <tbody>
                    @if (Model.AnalysisResult != null) {
                        @foreach (var entity in Model.AnalysisResult)
                        {
                            <tr>
                                <td>@entity.Text</td>
                                <td>@entity.Category.ToString()</td>
                                <td>@entity.SubCategory</td>
                                <td>@entity.Offset</td>
                                <td>@entity.Length</td>
                                <td>@entity.ConfidenceScore</td>                                        
                            </tr>
                        }
                    }
                </tbody>
            </table>

        }
    </div>

</EditForm>

The Demo uses Bootstrap 5 to build up a HTML table styled and showing the Azure.AI.TextAnalytics.PiiEntity properties.

Coding Grounds