Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Sunday, 8 December 2024

Extending Azure AI Search with data sources

This article will present both code and tips around getting Azure AI Search to utilize additional data sources. The article builds upon the previous article in the blog:

https://toreaurstad.blogspot.com/2024/12/azure-ai-openai-chat-gpt-4-client.html

This code will use Open AI Chat GPT-4 together with additional data source. I have tested this using Storage account in Azure which contains blobs with documents. First off, create Azure AI services if you do not have this yet.



Then create an Azure AI Search



Choose the location and the Pricing Tier. You can choose the Free (F) pricing tier to test out the Azure AI Search. The standard pricing tier comes in at about 250 USD per month, so a word of caution here as billing might incur if you do not choose the Free tier. Head over to the Azure AI Search service after it is crated and note inside the Overview the Url. Expand the Search management and choose the folowing menu options and fill out them in this order:
  • Data sources
  • Indexes
  • Indexers


There are several types of data sources you can add.
  • Azure Blog Storage
  • Azure Data Lake Storage Gen2
  • Azure Cosmos DB
  • Azure SQL Database
  • Azure Table Storage
  • Fabric OneLake files

Upload files to the blob container

  • I have tested out adding a data source using Azure Blob Storage. I had to create a new storage account and I believe Azure might have changed it over the years, so for best compability, add a brand new storage account. Then choose a blob container inside the Blob storage, then hit the Create button.
  • Head over to your Storage browser inside your storage account, then choose Blob container. You can add a Blob container and then after it is created, click the Upload button.
  • You can then upload multiple files into the blob container (it is like a folder, which saves your files as blobs).

Setting up the index

  • After the Blob storage (storage account) is added to the data source, choose the Indexes menu button inside Azure AI search. Click Add index.
  • After the index is added, choose the button Add field
  • Add a field name called : Edit.String of type Edm.String.
  • Click the checkbox for Retrievable and Searchable. Click the button Save

Setting up the indexer

  • Choose to add an Indexer via button Add indexer
  • Choose the Index you added
  • Choose the Data source you added
  • Select the indexed extensions and specify which file types to index. Probably you should select text based files here, such as .md and .markdown files and even some binary file type such as .pdf and .docx can be selected here
  • Data to extract: Choose Content and metadata


Source code for this article

The source code can be cloned from this Github repo:
br /> https://github.com/toreaurstadboss/OpenAIDemo.git

The code for this article is available in the branch:
feature/openai-search-documentsources To add the data source to our ChatClient instance, we do the following. Please note that this method will be changed in the Azure AI SDK in the future :


            ChatCompletionOptions? chatCompletionOptions = null;
            if (dataSources?.Any() == true)
            {
                chatCompletionOptions = new ChatCompletionOptions();

                foreach (var dataSource in dataSources!)
                {
#pragma warning disable AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                    chatCompletionOptions.AddDataSource(new AzureSearchChatDataSource()
                    {
                        Endpoint = new Uri(dataSource.endpoint),
                        IndexName = dataSource.indexname,
                        Authentication = DataSourceAuthentication.FromApiKey(dataSource.authentication)
                    });
#pragma warning restore AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                }

            }
            




The updated version of the extension class of OpenAI.Chat.ChatClient then looks like this: ChatClientExtensions.cs



using Azure.AI.OpenAI.Chat;
using OpenAI.Chat;
using System.ClientModel;
using System.Text;

namespace ToreAurstadIT.OpenAIDemo
{
    public static class ChatclientExtensions
    {

        /// <summary>
        /// Provides a stream result from the Chatclient service using AzureAI services.
        /// </summary>
        /// <param name="chatClient">ChatClient instance</param>
        /// <param name="message">The message to send and communicate to the ai-model</param>
        /// <returns>Streamed chat reply / result. Consume using 'await foreach'</returns>
        public static AsyncCollectionResult<StreamingChatCompletionUpdate> GetStreamedReplyAsync(this ChatClient chatClient, string message,
            (string endpoint, string indexname, string authentication)[]? dataSources = null)
        {
            ChatCompletionOptions? chatCompletionOptions = null;
            if (dataSources?.Any() == true)
            {
                chatCompletionOptions = new ChatCompletionOptions();

                foreach (var dataSource in dataSources!)
                {
#pragma warning disable AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                    chatCompletionOptions.AddDataSource(new AzureSearchChatDataSource()
                    {
                        Endpoint = new Uri(dataSource.endpoint),
                        IndexName = dataSource.indexname,
                        Authentication = DataSourceAuthentication.FromApiKey(dataSource.authentication)
                    });
#pragma warning restore AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                }

            }

            return chatClient.CompleteChatStreamingAsync(
                [new SystemChatMessage("You are an helpful, wonderful AI assistant"), new UserChatMessage(message)], chatCompletionOptions);
        }

        public static async Task<string> GetStreamedReplyStringAsync(this ChatClient chatClient, string message, (string endpoint, string indexname, string authentication)[]? dataSources = null, bool outputToConsole = false)
        {
            var sb = new StringBuilder();
            await foreach (var update in GetStreamedReplyAsync(chatClient, message, dataSources))
            {
                foreach (var textReply in update.ContentUpdate.Select(cu => cu.Text))
                {
                    sb.Append(textReply);
                    if (outputToConsole)
                    {
                        Console.Write(textReply);
                    }
                }
            }
            return sb.ToString();
        }

    }
}





The updated code for the demo app then looks like this, I chose to just use tuples here for the endpoint, index name and api key:

ChatpGptDemo.cs


using OpenAI.Chat;
using OpenAIDemo;
using System.Diagnostics;

namespace ToreAurstadIT.OpenAIDemo
{
    public class ChatGptDemo
    {

        public async Task<string?> RunChatGptQuery(ChatClient? chatClient, string msg)
        {
            if (chatClient == null)
            {
                Console.WriteLine("Sorry, the demo failed. The chatClient did not initialize propertly.");
                return null;
            }

            Console.WriteLine("Searching ... Please wait..");

            var stopWatch = Stopwatch.StartNew();

            var chatDataSources = new[]{
                (
                    SearchEndPoint: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_ENDPOINT", EnvironmentVariableTarget.User) ?? "N/A",
                    SearchIndexName: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_INDEXNAME", EnvironmentVariableTarget.User) ?? "N/A",
                    SearchApiKey: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_APIKEY", EnvironmentVariableTarget.User) ?? "N/A"
                )
            };

            string reply = "";

            try
            {

                reply = await chatClient.GetStreamedReplyStringAsync(msg, dataSources: chatDataSources, outputToConsole: true);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.WriteLine($"The operation took: {stopWatch.ElapsedMilliseconds} ms");


            Console.WriteLine();

            return reply;
        }

    }
}




The code here expects that three user-specific environment variables exists. Please note that the API key can be found under the menu item Keys in Azure AI Search. There are two admin keys and multiple query keys. To distribute keys to other users, you of course share the API query key, not the admin key(s). The screenshot below shows the demo. It is a console application, it could be web application or other client : Please note that the Free tier of Azure AI Search is rather slow and seems to only allow queryes at a certain interval, it will suffice to just test it out. To really test it out in for example an Intranet scenario, the standard tier Azure AI search service is recommended, at about 250 USD per month as noted.

Conclusions

Getting an Azure AI Chat service to work in intranet scenarios using a combination of Open AI Chat GPT-4 together with a custom collection of files that are indexed offers a nice combination of building up a knowledge base which you can query against. It is rather convenient way of building an on-premise solution for intranet AI chat service using Azure cloud services.

Thursday, 9 May 2024

Azure Cognitive Synthesized Text To Speech with voice styles

Using Azure Cognitive Services, it is possible to translate text into other languages and also synthesize the text to speech. It is also possible to add voice effects such as style of the voice. This adds more realism by adding emotions to a synthesized voice. The voice is already trained by neural net training and adding voice style makes the synthesized speech even more realistic and multi-purpose. The Github repo for this is available here as .NET Maui Blazor client written with .NET 8 :

MultiLingual translator DEMO Github repo

Not all the voices supported in Azure Cognitive Services do support voice effects. An overview of which voices are shown here:

https://learn.microsoft.com/nb-no/azure/ai-services/speech-service/language-support?tabs=tts#voice-styles-and-roles

More and more synthetic voices in Azure Cognitive Services gets more and more voice styles which express emotions. For now, most of the voices are either english (en-US) or chinese (zh-CN) and a few other languages got some few voices supporting styles. This will most likely be improved into the future where these neural net trained voices are trained in voice styles or some generic voice style algorithm is achieved that can infer emotions on a generic level, although that still sounds a bit sci-fi.

Azure Cognitive Text-To-Speech Voices with support for emotions / voice styles


Voice Styles Roles
de-DE-ConradNeural1 cheerful Not supported
en-GB-SoniaNeural cheerful, sad Not supported
en-US-AriaNeural angry, chat, cheerful, customerservice, empathetic, excited, friendly, hopeful, narration-professional, newscast-casual, newscast-formal, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-DavisNeural angry, chat, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-GuyNeural angry, cheerful, excited, friendly, hopeful, newscast, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-JaneNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-JasonNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-JennyNeural angry, assistant, chat, cheerful, customerservice, excited, friendly, hopeful, newscast, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-NancyNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-SaraNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering Not supported
en-US-TonyNeural angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering Not supported
es-MX-JorgeNeural chat, cheerful Not supported
fr-FR-DeniseNeural cheerful, sad Not supported
fr-FR-HenriNeural cheerful, sad Not supported
it-IT-IsabellaNeural chat, cheerful Not supported
ja-JP-NanamiNeural chat, cheerful, customerservice Not supported
pt-BR-FranciscaNeural calm Not supported
zh-CN-XiaohanNeural affectionate, angry, calm, cheerful, disgruntled, embarrassed, fearful, gentle, sad, serious Not supported
zh-CN-XiaomengNeural chat Not supported
zh-CN-XiaomoNeural affectionate, angry, calm, cheerful, depressed, disgruntled, embarrassed, envious, fearful, gentle, sad, serious Boy, Girl, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, YoungAdultFemale, YoungAdultMale
zh-CN-XiaoruiNeural angry, calm, fearful, sad Not supported
zh-CN-XiaoshuangNeural chat Not supported
zh-CN-XiaoxiaoNeural affectionate, angry, assistant, calm, chat, chat-casual, cheerful, customerservice, disgruntled, fearful, friendly, gentle, lyrical, newscast, poetry-reading, sad, serious, sorry, whisper Not supported
zh-CN-XiaoyiNeural affectionate, angry, cheerful, disgruntled, embarrassed, fearful, gentle, sad, serious Not supported
zh-CN-XiaozhenNeural angry, cheerful, disgruntled, fearful, sad, serious Not supported
zh-CN-YunfengNeural angry, cheerful, depressed, disgruntled, fearful, sad, serious Not supported
zh-CN-YunhaoNeural2 advertisement-upbeat Not supported
zh-CN-YunjianNeural3,4 angry, cheerful, depressed, disgruntled, documentary-narration, narration-relaxed, sad, serious, sports-commentary, sports-commentary-excited Not supported
zh-CN-YunxiaNeural angry, calm, cheerful, fearful, sad Not supported
zh-CN-YunxiNeural angry, assistant, chat, cheerful, depressed, disgruntled, embarrassed, fearful, narration-relaxed, newscast, sad, serious Boy, Narrator, YoungAdultMale
zh-CN-YunyangNeural customerservice, narration-professional, newscast-casual Not supported
zh-CN-YunyeNeural angry, calm, cheerful, disgruntled, embarrassed, fearful, sad, serious Boy, Girl, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, YoungAdultFemale, YoungAdultMale
zh-CN-YunzeNeural angry, calm, cheerful, depressed, disgruntled, documentary-narration, fearful, sad, serious OlderAdultMale, SeniorMale

Screenshot from the DEMO showing its user interface. You enter the text to translate at the top and the language of the text is detected using Azure Cognitive Services text detection functionality. And you can then select which language to translate the text into. It will call a REST call to Azure Cognitive Services to translate the text. And it is also possible to hear the speech of the text. Now, it is also added to add voice style. Use the table shown above to select a voice actor that supports a voice style you want to test. As noted, voice styles are still limited to a few languages and voice actors supporting emotions or voice styles. You will hear the voice from the voice actor in a normal mood or voice style if additional emotions or voice styles are not supported.
Let's look at some code for this DEMO too. You can study the Github repo and clone it to test it out yourself. The TextToSpeechUtil class handles much of the logic of creating voice from text input and also create the SSML-XML contents and performt the REST api call to create the voice file. Note that SSML mentioned here, is the Speech Synthesis Markup Language (SSML). The SSML standard is documented here on MSDN, it is a standard adopted by others too including Google.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup



using Microsoft.Extensions.Configuration;
using MultiLingual.Translator.Lib.Models;
using System;
using System.Security;
using System.Text;
using System.Xml.Linq;
using static System.Runtime.InteropServices.JavaScript.JSType;

namespace MultiLingual.Translator.Lib
{
    public class TextToSpeechUtil : ITextToSpeechUtil
    {

        public TextToSpeechUtil(IConfiguration configuration)
        {
            _configuration = configuration;
        }

        public async Task<TextToSpeechResult> GetSpeechFromText(string text, string language, TextToSpeechLanguage[] actorVoices, 
            string? preferredVoiceActorId, string? preferredVoiceStyle)
        {
            var result = new TextToSpeechResult();

            result.Transcript = GetSpeechTextXml(text, language, actorVoices, preferredVoiceActorId, preferredVoiceStyle, result);
            result.ContentType = _configuration[TextToSpeechSpeechContentType];
            result.OutputFormat = _configuration[TextToSpeechSpeechXMicrosoftOutputFormat];
            result.UserAgent = _configuration[TextToSpeechSpeechUserAgent];
            result.AvailableVoiceActorIds = ResolveAvailableActorVoiceIds(language, actorVoices);
            result.LanguageCode = language;

            string? token = await GetUpdatedToken();

            HttpClient httpClient = GetTextToSpeechWebClient(token);

            string ttsEndpointUrl = _configuration[TextToSpeechSpeechEndpoint];
            var response = await httpClient.PostAsync(ttsEndpointUrl, new StringContent(result.Transcript, Encoding.UTF8, result.ContentType));

            using (var memStream = new MemoryStream()) {
                var responseStream = await response.Content.ReadAsStreamAsync();
                responseStream.CopyTo(memStream);
                result.VoiceData = memStream.ToArray();
            }

            return result;
        }

        private async Task<string?> GetUpdatedToken()
        {
            string? token = _token?.ToNormalString();
            if (_lastTimeTokenFetched == null || DateTime.Now.Subtract(_lastTimeTokenFetched.Value).Minutes > 8)
            {
                token = await GetIssuedToken();
            }

            return token;
        }

        private HttpClient GetTextToSpeechWebClient(string? token)
        {
            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", token);
            httpClient.DefaultRequestHeaders.Add("X-Microsoft-OutputFormat", _configuration[TextToSpeechSpeechXMicrosoftOutputFormat]);
            httpClient.DefaultRequestHeaders.Add("User-Agent", _configuration[TextToSpeechSpeechUserAgent]);
            return httpClient;
        }
       
        public string GetSpeechTextXml(string text, string language, TextToSpeechLanguage[] actorVoices, string? preferredVoiceActorId,
              string? preferredVoiceStyle, TextToSpeechResult result)
        {
            result.VoiceActorId = ResolveVoiceActorId(language, preferredVoiceActorId, actorVoices);
            string speechXml = $@"
            <speak version='1.0' xml:lang='en-US' xmlns:mstts='https://www.w3.org/2001/mstts'>
                <voice xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice {result.VoiceActorId}'>
                    <prosody rate='1'>{text}</prosody>
                </voice>
            </speak>";

            speechXml = AddVoiceStyleEffectIfDesired(preferredVoiceStyle, speechXml);

            return speechXml;
        }

        /// <summary>
        /// Adds voice style / expression to the SSML markup for the voice
        /// </summary>
        private static string AddVoiceStyleEffectIfDesired(string? preferredVoiceStyle, string speechXml)
        {
            if (!string.IsNullOrWhiteSpace(preferredVoiceStyle) && preferredVoiceStyle != "normal-neutral")
            {
                var voiceDoc = XDocument.Parse(speechXml); //https://learn.microsoft.com/nb-no/azure/ai-services/speech-service/speech-synthesis-markup-voice#use-speaking-styles-and-roles

                XElement? prosody = voiceDoc.Descendants("prosody").FirstOrDefault();
                if (prosody?.Value != null)
                {
                    // Create the <mstts:express-as> element, for now skip the ':' letter and replace at the end

                    var expressedAsWrappedElement = new XElement("msttsexpress-as",
                        new XAttribute("style", preferredVoiceStyle));
                    expressedAsWrappedElement.Value = prosody!.Value;
                    prosody?.ReplaceWith(expressedAsWrappedElement);
                    speechXml = voiceDoc.ToString().Replace(@"msttsexpress-as", "mstts:express-as");
                }
            }

            return speechXml;
        }

        private List<string> ResolveAvailableActorVoiceIds(string language, TextToSpeechLanguage[] actorVoices)
        {
            if (actorVoices?.Any() == true)
            {
                var voiceActorIds = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
                return voiceActorIds;
            }
            return new List<string>();
        }

        private string ResolveVoiceActorId(string language, string? preferredVoiceActorId, TextToSpeechLanguage[] actorVoices)
        {
            string actorVoiceId = "(en-AU, NatashaNeural)"; //default to a select voice actor id 
            if (actorVoices?.Any() == true)
            {
                var voiceActorsForLanguage = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
                if (voiceActorsForLanguage != null)
                {
                    if (voiceActorsForLanguage.Any() == true)
                    {
                        var resolvedPreferredVoiceActorId = voiceActorsForLanguage.FirstOrDefault(v => v == preferredVoiceActorId);
                        if (!string.IsNullOrWhiteSpace(resolvedPreferredVoiceActorId))
                        {
                            return resolvedPreferredVoiceActorId!;
                        }
                        actorVoiceId = voiceActorsForLanguage.First();
                    }
                }
            }
            return actorVoiceId;
        }

        private async Task<string> GetIssuedToken()
        {
            var httpClient = new HttpClient();
            string? textToSpeechSubscriptionKey = Environment.GetEnvironmentVariable("AZURE_TEXT_SPEECH_SUBSCRIPTION_KEY", EnvironmentVariableTarget.Machine);
            httpClient.DefaultRequestHeaders.Add(OcpApiSubscriptionKeyHeaderName, textToSpeechSubscriptionKey);
            string tokenEndpointUrl = _configuration[TextToSpeechIssueTokenEndpoint];
            var response = await httpClient.PostAsync(tokenEndpointUrl, new StringContent("{}"));
            _token = (await response.Content.ReadAsStringAsync()).ToSecureString();
            _lastTimeTokenFetched = DateTime.Now;
            return _token.ToNormalString();
        }

        public async Task<List<string>> GetVoiceStyles()
        {
            var voiceStyles = new List<string>
            {
                "normal-neutral",
                "advertisement_upbeat",
                "affectionate",
                "angry",
                "assistant",
                "calm",
                "chat",
                "cheerful",
                "customerservice",
                "depressed",
                "disgruntled",
                "documentary-narration",
                "embarrassed",
                "empathetic",
                "envious",
                "excited",
                "fearful",
                "friendly",
                "gentle",
                "hopeful",
                "lyrical",
                "narration-professional",
                "narration-relaxed",
                "newscast",
                "newscast-casual",
                "newscast-formal",
                "poetry-reading",
                "sad",
                "serious",
                "shouting",
                "sports_commentary",
                "sports_commentary_excited",
                "whispering",
                "terrified",
                "unfriendly"
            };
            return await Task.FromResult(voiceStyles);
        }

        private const string OcpApiSubscriptionKeyHeaderName = "Ocp-Apim-Subscription-Key";
        private const string TextToSpeechIssueTokenEndpoint = "TextToSpeechIssueTokenEndpoint";
        private const string TextToSpeechSpeechEndpoint = "TextToSpeechSpeechEndpoint";        
        private const string TextToSpeechSpeechContentType = "TextToSpeechSpeechContentType";
        private const string TextToSpeechSpeechUserAgent = "TextToSpeechSpeechUserAgent";
        private const string TextToSpeechSpeechXMicrosoftOutputFormat = "TextToSpeechSpeechXMicrosoftOutputFormat";

        private readonly IConfiguration _configuration;

        private DateTime? _lastTimeTokenFetched = null;
        private SecureString _token = null;

    }
}

 
 

The REST call to generate the voice file is using following set up: TTS endpoint url: https://norwayeast.tts.speech.microsoft.com/cognitiveservices/v1 The transcript (text to translate into speech) is the following in my test as a SSML-XML document:


<speak version="1.0" xml:lang="en-US" xmlns:mstts="https://www.w3.org/2001/mstts">
  <voice xml:gender="Male" name="Microsoft Server Speech Text to Speech Voice (en-US, JaneNeural)">
    <mstts:express-as style="angry">I listen to Eurovision and cheer for Norway</mstts:express-as>
  </voice>
</speak>


The SSML also contains an extension called mstts extension language that adds features to SSML such as the express-as set to a voice style or emotion of "angry". Not all emotions or voice styles are supported by every voice actor in Azure Cognitive Services. But this is a list of the voice styles that could be supported, it varies which voice actor you choose (and inherently which language).
  • "normal-neutral"
  • "advertisement_upbeat"
  • "affectionate"
  • "angry"
  • "assistant"
  • "calm"
  • "chat"
  • "cheerful"
  • "customerservice"
  • "depressed"
  • "disgruntled"
  • "documentary-narration"
  • "embarrassed"
  • "empathetic"
  • "envious"
  • "excited"
  • "fearful"
  • "friendly"
  • "gentle"
  • "hopeful"
  • "lyrical"
  • "narration-professional"
  • "narration-relaxed"
  • "newscast"
  • "newscast-casual"
  • "newscast-formal"
  • "poetry-reading"
  • "sad"
  • "serious"
  • "shouting"
  • "sports_commentary"
  • "sports_commentary_excited"
  • "whispering"
  • "terrified"
  • "unfriendly
Microsoft has come a long way from the early work with SAPI - Microsoft Speech API with Microsoft SAM around 2000. The realism of synthetic voices more than 20 years ago were rather crude and robotic. Nowaydays, voice actors provided by Azure Cloud computing platform as shown here are neural net trained and very realistic based upon training from real voice actors and now more and more voice actor voices support emotions or voice styles. The usages of this can be diverse. Making use of text synthesis can serve in automated answering services and apps in diverse fields such as healthcare and public services or education and more. Making this demo has been fun for me and it can be used to learn languages and with the voice functionality you can train on not only the translation but also pronounciation.

Monday, 22 April 2024

Pii - Detecting Personally Identifiable Information using Azure Cognitive Services

This article will look at detecting Person Identifiable Information (Pii) using Azure Cognitive Services. I have created a demo using .NET Maui Blazor has been created and the Github repo is here:
https://github.com/toreaurstadboss/PiiDetectionDemo

Person Identifiable Information (Pii) is desired to detect and also redact, that is using censorship or obscuring Pii to prepare documents for publication. The Pii feature in Azure Cognitive Services is a part of the Language resource service. A quickstart for using Pii is available here:
https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/quickstart?pivots=programming-language-csharp

After creating the Language resource, look up the keys and endpoints for you service. Using Azure CLI inside Cloud shell, you can enter this command to find the keys, in Azure many services has got two keys you can exchange with new keys through regeneration:

az cognitiveservices account keys list --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices
This is how you can query after endpoint of language resource using Azure CLI : az cognitiveservices account show --query "properties.endpoint" --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices
Next, the demo of this article. Connecting to the Pii Removal Text Analytics is possible using this Nuget package (REST calls can also be done manually): - Azure.AI.TextAnalytics version 5.3.0 Here is the other Nugets of my Demo included from the .csproj file :

PiiDetectionDemo.csproj


  <ItemGroup>
        <PackageReference Include="Azure.AI.TextAnalytics" Version="5.3.0" />
        <PackageReference Include="Microsoft.Maui.Controls" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.Maui.Controls.Compatibility" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.AspNetCore.Components.WebView.Maui" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.Extensions.Logging.Debug" Version="8.0.0" />
    </ItemGroup>


A service using this Pii removal feature is simply making use of a TextAnalyticsClient and method RecognizePiiEntitiesAsync.

PiiRemovalTextClientService.cs IPiiRemovalTextClientService.cs



using Azure;
using Azure.AI.TextAnalytics;

namespace PiiDetectionDemo.Util
{
    public interface IPiiRemovalTextAnalyticsClientService
    {
        Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language);
    }
}


namespace PiiDetectionDemo.Util
{
    public class PiiRemovalTextAnalyticsClientService : IPiiRemovalTextAnalyticsClientService
    {

        private TextAnalyticsClient _client;

        public PiiRemovalTextAnalyticsClientService()
        {
            var azureEndpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT");
            var azureKey = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY");

            if (string.IsNullOrWhiteSpace(azureEndpoint))
            {
                throw new ArgumentNullException(nameof(azureEndpoint), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_ENDPOINT");
            }
            if (string.IsNullOrWhiteSpace(azureKey))
            {
                throw new ArgumentNullException(nameof(azureKey), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_KEY");
            }

            _client = new TextAnalyticsClient(new Uri(azureEndpoint), new AzureKeyCredential(azureKey));
        }

        public async Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language)
        {
            var piiEntities = await _client.RecognizePiiEntitiesAsync(document, language);
            return piiEntities;
        }

    }
}


The UI codebehind of the razor component page showing the UI looks like this:

Home.razor.cs


using Azure;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Models;
using PiiDetectionDemo.Util;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace PiiDetectionDemo.Components.Pages
{
    public partial class Home
    {

        private IndexModel Model = new();
        private bool isProcessing = false;
        private bool isSearchPerformed = false;

        private async Task Submit()
        {
            isSearchPerformed = false;
            isProcessing = true;
            try
            {
                var response = await _piiRemovalTextAnalyticsClientService.RecognizePiiEntitiesAsync(Model.InputText, null);
                Model.RedactedText = response?.Value?.RedactedText;
                Model.UpdateHtmlRedactedText();
                Model.AnalysisResult = response?.Value;
                StateHasChanged();
            }
            catch (Exception ex)
            {
                await Console.Out.WriteLineAsync(ex.ToString());
            }
            isProcessing = false;
            isSearchPerformed = true;
        }

        private void removeWhitespace(ChangeEventArgs args)
        {
            Model.InputText = args.Value?.ToString()?.CleanupAllWhiteSpace();
            StateHasChanged();
        }



    }
}



To get the redacted or censored text void of any Pii that the Pii detection feature was able to detect, access the Value of type Azure.AI.TextAnalytics.PiiEntityCollection. Inside this object, the string RedactedText contains the censored / redacted text. The IndexModel looks like this :


using Azure.AI.TextAnalytics;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Util;
using System.ComponentModel.DataAnnotations;
using System.Text;

namespace PiiDetectionDemo.Models
{

    public class IndexModel
    {

        [Required]
        public string? InputText { get; set; }

        public string? RedactedText { get; set; }

        public string? HtmlRedactedText { get; set; }

        public MarkupString HtmlRedactedTextMarkupString { get; set; }

        public void UpdateHtmlRedactedText()
        {
            var sb = new StringBuilder(RedactedText);
            if (AnalysisResult != null && RedactedText != null)
            {
                foreach (var piiEntity in AnalysisResult.OrderByDescending(a => a.Offset))
                {
                    sb.Insert(piiEntity.Offset + piiEntity.Length, "</b></span>");
                    sb.Insert(piiEntity.Offset, $"<span style='background-color:lightgray;border:1px solid black;corner-radius:2px; color:{GetBackgroundColor(piiEntity)}' title='{piiEntity.Category}: {piiEntity.SubCategory} Confidence: {piiEntity.ConfidenceScore} Redacted Text: {piiEntity.Text}'><b>");
                }
            }
            HtmlRedactedText = sb.ToString()?.CleanupAllWhiteSpace();    
            HtmlRedactedTextMarkupString = new MarkupString(HtmlRedactedText ?? string.Empty);
        }

        private string GetBackgroundColor(PiiEntity piiEntity)
        {
            if (piiEntity.Category == PiiEntityCategory.PhoneNumber)
            {
                return "yellow";
            }
            if (piiEntity.Category == PiiEntityCategory.Organization)
            {
                return "orange";
            }
            if (piiEntity.Category == PiiEntityCategory.Address)
            {
                return "green";
            }
            return "gray";                   
        }

        public long ExecutionTime { get; set; }
        public PiiEntityCollection? AnalysisResult { get; set; }

    }
}




Frontend UI looks like this: Home.razor


@page "/"
@using PiiDetectionDemo.Util

@inject IPiiRemovalTextAnalyticsClientService _piiRemovalTextAnalyticsClientService;

<h3>Azure HealthCare Text Analysis - Pii detection feature - Azure Cognitive Services</h3>

<em>Pii = Person identifiable information</em>

<EditForm Model="@Model" OnValidSubmit="@Submit">
    <DataAnnotationsValidator />
    <ValidationSummary />

    <div class="form-group row">
        <label><strong>Text input</strong></label>
        <InputTextArea @oninput="removeWhitespace" class="overflow-scroll" style="max-height:500px;max-width:900px;font-size: 10pt;font-family:Verdana, Geneva, Tahoma, sans-serif" @bind-Value="@Model.InputText" rows="5" />
    </div>

    <div class="form-group row">
        <div class="col">
            <br />
            <button class="btn btn-outline-primary" type="submit">Run</button>
        </div>
        <div class="col">
        </div>
        <div class="col">
        </div>
    </div>

    <br />

    @if (isProcessing)
    {

        <div class="progress" style="max-width: 90%">
            <div class="progress-bar progress-bar-striped progress-bar-animated"
                 style="width: 100%; background-color: green">
                Retrieving result from Azure Text Analysis Pii detection feature. Processing..
            </div>
        </div>
        <br />

    }

    <div class="form-group row">
        <label><strong>Analysis result</strong></label>

        @if (isSearchPerformed)
        {
            <br />
            <b>Execution time took: @Model.ExecutionTime ms (milliseconds)</b>

            <br />
            <br />

            <b>Redacted text (Pii removed)</b>
            <br />

            <div class="form-group row">
               <label><strong>Categorized Pii redacted text</strong></label>
               <div>
               @Model.HtmlRedactedTextMarkupString
               </div>
            </div>

            <br />
            <br />

            <table class="table table-striped table-dark table-hover">
                <thead>
                <th>Pii text</th>
                <th>Category</th>
                <th>SubCategory</th>
                <th>Offset</th>
                <th>Length</th>
                <th>ConfidenceScore</th>
                </thead>
                <tbody>
                    @if (Model.AnalysisResult != null) {
                        @foreach (var entity in Model.AnalysisResult)
                        {
                            <tr>
                                <td>@entity.Text</td>
                                <td>@entity.Category.ToString()</td>
                                <td>@entity.SubCategory</td>
                                <td>@entity.Offset</td>
                                <td>@entity.Length</td>
                                <td>@entity.ConfidenceScore</td>                                        
                            </tr>
                        }
                    }
                </tbody>
            </table>

        }
    </div>

</EditForm>



The Demo uses Bootstrap 5 to build up a HTML table styled and showing the Azure.AI.TextAnalytics.PiiEntity properties.