This article will look at detecting Person Identifiable Information (Pii) using Azure Cognitive Services.
I have created a demo using .NET Maui Blazor has been created and the Github repo is here:
https://github.com/toreaurstadboss/PiiDetectionDemo
After creating the Language resource, look up the keys and endpoints for you service.
Using Azure CLI inside Cloud shell, you can enter this command to find the keys, in Azure many services has got two keys you can exchange with new keys through regeneration:
az cognitiveservices account keys list --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices
This is how you can query after endpoint of language resource using Azure CLI :
az cognitiveservices account show --query "properties.endpoint" --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices
Next, the demo of this article. Connecting to the Pii Removal Text Analytics is possible using this Nuget package (REST calls can also be done manually):
- Azure.AI.TextAnalytics version 5.3.0
Here is the other Nugets of my Demo included from the .csproj file :
using Azure;
using Azure.AI.TextAnalytics;
namespacePiiDetectionDemo.Util
{
publicinterfaceIPiiRemovalTextAnalyticsClientService
{
Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language);
}
}
namespacePiiDetectionDemo.Util
{
publicclassPiiRemovalTextAnalyticsClientService : IPiiRemovalTextAnalyticsClientService
{
private TextAnalyticsClient _client;
publicPiiRemovalTextAnalyticsClientService()
{
var azureEndpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT");
var azureKey = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY");
if (string.IsNullOrWhiteSpace(azureEndpoint))
{
thrownew ArgumentNullException(nameof(azureEndpoint), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_ENDPOINT");
}
if (string.IsNullOrWhiteSpace(azureKey))
{
thrownew ArgumentNullException(nameof(azureKey), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_KEY");
}
_client = new TextAnalyticsClient(new Uri(azureEndpoint), new AzureKeyCredential(azureKey));
}
publicasync Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language)
{
var piiEntities = await _client.RecognizePiiEntitiesAsync(document, language);
return piiEntities;
}
}
}
The UI codebehind of the razor component page showing the UI looks like this:
Home.razor.cs
using Azure;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Models;
using PiiDetectionDemo.Util;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespacePiiDetectionDemo.Components.Pages
{
publicpartialclassHome
{
private IndexModel Model = new();
privatebool isProcessing = false;
privatebool isSearchPerformed = false;
privateasync Task Submit()
{
isSearchPerformed = false;
isProcessing = true;
try
{
var response = await _piiRemovalTextAnalyticsClientService.RecognizePiiEntitiesAsync(Model.InputText, null);
Model.RedactedText = response?.Value?.RedactedText;
Model.UpdateHtmlRedactedText();
Model.AnalysisResult = response?.Value;
StateHasChanged();
}
catch (Exception ex)
{
await Console.Out.WriteLineAsync(ex.ToString());
}
isProcessing = false;
isSearchPerformed = true;
}
privatevoidremoveWhitespace(ChangeEventArgs args)
{
Model.InputText = args.Value?.ToString()?.CleanupAllWhiteSpace();
StateHasChanged();
}
}
}
To get the redacted or censored text void of any Pii that the Pii detection feature was able to detect, access the
Value of type Azure.AI.TextAnalytics.PiiEntityCollection. Inside this object, the string RedactedText contains the censored / redacted text.
The IndexModel looks like this :
using Azure.AI.TextAnalytics;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Util;
using System.ComponentModel.DataAnnotations;
using System.Text;
namespacePiiDetectionDemo.Models
{
publicclassIndexModel
{
[Required]
publicstring? InputText { get; set; }
publicstring? RedactedText { get; set; }
publicstring? HtmlRedactedText { get; set; }
public MarkupString HtmlRedactedTextMarkupString { get; set; }
publicvoidUpdateHtmlRedactedText()
{
var sb = new StringBuilder(RedactedText);
if (AnalysisResult != null && RedactedText != null)
{
foreach (var piiEntity in AnalysisResult.OrderByDescending(a => a.Offset))
{
sb.Insert(piiEntity.Offset + piiEntity.Length, "</b></span>");
sb.Insert(piiEntity.Offset, $"<span style='background-color:lightgray;border:1px solid black;corner-radius:2px; color:{GetBackgroundColor(piiEntity)}' title='{piiEntity.Category}: {piiEntity.SubCategory} Confidence: {piiEntity.ConfidenceScore} Redacted Text: {piiEntity.Text}'><b>");
}
}
HtmlRedactedText = sb.ToString()?.CleanupAllWhiteSpace();
HtmlRedactedTextMarkupString = new MarkupString(HtmlRedactedText ?? string.Empty);
}
privatestringGetBackgroundColor(PiiEntity piiEntity)
{
if (piiEntity.Category == PiiEntityCategory.PhoneNumber)
{
return"yellow";
}
if (piiEntity.Category == PiiEntityCategory.Organization)
{
return"orange";
}
if (piiEntity.Category == PiiEntityCategory.Address)
{
return"green";
}
return"gray";
}
publiclong ExecutionTime { get; set; }
public PiiEntityCollection? AnalysisResult { get; set; }
}
}
The speech synthesis service of Azure AI is accessed via a REST service. You can actually test it out first in Postman, retrieving an access token via an endpoint for this and then
calling the text to speech endpoint using the access token as a bearer token.
To get the demo working, you have to inside the Azure Portal create the necessary resources / services. This article is focused on speech service.
Important, if you want to test out the DEMO yourself, remember to put the keys into environment variables so they are not exposed via source control.
To get started with speech synthesis in Azure Cognitive Services, add a Speech Service resource via the Azure Portal.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/overview
We also need to add audio capability to our demo, which is a .NET MAUI Blazor app. The Nuget package used is the following :
MultiLingual.Translator.csproj
This Nuget package's website is here:
https://github.com/jfversluis/Plugin.Maui.Audio
The MauiProgram.cs looks like the following, make note of AudioManager.Current, which is registered as a singleton.
MauiProgram.cs
using Microsoft.Extensions.Configuration;
using MultiLingual.Translator.Lib;
using Plugin.Maui.Audio;
namespaceMultiLingual.Translator;
publicstaticclassMauiProgram
{
publicstatic MauiApp CreateMauiApp()
{
var builder = MauiApp.CreateBuilder();
builder
.UseMauiApp<App>()
.ConfigureFonts(fonts =>
{
fonts.AddFont("OpenSans-Regular.ttf", "OpenSansRegular");
});
builder.Services.AddMauiBlazorWebView();
#if DEBUG
builder.Services.AddBlazorWebViewDeveloperTools();
#endif
builder.Services.AddSingleton(AudioManager.Current);
builder.Services.AddTransient<MainPage>();
builder.Services.AddScoped<IDetectLanguageUtil, DetectLanguageUtil>();
builder.Services.AddScoped<ITranslateUtil, TranslateUtil>();
builder.Services.AddScoped<ITextToSpeechUtil, TextToSpeechUtil>();
var config = new ConfigurationBuilder().AddJsonFile("appsettings.json").Build();
builder.Configuration.AddConfiguration(config);
return builder.Build();
}
}
Next up, let's look at the TextToSpeechUtil. This class, which is a service that does two things against the REST API of the text-to-speech Azure Cognitive AI service :
Fetch an access token
Synthesize text to speech
TextToSpeechUtil.cs
using Microsoft.Extensions.Configuration;
using MultiLingual.Translator.Lib.Models;
using System.Security;
using System.Text;
namespaceMultiLingual.Translator.Lib
{
publicclassTextToSpeechUtil : ITextToSpeechUtil
{
publicTextToSpeechUtil(IConfiguration configuration)
{
_configuration = configuration;
}
publicasync Task<TextToSpeechResult> GetSpeechFromText(string text, string language, TextToSpeechLanguage[] actorVoices, string? preferredVoiceActorId)
{
var result = new TextToSpeechResult();
result.Transcript = GetSpeechTextXml(text, language, actorVoices, preferredVoiceActorId, result);
result.ContentType = _configuration[TextToSpeechSpeechContentType];
result.OutputFormat = _configuration[TextToSpeechSpeechXMicrosoftOutputFormat];
result.UserAgent = _configuration[TextToSpeechSpeechUserAgent];
result.AvailableVoiceActorIds = ResolveAvailableActorVoiceIds(language, actorVoices);
result.LanguageCode = language;
string? token = await GetUpdatedToken();
HttpClient httpClient = GetTextToSpeechWebClient(token);
string ttsEndpointUrl = _configuration[TextToSpeechSpeechEndpoint];
var response = await httpClient.PostAsync(ttsEndpointUrl, new StringContent(result.Transcript, Encoding.UTF8, result.ContentType));
using (var memStream = new MemoryStream()) {
var responseStream = await response.Content.ReadAsStreamAsync();
responseStream.CopyTo(memStream);
result.VoiceData = memStream.ToArray();
}
return result;
}
privateasync Task<string?> GetUpdatedToken()
{
string? token = _token?.ToNormalString();
if (_lastTimeTokenFetched == null || DateTime.Now.Subtract(_lastTimeTokenFetched.Value).Minutes > 8)
{
token = await GetIssuedToken();
}
return token;
}
private HttpClient GetTextToSpeechWebClient(string? token)
{
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", token);
httpClient.DefaultRequestHeaders.Add("X-Microsoft-OutputFormat", _configuration[TextToSpeechSpeechXMicrosoftOutputFormat]);
httpClient.DefaultRequestHeaders.Add("User-Agent", _configuration[TextToSpeechSpeechUserAgent]);
return httpClient;
}
privatestringGetSpeechTextXml(string text, string language, TextToSpeechLanguage[] actorVoices, string? preferredVoiceActorId, TextToSpeechResult result)
{
result.VoiceActorId = ResolveVoiceActorId(language, preferredVoiceActorId, actorVoices);
string speechXml = $@"
<speak version='1.0' xml:lang='en-US'>
<voice xml:lang='en-US' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice {result.VoiceActorId}'>
<prosody rate='1'>{text}</prosody>
</voice>
</speak>";
return speechXml;
}
private List<string> ResolveAvailableActorVoiceIds(string language, TextToSpeechLanguage[] actorVoices)
{
if (actorVoices?.Any() == true)
{
var voiceActorIds = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
return voiceActorIds;
}
returnnew List<string>();
}
privatestringResolveVoiceActorId(string language, string? preferredVoiceActorId, TextToSpeechLanguage[] actorVoices)
{
string actorVoiceId = "(en-AU, NatashaNeural)"; //default to a select voice actor id if (actorVoices?.Any() == true)
{
var voiceActorsForLanguage = actorVoices.Where(v => v.LanguageKey == language || v.LanguageKey.Split("-")[0] == language).SelectMany(v => v.VoiceActors).Select(v => v.VoiceId).ToList();
if (voiceActorsForLanguage != null)
{
if (voiceActorsForLanguage.Any() == true)
{
var resolvedPreferredVoiceActorId = voiceActorsForLanguage.FirstOrDefault(v => v == preferredVoiceActorId);
if (!string.IsNullOrWhiteSpace(resolvedPreferredVoiceActorId))
{
return resolvedPreferredVoiceActorId!;
}
actorVoiceId = voiceActorsForLanguage.First();
}
}
}
return actorVoiceId;
}
privateasync Task<string> GetIssuedToken()
{
var httpClient = new HttpClient();
string? textToSpeechSubscriptionKey = Environment.GetEnvironmentVariable("AZURE_TEXT_SPEECH_SUBSCRIPTION_KEY", EnvironmentVariableTarget.Machine);
httpClient.DefaultRequestHeaders.Add(OcpApiSubscriptionKeyHeaderName, textToSpeechSubscriptionKey);
string tokenEndpointUrl = _configuration[TextToSpeechIssueTokenEndpoint];
var response = await httpClient.PostAsync(tokenEndpointUrl, new StringContent("{}"));
_token = (await response.Content.ReadAsStringAsync()).ToSecureString();
_lastTimeTokenFetched = DateTime.Now;
return _token.ToNormalString();
}
privateconststring OcpApiSubscriptionKeyHeaderName = "Ocp-Apim-Subscription-Key";
privateconststring TextToSpeechIssueTokenEndpoint = "TextToSpeechIssueTokenEndpoint";
privateconststring TextToSpeechSpeechEndpoint = "TextToSpeechSpeechEndpoint";
privateconststring TextToSpeechSpeechContentType = "TextToSpeechSpeechContentType";
privateconststring TextToSpeechSpeechUserAgent = "TextToSpeechSpeechUserAgent";
privateconststring TextToSpeechSpeechXMicrosoftOutputFormat = "TextToSpeechSpeechXMicrosoftOutputFormat";
privatereadonly IConfiguration _configuration;
private DateTime? _lastTimeTokenFetched = null;
private SecureString _token = null;
}
}
Let's look at the appsettings.json file. The Ocp-Apim-Subscription-Key is put into environment variable, this is a secret key you do not want to expose to avoid leaking a key an running costs for usage of service.
Appsettings.json
Next up, I have gathered all the voice actor ids for languages in Azure Cognitive Services which have voice actor ids. Thesee are all the most known languages in the list of Azure about 150 supported languages, see the following json for an overview of voice actor ids.
For example, Norwegian language got three voice actors that are synthesized neural net trained AI voice actors for realistic speech synthesis.
Let's look at the source code for calling the TextToSpeechUtil.cs shown above from a MAUI Blazor app view, Index.razor
The code below shown is two private methods that does the work of retrieving the audio file from the Azure Speeech Service by first loading up all the voice actor ids from a bundled json file of voice actors displayed above and deserialize this into a list of voice actors.
Retrieving the audio file passes in the translated text of which to generate synthesized speedch for and also the target language, all available actor voices and preferred voice actor id, if set.
Retrieved is metadata and the audio file, in a MP3 file format. The file format is recognized by for example Windows withouth having to have any codec libraries installed in addition.
Index.razor (Inside the @code block { .. } of that razor file)
A screenshot shows how the DEMO app now looks like. You can translate text into other language and then have speech synthesis in Azure AI Cognitive Service generate a realistic audio speech of the translated text so you can also see how the text not only is translated, but also pronounced.
I have added a repo on Github for a web scraping app written in .NET MAUI Blazor. It uses Azure Cognitive Services to summarize articles.
https://github.com/toreaurstadboss/DagbladetWebscrapper
The web scrapper uses the Nuget package for Html agility pack to handle the DOM after downloading articles from the Internet.
As the name of the repo suggests, it can be used to read for example Dagbladet articles, without having to waddle through ads. 'Website Scraping'
is a term that means extracting data from web sites, or content in general.
The following libraries are used in the Razor lib containing the text handling methods to scrap web pages:
Let's first look at the SummarizationUtil class. This uses TextAnalyticsClient in Azure.AI.TextAnalytics. We will summarize articles into five sentence summaries using the Azure AI
text analytics client.
using Azure.AI.TextAnalytics;
using System.Text;
namespaceWebscrapper.Lib
{
publicclassSummarizationUtil : ISummarizationUtil
{
publicasync Task<List<ExtractiveSummarizeResult>> GetExtractiveSummarizeResult(string document, TextAnalyticsClient client)
{
var batchedDocuments = new List<string>
{
document
};
var result = new List<ExtractiveSummarizeResult>();
var options = new ExtractiveSummarizeOptions
{
MaxSentenceCount = 5
};
var operation = await client.ExtractiveSummarizeAsync(Azure.WaitUntil.Completed, batchedDocuments, options: options);
awaitforeach (ExtractiveSummarizeResultCollection documentsInPage in operation.Value)
{
foreach (ExtractiveSummarizeResult documentResult in documentsInPage)
{
result.Add(documentResult);
}
}
return result;
}
publicasync Task<string> GetExtractiveSummarizeSentectesResult(string document, TextAnalyticsClient client)
{
List<ExtractiveSummarizeResult> summaries = await GetExtractiveSummarizeResult(document, client);
returnstring.Join(Environment.NewLine, summaries.Select(s => s.Sentences).SelectMany(x => x).Select(x => x.Text));
}
}
}
We set up the extraction here to return a maximum of five sentences. Note the use of await foreach here. (async ienumerable)
Here is a helper method to get a string from a ExtractiveSummarizeResult.
using Azure.AI.TextAnalytics;
using System.Text;
namespaceWebscrapper.Lib
{
publicstaticclassSummarizationExtensions
{
publicstaticstringGetExtractiveSummarizeResultInfo(this ExtractiveSummarizeResult documentResults)
{
var sb = new StringBuilder();
if (documentResults.HasError)
{
sb.AppendLine($"Error!");
sb.AppendLine($"Document error code: {documentResults.Error.ErrorCode}.");
sb.AppendLine($"Message: {documentResults.Error.Message}");
}
else
{
sb.AppendLine($"SUCCESS. There are no errors encountered while summarizing the document");
}
sb.AppendLine($"Extracted the following {documentResults.Sentences.Count} sentence(s):");
sb.AppendLine();
foreach (ExtractiveSummarySentence sentence in documentResults.Sentences)
{
sb.AppendLine($"Sentence: {sentence.Text} Offset: {sentence.Offset} Rankscore: {sentence.RankScore} Length:{sentence.Length}");
sb.AppendLine();
}
return sb.ToString();
}
}
}
Here is a factory method to create a TextAnalyticsClient.
using Azure;
using Azure.AI.TextAnalytics;
namespaceWebscrapper.Lib
{
publicstaticclassTextAnalyticsClientFactory
{
publicstatic TextAnalyticsClient CreateClient()
{
string? uri = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT", EnvironmentVariableTarget.Machine);
string? key = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY", EnvironmentVariableTarget.Machine);
if (uri == null)
{
thrownew ArgumentNullException(nameof(uri), "Could not get system environment variable named 'AZURE_COGNITIVE_SERVICE_ENDPOINT' Set this variable first.");
}
if (uri == null)
{
thrownew ArgumentNullException(nameof(uri), "Could not get system environment variable named 'AZURE_COGNITIVE_SERVICE_KEY' Set this variable first.");
}
var client = new TextAnalyticsClient(new Uri(uri!), new AzureKeyCredential(key!));
return client;
}
}
}
To use Azure Cognitive Services, you have to get the endpoint (an url) and a service key for your account in Azure portal after having activated Azure Cognitive Services.
The page extraction util looks like this, note the use of Html Agility pack.
using HtmlAgilityPack;
using System.Text;
namespaceWebscrapper.Lib
{
publicclassPageExtractionUtil : IPageExtractionUtil
{
publicasync Task<string?> ExtractHtml(string url, bool includeTags)
{
if (string.IsNullOrEmpty(url))
returnnull;
var httpClient = new HttpClient();
string pageHtml = await httpClient.GetStringAsync(url);
if (string.IsNullOrEmpty(pageHtml))
{
returnnull;
}
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(pageHtml);
var textNodes = htmlDoc.DocumentNode.SelectNodes("//h1|//h2|//h3|//h4|//h5|//h6|//p")
.Where(n => !string.IsNullOrWhiteSpace(n.InnerText)).ToList();
var sb = new StringBuilder();
foreach (var textNode in textNodes)
{
var text = textNode.InnerText;
if (includeTags)
{
sb.AppendLine($"<{textNode.Name}>{textNode.InnerText}</{textNode.Name}>");
}
else
{
sb.AppendLine($"{textNode.InnerText}");
}
}
return sb.ToString();
}
}
}
Let's look at an example usage :
@page "/"
@inject ISummarizationUtil SummarizationUtil
@inject IPageExtractionUtil PageExtractionUtil
@using DagbladetWebscrapper.Models;
<h1>Dagbladet Artikkel Oppsummering</h1>
<EditForm Model="@Model" OnValidSubmit="@Submit"class="form-group">
<DataAnnotationsValidator />
<ValidationSummary />
<div class="form-group row">
<label for="Model.ArticleUrl">Url til artikkel</label>
<InputText @bind-Value="Model!.ArticleUrl" placeholder="Skriv inn url til artikkel i Dagbladet" />
</div>
<div class="form-group row">
<span>Artikkelens oppsummering</span>
<InputTextArea readonly="readonly" placeholder="Her dukker opp artikkelens oppsummering" @bind-Value="Model!.SummarySentences" rows="5"></InputTextArea>
</div>
<div class="form-group row">
<span>Artikkelens tekst</span>
<InputTextArea readonly="readonly" placeholder="Her dukker opp teksten til artikkelen" @bind-Value="Model!.ArticleText" rows="5"></InputTextArea>
</div>
<button type="submit">Submit</button>
</EditForm>
@code {
private Azure.AI.TextAnalytics.TextAnalyticsClient _client;
public IndexModel Model { get; set; } = new();
privateasyncvoidSubmit()
{
string articleText = await PageExtractionUtil.ExtractHtml(Model!.ArticleUrl, false);
Model.ArticleText = articleText;
if (_client == null)
{
_client = TextAnalyticsClientFactory.CreateClient();
}
string summaryText = await SummarizationUtil.GetExtractiveSummarizeSentectesResult(articleText, _client);
Model.SummarySentences = summaryText;
StateHasChanged();
}
}
The view model class for the form looks like this.
Let's look at a screen shot that shows the app in use. It targets an article on the tabloid newspaper Dagbladet in Norway. This tabloid is notorious for writing sensational titles of articles so you have to click into the article (e.g. 'clickbait') and then inside the article, you have to wade through lots of ads. Here, you now have an app, where you can open up www.dagbladet.no and find a link to an article and now extract the text and get a five sentence summary using Azure AI Cognitive services in a .NET MAUI app.