Monday 22 April 2024

Pii - Detecting Personally Identifiable Information using Azure Cognitive Services

This article will look at detecting Person Identifiable Information (Pii) using Azure Cognitive Services. I have created a demo using .NET Maui Blazor has been created and the Github repo is here:
https://github.com/toreaurstadboss/PiiDetectionDemo

Person Identifiable Information (Pii) is desired to detect and also redact, that is using censorship or obscuring Pii to prepare documents for publication. The Pii feature in Azure Cognitive Services is a part of the Language resource service. A quickstart for using Pii is available here:
https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/quickstart?pivots=programming-language-csharp

After creating the Language resource, look up the keys and endpoints for you service. Using Azure CLI inside Cloud shell, you can enter this command to find the keys, in Azure many services has got two keys you can exchange with new keys through regeneration:

az cognitiveservices account keys list --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices
This is how you can query after endpoint of language resource using Azure CLI : az cognitiveservices account show --query "properties.endpoint" --resource-group SomeAzureResourceGroup --name SomeAccountAzureCognitiveServices
Next, the demo of this article. Connecting to the Pii Removal Text Analytics is possible using this Nuget package (REST calls can also be done manually): - Azure.AI.TextAnalytics version 5.3.0 Here is the other Nugets of my Demo included from the .csproj file :

PiiDetectionDemo.csproj


  <ItemGroup>
        <PackageReference Include="Azure.AI.TextAnalytics" Version="5.3.0" />
        <PackageReference Include="Microsoft.Maui.Controls" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.Maui.Controls.Compatibility" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.AspNetCore.Components.WebView.Maui" Version="$(MauiVersion)" />
        <PackageReference Include="Microsoft.Extensions.Logging.Debug" Version="8.0.0" />
    </ItemGroup>


A service using this Pii removal feature is simply making use of a TextAnalyticsClient and method RecognizePiiEntitiesAsync.

PiiRemovalTextClientService.cs IPiiRemovalTextClientService.cs



using Azure;
using Azure.AI.TextAnalytics;

namespace PiiDetectionDemo.Util
{
    public interface IPiiRemovalTextAnalyticsClientService
    {
        Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language);
    }
}


namespace PiiDetectionDemo.Util
{
    public class PiiRemovalTextAnalyticsClientService : IPiiRemovalTextAnalyticsClientService
    {

        private TextAnalyticsClient _client;

        public PiiRemovalTextAnalyticsClientService()
        {
            var azureEndpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT");
            var azureKey = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY");

            if (string.IsNullOrWhiteSpace(azureEndpoint))
            {
                throw new ArgumentNullException(nameof(azureEndpoint), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_ENDPOINT");
            }
            if (string.IsNullOrWhiteSpace(azureKey))
            {
                throw new ArgumentNullException(nameof(azureKey), "Missing system environment variable: AZURE_COGNITIVE_SERVICE_KEY");
            }

            _client = new TextAnalyticsClient(new Uri(azureEndpoint), new AzureKeyCredential(azureKey));
        }

        public async Task<Response<PiiEntityCollection>> RecognizePiiEntitiesAsync(string? document, string? language)
        {
            var piiEntities = await _client.RecognizePiiEntitiesAsync(document, language);
            return piiEntities;
        }

    }
}


The UI codebehind of the razor component page showing the UI looks like this:

Home.razor.cs


using Azure;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Models;
using PiiDetectionDemo.Util;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace PiiDetectionDemo.Components.Pages
{
    public partial class Home
    {

        private IndexModel Model = new();
        private bool isProcessing = false;
        private bool isSearchPerformed = false;

        private async Task Submit()
        {
            isSearchPerformed = false;
            isProcessing = true;
            try
            {
                var response = await _piiRemovalTextAnalyticsClientService.RecognizePiiEntitiesAsync(Model.InputText, null);
                Model.RedactedText = response?.Value?.RedactedText;
                Model.UpdateHtmlRedactedText();
                Model.AnalysisResult = response?.Value;
                StateHasChanged();
            }
            catch (Exception ex)
            {
                await Console.Out.WriteLineAsync(ex.ToString());
            }
            isProcessing = false;
            isSearchPerformed = true;
        }

        private void removeWhitespace(ChangeEventArgs args)
        {
            Model.InputText = args.Value?.ToString()?.CleanupAllWhiteSpace();
            StateHasChanged();
        }



    }
}



To get the redacted or censored text void of any Pii that the Pii detection feature was able to detect, access the Value of type Azure.AI.TextAnalytics.PiiEntityCollection. Inside this object, the string RedactedText contains the censored / redacted text. The IndexModel looks like this :


using Azure.AI.TextAnalytics;
using Microsoft.AspNetCore.Components;
using PiiDetectionDemo.Util;
using System.ComponentModel.DataAnnotations;
using System.Text;

namespace PiiDetectionDemo.Models
{

    public class IndexModel
    {

        [Required]
        public string? InputText { get; set; }

        public string? RedactedText { get; set; }

        public string? HtmlRedactedText { get; set; }

        public MarkupString HtmlRedactedTextMarkupString { get; set; }

        public void UpdateHtmlRedactedText()
        {
            var sb = new StringBuilder(RedactedText);
            if (AnalysisResult != null && RedactedText != null)
            {
                foreach (var piiEntity in AnalysisResult.OrderByDescending(a => a.Offset))
                {
                    sb.Insert(piiEntity.Offset + piiEntity.Length, "</b></span>");
                    sb.Insert(piiEntity.Offset, $"<span style='background-color:lightgray;border:1px solid black;corner-radius:2px; color:{GetBackgroundColor(piiEntity)}' title='{piiEntity.Category}: {piiEntity.SubCategory} Confidence: {piiEntity.ConfidenceScore} Redacted Text: {piiEntity.Text}'><b>");
                }
            }
            HtmlRedactedText = sb.ToString()?.CleanupAllWhiteSpace();    
            HtmlRedactedTextMarkupString = new MarkupString(HtmlRedactedText ?? string.Empty);
        }

        private string GetBackgroundColor(PiiEntity piiEntity)
        {
            if (piiEntity.Category == PiiEntityCategory.PhoneNumber)
            {
                return "yellow";
            }
            if (piiEntity.Category == PiiEntityCategory.Organization)
            {
                return "orange";
            }
            if (piiEntity.Category == PiiEntityCategory.Address)
            {
                return "green";
            }
            return "gray";                   
        }

        public long ExecutionTime { get; set; }
        public PiiEntityCollection? AnalysisResult { get; set; }

    }
}




Frontend UI looks like this: Home.razor


@page "/"
@using PiiDetectionDemo.Util

@inject IPiiRemovalTextAnalyticsClientService _piiRemovalTextAnalyticsClientService;

<h3>Azure HealthCare Text Analysis - Pii detection feature - Azure Cognitive Services</h3>

<em>Pii = Person identifiable information</em>

<EditForm Model="@Model" OnValidSubmit="@Submit">
    <DataAnnotationsValidator />
    <ValidationSummary />

    <div class="form-group row">
        <label><strong>Text input</strong></label>
        <InputTextArea @oninput="removeWhitespace" class="overflow-scroll" style="max-height:500px;max-width:900px;font-size: 10pt;font-family:Verdana, Geneva, Tahoma, sans-serif" @bind-Value="@Model.InputText" rows="5" />
    </div>

    <div class="form-group row">
        <div class="col">
            <br />
            <button class="btn btn-outline-primary" type="submit">Run</button>
        </div>
        <div class="col">
        </div>
        <div class="col">
        </div>
    </div>

    <br />

    @if (isProcessing)
    {

        <div class="progress" style="max-width: 90%">
            <div class="progress-bar progress-bar-striped progress-bar-animated"
                 style="width: 100%; background-color: green">
                Retrieving result from Azure Text Analysis Pii detection feature. Processing..
            </div>
        </div>
        <br />

    }

    <div class="form-group row">
        <label><strong>Analysis result</strong></label>

        @if (isSearchPerformed)
        {
            <br />
            <b>Execution time took: @Model.ExecutionTime ms (milliseconds)</b>

            <br />
            <br />

            <b>Redacted text (Pii removed)</b>
            <br />

            <div class="form-group row">
               <label><strong>Categorized Pii redacted text</strong></label>
               <div>
               @Model.HtmlRedactedTextMarkupString
               </div>
            </div>

            <br />
            <br />

            <table class="table table-striped table-dark table-hover">
                <thead>
                <th>Pii text</th>
                <th>Category</th>
                <th>SubCategory</th>
                <th>Offset</th>
                <th>Length</th>
                <th>ConfidenceScore</th>
                </thead>
                <tbody>
                    @if (Model.AnalysisResult != null) {
                        @foreach (var entity in Model.AnalysisResult)
                        {
                            <tr>
                                <td>@entity.Text</td>
                                <td>@entity.Category.ToString()</td>
                                <td>@entity.SubCategory</td>
                                <td>@entity.Offset</td>
                                <td>@entity.Length</td>
                                <td>@entity.ConfidenceScore</td>                                        
                            </tr>
                        }
                    }
                </tbody>
            </table>

        }
    </div>

</EditForm>



The Demo uses Bootstrap 5 to build up a HTML table styled and showing the Azure.AI.TextAnalytics.PiiEntity properties.

Sunday 14 April 2024

Building a filter via Expression trees in C#

This article will look at how to build a filter with Expression trees in C#.

It is an academic exercise how to use Expression trees, you would probably use filters just specifying lambda function with LINQ, but the code shows how you can build an Expression incrementally and compile it to a function. If there is a use-case where LINQ does not fit, perhaps some late-binding scenario or where LINQ does not offer an operator, you can use the approach shows in this article, but the article shows simple usage of Expression trees for introducing Expression trees to C# developers wanting to
test them out in more detail.

The sample code below shows the sample code testing out how to build the Expression incrementally using extension method loading some sample data. An important gotcha is to keep sending in same the objectParameter which is the parameter expression used in the lambda function that is built up, this must be the same parameter. Consider some lambda function of an object 'Person' where the parameter 'x' like:

x => x.Age > 3 && x.Age < 9

The point is that the ParameterExpression x must be the same object, or else we get an error.

FilterHelper.cs


public static class FilterHelper {

	public enum ComparisonOperator {
		Equal,
		LessThan,
		LessThanOrEqual,
		GreaterThan,
		GreaterThanOrEqual,
		NotEqual		
	}
	
	public static Func<TClass, bool> CompileFilter<TClass>(this Expression expression, ParameterExpression objectParameter){
		var expr = Expression.Lambda<Func<TClass, bool>>(expression, false, new List<ParameterExpression>{ objectParameter });
		return expr.Compile();
	}
	
	public static Expression BuildFilter<TClass, TProp>(this Expression previousExpression, Expression<Func<TClass, TProp>> prop, object value,
		ComparisonOperator op, ParameterExpression objectParameter){
		
		var propertyInfo = GetPropertyInfo(prop);
		var propertyToCall = Expression.Property(objectParameter, propertyInfo);		
		var valueToTest = Expression.Constant(value);
		
		Expression operatorExpression = null;
		switch (op)
		{
			case ComparisonOperator.Equal:
				operatorExpression = Expression.Equal(propertyToCall, valueToTest);
				break;
			case ComparisonOperator.NotEqual:
				operatorExpression = Expression.NotEqual(propertyToCall, valueToTest);
				break;
			case ComparisonOperator.LessThan:
				operatorExpression = Expression.LessThan(propertyToCall, valueToTest);
				break;
			case ComparisonOperator.LessThanOrEqual:
				operatorExpression = Expression.LessThanOrEqual(propertyToCall, valueToTest);
				break;
			case ComparisonOperator.GreaterThan:
				operatorExpression = Expression.GreaterThan(propertyToCall, valueToTest);
				break;
			case ComparisonOperator.GreaterThanOrEqual:
				operatorExpression = Expression.GreaterThanOrEqual(propertyToCall, valueToTest);
				break;
		}
		
		if (previousExpression == null){
			return operatorExpression;
		}
		else {
			return Expression.AndAlso(previousExpression, operatorExpression);
		}			
	}

	/// <summary>
	/// Gets the corresponding <see cref="PropertyInfo" /> from an <see cref="Expression" />.
	/// </summary>
	/// <param name="property">The expression that selects the property to get info on.</param>
	/// <returns>The property info collected from the expression.</returns>
	/// <exception cref="ArgumentNullException">When <paramref name="property" /> is <c>null</c>.</exception>
	/// <exception cref="ArgumentException">The expression doesn't indicate a valid property."</exception>
	private static PropertyInfo GetPropertyInfo<T, P>(Expression<Func<T, P>> property)
	{
		if (property == null)
		{
			throw new ArgumentNullException(nameof(property));
		}

		if (property.Body is UnaryExpression unaryExp)
		{
			if (unaryExp.Operand is MemberExpression memberExp)
			{
				return (PropertyInfo)memberExp.Member;
			}
		}
		else if (property.Body is MemberExpression memberExp)
		{
			return (PropertyInfo)memberExp.Member;
		}

		throw new ArgumentException($"The expression doesn't indicate a valid property. [ {property} ]");
	}

}




The sample data uses a POCO Employee as entity class: Employee.cs


public class Employee
{
	public int Id { get; set; }
	public string FirstName { get; set; }
	public string LastName { get; set; }
	public string Department { get; set; }
	public string Position { get; set; }
	public decimal Salary { get; set; }
	public DateTime HireDate { get; set; }
	public DateOnly HireDateOnly
	{
		get { return DateOnly.FromDateTime(HireDate); }
	}
}





The sample program loads up the Json data, then it builds the expression with method BuildFilter shown above and then finally calls CompileFilter to build the expression into a Func<TClass, bool> where TClass is the employee type.

Program.cs


void Main()
{
	string json = File.ReadAllText(Path.Combine(@"C:\Users\SomeUser\Documents\LINQPad Queries\SampleData\Employees.json"));
	var employees = JsonSerializer.Deserialize<List<Employee>>(json, 
new JsonSerializerOptions { PropertyNameCaseInsensitive = true }); //employees.Dump(); var objectParameter = Expression.Parameter(typeof(Employee)); Expression currentFilter = null; currentFilter = FilterHelper.BuildFilter<Employee, object>(currentFilter, e => e.Department,
"Engineering", FilterHelper.ComparisonOperator.Equal, objectParameter) .BuildFilter<Employee, object>(e => e.Salary, 79000m,
FilterHelper.ComparisonOperator.GreaterThan, objectParameter); Func<Employee, bool> employeeFilter = currentFilter.CompileFilter<Employee>(objectParameter); var matchingEmployees = employees.Where(employeeFilter).ToList(); matchingEmployees.Dump(); }


Sample data json looks like this - an array of employees

Employees.json


[
    {
      "id": 1,
      "firstName": "Alice",
      "lastName": "Johnson",
      "department": "HR",
      "position": "Manager",
      "salary": 60000,
      "hireDate": "2022-03-15"
    },
    {
      "id": 2,
      "firstName": "Bob",
      "lastName": "Smith",
      "department": "Engineering",
      "position": "Software Engineer",
      "salary": 80000,
      "hireDate": "2021-09-10"
    },
    {
      "id": 3,
      "firstName": "Charlie",
      "lastName": "Brown",
      "department": "Finance",
      "position": "Financial Analyst",
      "salary": 70000,
      "hireDate": "2020-05-20"
    },
    {
      "id": 4,
      "firstName": "David",
      "lastName": "Lee",
      "department": "Marketing",
      "position": "Marketing Specialist",
      "salary": 65000,
      "hireDate": "2019-11-05"
    },
    {
      "id": 5,
      "firstName": "Eva",
      "lastName": "Garcia",
      "department": "Sales",
      "position": "Sales Representative",
      "salary": 75000,
      "hireDate": "2018-07-12"
    },
    {
      "id": 6,
      "firstName": "Frank",
      "lastName": "Wang",
      "department": "Engineering",
      "position": "Senior Developer",
      "salary": 95000,
      "hireDate": "2017-02-28"
    },
    {
      "id": 7,
      "firstName": "Grace",
      "lastName": "Miller",
      "department": "HR",
      "position": "Recruiter",
      "salary": 55000,
      "hireDate": "2016-08-18"
    },
    {
      "id": 8,
      "firstName": "Henry",
      "lastName": "Chen",
      "department": "Finance",
      "position": "Financial Manager",
      "salary": 90000,
      "hireDate": "2015-04-03"
    },
    {
      "id": 9,
      "firstName": "Ivy",
      "lastName": "Nguyen",
      "department": "Marketing",
      "position": "Content Writer",
      "salary": 60000,
      "hireDate": "2014-10-22"
    },
    {
      "id": 10,
      "firstName": "Jack",
      "lastName": "Kim",
      "department": "Sales",
      "position": "Account Executive",
      "salary": 80000,
      "hireDate": "2013-06-14"
    },
    {
      "id": 11,
      "firstName": "Karen",
      "lastName": "Taylor",
      "department": "Engineering",
      "position": "QA Engineer",
      "salary": 75000,
      "hireDate": "2012-01-09"
    },
    {
      "id": 12,
      "firstName": "Leo",
      "lastName": "Rodriguez",
      "department": "HR",
      "position": "HR Specialist",
      "salary": 55000,
      "hireDate": "2011-07-27"
    },
    {
      "id": 13,
      "firstName": "Mia",
      "lastName": "Liu",
      "department": "Finance",
      "position": "Financial Advisor",
      "salary": 70000,
      "hireDate": "2010-03-16"
    },
    {
      "id": 14,
      "firstName": "Nina",
      "lastName": "Martinez",
      "department": "Marketing",
      "position": "Social Media Manager",
      "salary": 65000,
      "hireDate": "2009-09-05"
    },
    {
      "id": 15,
      "firstName": "Oscar",
      "lastName": "Hernandez",
      "department": "Sales",
      "position": "Sales Manager",
      "salary": 100000,
      "hireDate": "2008-04-21"
    }
  ]



The filter is more limited than just sticking to LINQ, but the code in this example shows how you can build a filter incrementally. Traditionally, you would use Linq and an IEnumerable of TClass and you can keep on filter it too. Here are some closing arguments for why you could make use of Expression trees and have to use them too and not be able to use Linq:
Purpose: Expression trees represent code as data structures. They allow you to build executable code dynamically in C#. Use Cases: - Dynamic Code Generation: When you need to create or modify code at runtime (e.g., building custom queries or transformations). - Remote Execution: Expression trees are useful for scenarios where you want to send calculations across the wire (e.g., database queries, web services). - Custom Query Providers: If you’re building your own query provider (like LINQ to SQL or Entity Framework), expression trees help translate queries into other formats (e.g., SQL).

Wednesday 27 March 2024

Importing Json File to SQL Server into a variable

A short article today of how to import JSON file to SQL Server into a variable, which can then
be used to insert it into a column of type NVARCHAR(MAX) of a table. The maximum size of NVARCHAR(MAX) is 2 Gb, so you can
import large Json files using this datatype. If the Json is small and below 4000 chars, use for example NVARCHAR(4000) instead. Here is a SQL script to import the json file using OPENROWSET and Bulk import. We also pass in the path to the folder where the json file is. It is put in the same folder as the .sql file script. Note that the variable $(FullScriptDir) is passed in via a .bat file (shown further below) and we expect the .json file to be in the same folder as the .bat file. You can provide a full path to a .json file instead and skip the .bat file here and import a json file, but it is nice to load the .json file from the same folder as the .sql file in case you want to copy the .sql and .json file to another server and not having to provide and possibly having to adjust the full path. Sql-script import_json_file_openrowset.sql:


DECLARE @JSONFILE VARCHAR(MAX); 

SELECT @JSONFILE = BulkColumn
FROM OPENROWSET (BULK '$(FullScriptDir)\top-posts.json', SINGLE_CLOB) AS j;

PRINT 'JsonFile contents: ' + @JSONFILE

IF (ISJSON(@JSONFILE)=1) PRINT 'It is valid Json';


The .bat file here passes the current folder as a variable to the sql script runsqlscript.bat


@set FullScriptDir=%CD%
sqlcmd -S .\SQLEXPRESS  -i import_json_file_openrowset.sql


This outputs:


sqlcmd -S .\SQLEXPRESS  -i import_json_file_openrowset.sql
JsonFile contents: [
   {
      "Id":6107,
      "Score":176,
      "ViewCount":155988,
      "Title":"What are deconvolutional layers?",
      "OwnerUserId":8820
   },
   {
      "Id":155,
      "Score":164,
      "ViewCount":25822,
      "Title":"Publicly Available Datasets",
      "OwnerUserId":227
   }
]
It is valid Json


With the variable JSONFILE you can do whatever with it such as inserting it to a column in a new row of a table for example.
Importing json from a string directly using OPENJSON

It is also possible to directly just import the JSON from a string variable like this:


DECLARE @JSONSTRINGSAMPLE VARCHAR(MAX) 

SET @JSONSTRINGSAMPLE = N'[
 {
    "Id": 2334,
    "Score": 4.3,
    "Title": "Python - Used as scientific tool for graphing"
 },
{
    "Id": 2335,
    "Score": 5.2,
    "Title": "C# : Math and physics programming"
 }
]';

SELECT * FROM OPENJSON (@JSONSTRINGSAMPLE) WITH (
    Id INT,
    Score REAL,
    Title NVARCHAR(100)
)