This article presents code how to extract Health information from arbitrary text using Azure Health Information extraction in Azure Cognitive Services. This technology uses NLP - natural language processing combined with AI techniques.
A Github repo exists with the code for a running .NET MAUI Blazor demo in .NET 7 here:
https://github.com/toreaurstadboss/HealthTextAnalytics
A screenshot from the demo shows how it works below.
The demo uses Azure AI Healthcare information extraction to extract
entities of the text, such as a person's age, gender, employment and medical history and condition such as diagnosises, procedures and so on.
The returned data in the demo is shown at the bottom of the demo, the raw data shows it is in the format as a json and in a FHIR format. Since we want FHIR format, we must use the REST api to get this information.
Azure AI Healthcare information also extracts
relations, which is connecting the
entities together for semantic analysis of the text. Also,
links exist for each entity for further reading.
These are external systems such as Snomed CT and Snomed codes for each entity.
Let's look at the source code for the demo next.
We define a named http client in the MauiProgram.cs file which starts the application. We could move the code into a middleware extension method, but the code is kept simple in the demo.
MauiProgram.cs
var azureEndpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_LANGUAGE_SERVICE_ENDPOINT");
var azureKey = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_LANGUAGE_SERVICE_KEY");
if (string.IsNullOrWhiteSpace(azureEndpoint))
{
throw new ArgumentNullException(nameof(azureEndpoint), "Missing system environment variable: AZURE_COGNITIVE_SERVICES_LANGUAGE_SERVICE_ENDPOINT");
}
if (string.IsNullOrWhiteSpace(azureKey))
{
throw new ArgumentNullException(nameof(azureKey), "Missing system environment variable: AZURE_COGNITIVE_SERVICES_LANGUAGE_SERVICE_KEY");
}
var azureEndpointHost = new Uri(azureEndpoint);
builder.Services.AddHttpClient("Az", httpClient =>
{
string baseUrl = azureEndpointHost.GetLeftPart(UriPartial.Authority); //https://stackoverflow.com/a/18708268/741368
httpClient.BaseAddress = new Uri(baseUrl);
//httpClient..Add("Content-type", "application/json");
//httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));//ACCEPT header
httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", azureKey);
});
The content-type header will be specified instead inside the HttpRequestMessage shown further below and not in this named client. As we see, we must add both the endpoint base url and also the key in the
Ocp-Apim-Subscription-key http header.
Let's next look at how to create a POST request to the
language resource endpoint that offers the health text analysis below.
HealthAnalyticsTextClientService.cs
using HealthTextAnalytics.Models;
using System.Diagnostics;
using System.Text;
using System.Text.Json.Nodes;
namespace HealthTextAnalytics.Util
{
public class HealthAnalyticsTextClientService : IHealthAnalyticsTextClientService
{
private readonly IHttpClientFactory _httpClientFactory;
private const int awaitTimeInMs = 500;
private const int maxTimerWait = 10000;
public HealthAnalyticsTextClientService(IHttpClientFactory httpClientFactory)
{
_httpClientFactory = httpClientFactory;
}
public async Task<HealthTextAnalyticsResponse> GetHealthTextAnalytics(string inputText)
{
var client = _httpClientFactory.CreateClient("Az");
string requestBodyRaw = HealthAnalyticsTextHelper.CreateRequest(inputText);
//https://learn.microsoft.com/en-us/azure/ai-services/language-service/text-analytics-for-health/how-to/call-api?tabs=ner
var stopWatch = Stopwatch.StartNew();
HttpRequestMessage request = CreateTextAnalyticsRequest(requestBodyRaw);
var response = await client.SendAsync(request);
var result = new HealthTextAnalyticsResponse();
var timer = new PeriodicTimer(TimeSpan.FromMilliseconds(awaitTimeInMs));
int timeAwaited = 0;
while (await timer.WaitForNextTickAsync())
{
if (response.IsSuccessStatusCode)
{
result.IsSearchPerformed = true;
var operationLocation = response.Headers.First(h => h.Key?.ToLower() == Constants.Constants.HttpHeaderOperationResultAvailable).Value.FirstOrDefault();
var resultFromHealthAnalysis = await client.GetAsync(operationLocation);
JsonNode resultFromService = await resultFromHealthAnalysis.GetJsonFromHttpResponse();
if (resultFromService.GetValue<string>("status") == "succeeded")
{
result.AnalysisResultRawJson = await resultFromHealthAnalysis.Content.ReadAsStringAsync();
result.ExecutionTimeInMilliseconds = stopWatch.ElapsedMilliseconds;
result.Entities.AddRange(HealthAnalyticsTextHelper.GetEntities(result.AnalysisResultRawJson));
result.CategorizedInputText = HealthAnalyticsTextHelper.GetCategorizedInputText(inputText, result.AnalysisResultRawJson);
break;
}
}
timeAwaited += 500;
if (timeAwaited >= maxTimerWait)
{
result.CategorizedInputText = $"ERR: Timeout. Operation to analyze input text using Azure HealthAnalytics language service timed out after waiting for {timeAwaited} ms.";
break;
}
}
return result;
}
private static HttpRequestMessage CreateTextAnalyticsRequest(string requestBodyRaw)
{
var request = new HttpRequestMessage(HttpMethod.Post, Constants.Constants.AnalyzeTextEndpoint);
request.Content = new StringContent(requestBodyRaw, Encoding.UTF8, "application/json");//CONTENT-TYPE header
return request;
}
}
}
The code is using some helper methods to be shown next. As the code above shows, we must poll the Azure service until we get a reply from the service. We poll every 0.5 second up to a maxium of 10 seconds from the service. Typical requests takes about 3-4 seconds to process. Longer input text / 'documents' would need more processing time than 10 seconds, but for this demo, it works great.
HealthAnalyticsTextHelper.CreateRequest method
public static string CreateRequest(string inputText)
{
//note - the id 1 here in the request is a 'local id' that must be unique per request. only one text is supported in the
//request genreated, however the service allows multiple documents and id's if necessary. in this demo, we only will send in one text at a time
var request = new
{
analysisInput = new
{
documents = new[]
{
new { text = inputText, id = "1", language = "en" }
}
},
tasks = new[]
{
new { id = "analyze 1", kind = "Healthcare", parameters = new { fhirVersion = "4.0.1" } }
}
};
return JsonSerializer.Serialize(request, new JsonSerializerOptions { WriteIndented = true });
}
Creating the body of POST we use a template via a new anonymized object shown above which is what the REST service excepts. We could have multiple documents here, that is input texts, in this demo only one text / document is sent in. Note the use of id='1' and 'analyze 1' here.
We have some helper methods in System.Text.Json here to extract the JSON data sent in the response.
JsonNodeUtil
public static class JsonNodeUtil
{
public static async Task<JsonNode> GetJsonFromHttpResponse(this HttpResponseMessage response)
{
var resultFromService = JsonSerializer.Deserialize<JsonNode>(await response.Content.ReadAsStringAsync());
return resultFromService;
}
public static T? GetValue<T>(this JsonNode jsonNode, string key)
{
if (jsonNode == null)
{
return default;
}
return jsonNode[key] != null ? jsonNode[key].GetValue<T>() : default;
}
}
More code exists for returning a
categorized colored input text showing the entities of the input text in the helper below.
HealthAnalyticsTextHelper.cs - methods GetCategorizedInputText and GetBackgroundColor
public static string GetCategorizedInputText(string inputText, string analysisText)
{
var sb = new StringBuilder(inputText);
try
{
Root doc = JsonSerializer.Deserialize<Root>(analysisText);
//try loading up the documents inside of the analysisText
var entities = doc?.tasks?.items.FirstOrDefault()?.results?.documents?.SelectMany(d => d.entities)?.ToList();
if (entities != null)
{
foreach (var row in entities.OrderByDescending(r => r.offset))
{
sb.Insert(row.offset + row.length, "</b></span>");
sb.Insert(row.offset, $"<span style='color:{GetBackgroundColor(row)}' title='{row.category}: {row.text} Confidence: {row.confidenceScore} {row.name}'><b>");
}
}
}
catch (Exception err)
{
Console.WriteLine("Got an error while trying to load in analysis healthcare json: " + err.ToString());
}
return $"<pre style='text-wrap:wrap; max-height:500px;font-size: 10pt;font-family:Verdana, Geneva, Tahoma, sans-serif;'>{sb}</pre>";
}
private static string GetBackgroundColor(Entity row)
{
var cat = row?.category?.ToLower();
string backgroundColor = cat switch
{
"age" => "purple",
"diagnosis" => "orange",
"gender" => "purple",
"symptomorsign" => "purple",
"direction" => "blue",
"symptom" => "purple",
"symptoms" => "purple",
"bodystructure" => "blue",
"body" => "purple",
"structure" => "purple",
"examinationname" => "green",
"procedure" => "green",
"treatmentname" => "green",
"conditionqualifier" => "lightgreen",
"time" => "lightgreen",
"date" => "lightgreen",
"familyrelation" => "purple",
"employment" => "purple",
"livingstatus" => "purple",
"administrativeevent" => "darkgreen",
"careenvironment" => "darkgreen",
_ => "darkgray"
};
return backgroundColor;
}
I have added the Domain classes from the service using the
https://json2csharp.com/ website on the intial responses I got from the REST service using Postman. The REST Api might change in the future, that is, the JSON returned.
In that case, you might want to adjust the domain classes here if the deserialization fails. It seems relatively stable though, I have tested the code for some weeks now.
Finally, the categorized colored text code here had to remove newlines to get a correct indexing of the different
entities found in the text. This code shows how to get rid of newlines of the inputted text.
public static class StringExtensions
{
public static string CleanupAllWhiteSpace(this string input) => Regex.Replace(input ?? string.Empty, @"\s+", " ");
}
Let's look at the UI in the Index.razor file below.
Index.razor
@page "/"
@using HealthTextAnalytics.Models;
@inject IHttpClientFactory _httpClientFactory;
@inject IHealthAnalyticsTextClientService _healthAnalyticsTextClientService;
<h3>Azure HealthCare Text Analysis - Azure Cognitive Services</h3>
<EditForm Model="@Model" OnValidSubmit="@Submit">
<DataAnnotationsValidator />
<ValidationSummary />
<InputWatcher @ref="inputWatcher" FieldChanged="@FieldChanged" />
<div class="form-group row">
<label><strong>Text input</strong></label>
<InputTextArea @onkeyup="@removeWhitespace" class="overflow-scroll" style="max-height:500px;max-width:900px;font-size: 10pt;font-family:Verdana, Geneva, Tahoma, sans-serif" @bind-Value="@Model.InputText" rows="5" />
</div>
<div class="form-group row">
<div class="col">
<br />
<button class="btn btn-outline-primary" disabled="@isInvalid" type="submit">Run</button>
</div>
<div class="col">
</div>
<div class="col">
</div>
</div>
<br />
@if (isProcessing)
{
<div class="progress" style="max-width: 90%">
<div class="progress-bar progress-bar-striped progress-bar-animated"
style="width: 100%; background-color: green">
Retrieving result from Azure HealthCare Text Analysis. Processing..
</div>
</div>
<br />
}
<div class="form-group row">
<label><strong>Analysis result</strong></label>
@if (isSearchPerformed)
{
<br />
<b>Execution time took: @Model.ExecutionTime ms (milliseconds)</b><br />
<br />
<b>Categorized and analyzed Health Analysis of inputted text</b>
@ms
<br />
<table class="table table-striped table-dark table-hover">
<th>Category</th>
<th>Text</th>
<th>Name</th>
<th>ConfidenceScore</th>
<th>Offset</th>
<th>Length</th>
<th>Links</th>
<tbody>
@foreach (var entity in Model.EntititesInAnalyzedResult)
{
<tr>
<td>@entity.category</td>
<td>@entity.text</td>
<td>@entity.name</td>
<td>@entity.confidenceScore</td>
<td>@entity.offset</td>
<td>@entity.length</td>
<td>@string.Join(Environment.NewLine, (@entity.links ?? new List<Link>()).Select(l => l?.dataSource + " " + l?.id + " | "))</td>
</tr>
}
</tbody>
</table>
<b>Health Analysis raw text from Azure service</b>
<InputTextArea class="overflow-scroll" readonly="readonly" style="max-height:500px; max-width:900px;font-size: 10pt;font-family:Verdana, Geneva, Tahoma, sans-serif" @bind-Value="@Model.AnalysisResult" rows="1000" />
}
</div>
</EditForm>
The code-behind of Index.razor , looks like this.
using HealthTextAnalytics.Models;
using HealthTextAnalytics.Util;
using Microsoft.AspNetCore.Components;
using Microsoft.AspNetCore.Components.Web;
namespace HealthTextAnalytics.Pages
{
public partial class Index
{
private IndexModel Model = new();
MarkupString ms = new();
private bool isProcessing = false;
private bool isSearchPerformed = false;
private InputWatcher inputWatcher = new InputWatcher();
private bool isInvalid = false;
private void FieldChanged(string fieldName)
{
isInvalid = !inputWatcher.Validate();
}
protected override void OnParametersSet()
{
Model.InputText = SampleData.Sampledata.SamplePatientTextNote2.CleanupAllWhiteSpace();
StateHasChanged();
}
private void removeWhitespace(KeyboardEventArgs eventArgs)
{
Model.InputText = Model.InputText.CleanupAllWhiteSpace();
StateHasChanged();
}
private async Task Submit()
{
try
{
ResetFieldsForBeforeSearch();
HealthTextAnalyticsResponse response = await _healthAnalyticsTextClientService.GetHealthTextAnalytics(Model.InputText);
Model.EntititesInAnalyzedResult = response.Entities;
Model.ExecutionTime = response.ExecutionTimeInMilliseconds;
Model.AnalysisResult = response.AnalysisResultRawJson;
ms = new MarkupString(response.CategorizedInputText);
}
catch (Exception err)
{
Console.WriteLine(err);
}
finally
{
ResetFieldsAfterSearch();
StateHasChanged();
}
}
private void ResetFieldsForBeforeSearch()
{
isProcessing = true;
isSearchPerformed = false;
ms = new MarkupString(string.Empty);
Model.EntititesInAnalyzedResult.Clear();
Model.AnalysisResult = string.Empty;
}
private void ResetFieldsAfterSearch()
{
isProcessing = false;
isSearchPerformed = true;
}
}
}