Coding Grounds: Computer Vision

Showing posts with label Computer Vision. Show all posts

Sunday, 16 February 2025

Outputting tags/objects using Azure AI

This article presents a way to output tags for an image and output it to the console. Azure AI is used, more specifically the ImageAnalysisClient. The article shows how you can define a way to consume the data for an IAsyncEnumerable, so you can use await foreach to consume the data. I would recommend this approach for many services in Azure Ai (and similar) where there is no support out of the box for async enumerable and hide away the deails in a helper extension method as shown in this article.




  public static async void ExtractImageTags()
  {
      string visionApiKey = Environment.GetEnvironmentVariable("VISION_KEY")!;
      string visionApiEndpoint = Environment.GetEnvironmentVariable("VISION_ENDPOINT")!;

      var credentials = new AzureKeyCredential(visionApiKey);
      var serviceUri = new Uri(visionApiEndpoint);

      var imageAnalysisClient = new ImageAnalysisClient(serviceUri, credentials);
      await foreach (var tag in imageAnalysisClient.ExtractImageTagsAsync("Images/Store.png"))
      {
          Console.WriteLine(tag);
      }           
  }

The code creates an ImageAnalysisClient, defined in the Azure.AI.Vision.ImageAnalysis Nuget package. I got two environment variables here to store the key and endpoint. Note that not all Azure Ai features are available in all regions. If you just want to test out some Azure Ai features, you can first off just test it out at US East region, as that region will have most likely all features you want to test, then you can just a more local region if you are planning to do more workloads using Azure Ai.

Then we use an await foreach pattern here to extract the image tags asynchronously. This is a custom extension method I created so I can output the tags asynchronously using await foreach and also specify a wait time between each new tag being outputted, defaulting to 200 milliseconds here.

The extension method looks like this:



using Azure.AI.Vision.ImageAnalysis;

namespace UseAzureAIServicesFromNET.Vision;

public static class ImageAnalysisClientExtensions
{

    /// <summary>
    /// Extracts the tags for image at specified path, if existing.
    /// The results are returned as async enumerable of strings. 
    /// </summary>
    /// <param name="client"></param>
    /// <param name="imagePath"></param>
    /// <param name="waitTimeInMsBetweenOutputTags">Default wait time in ms between output. Defaults to 200 ms.</param>
    /// <returns></returns>
    public static async IAsyncEnumerable<string?> ExtractImageTagsAsync(this ImageAnalysisClient client, 
    	string imagePath, int waitTimeInMsBetweenOutputTags = 200)
    {
        if (!File.Exists(imagePath))
        {
            yield return default(string); //just return null if a file is not found at provided path
        }
        using FileStream imageStream = new FileStream(imagePath, FileMode.Open);
        var analysisResult = 
        	await client.AnalyzeAsync(BinaryData.FromStream(imageStream), VisualFeatures.Tags | VisualFeatures.Caption);
        yield return $"Description: {analysisResult.Value.Caption.Text}";
        foreach (var tag in analysisResult.Value.Tags.Values)
        {
            yield return $"Tag: {tag.Name}, Confidence: {tag.Confidence:F2}";        
            await Task.Delay(waitTimeInMsBetweenOutputTags);
        }
    }

}

The console output of the tags looks like this:

In addition to tags, we can also output objects in the image in a very similar extension method:



/// <summary>
/// Extracts the objects for image at specified path, if existing.
/// The results are returned as async enumerable of strings. 
/// </summary>
/// <param name="client"></param>
/// <param name="imagePath"></param>
/// <param name="waitTimeInMsBetweenOutputTags">Default wait time in ms between output. Defaults to 200 ms.</param>
/// <returns></returns>
public static async IAsyncEnumerable<string?> ExtractImageObjectsAsync(this ImageAnalysisClient client,
string imagePath, int waitTimeInMsBetweenOutputTags = 200)
{
    if (!File.Exists(imagePath))
    {
        yield return default(string); //just return null if a file is not found at provided path
    }
    using FileStream imageStream = new FileStream(imagePath, FileMode.Open);
    var analysisResult =
    	await client.AnalyzeAsync(BinaryData.FromStream(imageStream), VisualFeatures.Objects | VisualFeatures.Caption);
    yield return $"Description: {analysisResult.Value.Caption.Text}";
    foreach (var objectInImage in analysisResult.Value.Objects.Values)
    {
            yield return $"""
Object tag: {objectInImage.Tags.FirstOrDefault()?.Name} Confidence: {objectInImage.Tags.FirstOrDefault()?.Confidence}, 
Position (bbox): {objectInImage.BoundingBox}
""";
        await Task.Delay(waitTimeInMsBetweenOutputTags);
    }
}

The code is nearly identical, we set the VisualFeatures of the image to extract and we read out the objects (not the tags). The console output of the objects looks like this:

Saturday, 14 October 2023

Using Image Analysis in Azure AI Cognitive Services

I have added a demo .NET MAUI Blazor app that uses Image Analysis in Computer Vision in Azure Cognitive Services. Note that Image Analysis is not available in all Azure data centers. For example, Norway East does not have this feature. However, North Europe Azure data center do have the feature, the data center i Ireland. A Github repo exists for this demo here:

https://github.com/toreaurstadboss/Image.Analyze.Azure.Ai

A screen shot for this demo is shown below: Demo screenshot

The demo allows you to upload a picture (supported formats are .jpeg, .jpg and .png, but Azure AI Image Analyzer supports a lot of other image formats too). The demo shows a preview of the selected image and to the right an image of bounding boxes of objects in the image. A list of tags extracted from the image are also shown. Raw data from the Azure Image Analyzer service is shown in the text box area below the pictures, with a list of tags to the right. The demo is written with .NET Maui Blazor and .NET 6. Let us look at some code for making this demo. ImageSaveService.cs



using Image.Analyze.Azure.Ai.Models;
using Microsoft.AspNetCore.Components.Forms;

namespace Ocr.Handwriting.Azure.AI.Services
{

    public class ImageSaveService : IImageSaveService
    {

        public async Task<ImageSaveModel> SaveImage(IBrowserFile browserFile)
        {
            var buffers = new byte[browserFile.Size];
            var bytes = await browserFile.OpenReadStream(maxAllowedSize: 30 * 1024 * 1024).ReadAsync(buffers);
            string imageType = browserFile.ContentType;

            var basePath = FileSystem.Current.AppDataDirectory;
            var imageSaveModel = new ImageSaveModel
            {
                SavedFilePath = Path.Combine(basePath, $"{Guid.NewGuid().ToString("N")}-{browserFile.Name}"),
                PreviewImageUrl = $"data:{imageType};base64,{Convert.ToBase64String(buffers)}",
                FilePath = browserFile.Name,
                FileSize = bytes / 1024,
            };

            await File.WriteAllBytesAsync(imageSaveModel.SavedFilePath, buffers);

            return imageSaveModel;
        }

    }
}

//Interface defined inside IImageService.cs shown below
using Image.Analyze.Azure.Ai.Models;
using Microsoft.AspNetCore.Components.Forms;

namespace Ocr.Handwriting.Azure.AI.Services
{
  
    public interface IImageSaveService
    {

        Task<ImageSaveModel> SaveImage(IBrowserFile browserFile);

    }

}

The ImageSaveService saves the uploaded image from the IBrowserFile into a base-64 string from the image bytes of the uploaded IBrowserFile via OpenReadStream of the IBrowserFile. This allows us to preview the uploaded image. The code also saves the image to the AppDataDirectory that MAUI supports - FileSystem.Current.AppDataDirectory. Let's look at how to call the analysis service itself, it is actually quite straight forward. ImageAnalyzerService.cs



using Azure;
using Azure.AI.Vision.Common;
using Azure.AI.Vision.ImageAnalysis;

namespace Image.Analyze.Azure.Ai.Lib
{

    public class ImageAnalyzerService : IImageAnalyzerService
    {

        public ImageAnalyzer CreateImageAnalyzer(string imageFile)
        {
            string key = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_VISION_SECONDARY_KEY");
            string endpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_VISION_SECONDARY_ENDPOINT");
            var visionServiceOptions = new VisionServiceOptions(new Uri(endpoint), new AzureKeyCredential(key));

            using VisionSource visionSource = CreateVisionSource(imageFile);

            var analysisOptions = CreateImageAnalysisOptions();

            var analyzer = new ImageAnalyzer(visionServiceOptions, visionSource, analysisOptions);
            return analyzer;

        }

        private static VisionSource CreateVisionSource(string imageFile)
        {
            using var stream = File.OpenRead(imageFile);
            using var reader = new StreamReader(stream);
            byte[] imageBuffer;
            using (var streamReader = new MemoryStream())
            {
                stream.CopyTo(streamReader);
                imageBuffer = streamReader.ToArray();
            }

            using var imageSourceBuffer = new ImageSourceBuffer();
            imageSourceBuffer.GetWriter().Write(imageBuffer);
            return VisionSource.FromImageSourceBuffer(imageSourceBuffer);
        }

        private static ImageAnalysisOptions CreateImageAnalysisOptions() => new ImageAnalysisOptions
        {
            Language = "en",
            GenderNeutralCaption = false,
            Features =
              ImageAnalysisFeature.CropSuggestions
            | ImageAnalysisFeature.Caption
            | ImageAnalysisFeature.DenseCaptions
            | ImageAnalysisFeature.Objects
            | ImageAnalysisFeature.People
            | ImageAnalysisFeature.Text
            | ImageAnalysisFeature.Tags
        };

    }

}

//interface shown below 

 public interface IImageAnalyzerService
 {
     ImageAnalyzer CreateImageAnalyzer(string imageFile);
 }

We retrieve environment variables here and we create an ImageAnalyzer. We create a Vision source from the saved picture we uploaded and open a stream to it using File.OpenRead method on System.IO. Since we saved the file in the AppData folder of the .NET MAUI app, we can read this file. We set up the image analysis options and the vision service options. We then call the return the image analyzer. Let's look at the code-behind of the index.razor file that initializes the Image analyzer, and runs the Analyze method of it. Index.razor.cs

 
 
 using Azure.AI.Vision.ImageAnalysis;
using Image.Analyze.Azure.Ai.Extensions;
using Image.Analyze.Azure.Ai.Models;
using Microsoft.AspNetCore.Components.Forms;
using Microsoft.JSInterop;
using System.Text;

namespace Image.Analyze.Azure.Ai.Pages
{
    partial class Index
    {

        private IndexModel Model = new();

        //https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/how-to/call-analyze-image-40?WT.mc_id=twitter&pivots=programming-language-csharp

        private string ImageInfo = string.Empty;

        private async Task Submit()
        {
            if (Model.PreviewImageUrl == null || Model.SavedFilePath == null)
            {
                await Application.Current.MainPage.DisplayAlert($"MAUI Blazor Image Analyzer App", $"You must select an image first before running Image Analysis. Supported formats are .jpeg, .jpg and .png", "Ok", "Cancel");
                return;
            }

            using var imageAnalyzer = ImageAnalyzerService.CreateImageAnalyzer(Model.SavedFilePath);

            ImageAnalysisResult analysisResult = await imageAnalyzer.AnalyzeAsync();

            if (analysisResult.Reason == ImageAnalysisResultReason.Analyzed)
            {
                Model.ImageAnalysisOutputText = analysisResult.OutputImageAnalysisResult();
                Model.Caption = $"{analysisResult.Caption.Content} Confidence: {analysisResult.Caption.Confidence.ToString("F2")}";
                Model.Tags = analysisResult.Tags.Select(t => $"{t.Name} (Confidence: {t.Confidence.ToString("F2")})").ToList();
                var jsonBboxes = analysisResult.GetBoundingBoxesJson();
                await JsRunTime.InvokeVoidAsync("LoadBoundingBoxes", jsonBboxes);
            }
            else
            {
                ImageInfo = $"The image analysis did not perform its analysis. Reason: {analysisResult.Reason}";
            }

            StateHasChanged(); //visual refresh here
        }

        private async Task CopyTextToClipboard()
        {
            await Clipboard.SetTextAsync(Model.ImageAnalysisOutputText);
            await Application.Current.MainPage.DisplayAlert($"MAUI Blazor Image Analyzer App", $"The copied text was put into the clipboard. Character length: {Model.ImageAnalysisOutputText?.Length}", "Ok", "Cancel");
        }

        private async Task OnInputFile(InputFileChangeEventArgs args)
        {
            var imageSaveModel = await ImageSaveService.SaveImage(args.File);
            Model = new IndexModel(imageSaveModel);
            await Application.Current.MainPage.DisplayAlert($"MAUI Blazor ImageAnalyzer app App", $"Wrote file to location : {Model.SavedFilePath} Size is: {Model.FileSize} kB", "Ok", "Cancel");
        }


    }
}

In the code-behind above we have a submit handler called Submit. We there analyze the image and send the result both to the UI and also to a client side Javascript method using IJSRuntime in .NET MAUI Blazor. Let's look at the two helper methods of ImageAnalysisResult next. ImageAnalysisResultExtensions.cs

 
 
 using Azure.AI.Vision.ImageAnalysis;
using System.Text;

namespace Image.Analyze.Azure.Ai.Extensions
{
    public static class ImageAnalysisResultExtensions
    {

        public static string GetBoundingBoxesJson(this ImageAnalysisResult result)
        {
            var sb = new StringBuilder();
            sb.AppendLine(@"[");

            int objectIndex = 0;
            foreach (var detectedObject in result.Objects)
            {
                sb.Append($"{{ \"Name\": \"{detectedObject.Name}\", \"Y\": {detectedObject.BoundingBox.Y}, \"X\": {detectedObject.BoundingBox.X}, \"Height\": {detectedObject.BoundingBox.Height}, \"Width\": {detectedObject.BoundingBox.Width}, \"Confidence\": \"{detectedObject.Confidence:0.0000}\" }}");
                objectIndex++;
                if (objectIndex < result.Objects?.Count)
                {
                    sb.Append($",{Environment.NewLine}");
                }
                else
                {
                    sb.Append($"{Environment.NewLine}");
                }
            }
            sb.Remove(sb.Length - 2, 1); //remove trailing comma at the end
            sb.AppendLine(@"]");
            return sb.ToString();
        }

        public static string OutputImageAnalysisResult(this ImageAnalysisResult result)
        {
            var sb = new StringBuilder();

            if (result.Reason == ImageAnalysisResultReason.Analyzed)
            {

                sb.AppendLine($" Image height = {result.ImageHeight}");
                sb.AppendLine($" Image width = {result.ImageWidth}");
                sb.AppendLine($" Model version = {result.ModelVersion}");

                if (result.Caption != null)
                {
                    sb.AppendLine(" Caption:");
                    sb.AppendLine($"   \"{result.Caption.Content}\", Confidence {result.Caption.Confidence:0.0000}");
                }

                if (result.DenseCaptions != null)
                {
                    sb.AppendLine(" Dense Captions:");
                    foreach (var caption in result.DenseCaptions)
                    {
                        sb.AppendLine($"   \"{caption.Content}\", Bounding box {caption.BoundingBox}, Confidence {caption.Confidence:0.0000}");
                    }
                }

                if (result.Objects != null)
                {
                    sb.AppendLine(" Objects:");
                    foreach (var detectedObject in result.Objects)
                    {
                        sb.AppendLine($"   \"{detectedObject.Name}\", Bounding box {detectedObject.BoundingBox}, Confidence {detectedObject.Confidence:0.0000}");
                    }
                }

                if (result.Tags != null)
                {
                    sb.AppendLine($" Tags:");
                    foreach (var tag in result.Tags)
                    {
                        sb.AppendLine($"   \"{tag.Name}\", Confidence {tag.Confidence:0.0000}");
                    }
                }

                if (result.People != null)
                {
                    sb.AppendLine($" People:");
                    foreach (var person in result.People)
                    {
                        sb.AppendLine($"   Bounding box {person.BoundingBox}, Confidence {person.Confidence:0.0000}");
                    }
                }

                if (result.CropSuggestions != null)
                {
                    sb.AppendLine($" Crop Suggestions:");
                    foreach (var cropSuggestion in result.CropSuggestions)
                    {
                        sb.AppendLine($"   Aspect ratio {cropSuggestion.AspectRatio}: "
                            + $"Crop suggestion {cropSuggestion.BoundingBox}");
                    };
                }

                if (result.Text != null)
                {
                    sb.AppendLine($" Text:");
                    foreach (var line in result.Text.Lines)
                    {
                        string pointsToString = "{" + string.Join(',', line.BoundingPolygon.Select(pointsToString => pointsToString.ToString())) + "}";
                        sb.AppendLine($"   Line: '{line.Content}', Bounding polygon {pointsToString}");

                        foreach (var word in line.Words)
                        {
                            pointsToString = "{" + string.Join(',', word.BoundingPolygon.Select(pointsToString => pointsToString.ToString())) + "}";
                            sb.AppendLine($"     Word: '{word.Content}', Bounding polygon {pointsToString}, Confidence {word.Confidence:0.0000}");
                        }
                    }
                }

                var resultDetails = ImageAnalysisResultDetails.FromResult(result);
                sb.AppendLine($" Result details:");
                sb.AppendLine($"   Image ID = {resultDetails.ImageId}");
                sb.AppendLine($"   Result ID = {resultDetails.ResultId}");
                sb.AppendLine($"   Connection URL = {resultDetails.ConnectionUrl}");
                sb.AppendLine($"   JSON result = {resultDetails.JsonResult}");
            }
            else
            {
                var errorDetails = ImageAnalysisErrorDetails.FromResult(result);
                sb.AppendLine(" Analysis failed.");
                sb.AppendLine($"   Error reason : {errorDetails.Reason}");
                sb.AppendLine($"   Error code : {errorDetails.ErrorCode}");
                sb.AppendLine($"   Error message: {errorDetails.Message}");
            }

            return sb.ToString();
        }

    }
}

Finally, let's look at the client side Javascript function that we call and send the bounding boxes json to draw the boxes. We will use Canvas in HTML 5 to show the picture and the bounding boxes of objects found in the image. index.html

 
 
 	<script type="text/javascript">

		var colorPalette = ["red", "yellow", "blue", "green", "fuchsia", "moccasin", "purple", "magenta", "aliceblue", "lightyellow", "lightgreen"];

		function rescaleCanvas() {
			var img = document.getElementById('PreviewImage');
			var canvas = document.getElementById('PreviewImageBbox');
			canvas.width = img.width;
			canvas.height = img.height;
		}

		function getColor() {
			var colorIndex = parseInt(Math.random() * 10);
			var color = colorPalette[colorIndex];
			return color;
		}

		function LoadBoundingBoxes(objectDescriptions) {
			if (objectDescriptions == null || objectDescriptions == false) {
				alert('did not find any objects in image. returning from calling load bounding boxes : ' + objectDescriptions);
				return;
			}

			var objectDesc = JSON.parse(objectDescriptions);
			//alert('calling load bounding boxes, starting analysis on clientside js : ' + objectDescriptions);

			rescaleCanvas();
			var canvas = document.getElementById('PreviewImageBbox');
			var img = document.getElementById('PreviewImage');

			var ctx = canvas.getContext('2d');
			ctx.drawImage(img, img.width, img.height);

			ctx.font = "10px Verdana";

			for (var i = 0; i < objectDesc.length; i++) {
				ctx.beginPath();
				ctx.strokeStyle = "black";
				ctx.lineWidth = 1;
				ctx.fillText(objectDesc[i].Name, objectDesc[i].X + objectDesc[i].Width / 2, objectDesc[i].Y + objectDesc[i].Height / 2);
				ctx.fillText("Confidence: " + objectDesc[i].Confidence, objectDesc[i].X + objectDesc[i].Width / 2, 10 + objectDesc[i].Y + objectDesc[i].Height / 2);
			}

			for (var i = 0; i < objectDesc.length; i++) {
				ctx.fillStyle = getColor();
				ctx.globalAlpha = 0.2;
				ctx.fillRect(objectDesc[i].X, objectDesc[i].Y, objectDesc[i].Width, objectDesc[i].Height);
				ctx.lineWidth = 3;
				ctx.strokeStyle = "blue";
				ctx.rect(objectDesc[i].X, objectDesc[i].Y, objectDesc[i].Width, objectDesc[i].Height);
				ctx.fillStyle = "black";
				ctx.fillText("Color: " + getColor(), objectDesc[i].X + objectDesc[i].Width / 2, 20 + objectDesc[i].Y + objectDesc[i].Height / 2);

				ctx.stroke();
			}

			ctx.drawImage(img, 0, 0);


			console.log('got these object descriptions:');
			console.log(objectDescriptions);

		}
	</script>

The index.html file in wwwroot is the place we usually put extra css and js for MAUI Blazor apps and Blazor apps. I have chosen to put the script directly into the index.html file and not in a .js file, but that is an option to be chosen to tidy up a bit more. So there you have it, we can relatively easily find objects in images using Azure analyze image service in Azure Cognitive Services. We can get tags and captions of the image. In the demo the caption is shown above the picture loaded. Azure Computer vision service is really good since it has got a massive training set and can recognize a lot of different objects for different usages. As you see in the source code, I have the key and endpoint inside environment variables that the code expects exists. Never expose keys and endpoints in your source code.

Friday, 22 September 2023

Using Azure Computer Vision to perform Optical Character Recognition (OCR)

This article shows how you can use Azure Computer vision in Azure Cognitive Services to perform Optical Character Recognition (OCR). The Computer vision feature is available by adding a Computer Vision resource in Azure Portal. I have made a .NET MAUI Blazor app and the Github Repo for it is available here :


https://github.com/toreaurstadboss/Ocr.Handwriting.Azure.AI.Models

Let us first look at the .csproj of the Lib project in this repo.



<Project Sdk="Microsoft.NET.Sdk.Razor">

  <PropertyGroup>
    <TargetFramework>net6.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>
  <ItemGroup>
    <SupportedPlatform Include="browser" />
  </ItemGroup>

	<ItemGroup>
		<PackageReference Include="Microsoft.Azure.CognitiveServices.Vision.ComputerVision" Version="7.0.1" />
		<PackageReference Include="Microsoft.AspNetCore.Components.Web" Version="6.0.19" />
	</ItemGroup>

</Project>

The following class generates ComputerVision clients that can be used to extract different information from streams and files containing video and images. We are going to focus on images and extracting text via OCR. Azure Computer Vision can also extract handwritten text in addition to regular text written by typewriters or text inside images and similar. Azure Computer Vision also can detect shapes in images and classify objects. This demo only focuses on text extraction form images. ComputerVisionClientFactory



using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;

namespace Ocr.Handwriting.Azure.AI.Lib
{

    public interface IComputerVisionClientFactory
    {
        ComputerVisionClient CreateClient();
    }

    /// <summary>
    /// Client factory for Azure Cognitive Services - Computer vision.
    /// </summary>
    public class ComputerVisionClientFactory : IComputerVisionClientFactory
    {
        // Add your Computer Vision key and endpoint
        static string? _key = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_VISION_KEY");
        static string? _endpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_VISION_ENDPOINT");

        public ComputerVisionClientFactory() : this(_key, _endpoint)
        {
        }

        public ComputerVisionClientFactory(string? key, string? endpoint)
        {
            _key = key;
            _endpoint = endpoint;
        }

        public ComputerVisionClient CreateClient()
        {
            if (_key == null)
            {
                throw new ArgumentNullException(_key, "The AZURE_COGNITIVE_SERVICES_VISION_KEY is not set. Set a system-level environment variable or provide this value by calling the overloaded constructor of this class.");
            }
            if (_endpoint == null)
            {
                throw new ArgumentNullException(_key, "The AZURE_COGNITIVE_SERVICES_VISION_ENDPOINT is not set. Set a system-level environment variable or provide this value by calling the overloaded constructor of this class.");
            }

            var client = Authenticate(_key!, _endpoint!);
            return client;
        }

        public static ComputerVisionClient Authenticate(string key, string endpoint) =>
            new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
            {
                Endpoint = endpoint
            };

    }
}

The setup of the endpoint and key of the Computer Vision resource is done via system-level envrionment variables. Next up, let's look at retrieving OCR text from images. Here we use ComputerVisionClient. We open up a stream of a file, an image, using File.OpenReadAsync and then the method ReadInStreamAsync of Computer vision client. The image we will load up in the app is selected by the user and the image is previewed and saved using MAUI Storage lib (inside the Appdata folder). OcrImageService.cs



using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
using Microsoft.Extensions.Logging;
using System.Diagnostics;
using ReadResult = Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models.ReadResult;

namespace Ocr.Handwriting.Azure.AI.Lib
{

    public interface IOcrImageService
    {
        Task<IList<ReadResult?>?> GetReadResults(string imageFilePath);
        Task<string> GetReadResultsText(string imageFilePath);
    }

    public class OcrImageService : IOcrImageService
    {
        private readonly IComputerVisionClientFactory _computerVisionClientFactory;
        private readonly ILogger<OcrImageService> _logger;

        public OcrImageService(IComputerVisionClientFactory computerVisionClientFactory, ILogger<OcrImageService> logger)
        {
            _computerVisionClientFactory = computerVisionClientFactory;
            _logger = logger;
        }

        private ComputerVisionClient CreateClient() => _computerVisionClientFactory.CreateClient();

        public async Task<string> GetReadResultsText(string imageFilePath)
        {
            var readResults = await GetReadResults(imageFilePath);
            var ocrText = ExtractText(readResults?.FirstOrDefault());
            return ocrText;
        }

        public async Task<IList<ReadResult?>?> GetReadResults(string imageFilePath)
        {
            if (string.IsNullOrWhiteSpace(imageFilePath))
            {
                return null;
            }

            try
            {
                var client = CreateClient();

                //Retrieve OCR results 

                using (FileStream stream = File.OpenRead(imageFilePath))
                {
                    var textHeaders = await client.ReadInStreamAsync(stream);
                    string operationLocation = textHeaders.OperationLocation;
                    string operationId = operationLocation[^36..]; //hat operator of C# 8.0 : this slices out the last 36 chars, which contains the guid chars which are 32 hexadecimals chars + four hyphens

                    ReadOperationResult results;

                    do
                    {
                        results = await client.GetReadResultAsync(Guid.Parse(operationId));
                        _logger.LogInformation($"Retrieving OCR results for operationId {operationId} for image {imageFilePath}");
                    }
                    while (results.Status == OperationStatusCodes.Running || results.Status == OperationStatusCodes.NotStarted);

                    IList<ReadResult?> result = results.AnalyzeResult.ReadResults;
                    return result;

                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                return null;
            }
        }

        private static string ExtractText(ReadResult? readResult) => string.Join(Environment.NewLine, readResult?.Lines?.Select(l => l.Text) ?? new List<string>());

    }

}

Let's look at the MAUI Blazor project in the app. The MauiProgram.cs looks like this. MauiProgram.cs



using Ocr.Handwriting.Azure.AI.Data;
using Ocr.Handwriting.Azure.AI.Lib;
using Ocr.Handwriting.Azure.AI.Services;
using TextCopy;

namespace Ocr.Handwriting.Azure.AI;

public static class MauiProgram
{
    public static MauiApp CreateMauiApp()
    {
        var builder = MauiApp.CreateBuilder();
        builder
            .UseMauiApp<App>()
            .ConfigureFonts(fonts =>
            {
                fonts.AddFont("OpenSans-Regular.ttf", "OpenSansRegular");
            });

        builder.Services.AddMauiBlazorWebView();
#if DEBUG
        builder.Services.AddBlazorWebViewDeveloperTools();
        builder.Services.AddLogging();
#endif

        builder.Services.AddSingleton<WeatherForecastService>();
        builder.Services.AddScoped<IComputerVisionClientFactory, ComputerVisionClientFactory>();
        builder.Services.AddScoped<IOcrImageService, OcrImageService>();
        builder.Services.AddScoped<IImageSaveService, ImageSaveService>();

        builder.Services.InjectClipboard();

        return builder.Build();
    }
}

We also need some code to preview and save the image an end user chooses. The IImageService looks like this. ImageSaveService



using Microsoft.AspNetCore.Components.Forms;
using Ocr.Handwriting.Azure.AI.Models;

namespace Ocr.Handwriting.Azure.AI.Services
{

    public class ImageSaveService : IImageSaveService
    {

        public async Task<ImageSaveModel> SaveImage(IBrowserFile browserFile)
        {
            var buffers = new byte[browserFile.Size];
            var bytes = await browserFile.OpenReadStream(maxAllowedSize: 30 * 1024 * 1024).ReadAsync(buffers);
            string imageType = browserFile.ContentType;

            var basePath = FileSystem.Current.AppDataDirectory;
            var imageSaveModel = new ImageSaveModel
            {
                SavedFilePath = Path.Combine(basePath, $"{Guid.NewGuid().ToString("N")}-{browserFile.Name}"),
                PreviewImageUrl = $"data:{imageType};base64,{Convert.ToBase64String(buffers)}",
                FilePath = browserFile.Name,
                FileSize = bytes / 1024,
            };

            await File.WriteAllBytesAsync(imageSaveModel.SavedFilePath, buffers);

            return imageSaveModel;
        }

    }
}

Note the use of maxAllowedSize of IBrowserfile.OpenReadStream method, this is a good practice since IBrowserFile only supports 512 kB per default. I set it in the app to 30 MB to support some high res images too. We preview the image as base-64 here and we also save the image also. Note the use of FileSystem.Current.AppDataDirectory as base path here. Here we use nuget package Microsoft.Maui.Storage. These are the packages that is used for the MAUI Blazor project of the app. Ocr.Handwriting.Azure.AI.csproj




    <ItemGroup>
      <PackageReference Include="Microsoft.Azure.CognitiveServices.Vision.ComputerVision" Version="7.0.1" />
      <PackageReference Include="TextCopy" Version="6.2.1" />
    </ItemGroup>

The GUI looks like this : Index.razor



@page "/"
@using Ocr.Handwriting.Azure.AI.Models;
@using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
@using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
@using Ocr.Handwriting.Azure.AI.Lib;
@using Ocr.Handwriting.Azure.AI.Services;
@using TextCopy;

@inject IImageSaveService ImageSaveService
@inject IOcrImageService OcrImageService 
@inject IClipboard Clipboard

<h1>Azure AI OCR Text recognition</h1>


<EditForm Model="Model" OnValidSubmit="@Submit" style="background-color:aliceblue">
    <DataAnnotationsValidator />
    <label><b>Select a picture to run OCR</b></label><br />
    <InputFile OnChange="@OnInputFile" accept=".jpeg,.jpg,.png" />
    <br />
    <code class="alert-secondary">Supported file formats: .jpeg, .jpg and .png</code>
    <br />
    @if (Model.PreviewImageUrl != null) { 
        <label class="alert-info">Preview of the selected image</label>
        <div style="overflow:auto;max-height:300px;max-width:500px">
            <img class="flagIcon" src="@Model.PreviewImageUrl" /><br />
        </div>

        <code class="alert-light">File Size (kB): @Model.FileSize</code>
        <br />
        <code class="alert-light">File saved location: @Model.SavedFilePath</code>
        <br />

        <label class="alert-info">Click the button below to start running OCR using Azure AI</label><br />
        <br />
        <button type="submit">Submit</button> <button style="margin-left:200px" type="button" class="btn-outline-info" @onclick="@CopyTextToClipboard">Copy to clipboard</button>
        <br />
        <br />
        <InputTextArea style="width:1000px;height:300px" readonly="readonly" placeholder="Detected text in the image uploaded" @bind-Value="Model!.OcrOutputText" rows="5"></InputTextArea>
    }
</EditForm>


@code {

    private IndexModel Model = new();

    private async Task OnInputFile(InputFileChangeEventArgs args)
    {       
        var imageSaveModel = await ImageSaveService.SaveImage(args.File);
        Model = new IndexModel(imageSaveModel);
        await Application.Current.MainPage.DisplayAlert($"MAUI Blazor OCR App", $"Wrote file to location : {Model.SavedFilePath} Size is: {Model.FileSize} kB", "Ok", "Cancel");
    }

    public async Task CopyTextToClipboard()
    {
        await Clipboard.SetTextAsync(Model.OcrOutputText);
        await Application.Current.MainPage.DisplayAlert($"MAUI Blazor OCR App", $"The copied text was put into the clipboard. Character length: {Model.OcrOutputText?.Length}", "Ok", "Cancel");

    }

    private async Task Submit()
    {
        if (Model.PreviewImageUrl == null || Model.SavedFilePath == null)
        {
            await Application.Current.MainPage.DisplayAlert($"MAUI Blazor OCR App", $"You must select an image first before running OCR. Supported formats are .jpeg, .jpg and .png", "Ok", "Cancel");
            return;
        }
        Model.OcrOutputText = await OcrImageService.GetReadResultsText(Model.SavedFilePath);
        StateHasChanged(); //visual refresh here
    }

}

The UI works like this. The user selects an image. As we can see by the 'accept' html attribute, the .jpeg, .jpg and .png extensions are allowed in the file input dialog. When the user selects an image, the image is saved and previewed in the UI. By hitting the Submit button, the OCR service in Azure is contacted and text is retrieved and displayed in the text area below, if any text is present in the image. A button allows copying the text into the clipboard. Here are some screenshots of the app.

Coding Grounds