Friday, 22 September 2023

Using Azure Computer Vision to perform Optical Character Recognition (OCR)

This article shows how you can use Azure Computer vision in Azure Cognitive Services to perform Optical Character Recognition (OCR). The Computer vision feature is available by adding a Computer Vision resource in Azure Portal. I have made a .NET MAUI Blazor app and the Github Repo for it is available here : https://github.com/toreaurstadboss/Ocr.Handwriting.Azure.AI.Models
Let us first look at the .csproj of the Lib project in this repo.


<Project Sdk="Microsoft.NET.Sdk.Razor">

  <PropertyGroup>
    <TargetFramework>net6.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>
  <ItemGroup>
    <SupportedPlatform Include="browser" />
  </ItemGroup>

	<ItemGroup>
		<PackageReference Include="Microsoft.Azure.CognitiveServices.Vision.ComputerVision" Version="7.0.1" />
		<PackageReference Include="Microsoft.AspNetCore.Components.Web" Version="6.0.19" />
	</ItemGroup>

</Project>


The following class generates ComputerVision clients that can be used to extract different information from streams and files containing video and images. We are going to focus on images and extracting text via OCR. Azure Computer Vision can also extract handwritten text in addition to regular text written by typewriters or text inside images and similar. Azure Computer Vision also can detect shapes in images and classify objects. This demo only focuses on text extraction form images. ComputerVisionClientFactory


using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;

namespace Ocr.Handwriting.Azure.AI.Lib
{

    public interface IComputerVisionClientFactory
    {
        ComputerVisionClient CreateClient();
    }

    /// <summary>
    /// Client factory for Azure Cognitive Services - Computer vision.
    /// </summary>
    public class ComputerVisionClientFactory : IComputerVisionClientFactory
    {
        // Add your Computer Vision key and endpoint
        static string? _key = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_VISION_KEY");
        static string? _endpoint = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICES_VISION_ENDPOINT");

        public ComputerVisionClientFactory() : this(_key, _endpoint)
        {
        }

        public ComputerVisionClientFactory(string? key, string? endpoint)
        {
            _key = key;
            _endpoint = endpoint;
        }

        public ComputerVisionClient CreateClient()
        {
            if (_key == null)
            {
                throw new ArgumentNullException(_key, "The AZURE_COGNITIVE_SERVICES_VISION_KEY is not set. Set a system-level environment variable or provide this value by calling the overloaded constructor of this class.");
            }
            if (_endpoint == null)
            {
                throw new ArgumentNullException(_key, "The AZURE_COGNITIVE_SERVICES_VISION_ENDPOINT is not set. Set a system-level environment variable or provide this value by calling the overloaded constructor of this class.");
            }

            var client = Authenticate(_key!, _endpoint!);
            return client;
        }

        public static ComputerVisionClient Authenticate(string key, string endpoint) =>
            new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
            {
                Endpoint = endpoint
            };

    }
}



The setup of the endpoint and key of the Computer Vision resource is done via system-level envrionment variables. Next up, let's look at retrieving OCR text from images. Here we use ComputerVisionClient. We open up a stream of a file, an image, using File.OpenReadAsync and then the method ReadInStreamAsync of Computer vision client. The image we will load up in the app is selected by the user and the image is previewed and saved using MAUI Storage lib (inside the Appdata folder). OcrImageService.cs


using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
using Microsoft.Extensions.Logging;
using System.Diagnostics;
using ReadResult = Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models.ReadResult;

namespace Ocr.Handwriting.Azure.AI.Lib
{

    public interface IOcrImageService
    {
        Task<IList<ReadResult?>?> GetReadResults(string imageFilePath);
        Task<string> GetReadResultsText(string imageFilePath);
    }

    public class OcrImageService : IOcrImageService
    {
        private readonly IComputerVisionClientFactory _computerVisionClientFactory;
        private readonly ILogger<OcrImageService> _logger;

        public OcrImageService(IComputerVisionClientFactory computerVisionClientFactory, ILogger<OcrImageService> logger)
        {
            _computerVisionClientFactory = computerVisionClientFactory;
            _logger = logger;
        }

        private ComputerVisionClient CreateClient() => _computerVisionClientFactory.CreateClient();

        public async Task<string> GetReadResultsText(string imageFilePath)
        {
            var readResults = await GetReadResults(imageFilePath);
            var ocrText = ExtractText(readResults?.FirstOrDefault());
            return ocrText;
        }

        public async Task<IList<ReadResult?>?> GetReadResults(string imageFilePath)
        {
            if (string.IsNullOrWhiteSpace(imageFilePath))
            {
                return null;
            }

            try
            {
                var client = CreateClient();

                //Retrieve OCR results 

                using (FileStream stream = File.OpenRead(imageFilePath))
                {
                    var textHeaders = await client.ReadInStreamAsync(stream);
                    string operationLocation = textHeaders.OperationLocation;
                    string operationId = operationLocation[^36..]; //hat operator of C# 8.0 : this slices out the last 36 chars, which contains the guid chars which are 32 hexadecimals chars + four hyphens

                    ReadOperationResult results;

                    do
                    {
                        results = await client.GetReadResultAsync(Guid.Parse(operationId));
                        _logger.LogInformation($"Retrieving OCR results for operationId {operationId} for image {imageFilePath}");
                    }
                    while (results.Status == OperationStatusCodes.Running || results.Status == OperationStatusCodes.NotStarted);

                    IList<ReadResult?> result = results.AnalyzeResult.ReadResults;
                    return result;

                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                return null;
            }
        }

        private static string ExtractText(ReadResult? readResult) => string.Join(Environment.NewLine, readResult?.Lines?.Select(l => l.Text) ?? new List<string>());

    }

}
                                           

Let's look at the MAUI Blazor project in the app. The MauiProgram.cs looks like this. MauiProgram.cs


using Ocr.Handwriting.Azure.AI.Data;
using Ocr.Handwriting.Azure.AI.Lib;
using Ocr.Handwriting.Azure.AI.Services;
using TextCopy;

namespace Ocr.Handwriting.Azure.AI;

public static class MauiProgram
{
    public static MauiApp CreateMauiApp()
    {
        var builder = MauiApp.CreateBuilder();
        builder
            .UseMauiApp<App>()
            .ConfigureFonts(fonts =>
            {
                fonts.AddFont("OpenSans-Regular.ttf", "OpenSansRegular");
            });

        builder.Services.AddMauiBlazorWebView();
#if DEBUG
        builder.Services.AddBlazorWebViewDeveloperTools();
        builder.Services.AddLogging();
#endif

        builder.Services.AddSingleton<WeatherForecastService>();
        builder.Services.AddScoped<IComputerVisionClientFactory, ComputerVisionClientFactory>();
        builder.Services.AddScoped<IOcrImageService, OcrImageService>();
        builder.Services.AddScoped<IImageSaveService, ImageSaveService>();

        builder.Services.InjectClipboard();

        return builder.Build();
    }
}



We also need some code to preview and save the image an end user chooses. The IImageService looks like this. ImageSaveService


using Microsoft.AspNetCore.Components.Forms;
using Ocr.Handwriting.Azure.AI.Models;

namespace Ocr.Handwriting.Azure.AI.Services
{

    public class ImageSaveService : IImageSaveService
    {

        public async Task<ImageSaveModel> SaveImage(IBrowserFile browserFile)
        {
            var buffers = new byte[browserFile.Size];
            var bytes = await browserFile.OpenReadStream(maxAllowedSize: 30 * 1024 * 1024).ReadAsync(buffers);
            string imageType = browserFile.ContentType;

            var basePath = FileSystem.Current.AppDataDirectory;
            var imageSaveModel = new ImageSaveModel
            {
                SavedFilePath = Path.Combine(basePath, $"{Guid.NewGuid().ToString("N")}-{browserFile.Name}"),
                PreviewImageUrl = $"data:{imageType};base64,{Convert.ToBase64String(buffers)}",
                FilePath = browserFile.Name,
                FileSize = bytes / 1024,
            };

            await File.WriteAllBytesAsync(imageSaveModel.SavedFilePath, buffers);

            return imageSaveModel;
        }

    }
}


Note the use of maxAllowedSize of IBrowserfile.OpenReadStream method, this is a good practice since IBrowserFile only supports 512 kB per default. I set it in the app to 30 MB to support some high res images too. We preview the image as base-64 here and we also save the image also. Note the use of FileSystem.Current.AppDataDirectory as base path here. Here we use nuget package Microsoft.Maui.Storage. These are the packages that is used for the MAUI Blazor project of the app. Ocr.Handwriting.Azure.AI.csproj



    <ItemGroup>
      <PackageReference Include="Microsoft.Azure.CognitiveServices.Vision.ComputerVision" Version="7.0.1" />
      <PackageReference Include="TextCopy" Version="6.2.1" />
    </ItemGroup>


The GUI looks like this : Index.razor


@page "/"
@using Ocr.Handwriting.Azure.AI.Models;
@using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
@using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
@using Ocr.Handwriting.Azure.AI.Lib;
@using Ocr.Handwriting.Azure.AI.Services;
@using TextCopy;

@inject IImageSaveService ImageSaveService
@inject IOcrImageService OcrImageService 
@inject IClipboard Clipboard

<h1>Azure AI OCR Text recognition</h1>


<EditForm Model="Model" OnValidSubmit="@Submit" style="background-color:aliceblue">
    <DataAnnotationsValidator />
    <label><b>Select a picture to run OCR</b></label><br />
    <InputFile OnChange="@OnInputFile" accept=".jpeg,.jpg,.png" />
    <br />
    <code class="alert-secondary">Supported file formats: .jpeg, .jpg and .png</code>
    <br />
    @if (Model.PreviewImageUrl != null) { 
        <label class="alert-info">Preview of the selected image</label>
        <div style="overflow:auto;max-height:300px;max-width:500px">
            <img class="flagIcon" src="@Model.PreviewImageUrl" /><br />
        </div>

        <code class="alert-light">File Size (kB): @Model.FileSize</code>
        <br />
        <code class="alert-light">File saved location: @Model.SavedFilePath</code>
        <br />

        <label class="alert-info">Click the button below to start running OCR using Azure AI</label><br />
        <br />
        <button type="submit">Submit</button> <button style="margin-left:200px" type="button" class="btn-outline-info" @onclick="@CopyTextToClipboard">Copy to clipboard</button>
        <br />
        <br />
        <InputTextArea style="width:1000px;height:300px" readonly="readonly" placeholder="Detected text in the image uploaded" @bind-Value="Model!.OcrOutputText" rows="5"></InputTextArea>
    }
</EditForm>


@code {

    private IndexModel Model = new();

    private async Task OnInputFile(InputFileChangeEventArgs args)
    {       
        var imageSaveModel = await ImageSaveService.SaveImage(args.File);
        Model = new IndexModel(imageSaveModel);
        await Application.Current.MainPage.DisplayAlert($"MAUI Blazor OCR App", $"Wrote file to location : {Model.SavedFilePath} Size is: {Model.FileSize} kB", "Ok", "Cancel");
    }

    public async Task CopyTextToClipboard()
    {
        await Clipboard.SetTextAsync(Model.OcrOutputText);
        await Application.Current.MainPage.DisplayAlert($"MAUI Blazor OCR App", $"The copied text was put into the clipboard. Character length: {Model.OcrOutputText?.Length}", "Ok", "Cancel");

    }

    private async Task Submit()
    {
        if (Model.PreviewImageUrl == null || Model.SavedFilePath == null)
        {
            await Application.Current.MainPage.DisplayAlert($"MAUI Blazor OCR App", $"You must select an image first before running OCR. Supported formats are .jpeg, .jpg and .png", "Ok", "Cancel");
            return;
        }
        Model.OcrOutputText = await OcrImageService.GetReadResultsText(Model.SavedFilePath);
        StateHasChanged(); //visual refresh here
    }

}


The UI works like this. The user selects an image. As we can see by the 'accept' html attribute, the .jpeg, .jpg and .png extensions are allowed in the file input dialog. When the user selects an image, the image is saved and previewed in the UI. By hitting the Submit button, the OCR service in Azure is contacted and text is retrieved and displayed in the text area below, if any text is present in the image. A button allows copying the text into the clipboard. Here are some screenshots of the app.


No comments:

Post a Comment