Monday, 3 April 2023

Using Azure Cognitive Services to summarize articles

I have added a repo on Github for a web scraping app written in .NET MAUI Blazor. It uses Azure Cognitive Services to summarize articles. https://github.com/toreaurstadboss/DagbladetWebscrapper The web scrapper uses the Nuget package for Html agility pack to handle the DOM after downloading articles from the Internet. As the name of the repo suggests, it can be used to read for example Dagbladet articles, without having to waddle through ads. 'Website Scraping' is a term that means extracting data from web sites, or content in general. The following libraries are used in the Razor lib containing the text handling methods to scrap web pages:

<PackageReference Include="Azure.AI.TextAnalytics" Version="5.3.0" />
<PackageReference Include="HtmlAgilityPack" Version="1.11.52" />
<PackageReference Include="Microsoft.AspNetCore.Components.Web" Version="6.0.19" />
Let's first look at the SummarizationUtil class. This uses TextAnalyticsClient in Azure.AI.TextAnalytics. We will summarize articles into five sentence summaries using the
Azure AI text analytics client.


using Azure.AI.TextAnalytics;
using System.Text;

namespace Webscrapper.Lib
{
	public class SummarizationUtil : ISummarizationUtil
	{

		public async Task<List<ExtractiveSummarizeResult>> GetExtractiveSummarizeResult(string document, TextAnalyticsClient client)
		{
			var batchedDocuments = new List<string>
			{
				document
			};
			var result = new List<ExtractiveSummarizeResult>();
			var options = new ExtractiveSummarizeOptions
			{
				 MaxSentenceCount = 5
			};
			var operation = await client.ExtractiveSummarizeAsync(Azure.WaitUntil.Completed, batchedDocuments, options: options);
			await foreach (ExtractiveSummarizeResultCollection documentsInPage in operation.Value)
			{
				foreach (ExtractiveSummarizeResult documentResult in documentsInPage)
				{
					result.Add(documentResult);
				}
			}
			return result;
		}

		public async Task<string> GetExtractiveSummarizeSentectesResult(string document, TextAnalyticsClient client)
		{
			List<ExtractiveSummarizeResult> summaries = await GetExtractiveSummarizeResult(document, client);
			return string.Join(Environment.NewLine, summaries.Select(s => s.Sentences).SelectMany(x => x).Select(x => x.Text));
		}

	}

}

We set up the extraction here to return a maximum of five sentences. Note the use of await foreach here. (async ienumerable) Here is a helper method to get a string from a ExtractiveSummarizeResult.

using Azure.AI.TextAnalytics;
using System.Text;

namespace Webscrapper.Lib
{

	public static class SummarizationExtensions
	{

		public static string GetExtractiveSummarizeResultInfo(this ExtractiveSummarizeResult documentResults)
		{
			var sb = new StringBuilder();

			if (documentResults.HasError)
			{
				sb.AppendLine($"Error!");
				sb.AppendLine($"Document error code: {documentResults.Error.ErrorCode}.");
				sb.AppendLine($"Message: {documentResults.Error.Message}");
			}
			else
			{
				sb.AppendLine($"SUCCESS. There are no errors encountered while summarizing the document");
			}

			sb.AppendLine($"Extracted the following {documentResults.Sentences.Count} sentence(s):");
			sb.AppendLine();

			foreach (ExtractiveSummarySentence sentence in documentResults.Sentences)
			{
				sb.AppendLine($"Sentence: {sentence.Text} Offset: {sentence.Offset} Rankscore: {sentence.RankScore} Length:{sentence.Length}");
				sb.AppendLine();
			}
			return sb.ToString();
		}
	}

}



Here is a factory method to create a TextAnalyticsClient.


using Azure;
using Azure.AI.TextAnalytics;

namespace Webscrapper.Lib
{
    public static class TextAnalyticsClientFactory
    {

        public static TextAnalyticsClient CreateClient()
        {
            string? uri = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT", EnvironmentVariableTarget.Machine);
            string? key = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY", EnvironmentVariableTarget.Machine);
            if (uri == null)
            {
                throw new ArgumentNullException(nameof(uri), "Could not get system environment variable named 'AZURE_COGNITIVE_SERVICE_ENDPOINT' Set this variable first.");
            }
            if (uri == null)
            {
                throw new ArgumentNullException(nameof(uri), "Could not get system environment variable named 'AZURE_COGNITIVE_SERVICE_KEY' Set this variable first.");
            }
            var client = new TextAnalyticsClient(new Uri(uri!), new AzureKeyCredential(key!));
            return client;
        }

    }
}


To use Azure Cognitive Services, you have to get the endpoint (an url) and a service key for your account in Azure portal after having activated Azure Cognitive Services. The page extraction util looks like this, note the use of Html Agility pack.


using HtmlAgilityPack;
using System.Text;

namespace Webscrapper.Lib
{
	public class PageExtractionUtil : IPageExtractionUtil
	{

		public async Task<string?> ExtractHtml(string url, bool includeTags)
		{
			if (string.IsNullOrEmpty(url)) 
				return null;
			var httpClient = new HttpClient();

			string pageHtml = await httpClient.GetStringAsync(url);
			if (string.IsNullOrEmpty(pageHtml))
			{
				return null;
			}

			var htmlDoc = new HtmlDocument(); 
			htmlDoc.LoadHtml(pageHtml);
			var textNodes = htmlDoc.DocumentNode.SelectNodes("//h1|//h2|//h3|//h4|//h5|//h6|//p")
				.Where(n => !string.IsNullOrWhiteSpace(n.InnerText)).ToList();
			var sb = new StringBuilder();
			foreach (var textNode in textNodes)
			{
				var text = textNode.InnerText;
				if (includeTags)
				{
					sb.AppendLine($"<{textNode.Name}>{textNode.InnerText}</{textNode.Name}>");
				}
				else
				{
					sb.AppendLine($"{textNode.InnerText}");
				}
			}
			return sb.ToString();
		}
	}
}



Let's look at an example usage :

@page "/"
@inject ISummarizationUtil SummarizationUtil
@inject IPageExtractionUtil PageExtractionUtil

@using DagbladetWebscrapper.Models;

<h1>Dagbladet Artikkel Oppsummering</h1>

<EditForm Model="@Model" OnValidSubmit="@Submit" class="form-group">
    <DataAnnotationsValidator />
    <ValidationSummary />
  
    <div class="form-group row">
    <label for="Model.ArticleUrl">Url til artikkel</label>
    <InputText @bind-Value="Model!.ArticleUrl" placeholder="Skriv inn url til artikkel i Dagbladet" />
    </div>

    <div class="form-group row">
    <span>Artikkelens oppsummering</span>
    <InputTextArea readonly="readonly" placeholder="Her dukker opp artikkelens oppsummering" @bind-Value="Model!.SummarySentences" rows="5"></InputTextArea>
    </div>

    <div class="form-group row">
    <span>Artikkelens tekst</span>
    <InputTextArea readonly="readonly" placeholder="Her dukker opp teksten til artikkelen" @bind-Value="Model!.ArticleText" rows="5"></InputTextArea>
    </div>
    
    <button type="submit">Submit</button>


</EditForm>

@code {
    private Azure.AI.TextAnalytics.TextAnalyticsClient _client;

    public IndexModel Model { get; set; } = new();

    private async void Submit()
    {
        string articleText = await PageExtractionUtil.ExtractHtml(Model!.ArticleUrl, false);
        Model.ArticleText = articleText;
        if (_client == null)
        {
            _client = TextAnalyticsClientFactory.CreateClient();
        }
        string summaryText = await SummarizationUtil.GetExtractiveSummarizeSentectesResult(articleText, _client);
        Model.SummarySentences = summaryText;

        StateHasChanged();
    }   

}


The view model class for the form looks like this.


using System.ComponentModel.DataAnnotations;

namespace DagbladetWebscrapper.Models
{
	public class IndexModel
	{
        [Required]
        public string? ArticleUrl { get; set; }

        public string SummarySentences { get; set; }

        public string ArticleText { get; set; }
    }
}


Let's look at a screen shot that shows the app in use. It targets an article on the tabloid newspaper Dagbladet in Norway. This tabloid is notorious for writing sensational titles of articles so you have to click into the article (e.g. 'clickbait') and then inside the article, you have to wade through lots of ads. Here, you now have an app, where you can open up www.dagbladet.no and find a link to an article and now extract the text and get a five sentence summary using Azure AI Cognitive services in a .NET MAUI app.

Saturday, 4 March 2023

Generic math - Factorial method

Here is a example of a generic math factorial method in C# 11. Generic math allows you to make numeric methods that makes use of the INumber generic interface, which got many static virtual methods where you can make a numeric method that supports different kinds of number types, such as decimal, int, float and double. The code below calculates the factorial of some values in an array. We have inserted a double value here that shows that Factorial of a double or any number with decimal works a bit different than ints, as the decimal part takes part here. The factorial of 0! is defined as 1 and we multiply n with n-1 as long as n > 0. Note the use of T.One and T.Zero here, defined as
static virtual members of the different number types in C#.


void Main()
{
	var someNums = new[] { 1, 2, 3, 3.141592, 5};
	var fact = someNums.Select(n =>  Factorial(n));
	fact.Dump();
	
}

T Factorial<T>(T num)
where T : INumber<T>
{
	var result = T.One;
	while (num > T.Zero){
		result *= num;
		num--;
	}
	return result;
}


Output shows the result in Linqpad 7 after having specified using .NET 7:

Trøndersk dialekttemp with C# 11 og and patterns

This article demonstrates the use of relational patterns in C# 11. First off, relational patterns allow us to test how a given value /variable compares to constants. If we want to have multiple conditions we can use and operator not shown here. I have here different conditions / intervals for temperaturs outputting the temperature and description of the weather in some local language of mine from Norway (Trøndersk / Trondheim city).
	   
    
foreach (var iteration in Enumerable.Range(0, 20)){
    var tæmpen = new Random().Next(-60, 60);
    
    var været = $"Været e i dag {tæmpen}C og på Trondheimsdialækt: {tæmpen switch 
    {
        <= -50 => "Du træng eitt nytt termometer. For kaldt!",
        <= -35 => "Småfuggel'n dætt dau fra trær'n",
        <= -30 => "Båinnspeika",
        <= -25 => "Få inn katta!",
        <= -20 => "Gnallerfrost",
        <= -10 => "Kjøle kaaalt",
        <= -5  => "Kaillvoli",
        <= 5 => "Kaillhustri",
        <= 10 => "Julivær",
        <= 15 => "Godværsle",
        <= 20 => "Kjøle varmt",
        <= 25 => "Råådeili",
        <= 30 => "Steikvarmt",
        <= 40 => "Søkke heitt",
        <= 50 => "Kokheitt",
        _ => "Du træng eitt nytt termometer. For varmt!"
    }}";
    
    System.Console.WriteLine(været);
} 
    
    
As we can see, in C# 11, we can do a lot more inside string interpolation expressions and now allow multiple lines, including relational patterns. It can be quite handy to use, when .NET 7 and C# 11 reaches mainstream usage. The text here is based upon a dialect based thermometer, available for purchase from here: https://dialekttempen.no/butikk/termometer/fylker/sor-trondelag/trondheim-munkholmen/#&gid=1&pid=1