Coding Grounds: April 2023

This article will present a sample Tag Helper in .net. A Tag Helper is similar to Html Helpers in Asp.net Mvc in .NET Framework, but it is easier to use in HTML as it does not use the special "@-syntax". The Tag helper will render a list using the <ul> and <li> tags. In addition, Bootstrap 5 will be used. Start by creating a razor application with this command:


   dotnet new razor -o TagHelpers

Then move into the folder TagHelpers and type: code .

Inside Visual Studio Code, hit Ctrl+P and look up the file _ViewImports.cshtml and add the current assembly/solution using:


@addTagHelper *, TagHelpers

This tells that we want to add any TagHelper from the assembly called TagHelpers (the solution we are working with).


@using TagHelpers
@namespace TagHelpers.Pages
@addTagHelper *, Microsoft.AspNetCore.Mvc.TagHelpers
@addTagHelper *, TagHelpers

Consider the following HTML :


<list separator="|">option 1| option 2| option 3| option 4| option 5| option 6| option 7| option 8|this is fun<list>

We want to turn that HTML into the list shown in screen shot below :

That is - create a list using an <ul> tag followed by <li> tags inside. Since we need to access the inner content of the HTML here, we have to use ProcessAsync method of derived method from the TagHelper. We create a TagHelper by inheriting from this class and we also have to name the class suffixed by TagHelper by convention. The resulting Tag Helper then looks like this:



using System.Text;
using Microsoft.AspNetCore.Razor.TagHelpers;

namespace TagHelpers.TagHelpers;

public class ListTagHelper : TagHelper {

    public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
    {
        output.TagName = "ul";
        output.Attributes.Add("class", "list-group");
        output.Attributes.Add("style", "display:inline-block"); 
        var existingContent = await output.GetChildContentAsync(); 
        var allContent = existingContent.GetContent();
        var items = allContent.Trim().Split(new[] { Separator }, StringSplitOptions.None);
        var outputHtml = new StringBuilder();
        foreach (var item in items){
            outputHtml.Append($@"<li class=""list-group-item"">{item}</li>");
        }
        output.Content.SetHtmlContent(outputHtml.ToString());        
    }
    public string Separator { get; set; } = ",";
}

We default set the property Separator default to "," to separate items in our list. We could use another separator, such as "|" shown in the markup. If you omit the Separator, "," will be default used. Each public property becomes a recognized attribute in your TagHelper and can be used in the HTML. The TagName is the tag that will be used in the HTML. As we see, we also add 'class' and 'style' attributes here to show a list-group in HTML using Bootstrap 5 classes. We also split the items using the separator, make not that we use the GetChildContentAsync() method on the TagHelperOutput output object, followed by GetContent() method call. Also note that we have to use SetHtmlContent method in case we want to add explicit html content in the content of our 'a' tag here. It is suggested that you stick to string properties in Razor tag helpers instead of other data types.

I have added a repo on Github for a web scraping app written in .NET MAUI Blazor. It uses Azure Cognitive Services to summarize articles. https://github.com/toreaurstadboss/DagbladetWebscrapper The web scrapper uses the Nuget package for Html agility pack to handle the DOM after downloading articles from the Internet. As the name of the repo suggests, it can be used to read for example Dagbladet articles, without having to waddle through ads. 'Website Scraping' is a term that means extracting data from web sites, or content in general. The following libraries are used in the Razor lib containing the text handling methods to scrap web pages:


<PackageReference Include="Azure.AI.TextAnalytics" Version="5.3.0" />
<PackageReference Include="HtmlAgilityPack" Version="1.11.52" />
<PackageReference Include="Microsoft.AspNetCore.Components.Web" Version="6.0.19" />




Let's first look at the SummarizationUtil class. This uses TextAnalyticsClient in Azure.AI.TextAnalytics. We will summarize articles into five sentence summaries using the 
Azure AI
text analytics client.



using Azure.AI.TextAnalytics;
using System.Text;

namespace Webscrapper.Lib
{
	public class SummarizationUtil : ISummarizationUtil
	{

		public async Task<List<ExtractiveSummarizeResult>> GetExtractiveSummarizeResult(string document, TextAnalyticsClient client)
		{
			var batchedDocuments = new List<string>
			{
				document
			};
			var result = new List<ExtractiveSummarizeResult>();
			var options = new ExtractiveSummarizeOptions
			{
				 MaxSentenceCount = 5
			};
			var operation = await client.ExtractiveSummarizeAsync(Azure.WaitUntil.Completed, batchedDocuments, options: options);
			await foreach (ExtractiveSummarizeResultCollection documentsInPage in operation.Value)
			{
				foreach (ExtractiveSummarizeResult documentResult in documentsInPage)
				{
					result.Add(documentResult);
				}
			}
			return result;
		}

		public async Task<string> GetExtractiveSummarizeSentectesResult(string document, TextAnalyticsClient client)
		{
			List<ExtractiveSummarizeResult> summaries = await GetExtractiveSummarizeResult(document, client);
			return string.Join(Environment.NewLine, summaries.Select(s => s.Sentences).SelectMany(x => x).Select(x => x.Text));
		}

	}

}



We set up the extraction here to return a maximum of five sentences. Note the use of await foreach here. (async ienumerable)

Here is a helper method to get a string from a ExtractiveSummarizeResult. 


using Azure.AI.TextAnalytics;
using System.Text;

namespace Webscrapper.Lib
{

	public static class SummarizationExtensions
	{

		public static string GetExtractiveSummarizeResultInfo(this ExtractiveSummarizeResult documentResults)
		{
			var sb = new StringBuilder();

			if (documentResults.HasError)
			{
				sb.AppendLine($"Error!");
				sb.AppendLine($"Document error code: {documentResults.Error.ErrorCode}.");
				sb.AppendLine($"Message: {documentResults.Error.Message}");
			}
			else
			{
				sb.AppendLine($"SUCCESS. There are no errors encountered while summarizing the document");
			}

			sb.AppendLine($"Extracted the following {documentResults.Sentences.Count} sentence(s):");
			sb.AppendLine();

			foreach (ExtractiveSummarySentence sentence in documentResults.Sentences)
			{
				sb.AppendLine($"Sentence: {sentence.Text} Offset: {sentence.Offset} Rankscore: {sentence.RankScore} Length:{sentence.Length}");
				sb.AppendLine();
			}
			return sb.ToString();
		}
	}

}





Here is a factory method to create a TextAnalyticsClient. 




using Azure;
using Azure.AI.TextAnalytics;

namespace Webscrapper.Lib
{
    public static class TextAnalyticsClientFactory
    {

        public static TextAnalyticsClient CreateClient()
        {
            string? uri = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_ENDPOINT", EnvironmentVariableTarget.Machine);
            string? key = Environment.GetEnvironmentVariable("AZURE_COGNITIVE_SERVICE_KEY", EnvironmentVariableTarget.Machine);
            if (uri == null)
            {
                throw new ArgumentNullException(nameof(uri), "Could not get system environment variable named 'AZURE_COGNITIVE_SERVICE_ENDPOINT' Set this variable first.");
            }
            if (uri == null)
            {
                throw new ArgumentNullException(nameof(uri), "Could not get system environment variable named 'AZURE_COGNITIVE_SERVICE_KEY' Set this variable first.");
            }
            var client = new TextAnalyticsClient(new Uri(uri!), new AzureKeyCredential(key!));
            return client;
        }

    }
}




To use Azure Cognitive Services, you have to get the endpoint (an url) and a service key for your account in Azure portal after having activated Azure Cognitive Services.

The page extraction util looks like this, note the use of Html Agility pack. 



using HtmlAgilityPack;
using System.Text;

namespace Webscrapper.Lib
{
	public class PageExtractionUtil : IPageExtractionUtil
	{

		public async Task<string?> ExtractHtml(string url, bool includeTags)
		{
			if (string.IsNullOrEmpty(url)) 
				return null;
			var httpClient = new HttpClient();

			string pageHtml = await httpClient.GetStringAsync(url);
			if (string.IsNullOrEmpty(pageHtml))
			{
				return null;
			}

			var htmlDoc = new HtmlDocument(); 
			htmlDoc.LoadHtml(pageHtml);
			var textNodes = htmlDoc.DocumentNode.SelectNodes("//h1|//h2|//h3|//h4|//h5|//h6|//p")
				.Where(n => !string.IsNullOrWhiteSpace(n.InnerText)).ToList();
			var sb = new StringBuilder();
			foreach (var textNode in textNodes)
			{
				var text = textNode.InnerText;
				if (includeTags)
				{
					sb.AppendLine($"<{textNode.Name}>{textNode.InnerText}</{textNode.Name}>");
				}
				else
				{
					sb.AppendLine($"{textNode.InnerText}");
				}
			}
			return sb.ToString();
		}
	}
}





Let's look at an example usage : 


@page "/"
@inject ISummarizationUtil SummarizationUtil
@inject IPageExtractionUtil PageExtractionUtil

@using DagbladetWebscrapper.Models;

<h1>Dagbladet Artikkel Oppsummering</h1>

<EditForm Model="@Model" OnValidSubmit="@Submit" class="form-group">
    <DataAnnotationsValidator />
    <ValidationSummary />
  
    <div class="form-group row">
    <label for="Model.ArticleUrl">Url til artikkel</label>
    <InputText @bind-Value="Model!.ArticleUrl" placeholder="Skriv inn url til artikkel i Dagbladet" />
    </div>

    <div class="form-group row">
    <span>Artikkelens oppsummering</span>
    <InputTextArea readonly="readonly" placeholder="Her dukker opp artikkelens oppsummering" @bind-Value="Model!.SummarySentences" rows="5"></InputTextArea>
    </div>

    <div class="form-group row">
    <span>Artikkelens tekst</span>
    <InputTextArea readonly="readonly" placeholder="Her dukker opp teksten til artikkelen" @bind-Value="Model!.ArticleText" rows="5"></InputTextArea>
    </div>
    
    <button type="submit">Submit</button>


</EditForm>

@code {
    private Azure.AI.TextAnalytics.TextAnalyticsClient _client;

    public IndexModel Model { get; set; } = new();

    private async void Submit()
    {
        string articleText = await PageExtractionUtil.ExtractHtml(Model!.ArticleUrl, false);
        Model.ArticleText = articleText;
        if (_client == null)
        {
            _client = TextAnalyticsClientFactory.CreateClient();
        }
        string summaryText = await SummarizationUtil.GetExtractiveSummarizeSentectesResult(articleText, _client);
        Model.SummarySentences = summaryText;

        StateHasChanged();
    }   

}




The view model class for the form looks like this.



using System.ComponentModel.DataAnnotations;

namespace DagbladetWebscrapper.Models
{
	public class IndexModel
	{
        [Required]
        public string? ArticleUrl { get; set; }

        public string SummarySentences { get; set; }

        public string ArticleText { get; set; }
    }
}





Let's look at a screen shot that shows the app in use. It targets an article on the tabloid newspaper Dagbladet in Norway. This tabloid is notorious for writing sensational titles of articles so you have to click into the article (e.g. 'clickbait') and then inside the article, you have to wade through lots of ads. Here, you now have an app, where you can open up www.dagbladet.no and find a link to an article and now extract the text and get a five sentence summary using Azure AI Cognitive services in a .NET MAUI app.

Coding Grounds

Saturday, 22 April 2023

Tag Helpers in Asp.net Core Mvc 7

Monday, 3 April 2023

Using Azure Cognitive Services to summarize articles