Tuesday, 22 April 2025

Predicting variable using Regression with ML.net

This article will look at regression with ML.net In the example, the variable "Poverty rate" measured as a percentage against amount "teenage pregnancies" per 1,000 birth. The data is fetched from a publicly available CSV file. The data is obtained from Jeff Prosise repos of ML.net here on Github:

https://github.com/jeffprosise/ML.NET/blob/master/MLN-SimpleRegression/MLN-SimpleRegression/Data/poverty.csv



In this article, Linqpad 8 will be used. First off, the following two Nuget packages are added :
  • Microsoft.ML
  • Microsoft.ML.Mkl.Components
The following method will plot a scatter graph from provided MLContext data, and add a standard linear trendline, which will work with the example in this article.

Plotutils.cs



void PlotScatterGraph<T>(MLContext mlContext, IDataView trainData, Func<T, PointItem> pointCreator, string chartTitle) where T : class, new()
{
	//Convert the IDataview to an enumerable collection
	var data = mlContext.Data.CreateEnumerable<T>(trainData, reuseRowObject: false).Select(x => pointCreator(x)).ToList();

	// Calculate trendline (simple linear regression)
	double avgX = data.Average(d => d.X);
	double avgY = data.Average(d => d.Y);
	double slope = data.Sum(d => (d.X - avgX) * (d.Y - avgY)) / data.Sum(d => (d.X - avgX) * (d.X - avgX));
	double intercept = avgY - slope * avgX;
	var trendline = data.Select(d => new { X = d.X, Y = slope * d.X + intercept }).ToList();

	//Plot the scatter graph
	var plot = data.Chart(d => d.X)
		.AddYSeries(d => d.Y, LINQPad.Util.SeriesType.Point, chartTitle)
		.AddYSeries(d => trendline.FirstOrDefault(t => t.X == d.X)?.Y ?? 0, Util.SeriesType.Line, "Trendline")
		.ToWindowsChart();
		
	plot.AntiAliasing = System.Windows.Forms.DataVisualization.Charting.AntiAliasingStyles.All;
	plot.Dump();
}



Let's look at the code for loading the CSV data and into the MLContext and then used the method TrainTestSplit to split the data into training data and testing data. Note also the classes Input and Output and the usage of LoadColumn and ColumnName

Program.cs



void Main()
{

	string inputFile = Path.Combine(Path.GetDirectoryName(Util.CurrentQueryPath)!, @"Sampledata\poverty2.csv"); //linqpad tech

	var context = new MLContext(seed: 0);

	//Train the model 
	var data = context.Data
		.LoadFromTextFile<Input>(inputFile, hasHeader: true, separatorChar: ';');
	
	// Split data into training and test sets 
	var split = context.Data.TrainTestSplit(data, testFraction: 0.2);
	var trainData = split.TrainSet;
	var testData = split.TestSet;

	var pipeline = context
		.Transforms.NormalizeMinMax("PovertyRate")
		.Append(context.Transforms.Concatenate("Features", "PovertyRate"))
		.Append(context.Regression.Trainers.Ols());

	var model = pipeline.Fit(trainData);
	// Use the model to make a prediction
	var predictor = context.Model.CreatePredictionEngine<Input, Output>(model);
	var input = new Input { PovertyRate = 8.4f };

	var actual = 36.8f;

	var prediction = predictor.Predict(input);
	Console.WriteLine($"Input poverty rate: {input.PovertyRate} . Predicted birth rate per 1000: {prediction.TeenageBirthRate:0.##}");
	Console.WriteLine($"Actual birth rate per 1000: {actual}");

	// Evaluate the regression model 
	var predictions = model.Transform(testData);
	var metrics = context.Regression.Evaluate(predictions);
	Console.WriteLine($"R-squared: {metrics.RSquared:0.##}");
	Console.WriteLine($"Root Mean Squared Error: {metrics.RootMeanSquaredError:0.##}");
	Console.WriteLine($"Mean Absolute Error: {metrics.MeanAbsoluteError:0.##}");
	Console.WriteLine($"Mean Squared Error: {metrics.MeanSquaredError:0.##}");


	PlotScatterGraph<Input>(context, trainData, (Input input) => 
		new PointItem { X = (float) Math.Round(input.PovertyRate, 2), Y = (float) Math.Round(input.TeenageBirthRate, 2) },
		"Poverty rate (%) vs Teenage Pregnancies per 1,000 birth");

}

public class PointItem {
	public float X { get; set; }
	public float Y { get; set; }
}

void PlotScatterGraph<T>(MLContext mlContext, IDataView trainData, Func<T, PointItem> pointCreator, string chartTitle) where T : class, new()
{
	//Convert the IDataview to an enumerable collection
	var data = mlContext.Data.CreateEnumerable<T>(trainData, reuseRowObject: false).Select(x => pointCreator(x)).ToList();

	// Calculate trendline (simple linear regression)
	double avgX = data.Average(d => d.X);
	double avgY = data.Average(d => d.Y);
	double slope = data.Sum(d => (d.X - avgX) * (d.Y - avgY)) / data.Sum(d => (d.X - avgX) * (d.X - avgX));
	double intercept = avgY - slope * avgX;
	var trendline = data.Select(d => new { X = d.X, Y = slope * d.X + intercept }).ToList();

	//Plot the scatter graph
	var plot = data.Chart(d => d.X)
		.AddYSeries(d => d.Y, LINQPad.Util.SeriesType.Point, chartTitle)
		.AddYSeries(d => trendline.FirstOrDefault(t => t.X == d.X)?.Y ?? 0, Util.SeriesType.Line, "Trendline")
		.ToWindowsChart();
		
	plot.AntiAliasing = System.Windows.Forms.DataVisualization.Charting.AntiAliasingStyles.All;
	plot.Dump();
}



public class Input
{

	[LoadColumn(1)]
	public float PovertyRate;

	[LoadColumn(5), ColumnName("Label")]
	public float TeenageBirthRate { get; set; }

}
public class Output
{
	[ColumnName("Score")]
	public float TeenageBirthRate;

}


A pipeline is defined for the machine learning here consisting of the following :
  • The method NormalizeMinMax will transform the poverty rate into a normalized scale between 0 and 1. The Concatenate method will be used to specify the "Features", in this case only the column Poverty rate is the feature of which we want to predict a score, this is the rate of teenage pregnancy births per 1,000 births. Note that our CSV data set contains more columns, but this is a simple regression where only one variable is taken into account.
  • The trainers used to train the machine learning algorithm is Ols, the Ordinary Least Squares.
  • The method fit will train using the training data defined from the method TrainTestSplit.
  • The resulting model is used to create a prediction engine.
  • Using the prediction engine, it is possible to predict a value value using the Predict method given one input item. Our prediction engine expects input objects of type Input and Output.
  • Using the testdata, the method Transform using the model gives us multiple predictions and it is possible to evalute the regression analysis from the predictions to check how accurate the regression model is.
  • Returning from this evaluation, we get the R-squared for example. This is a value from 0 to 1.0 where it describes how accurate the regression is in in describing the total variation of the residues of the model, the amount the data when plotted in a scatter graph where residue is the offset between the actual data and what the regression model predicts.
  • Other values such as RMSE and MSE are the root and mean squared error, which are absolute values.
  • Using the code above we got a fairly accurate regression model, but more accuracy would be achieved by taking in additional factors.


  • Output from the Linqpad 8 application shown in this article :
    
        
    Input poverty rate: 8,4 . Predicted birth rate per 1000: 35,06
    Actual birth rate per 1000: 36,8
    R-squared: 0,59
    Root Mean Squared Error: 8,99
    Mean Absolute Error: 8,01
    Mean Squared Error: 80,83
        
      
    
    Please note that there are some standard column names used for machine learning.
    Label: Represents the target variable (the value to predict).
    
    Features: Contains the input features used for training.
    
    Score: Stores the predicted value in regression models.
    
    PredictedLabel: Holds the predicted class in classification models.
    
    Probability: Represents the probability of a predicted class.
    
    FeatureContributions: Shows how much each feature contributes to a prediction.
    
    
    In the code above, the column names "Label", "Features" and "Score" was used to instruct the regression being calculated in the code here for ML.Net context model. The attribute ColumnName was being used here together with the Concatenate method.

Tuesday, 15 April 2025

Adding plugins to use for Semantic Kernel

With Microsoft Semantic Kernel, it is possible to consume AI services such as Azure AI and OpenAI with less code, as this framework provides simplification and standardization for consuming these services. A repo on Github with the code shown in this article is provided here :

https://github.com/toreaurstadboss/SemanticKernelPluginDemov4

The demo code is a Blazor server app. It demonstrates how to use Microsoft Semantic Kernel with plugins. I have decided to provide the Northwind database as the extra data the plugin will use. Via debugging and seeing the output, I see that the plugin is successfully called and used. It is also easy to add plugins, which provides additional data to the AI model. This is suitable for providing AI powered solutions with private data that you want to provide to the AI model. For example, when using OpenAI Chat GPT-4, providing a plugin will make it possible to specify which data are to be presented and displayed. It is a convenient way to provide a natural language interface for doing data reporting such as listing up results from this plugin. The plugin can provide kernel functions, using attributes on the method. Let's first look at the Program.cs file for wiring up the semantic kernel for a Blazor Server demo app.

Program.cs



using Microsoft.EntityFrameworkCore;
using Microsoft.SemanticKernel;
using SemanticKernelPluginDemov4.Models;
using SemanticKernelPluginDemov4.Services;


namespace SemanticKernelPluginDemov4
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var builder = WebApplication.CreateBuilder(args);

            // Add services to the container.
            builder.Services.AddRazorPages();
            builder.Services.AddServerSideBlazor();

            // Add DbContext
            builder.Services.AddDbContextFactory<NorthwindContext>(options =>
                options.UseSqlServer(builder.Configuration.GetConnectionString("DefaultConnection")));

            builder.Services.AddScoped<IOpenAIChatcompletionService, OpenAIChatcompletionService>();

            builder.Services.AddScoped<NorthwindSemanticKernelPlugin>();

            builder.Services.AddScoped(sp =>
            {
                var kernelBuilder = Kernel.CreateBuilder();
                kernelBuilder.AddOpenAIChatCompletion(modelId: builder.Configuration.GetSection("OpenAI").GetValue<string>("ModelId")!,
                    apiKey: builder.Configuration.GetSection("OpenAI").GetValue<string>("ApiKey")!);

                var kernel = kernelBuilder.Build();

                var dbContextFactory = sp.GetRequiredService<IDbContextFactory<NorthwindContext>>();
                var northwindSemanticKernelPlugin = new NorthwindSemanticKernelPlugin(dbContextFactory);
                kernel.ImportPluginFromObject(northwindSemanticKernelPlugin);

                return kernel;
            });

            var app = builder.Build();

            // Configure the HTTP request pipeline.
            if (!app.Environment.IsDevelopment())
            {
                app.UseExceptionHandler("/Error");
                // The default HSTS value is 30 days. You may want to change this for production scenarios, see https://aka.ms/aspnetcore-hsts.
                app.UseHsts();
            }

            app.UseHttpsRedirection();

            app.UseStaticFiles();

            app.UseRouting();

            app.MapBlazorHub();
            app.MapFallbackToPage("/_Host");

            app.Run();
        }
    }
}



In the code above, note the following:
  • The usage of IDbContextFactory for creating a db context, injected into the plugin. This is a Blazor server app, so this service is used to create db contet, since a Blazor server will have a durable connection between client and the server over Signal-R and there needs to use this interface to create dbcontext instances as needed
  • Using the method ImportPluginFromObject to import the plugin into the semantic kernel built here. Note that we register the kernel as a scoped service here. Also the plugin is registered as a scoped service here.
The plugin looks like this.

NorthwindSemanticKernelplugin.cs



using Microsoft.EntityFrameworkCore;
using Microsoft.SemanticKernel;
using SemanticKernelPluginDemov4.Models;
using System.ComponentModel;

namespace SemanticKernelPluginDemov4.Services
{

    public class NorthwindSemanticKernelPlugin
    {
        private readonly IDbContextFactory<NorthwindContext> _northwindContext;

        public NorthwindSemanticKernelPlugin(IDbContextFactory<NorthwindContext> northwindContext)
        {
            _northwindContext = northwindContext;
        }

        [KernelFunction]
        [Description("When asked about the suppliers of Nortwind database, use this method to get all the suppliers. Inform that the data comes from the Semantic Kernel plugin called : NortwindSemanticKernelPlugin")]
        public async Task<List<string>> GetSuppliers()
        {
            using (var dbContext = _northwindContext.CreateDbContext())
            {
                return await dbContext.Suppliers.OrderBy(s => s.CompanyName).Select(s => "Kernel method 'NorthwindSemanticKernelPlugin:GetSuppliers' gave this: " + s.CompanyName).ToListAsync();
            }
        }

        [KernelFunction]
        [Description("When asked about the total sales of a given month in a year, use this method. In case asked for multiple months, call this method again multiple times, adjusting the month and year as provided. The month and year is to be in the range 1-12 for months and for year 1996-1998. Suggest for the user what the valid ranges are in case other values are provided.")]
        public async Task<decimal> GetTotalSalesInMontAndYear(int month, int year)
        {
            using (var dbContext = _northwindContext.CreateDbContext())
            {
                var sumOfOrders = await (from Order o in dbContext.Orders
                             join OrderDetail od in dbContext.OrderDetails on o.OrderId equals od.OrderId
                             where o.OrderDate.HasValue && (o.OrderDate.Value.Month == month
                             && o.OrderDate.Value.Year == year) 
                             select (od.UnitPrice * od.Quantity) * (1 - (decimal)od.Discount)).SumAsync();

                return sumOfOrders;
            }
        }

    }
}


In the code above, note the attributes used. KernelFunction tells that this is a method the Semantic kernel can use. The description attribute instructs the AI LLM model how to use the method, how to provide parameter values if any and when the method is to be called. Let's look at the OpenAI service next.

OpenAIChatcompletionService.cs



using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;

namespace SemanticKernelPluginDemov4.Services
{

    public class OpenAIChatcompletionService : IOpenAIChatcompletionService
    {
        private readonly Kernel _kernel;

        private IChatCompletionService _chatCompletionService;

        public OpenAIChatcompletionService(Kernel kernel)
        {
            _kernel = kernel;
            _chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
        }

        public async IAsyncEnumerable<string?> RunQuery(string question)
        {
            var chatHistory = new ChatHistory();

            chatHistory.AddSystemMessage("You are a helpful assistant, answering only on questions about Northwind database. In case you got other questions, inform that you only can provide questions about the Northwind database. It is important that only the provided Northwind database functions added to the language model through plugin is used when answering the questions. If no answer is available, inform this.");

            chatHistory.AddUserMessage(question);

            await foreach (var chatUpdate in _chatCompletionService.GetStreamingChatMessageContentsAsync(chatHistory, CreateOpenAIExecutionSettings(), _kernel))
            {
                yield return chatUpdate.Content;
            }
        }

        private OpenAIPromptExecutionSettings? CreateOpenAIExecutionSettings()
        {
            return new OpenAIPromptExecutionSettings
            {
                ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
            };
        }

    }
}



In the code above, the kernel is injected into this service. The kernel was registered in Program.cs as a scoped service, so it is injected here. The method GetRequiredService is similar to the method with same name of IServiceProvider used inside Program.cs. Note the use of ToolCallBehavior set to AutoInvokeKernelFunctions. Extending the AI powered functionality with plugins requires little extra code with Microsoft Semantic kernel. A screenshot of the demo is shown below.