Showing posts with label Machine Learning. Show all posts
Showing posts with label Machine Learning. Show all posts

Tuesday, 22 April 2025

Predicting variable using Regression with ML.net

This article will look at regression with ML.net In the example, the variable "Poverty rate" measured as a percentage against amount "teenage pregnancies" per 1,000 birth. The data is fetched from a publicly available CSV file. The data is obtained from Jeff Prosise repos of ML.net here on Github:

https://github.com/jeffprosise/ML.NET/blob/master/MLN-SimpleRegression/MLN-SimpleRegression/Data/poverty.csv



In this article, Linqpad 8 will be used. First off, the following two Nuget packages are added :
  • Microsoft.ML
  • Microsoft.ML.Mkl.Components
The following method will plot a scatter graph from provided MLContext data, and add a standard linear trendline, which will work with the example in this article.

Plotutils.cs



void PlotScatterGraph<T>(MLContext mlContext, IDataView trainData, Func<T, PointItem> pointCreator, string chartTitle) where T : class, new()
{
	//Convert the IDataview to an enumerable collection
	var data = mlContext.Data.CreateEnumerable<T>(trainData, reuseRowObject: false).Select(x => pointCreator(x)).ToList();

	// Calculate trendline (simple linear regression)
	double avgX = data.Average(d => d.X);
	double avgY = data.Average(d => d.Y);
	double slope = data.Sum(d => (d.X - avgX) * (d.Y - avgY)) / data.Sum(d => (d.X - avgX) * (d.X - avgX));
	double intercept = avgY - slope * avgX;
	var trendline = data.Select(d => new { X = d.X, Y = slope * d.X + intercept }).ToList();

	//Plot the scatter graph
	var plot = data.Chart(d => d.X)
		.AddYSeries(d => d.Y, LINQPad.Util.SeriesType.Point, chartTitle)
		.AddYSeries(d => trendline.FirstOrDefault(t => t.X == d.X)?.Y ?? 0, Util.SeriesType.Line, "Trendline")
		.ToWindowsChart();
		
	plot.AntiAliasing = System.Windows.Forms.DataVisualization.Charting.AntiAliasingStyles.All;
	plot.Dump();
}



Let's look at the code for loading the CSV data and into the MLContext and then used the method TrainTestSplit to split the data into training data and testing data. Note also the classes Input and Output and the usage of LoadColumn and ColumnName

Program.cs



void Main()
{

	string inputFile = Path.Combine(Path.GetDirectoryName(Util.CurrentQueryPath)!, @"Sampledata\poverty2.csv"); //linqpad tech

	var context = new MLContext(seed: 0);

	//Train the model 
	var data = context.Data
		.LoadFromTextFile<Input>(inputFile, hasHeader: true, separatorChar: ';');
	
	// Split data into training and test sets 
	var split = context.Data.TrainTestSplit(data, testFraction: 0.2);
	var trainData = split.TrainSet;
	var testData = split.TestSet;

	var pipeline = context
		.Transforms.NormalizeMinMax("PovertyRate")
		.Append(context.Transforms.Concatenate("Features", "PovertyRate"))
		.Append(context.Regression.Trainers.Ols());

	var model = pipeline.Fit(trainData);
	// Use the model to make a prediction
	var predictor = context.Model.CreatePredictionEngine<Input, Output>(model);
	var input = new Input { PovertyRate = 8.4f };

	var actual = 36.8f;

	var prediction = predictor.Predict(input);
	Console.WriteLine($"Input poverty rate: {input.PovertyRate} . Predicted birth rate per 1000: {prediction.TeenageBirthRate:0.##}");
	Console.WriteLine($"Actual birth rate per 1000: {actual}");

	// Evaluate the regression model 
	var predictions = model.Transform(testData);
	var metrics = context.Regression.Evaluate(predictions);
	Console.WriteLine($"R-squared: {metrics.RSquared:0.##}");
	Console.WriteLine($"Root Mean Squared Error: {metrics.RootMeanSquaredError:0.##}");
	Console.WriteLine($"Mean Absolute Error: {metrics.MeanAbsoluteError:0.##}");
	Console.WriteLine($"Mean Squared Error: {metrics.MeanSquaredError:0.##}");


	PlotScatterGraph<Input>(context, trainData, (Input input) => 
		new PointItem { X = (float) Math.Round(input.PovertyRate, 2), Y = (float) Math.Round(input.TeenageBirthRate, 2) },
		"Poverty rate (%) vs Teenage Pregnancies per 1,000 birth");

}

public class PointItem {
	public float X { get; set; }
	public float Y { get; set; }
}

void PlotScatterGraph<T>(MLContext mlContext, IDataView trainData, Func<T, PointItem> pointCreator, string chartTitle) where T : class, new()
{
	//Convert the IDataview to an enumerable collection
	var data = mlContext.Data.CreateEnumerable<T>(trainData, reuseRowObject: false).Select(x => pointCreator(x)).ToList();

	// Calculate trendline (simple linear regression)
	double avgX = data.Average(d => d.X);
	double avgY = data.Average(d => d.Y);
	double slope = data.Sum(d => (d.X - avgX) * (d.Y - avgY)) / data.Sum(d => (d.X - avgX) * (d.X - avgX));
	double intercept = avgY - slope * avgX;
	var trendline = data.Select(d => new { X = d.X, Y = slope * d.X + intercept }).ToList();

	//Plot the scatter graph
	var plot = data.Chart(d => d.X)
		.AddYSeries(d => d.Y, LINQPad.Util.SeriesType.Point, chartTitle)
		.AddYSeries(d => trendline.FirstOrDefault(t => t.X == d.X)?.Y ?? 0, Util.SeriesType.Line, "Trendline")
		.ToWindowsChart();
		
	plot.AntiAliasing = System.Windows.Forms.DataVisualization.Charting.AntiAliasingStyles.All;
	plot.Dump();
}



public class Input
{

	[LoadColumn(1)]
	public float PovertyRate;

	[LoadColumn(5), ColumnName("Label")]
	public float TeenageBirthRate { get; set; }

}
public class Output
{
	[ColumnName("Score")]
	public float TeenageBirthRate;

}


A pipeline is defined for the machine learning here consisting of the following :
  • The method NormalizeMinMax will transform the poverty rate into a normalized scale between 0 and 1. The Concatenate method will be used to specify the "Features", in this case only the column Poverty rate is the feature of which we want to predict a score, this is the rate of teenage pregnancy births per 1,000 births. Note that our CSV data set contains more columns, but this is a simple regression where only one variable is taken into account.
  • The trainers used to train the machine learning algorithm is Ols, the Ordinary Least Squares.
  • The method fit will train using the training data defined from the method TrainTestSplit.
  • The resulting model is used to create a prediction engine.
  • Using the prediction engine, it is possible to predict a value value using the Predict method given one input item. Our prediction engine expects input objects of type Input and Output.
  • Using the testdata, the method Transform using the model gives us multiple predictions and it is possible to evalute the regression analysis from the predictions to check how accurate the regression model is.
  • Returning from this evaluation, we get the R-squared for example. This is a value from 0 to 1.0 where it describes how accurate the regression is in in describing the total variation of the residues of the model, the amount the data when plotted in a scatter graph where residue is the offset between the actual data and what the regression model predicts.
  • Other values such as RMSE and MSE are the root and mean squared error, which are absolute values.
  • Using the code above we got a fairly accurate regression model, but more accuracy would be achieved by taking in additional factors.


  • Output from the Linqpad 8 application shown in this article :
    
        
    Input poverty rate: 8,4 . Predicted birth rate per 1000: 35,06
    Actual birth rate per 1000: 36,8
    R-squared: 0,59
    Root Mean Squared Error: 8,99
    Mean Absolute Error: 8,01
    Mean Squared Error: 80,83
        
      
    
    Please note that there are some standard column names used for machine learning.
    
    Label: Represents the target variable (the value to predict).
    
    Features: Contains the input features used for training.
    
    Score: Stores the predicted value in regression models.
    
    PredictedLabel: Holds the predicted class in classification models.
    
    Probability: Represents the probability of a predicted class.
    
    FeatureContributions: Shows how much each feature contributes to a prediction.
    
    
    In the code above, the column names "Label", "Features" and "Score" was used to instruct the regression being calculated in the code here for ML.Net context model. The attribute ColumnName was being used here together with the Concatenate method.

Saturday, 22 March 2025

Image classification using ML.NET Machine Learning

I added a demo using ML.Net in a Github. The demo is available in this repository :

https://github.com/toreaurstadboss/ImageClassificationMLNetBlazorDemo

A screenshot shows the application running below :

ML.Net is Microsoft's machine learning library. It is combined with tooling inside VS 2022 an easy way to locally use machine learning models on your CPU or GPU, or hosted in Azure cloud services. The website for ML.Net is available here for more information about ML.Net and documentation:

https://dotnet.microsoft.com/en-us/apps/ai/ml-dotnet

In the demo above I have trained the model to recognize either horses or mooses. These species are both mammals and herbivores and somewhat are similar in appearance. I have trained the machine learning model in this demo only with ten images of each category, then again with ten other test images that checks if the model recognizes correctly if we see a horse or a moose. Already with just ten images, it did not miss once, and of course a better example for a real world machine learning model would have scoured over tens of thousand of images to handle all edge cases. ML.Net is very easy to run, it can be run locally on your own machine, using the CPU or GPU. The GPU must be CUDA compatible. That actually means you need a NVIDIA card with 8-series. I got such a card on a laptop of mine and have tested it. The following links points to download pages of NVIDIA for downloading the necessary software as of March 2025 to run ML.Net image classification functionality on GPUs :

Download Cuda 10.1

Cuda 10.1 can be downloaded from here: https://developer.nvidia.com/cuda-10.1-download-archive-base

CuDnn 7.6.4

CuDnn can be downloaded from here: https://developer.nvidia.com/rdp/cudnn-archive

Getting started with image classification using ML.Net

It is easiest to use VS 2022 to add a ML.Net machine learning model. Inside VS 2022, right click your project and choose Add and choose Machine Learning Model In case you do not see this option, hit the start menu and type in Visual Studio installer Now, hit the button Modify for your VS installation. Choose Individual Components Search for 'ml'. Select the ML.NET Model Builder. There are also a package called ML.NET Model Builder 2022, I also chose that.
Choosing the scenario
Now, after adding the Machine Learning model, the first page asks for a scenario. I choose Image Classification here, below Computer Vision scenario category.

Choosing the environment
Then I hit the button Local. In the next step, I select Local (CPU). Note that I have tested also Nvidia Cuda-compatible graphics card / GPU on another laptop and it also worked great and should be preferred if you have a GPU compatible and have installed Cuda 10.1 and Cdnn 7.6.4 as shown in links above.

Hit the button Next Step.
Choosing the Data
It is time to train the machine learning model with data ! I have gathered ten sample images of mooses and horses each. By pointing to a folder with images where each category of images are gathered in subfolders of this folder.

Next step is Train
Training the model
Here you can hit the button Train again. When you have trained enough here the model, you can hit the button Next step . Training the machine learning will take some time depending on you using CPU or GPU and the number of input images here. Usually it takes a few seconds, but not many minutes to churn through a couple of images as shown here, 20 images in total.

Loading up the image data and using the machine learning model

Note that ML.Net demands support to renderinteractive rendering of web apps, pure Blazor WASM apps are not supported. The following file shows how the Blazor serverside app is set up.

Program.cs

using ImageClassificationMLNetBlazorDemo.Components;

var builder = WebApplication.CreateBuilder(args);

// Add services to the container.
builder.Services.AddRazorComponents()
    .AddInteractiveServerComponents();

var app = builder.Build();

// Configure the HTTP request pipeline.
if (!app.Environment.IsDevelopment())
{
    app.UseExceptionHandler("/Error", createScopeForErrors: true);
    // The default HSTS value is 30 days. You may want to change this for production scenarios, see https://aka.ms/aspnetcore-hsts.
    app.UseHsts();
}

app.UseHttpsRedirection();

app.UseStaticFiles();
app.UseAntiforgery();

app.MapRazorComponents<App>()
    .AddInteractiveServerRenderMode();

app.Run();


InteractiveServer is set up inside the App.razor using the HeadOutlet.

App.razor


<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <base href="/" />
    <link rel="stylesheet" href="app.css" />
    <link rel="stylesheet" href="lib/bootstrap/css/bootstrap.min.css" />
    <link rel="stylesheet" href="ImageClassificationMLNetBlazorDemo.styles.css" />
    <link rel="icon" type="image/png" href="favicon.png" />

    <HeadOutlet @rendermode="InteractiveServer" />
</head>

<body>
    <Routes @rendermode="InteractiveServer" />
    <script src="_framework/blazor.web.js"></script>
</body>

</html>


The following codebehind of the razor component Home.razor in the demo repo shows how a file uploaded using the InputFile control in Blazor serverside. Home.razor.cs


@code {

    private string? _base64ImageSource = null;
    private string? _predictedLabel = "No classification";
    private IOrderedEnumerable<KeyValuePair<string, float>>? _predictedLabels = null;
    private int? _assessedPredictionQuality = null;
    private string? _errorMessage = null;

    private async Task LoadFileAsync(InputFileChangeEventArgs e)
    {
        try
        {
            ResetPrivateFields();

            if (e.File.Size <= 0 || e.File.Size >= 2 * 1024 * 1024)
            {
                _errorMessage = "Sorry, the uploaded image but be between 1 byte and 2 MB!";
                return;
            }

            byte[] imageBytes = await GetImageBytes(e.File);
            _base64ImageSource = GetBase64ImageSourceString(e.File.ContentType, imageBytes);

            PredictImageClassification(imageBytes);

        }
        catch (Exception err)
        {
            Console.WriteLine(err);
        }
    }

    private void ResetPrivateFields()
    {
        _base64ImageSource = null;
        _predictedLabel = null;
        _predictedLabels = null;
        _assessedPredictionQuality = null;
    }

    private int GetAssesPrediction()
    {
        int result = 1;
        if (_predictedLabel != null && _predictedLabels != null)
        {
            foreach (var label in _predictedLabels)
            {
                if (label.Key == _predictedLabel)
                {
                    result = label.Value switch
                    {
                        <= 0.50f => 1,
                        <= 0.70f => 2,
                        <= 0.80f => 3,
                        <= 0.85f => 4,
                        <= 0.90f => 5,
                        <= 1.0f => 6,
                        _ => 1 //default to dice we get some other score here..
                    };
                }
            }
        }

        return result;
    }

    private void PredictImageClassification(byte[] imageBytes)
    {

        var input = new ModelInput
            {
                ImageSource = imageBytes
            };
        ModelOutput output = HorseOrMooseImageClassifier.Predict(input);
        _predictedLabel = output.PredictedLabel;

        _predictedLabels = HorseOrMooseImageClassifier.PredictAllLabels(input);

        _assessedPredictionQuality = GetAssesPrediction(); //check how good the prediction is, give a score from 1-6 (dice score!)

        StateHasChanged();
    }


    private async Task<byte[]> GetImageBytes(IBrowserFile file) 
    {
        using MemoryStream memoryStream = new();
        var stream = file.OpenReadStream(2 * 1024 * 1024, CancellationToken.None);
        await stream.CopyToAsync(memoryStream);
        return memoryStream.ToArray();
    }

    private string GetBase64ImageSourceString(string contentType, byte[] bytes)
    {
        string preAmble = $"data:{contentType};base64,";
        return $"{preAmble}{(Convert.ToBase64String(bytes))}";
    }
}


As the code shows above, using the machine learning model is quite convenient, we just use the methods Predict to get the Label that is decided exists in the loaded image. This is the image classiciation that the machine learning found. Note that using the method PredictAllLabels get the confidence of the different labels show in this demo. There are no limitations on the number of categories here in the image classification labels that one could train a model to look after. A benefit with ML.Net is the option to use it on-premise servers and get fairly good result on just a few sample images. But the more sample images you obtain for a label, the more precise the machine learning model will become. It is possible to download a pre-trained model such as Inceptionv3 that is compatible with Tensorflow used here that supports up to 1000 categories. More information is available here from Microsoft about using a pre-trained model such as InceptionV3:

https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification