Sunday, 18 May 2025

Displaying weather data using Seaborn with Python

Using the library Seaborn, built on top of MatPlotLib, displaying weather data is convenient. Using Anaconda and Jupyter Notebook, working on the data is user friendly. First off, let's look at a data set containing some Norwegian weather data. The following data set, contains weather data from Norway from 2020-2021 for 55 meterological stations. (13,61 MB in size), available on DbCL v1.0 license (meaning, free of use , 'as-is' warranty) on the Kaggle.com website.

https://www.kaggle.com/datasets/annbengardt/noway-meteorological-data/data

First off, the following imports are done in the Jupyter Notebook. This is a free IDE part of the Anaconda distribution that is prepared for data visualization, that runs in a browser.

NorwayMeteoDemo1.pynb


import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

Next off, using Pandas, Python's Data Analysis Library, the data is prepared from the mentioned dataset. The format is in CSV format. Also, columns are added dynamically to the dataset. A moving 14-days average of the daily maxmimum air temperature is added using the rolling method and setting window to 14. Also, a date column is added. Our dataset contains three int64 values day, month and year. We combine these to create a date column.

NorwayMeteoDemo1.pynb


df = pd.read_csv("datasets/weather/NorwayMeteoDataCompleted.csv")
df['moving_max_air_temp_avg'] = df['max(air_temperature P1D)'].rolling(window = 14, min_periods = 1).mean()
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

Note the use of min_periods set to 1 for the moving average. Or else, you will get NaN in the start of the data of your created moving average column and the way Python works, it will cause NaN for all the next periods too ! Next, choosing what data to display. The following data will be shown in the demo.
  • Station id : SN69100 (This is Værnes - Trondheim Airport weather station by the way,
  • Year 2020
The station ids can be looked up here on this user's Github GIST: https://gist.github.com/ofaltins/c1f0158f1766c8bd695b2c8d545c052c The following code set up the filtered data and saves it into a variable

NorwayMeteoDemo1.pynb


filtered_df_2020= df[ 
    (df['sourceId'] == 'SN69100')  &
    (df['year'] == 2020)
]

Next up, the data is then displayed. The demo will show two lineplots, first the maximum air temperature as a lineplot using Seaborn. In addition, another line plot with the moving 14 days average of the daily maximum air temperature. Also, a bar plot is added below these two line plots. Note that the bar plot used here is bar and not barplot. Barplot is available in Seaborn, while Bar is available in the MatPlotLib.

NorwayMeteoDemo1.pynb



sns.set_style('whitegrid') # set the style to 'whitegrid' 

# Create a 2x1 subplot

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10), sharex=True)

sns.lineplot(data = filtered_df_2020, x = filtered_df_2020['date'], y = filtered_df_2020['max(air_temperature P1D)'], label='Daily max air temperature (C)', linewidth = 1.5, color = 'pink', ax = ax1)

sns.lineplot(data = filtered_df_2020, x = filtered_df_2020['date'], y = filtered_df_2020['moving_max_air_temp_avg'], label='14-day moving average Daily max air temperature (C)', color = 'red', linewidth = 1.8, ax = ax1)

ax2.bar(filtered_df_2020['date'], filtered_df_2020['sum(precipitation_amount P1D)'], data = filtered_df_2020, color = 'blue', label = 'Daily sum precipitation (mm)')

ax1.set_title('Værnes - Weather data - 2020')

ax1.set_xlabel('Date of year')
ax1.set_ylabel('Daily max air temperature (C)')

ax2.set_xlabel('Date of year')
ax2.set_ylabel('Daily sum precipitation (mm)')



The code above shows the resulting figure consisting of a 2x1 subplots layout, the upper plot shows the 2020 daily maximum air temperature combined with a moving 14-days average as a smoothing or trending function to show the general temperature shifts every second week of the year in average. The lower plot shows a bar plot, using MatPlotLib, since Seaborn's bar plot does not handle dates as x-axis (in our dataset, datetime64 is used). Seaborn and MatPlotLib offers a ton of plotting functionality. Maybe it also could be of interest for .NET Developers to use it more often? My previous article showed how it is possible to render in the backend images using MatPlotLib and then display in a Blazor serverside app, combining both
Python and .NET. That demo used Python.Net library for the Python interop with .NET. The screenshot shows the Jupyter Notebook IDE, part of Anaconda distribution of Python tailored for data analysis and data science. It displays the plots described from the Python script above.

Monday, 5 May 2025

Using MatPlotLib from .NET

MatPlotLib is a powerful library for data visualization. It provides graphing for scientific computing. It can be used for doing both mathematical calculations and statistics. Together with additional libraries like NumPy or Numerical Python, it is clear that Python as a programming language and ecosystem provides a lot of powerful functionality that is also free to use. MatplotLib has a BSD license, which means it can be ued for personal, academic or commercial purposes without restrictions. This article will look at using MatplotLib from .NET. First off an image that displays the demo and example of using MatplotLib.

The source code shown in this article is available on Github here:

https://github.com/toreaurstadboss/SeabornBlazorVisualizer



Using MatPlotLib from .NET

First off, install Anaconda. Anaconda is a Python distribution that contains a large collection of data visualization libraries. A compatible version with the lastest version of Python.net Nuget library. The demo displayed here uses Anaconda version 2023.03.

Anaconda archived versions 2023.03 can be installed from here. Windows users can download the file: https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Windows-x86_64.exe

https://repo.anaconda.com/archive/

Next up, install also Python 3.10 version. It will be used together with Anaconda. A 64-bit installer can be found here:

Python 3.10 installer (Windows 64-bits) The correct versions of NumPy and MatPlotLib can be checked against this list :

https://github.com/toreaurstadboss/SeabornBlazorVisualizer/blob/main/SeabornBlazorVisualizer/conda_list_loading_matplotlib_working_1st_May_2025.txt

Calculating the determinite integral of a function

The demo in this article show in the link at the top has got an appsettings.json file, you can adjust to your environment.

appsettings.json Application configuration file




{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "PythonConfig": {
    "PythonDllPath": "C:\\Python310\\Python310.dll",
    "PythonHome": "C:\\Programdata\\anaconda3",
    "PythonSitePackages":  "C:\\Programdata\\anaconda3\\lib\\site-packages",
    "PythonVersion": "3.10"
  },
  "AllowedHosts": "*"
}


Clone the source code and run the application. It is a Blazor server app. You can run it from VS 2022 for example. The following code shows how Python.net is set up to start using Python. Both Python 3.10 and Anaconda site libs are used here. The Python runtime and engine is set up using this helper class.

PythonInitializer.cs



using Microsoft.Extensions.Options;
using Python.Runtime;

namespace SeabornBlazorVisualizer.Data
{

    /// <summary>
    /// Helper class to initialize the Python runtime
    /// </summary>
    public static class PythonInitializer
    {

        private static bool runtime_initialized = false;

        /// <summary>
        /// Perform one-time initialization of Python runtime
        /// </summary>
        /// <param name="pythonConfig"></param>
        public static void InitializePythonRuntime(IOptions<PythonConfig> pythonConfig)
        {
            if (runtime_initialized)
                return;
            var config = pythonConfig.Value;

            // Set environment variables
            Environment.SetEnvironmentVariable("PYTHONHOME", config.PythonHome, EnvironmentVariableTarget.Process);
            Environment.SetEnvironmentVariable("PYTHONPATH", config.PythonSitePackages, EnvironmentVariableTarget.Process);
            Environment.SetEnvironmentVariable("PYTHONNET_PYDLL", config.PythonDllPath);
            Environment.SetEnvironmentVariable("PYTHONNET_PYVER", config.PythonVersion);

            PythonEngine.Initialize();

            PythonEngine.PythonHome = config.PythonHome ?? Environment.GetEnvironmentVariable("PYTHONHOME", EnvironmentVariableTarget.Process)!;
            PythonEngine.PythonPath = config.PythonDllPath ?? Environment.GetEnvironmentVariable("PYTHONNET_PYDLL", EnvironmentVariableTarget.Process)!;

            PythonEngine.BeginAllowThreads();
            AddSitePackagesToPythonPath(pythonConfig);
            runtime_initialized = true;
        }

        private static void AddSitePackagesToPythonPath(IOptions<PythonConfig> pythonConfig)
        {
            if (!runtime_initialized)
            {
                using (Py.GIL())
                {
                    dynamic sys = Py.Import("sys");
                    sys.path.append(pythonConfig.Value.PythonSitePackages);
                    Console.WriteLine(sys.path);

                    //add folders in solution this too with scripts
                    sys.path.append(@"Data/");
                }
            }
        }

    }
}



The following helper class sets up the site libraries we will use.

PythonHelper.cs



using Python.Runtime;

namespace SeabornBlazorVisualizer.Data
{

    /// <summary>
    /// Helper class to initialize the Python runtime
    /// </summary>
    public static class PythonHelper
    {

        /// <summary>
        /// Imports Python modules. Returned are the following modules:
        /// <para>np (numpy)</para>
        /// <para>os (OS module - standard library)</para>
        /// <para>scipy (scipy)</para>
        /// <para>mpl (matplotlib)</para>
        /// <para>plt (matplotlib.pyplot </para>
        /// </summary>
        /// <returns>Tuple of Python modules</returns>
        public static (dynamic np, dynamic os, dynamic scipy, dynamic mpl, dynamic plt) ImportPythonModules()
        {

            dynamic np = Py.Import("numpy");
            dynamic os = Py.Import("os");
            dynamic mpl = Py.Import("matplotlib");
            dynamic plt = Py.Import("matplotlib.pyplot");
            dynamic scipy = Py.Import("scipy");

            mpl.use("Agg");

            return (np, os, scipy, mpl, plt);
        }

    }
}



The demo is a Blazor server app. The following service will generate the plot of a determinite integral using MatPlotLib. The service saves the plot into a PNG file. This PNG file is saved into the folder wwwroot. The Blazor server app displays the image that was generated and saved.

MatPlotImageService.cs



using Microsoft.Extensions.Options;
using Python.Runtime;

namespace SeabornBlazorVisualizer.Data
{
    public class MatplotPlotImageService
    {

        private IOptions<PythonConfig>? _pythonConfig;

        private static readonly object _lock = new object();

        public MatplotPlotImageService(IOptions<PythonConfig> pythonConfig)
        {
            _pythonConfig = pythonConfig;
            PythonInitializer.InitializePythonRuntime(_pythonConfig);
        }

        public Task<string> GenerateDefiniteIntegral(string functionExpression, int lowerBound, int upperBound)
        {

            string? result = null;

            using (Py.GIL()) // Ensure thread safety for Python calls
            {
                dynamic np = Py.Import("numpy");
                dynamic plt = Py.Import("matplotlib.pyplot");

                dynamic patches = Py.Import("matplotlib.patches"); // Import patches module

                // Create a Python execution scope
                using (var scope = Py.CreateScope())
                {
                    // Define the function inside the scope
                    scope.Exec($@"
import numpy as np
def func(x):
    return {functionExpression}
");

                    // Retrieve function reference from scope
                    dynamic func = scope.Get("func");

                    // Define integration limits
                    double a = lowerBound, b = upperBound;

                    // Generate x-values
                    dynamic x = np.linspace(0, 10, 100); //generate evenly spaced values in range [0, 20], 100 values (per 0.1)
                    dynamic y = func.Invoke(x);

                    // Create plot figure
                    var fig = plt.figure();
                    var ax = fig.add_subplot(111);

                    // set title to function expression
                    plt.title(functionExpression);

                    ax.plot(x, y, "r", linewidth: 2);
                    ax.set_ylim(0, null);

                    // Select range for integral shading
                    dynamic ix = np.linspace(a, b, 100);
                    dynamic iy = func.Invoke(ix);

                    // **Fix: Separate x and y coordinates properly**
                    List<double> xCoords = new List<double> { a }; // Start at (a, 0)
                    List<double> yCoords = new List<double> { 0 };

                    int length = (int)np.size(ix);
                    for (int i = 0; i < length; i++)
                    {
                        xCoords.Add((double)ix[i]);
                        yCoords.Add((double)iy[i]);
                    }

                    xCoords.Add(b); // End at (b, 0)
                    yCoords.Add(0);

                    // Convert x and y lists to NumPy arrays
                    dynamic npVerts = np.column_stack(new object[] { np.array(xCoords), np.array(yCoords) });

                    // **Correctly Instantiate Polygon Using NumPy Array**
                    dynamic poly = patches.Polygon(npVerts, facecolor: "0.6", edgecolor: "0.2");
                    ax.add_patch(poly);

                    // Compute integral area
                    double area = np.trapezoid(iy, ix);
                    ax.text(0.5 * (a + b), 30, "$\\int_a^b f(x)\\mathrm{d}x$", ha: "center", fontsize: 20);
                    ax.text(0.5 * (a + b), 10, $"Area = {area:F2}", ha: "center", fontsize: 12);

                    plt.show();


                    result = SavePlot(plt, dpi: 150);
                }
            }
            return Task.FromResult(result);
        }

        public Task<string> GenerateHistogram(List<double> values, string title = "Provide Plot title", string xlabel = "Provide xlabel title", string ylabel = "Provide ylabel title")
        {
            string? result = null;
            using (Py.GIL()) //Python Global Interpreter Lock (GIL)
            {
                var (np, os, scipy, mpl, plt) = PythonHelper.ImportPythonModules();

                var distribution = np.array(values.ToArray());

                //// Ensure clearing the plot
                //plt.clf();

                var fig = plt.figure(); //create a new figure
                var ax1 = fig.add_subplot(1, 2, 1);
                var ax2 = fig.add_subplot(1, 2, 2);

                // Add style
                plt.style.use("ggplot");

                var counts_bins_patches = ax1.hist(distribution, edgecolor: "black");

                // Normalize counts to get colors 
                var counts = counts_bins_patches[0];
                var patches = counts_bins_patches[2];

                var norm_counts = counts / np.max(counts);

                int norm_counts_size = Convert.ToInt32(norm_counts.size.ToString());

                // Apply colors to patches based on frequency
                for (int i = 0; i < norm_counts_size; i++)
                {
                    plt.setp(patches[i], "facecolor", plt.cm.viridis(norm_counts[i])); //plt.cm is the colormap module in MatPlotlib. viridis creates color maps from normalized value 0 to 1 that is optimized for color-blind people.
                }

                // **** AX1 Histogram first - frequency counts ***** 

                ax1.set_title(title);
                ax1.set_xlabel(xlabel);
                ax1.set_ylabel(ylabel);

                string cwd = os.getcwd();

                // Calculate average and standard deviation
                var average = np.mean(distribution);
                var std_dev = np.std(distribution);
                var total_count = np.size(distribution);

                // Format average and standard deviation to two decimal places
                var average_formatted = np.round(average, 2);
                var std_dev_formatted = np.round(std_dev, 2);

                //Add legend with average and standard deviation
                ax1.legend(new string[] { $"Total count: {total_count}\n Average: {average_formatted} cm\nStd Dev: {std_dev_formatted} cm" }, framealpha: 0.5, fancybox: true);



                //***** AX2 : Set up ax2 = Percentage histogram next *******

                ax2.set_title("Percentage distribution");
                ax2.set_xlabel(xlabel);
                ax2.set_ylabel(ylabel);
                // Fix for CS1977: Cast the lambda expression to a delegate type
                ax2.yaxis.set_major_formatter((PyObject)plt.FuncFormatter(new Func<double, int, string>((y, _) => $"{y:P0}")));

                ax2.hist(distribution, edgecolor: "black", weights: np.ones(distribution.size) / distribution.size);

                // Format y-axis to show percentages
                ax2.yaxis.set_major_formatter(plt.FuncFormatter(new Func<double, int, string>((y, _) => $"{y:P0}")));

                // tight layout to prevent overlap 
                plt.tight_layout();

                // Show the plot with the two subplots at last (render to back buffer 'Agg', see method SavePlot for details)
                plt.show();

                result = SavePlot(plt, theme: "bmh", dpi: 150);
            }

            return Task.FromResult(result);
        }

        public Task<string> GeneratedCumulativeGraphFromValues(List<double> values)
        {
            string? result = null;
            using (Py.GIL()) //Python Global Interpreter Lock (GIL)
            {
                var (np, os, scipy, mpl, plt) = PythonHelper.ImportPythonModules();

                dynamic pythonValues = np.cumsum(np.array(values.ToArray()));

                // Ensure clearing the plot
                plt.clf();

                // Create a figure with increased size
                dynamic fig = plt.figure(figsize: new PyTuple(new PyObject[] { new PyFloat(6), new PyFloat(4) }));

                // Plot data
                plt.plot(values, color: "green");

                string cwd = os.getcwd();

                result = SavePlot(plt, theme: "ggplot", dpi: 200);

            }

            return Task.FromResult(result);
        }

        public Task<string> GenerateRandomizedCumulativeGraph()
        {
            string? result = null;
            using (Py.GIL()) //Python Global Interpreter Lock (GIL)
            {

                dynamic np = Py.Import("numpy");

                //TODO : Remove imports of pandas and scipy and datetime if they are not needed

                Py.Import("pandas");
                Py.Import("scipy");
                Py.Import("datetime");
                dynamic os = Py.Import("os");

                dynamic mpl = Py.Import("matplotlib");
                dynamic plt = Py.Import("matplotlib.pyplot");

                // Set dark theme
                plt.style.use("ggplot");

                mpl.use("Agg");


                // Generate data
                //dynamic x = np.arange(0, 10, 0.1);
                //dynamic y = np.multiply(2, x); // Use NumPy's multiply function

                dynamic values = np.cumsum(np.random.randn(1000, 1));


                // Ensure clearing the plot
                plt.clf();

                // Create a figure with increased size
                dynamic fig = plt.figure(figsize: new PyTuple(new PyObject[] { new PyFloat(6), new PyFloat(4) }));

                // Plot data
                plt.plot(values, color: "blue");

                string cwd = os.getcwd();

                result = SavePlot(plt, theme: "ggplot", dpi: 200);

            }

            return Task.FromResult(result);
        }

        /// <summary>
        /// Saves the plot to a PNG file with a unique name based on the current date and time
        /// </summary>
        /// <param name="plot">Plot, must be a PyPlot plot use Python.net Py.Import("matplotlib.pyplot")</param>
        /// <param name="theme"></param>
        /// <param name="dpi"></param>
        /// <returns></returns>
        public string? SavePlot(dynamic plt, string theme = "ggplot", int dpi = 200)
        {
            string? plotSavedImagePath = null;
            //using (Py.GIL()) //Python Global Interpreter Lock (GIL)
            //{
            dynamic os = Py.Import("os");
            dynamic mpl = Py.Import("matplotlib");
            // Set dark theme
            plt.style.use(theme);
            mpl.use("Agg"); //set up rendering of plot to back-buffer ('headless' mode)

            string cwd = os.getcwd();
            // Save plot to PNG file
            string imageToCreatePath = $@"GeneratedImages\{DateTime.Now.ToString("yyyyMMddHHmmss")}{Guid.NewGuid().ToString("N")}_plotimg.png";
            string imageToCreateWithFolderPath = $@"{cwd}\wwwroot\{imageToCreatePath}";
            plt.savefig(imageToCreateWithFolderPath, dpi: dpi); //save the plot to a file (use full path)
            plotSavedImagePath = imageToCreatePath;

            CleanupOldGeneratedImages(cwd);
            //}
            return plotSavedImagePath;
        }

        private static void CleanupOldGeneratedImages(string cwd)
        {
            lock (_lock)
            {

                Directory.GetFiles(cwd + @"\wwwroot\GeneratedImages", "*.png")
                 .OrderByDescending(File.GetLastWriteTime)
                 .Skip(10)
                 .ToList()
                 .ForEach(File.Delete);
            }
        }

}



The code above shows some additional examples of using MatPlotLib.
  • Histogram example
  • Line graph using cumulative sum by making use of NumPy or a helper method in .NET
These examples demonstrates also that MatPlotLib can be used for statistics, which today for .NET is mostly crunched with the help of Excel or EP Plus library for example. Since Python is considered as the home of data visualization with its vast ecosystem of data science libraries, this article and demos shows how you can get started with using this ecosystem from .NET. Note, using Python.net to create these plots in MatPlotLib is best prepared using Jupyter Notebook. When the plot displayed looks okay, it is time to integrate that Python script into .NET and C# using Python.Net library. Make note that there will be some challenges to get the Python code to work in C# of course. When passing in values to a function, sometimes you must use
for example NumPy to create compatible data types. Also note the usage of the Pystatic class here from Python.net , which offers the GIL Global Interpreter lock and a way to import Python modules.

https://jupyter.org/ A screenshot showing histogram in the demo is shown below. As we can see, MatPlotLib can be used from many different data visualizations and domains.

Tuesday, 22 April 2025

Predicting variable using Regression with ML.net

This article will look at regression with ML.net In the example, the variable "Poverty rate" measured as a percentage against amount "teenage pregnancies" per 1,000 birth. The data is fetched from a publicly available CSV file. The data is obtained from Jeff Prosise repos of ML.net here on Github:

https://github.com/jeffprosise/ML.NET/blob/master/MLN-SimpleRegression/MLN-SimpleRegression/Data/poverty.csv



In this article, Linqpad 8 will be used. First off, the following two Nuget packages are added :
  • Microsoft.ML
  • Microsoft.ML.Mkl.Components
The following method will plot a scatter graph from provided MLContext data, and add a standard linear trendline, which will work with the example in this article.

Plotutils.cs



void PlotScatterGraph<T>(MLContext mlContext, IDataView trainData, Func<T, PointItem> pointCreator, string chartTitle) where T : class, new()
{
	//Convert the IDataview to an enumerable collection
	var data = mlContext.Data.CreateEnumerable<T>(trainData, reuseRowObject: false).Select(x => pointCreator(x)).ToList();

	// Calculate trendline (simple linear regression)
	double avgX = data.Average(d => d.X);
	double avgY = data.Average(d => d.Y);
	double slope = data.Sum(d => (d.X - avgX) * (d.Y - avgY)) / data.Sum(d => (d.X - avgX) * (d.X - avgX));
	double intercept = avgY - slope * avgX;
	var trendline = data.Select(d => new { X = d.X, Y = slope * d.X + intercept }).ToList();

	//Plot the scatter graph
	var plot = data.Chart(d => d.X)
		.AddYSeries(d => d.Y, LINQPad.Util.SeriesType.Point, chartTitle)
		.AddYSeries(d => trendline.FirstOrDefault(t => t.X == d.X)?.Y ?? 0, Util.SeriesType.Line, "Trendline")
		.ToWindowsChart();
		
	plot.AntiAliasing = System.Windows.Forms.DataVisualization.Charting.AntiAliasingStyles.All;
	plot.Dump();
}



Let's look at the code for loading the CSV data and into the MLContext and then used the method TrainTestSplit to split the data into training data and testing data. Note also the classes Input and Output and the usage of LoadColumn and ColumnName

Program.cs



void Main()
{

	string inputFile = Path.Combine(Path.GetDirectoryName(Util.CurrentQueryPath)!, @"Sampledata\poverty2.csv"); //linqpad tech

	var context = new MLContext(seed: 0);

	//Train the model 
	var data = context.Data
		.LoadFromTextFile<Input>(inputFile, hasHeader: true, separatorChar: ';');
	
	// Split data into training and test sets 
	var split = context.Data.TrainTestSplit(data, testFraction: 0.2);
	var trainData = split.TrainSet;
	var testData = split.TestSet;

	var pipeline = context
		.Transforms.NormalizeMinMax("PovertyRate")
		.Append(context.Transforms.Concatenate("Features", "PovertyRate"))
		.Append(context.Regression.Trainers.Ols());

	var model = pipeline.Fit(trainData);
	// Use the model to make a prediction
	var predictor = context.Model.CreatePredictionEngine<Input, Output>(model);
	var input = new Input { PovertyRate = 8.4f };

	var actual = 36.8f;

	var prediction = predictor.Predict(input);
	Console.WriteLine($"Input poverty rate: {input.PovertyRate} . Predicted birth rate per 1000: {prediction.TeenageBirthRate:0.##}");
	Console.WriteLine($"Actual birth rate per 1000: {actual}");

	// Evaluate the regression model 
	var predictions = model.Transform(testData);
	var metrics = context.Regression.Evaluate(predictions);
	Console.WriteLine($"R-squared: {metrics.RSquared:0.##}");
	Console.WriteLine($"Root Mean Squared Error: {metrics.RootMeanSquaredError:0.##}");
	Console.WriteLine($"Mean Absolute Error: {metrics.MeanAbsoluteError:0.##}");
	Console.WriteLine($"Mean Squared Error: {metrics.MeanSquaredError:0.##}");


	PlotScatterGraph<Input>(context, trainData, (Input input) => 
		new PointItem { X = (float) Math.Round(input.PovertyRate, 2), Y = (float) Math.Round(input.TeenageBirthRate, 2) },
		"Poverty rate (%) vs Teenage Pregnancies per 1,000 birth");

}

public class PointItem {
	public float X { get; set; }
	public float Y { get; set; }
}

void PlotScatterGraph<T>(MLContext mlContext, IDataView trainData, Func<T, PointItem> pointCreator, string chartTitle) where T : class, new()
{
	//Convert the IDataview to an enumerable collection
	var data = mlContext.Data.CreateEnumerable<T>(trainData, reuseRowObject: false).Select(x => pointCreator(x)).ToList();

	// Calculate trendline (simple linear regression)
	double avgX = data.Average(d => d.X);
	double avgY = data.Average(d => d.Y);
	double slope = data.Sum(d => (d.X - avgX) * (d.Y - avgY)) / data.Sum(d => (d.X - avgX) * (d.X - avgX));
	double intercept = avgY - slope * avgX;
	var trendline = data.Select(d => new { X = d.X, Y = slope * d.X + intercept }).ToList();

	//Plot the scatter graph
	var plot = data.Chart(d => d.X)
		.AddYSeries(d => d.Y, LINQPad.Util.SeriesType.Point, chartTitle)
		.AddYSeries(d => trendline.FirstOrDefault(t => t.X == d.X)?.Y ?? 0, Util.SeriesType.Line, "Trendline")
		.ToWindowsChart();
		
	plot.AntiAliasing = System.Windows.Forms.DataVisualization.Charting.AntiAliasingStyles.All;
	plot.Dump();
}



public class Input
{

	[LoadColumn(1)]
	public float PovertyRate;

	[LoadColumn(5), ColumnName("Label")]
	public float TeenageBirthRate { get; set; }

}
public class Output
{
	[ColumnName("Score")]
	public float TeenageBirthRate;

}


A pipeline is defined for the machine learning here consisting of the following :
  • The method NormalizeMinMax will transform the poverty rate into a normalized scale between 0 and 1. The Concatenate method will be used to specify the "Features", in this case only the column Poverty rate is the feature of which we want to predict a score, this is the rate of teenage pregnancy births per 1,000 births. Note that our CSV data set contains more columns, but this is a simple regression where only one variable is taken into account.
  • The trainers used to train the machine learning algorithm is Ols, the Ordinary Least Squares.
  • The method fit will train using the training data defined from the method TrainTestSplit.
  • The resulting model is used to create a prediction engine.
  • Using the prediction engine, it is possible to predict a value value using the Predict method given one input item. Our prediction engine expects input objects of type Input and Output.
  • Using the testdata, the method Transform using the model gives us multiple predictions and it is possible to evalute the regression analysis from the predictions to check how accurate the regression model is.
  • Returning from this evaluation, we get the R-squared for example. This is a value from 0 to 1.0 where it describes how accurate the regression is in in describing the total variation of the residues of the model, the amount the data when plotted in a scatter graph where residue is the offset between the actual data and what the regression model predicts.
  • Other values such as RMSE and MSE are the root and mean squared error, which are absolute values.
  • Using the code above we got a fairly accurate regression model, but more accuracy would be achieved by taking in additional factors.


  • Output from the Linqpad 8 application shown in this article :
    
        
    Input poverty rate: 8,4 . Predicted birth rate per 1000: 35,06
    Actual birth rate per 1000: 36,8
    R-squared: 0,59
    Root Mean Squared Error: 8,99
    Mean Absolute Error: 8,01
    Mean Squared Error: 80,83
        
      
    
    Please note that there are some standard column names used for machine learning.
    
    Label: Represents the target variable (the value to predict).
    
    Features: Contains the input features used for training.
    
    Score: Stores the predicted value in regression models.
    
    PredictedLabel: Holds the predicted class in classification models.
    
    Probability: Represents the probability of a predicted class.
    
    FeatureContributions: Shows how much each feature contributes to a prediction.
    
    
    In the code above, the column names "Label", "Features" and "Score" was used to instruct the regression being calculated in the code here for ML.Net context model. The attribute ColumnName was being used here together with the Concatenate method.