Coding Grounds

Sunday, 8 December 2024

Extending Azure AI Search with data sources

This article will present both code and tips around getting Azure AI Search to utilize additional data sources. The article builds upon the previous article in the blog:

https://toreaurstad.blogspot.com/2024/12/azure-ai-openai-chat-gpt-4-client.html

This code will use Open AI Chat GPT-4 together with additional data source. I have tested this using Storage account in Azure which contains blobs with documents. First off, create Azure AI services if you do not have this yet.

Then create an Azure AI Search

Choose the location and the Pricing Tier. You can choose the Free (F) pricing tier to test out the Azure AI Search. The standard pricing tier comes in at about 250 USD per month, so a word of caution here as billing might incur if you do not choose the Free tier. Head over to the Azure AI Search service after it is crated and note inside the Overview the Url. Expand the Search management and choose the folowing menu options and fill out them in this order:

Data sources
Indexes
Indexers

There are several types of data sources you can add.

Azure Blog Storage
Azure Data Lake Storage Gen2
Azure Cosmos DB
Azure SQL Database
Azure Table Storage
Fabric OneLake files

Upload files to the blob container

I have tested out adding a data source using Azure Blob Storage. I had to create a new storage account and I believe Azure might have changed it over the years, so for best compability, add a brand new storage account. Then choose a blob container inside the Blob storage, then hit the Create button.
Head over to your Storage browser inside your storage account, then choose Blob container. You can add a Blob container and then after it is created, click the Upload button.
You can then upload multiple files into the blob container (it is like a folder, which saves your files as blobs).

Setting up the index

After the Blob storage (storage account) is added to the data source, choose the Indexes menu button inside Azure AI search. Click Add index.
After the index is added, choose the button Add field
Add a field name called : Edit.String of type Edm.String.
Click the checkbox for Retrievable and Searchable. Click the button Save

Setting up the indexer

Choose to add an Indexer via button Add indexer
Choose the Index you added
Choose the Data source you added
Select the indexed extensions and specify which file types to index. Probably you should select text based files here, such as .md and .markdown files and even some binary file type such as .pdf and .docx can be selected here
Data to extract: Choose Content and metadata

Source code for this article

The source code can be cloned from this Github repo:
br /> https://github.com/toreaurstadboss/OpenAIDemo.git

The code for this article is available in the branch:
feature/openai-search-documentsources To add the data source to our ChatClient instance, we do the following. Please note that this method will be changed in the Azure AI SDK in the future :



            ChatCompletionOptions? chatCompletionOptions = null;
            if (dataSources?.Any() == true)
            {
                chatCompletionOptions = new ChatCompletionOptions();

                foreach (var dataSource in dataSources!)
                {
#pragma warning disable AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                    chatCompletionOptions.AddDataSource(new AzureSearchChatDataSource()
                    {
                        Endpoint = new Uri(dataSource.endpoint),
                        IndexName = dataSource.indexname,
                        Authentication = DataSourceAuthentication.FromApiKey(dataSource.authentication)
                    });
#pragma warning restore AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                }

            }

The updated version of the extension class of OpenAI.Chat.ChatClient then looks like this: ChatClientExtensions.cs



using Azure.AI.OpenAI.Chat;
using OpenAI.Chat;
using System.ClientModel;
using System.Text;

namespace ToreAurstadIT.OpenAIDemo
{
    public static class ChatclientExtensions
    {

        /// <summary>
        /// Provides a stream result from the Chatclient service using AzureAI services.
        /// </summary>
        /// <param name="chatClient">ChatClient instance</param>
        /// <param name="message">The message to send and communicate to the ai-model</param>
        /// <returns>Streamed chat reply / result. Consume using 'await foreach'</returns>
        public static AsyncCollectionResult<StreamingChatCompletionUpdate> GetStreamedReplyAsync(this ChatClient chatClient, string message,
            (string endpoint, string indexname, string authentication)[]? dataSources = null)
        {
            ChatCompletionOptions? chatCompletionOptions = null;
            if (dataSources?.Any() == true)
            {
                chatCompletionOptions = new ChatCompletionOptions();

                foreach (var dataSource in dataSources!)
                {
#pragma warning disable AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                    chatCompletionOptions.AddDataSource(new AzureSearchChatDataSource()
                    {
                        Endpoint = new Uri(dataSource.endpoint),
                        IndexName = dataSource.indexname,
                        Authentication = DataSourceAuthentication.FromApiKey(dataSource.authentication)
                    });
#pragma warning restore AOAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
                }

            }

            return chatClient.CompleteChatStreamingAsync(
                [new SystemChatMessage("You are an helpful, wonderful AI assistant"), new UserChatMessage(message)], chatCompletionOptions);
        }

        public static async Task<string> GetStreamedReplyStringAsync(this ChatClient chatClient, string message, (string endpoint, string indexname, string authentication)[]? dataSources = null, bool outputToConsole = false)
        {
            var sb = new StringBuilder();
            await foreach (var update in GetStreamedReplyAsync(chatClient, message, dataSources))
            {
                foreach (var textReply in update.ContentUpdate.Select(cu => cu.Text))
                {
                    sb.Append(textReply);
                    if (outputToConsole)
                    {
                        Console.Write(textReply);
                    }
                }
            }
            return sb.ToString();
        }

    }
}

The updated code for the demo app then looks like this, I chose to just use tuples here for the endpoint, index name and api key:

ChatpGptDemo.cs



using OpenAI.Chat;
using OpenAIDemo;
using System.Diagnostics;

namespace ToreAurstadIT.OpenAIDemo
{
    public class ChatGptDemo
    {

        public async Task<string?> RunChatGptQuery(ChatClient? chatClient, string msg)
        {
            if (chatClient == null)
            {
                Console.WriteLine("Sorry, the demo failed. The chatClient did not initialize propertly.");
                return null;
            }

            Console.WriteLine("Searching ... Please wait..");

            var stopWatch = Stopwatch.StartNew();

            var chatDataSources = new[]{
                (
                    SearchEndPoint: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_ENDPOINT", EnvironmentVariableTarget.User) ?? "N/A",
                    SearchIndexName: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_INDEXNAME", EnvironmentVariableTarget.User) ?? "N/A",
                    SearchApiKey: Environment.GetEnvironmentVariable("AZURE_SEARCH_AI_APIKEY", EnvironmentVariableTarget.User) ?? "N/A"
                )
            };

            string reply = "";

            try
            {

                reply = await chatClient.GetStreamedReplyStringAsync(msg, dataSources: chatDataSources, outputToConsole: true);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.WriteLine($"The operation took: {stopWatch.ElapsedMilliseconds} ms");


            Console.WriteLine();

            return reply;
        }

    }
}

The code here expects that three user-specific environment variables exists. Please note that the API key can be found under the menu item Keys in Azure AI Search. There are two admin keys and multiple query keys. To distribute keys to other users, you of course share the API query key, not the admin key(s). The screenshot below shows the demo. It is a console application, it could be web application or other client :

Please note that the Free tier of Azure AI Search is rather slow and seems to only allow queryes at a certain interval, it will suffice to just test it out. To really test it out in for example an Intranet scenario, the standard tier Azure AI search service is recommended, at about 250 USD per month as noted.

Conclusions

Getting an Azure AI Chat service to work in intranet scenarios using a combination of Open AI Chat GPT-4 together with a custom collection of files that are indexed offers a nice combination of building up a knowledge base which you can query against. It is rather convenient way of building an on-premise solution for intranet AI chat service using Azure cloud services.

Monday, 2 December 2024

Azure AI OpenAI chat GPT-4 client connection

This article presents code that shows how you can connect to OpenAI Chat GPT-4 client connection. The repository for the code presented is the following GitHub repo:

https://github.com/toreaurstadboss/OpenAIDemo

The repo contains useful helper methods to use Azure AI Service and create AzureOpenAIClient or the more generic ChatClient which is a specified chat client for the AzureOpenAIClient that uses a specific ai model, default ai model to use is 'gpt-4'. The creation of chat client is done using a class with a builder pattern. To create a chat client you can simply create it like this :

Program.cs


         const string modelName = "gpt-4";

            var chatClient = AzureOpenAIClientBuilder
                .Instance
                .WithDefaultEndpointFromEnvironmentVariable()
                .WithDefaultKeyFromEnvironmentVariable()
                .BuildChatClient(aiModel: modelName);

The builder looks like this:

AzureOpenAIClientBuilder.cs



using Azure.AI.OpenAI;
using OpenAI.Chat;
using System.ClientModel;

namespace ToreAurstadIT.OpenAIDemo
{

    /// <summary>
    /// Creates AzureOpenAIClient or ChatClient (default model is "gpt-4")
    /// Suggestion:
    /// Create user-specific Environment variables for : AZURE_AI_SERVICES_KEY and AZURE_AI_SERVICES_ENDPOINT to avoid exposing endpoint and key in source code.
    /// Then us the 'WithDefault' methods to use the two user-specific environment variables, which must be set.
    /// </summary>
    public class AzureOpenAIClientBuilder
    {

        private const string AZURE_AI_SERVICES_KEY = nameof(AZURE_AI_SERVICES_KEY);
        private const string AZURE_AI_SERVICES_ENDPOINT = nameof(AZURE_AI_SERVICES_ENDPOINT);

        private string? _endpoint = null;
        private ApiKeyCredential? _key = null;

        public AzureOpenAIClientBuilder WithEndpoint(string endpoint) { _endpoint = endpoint; return this; }

        /// <summary>
        /// Usage: Provide user-specific enviornment variable called : 'AZURE_AI_SERVICES_ENDPOINT'
        /// </summary>
        /// <returns></returns>
        public AzureOpenAIClientBuilder WithDefaultEndpointFromEnvironmentVariable() { _endpoint = Environment.GetEnvironmentVariable(AZURE_AI_SERVICES_ENDPOINT, EnvironmentVariableTarget.User); return this; }
       
        
        public AzureOpenAIClientBuilder WithKey(string key) { _key = new ApiKeyCredential(key); return this; }       
        public AzureOpenAIClientBuilder WithKeyFromEnvironmentVariable(string key) { _key = new ApiKeyCredential(Environment.GetEnvironmentVariable(key) ?? "N/A"); return this; }

        /// <summary>
        /// Usage : Provide user-specific environment variable called : 'AZURE_AI_SERVICES_KEY'
        /// </summary>
        /// <returns></returns>
        public AzureOpenAIClientBuilder WithDefaultKeyFromEnvironmentVariable() { _key = new ApiKeyCredential(Environment.GetEnvironmentVariable(AZURE_AI_SERVICES_KEY, EnvironmentVariableTarget.User) ?? "N/A"); return this; }

        public AzureOpenAIClient? Build() => !string.IsNullOrWhiteSpace(_endpoint) && _key != null ? new AzureOpenAIClient(new Uri(_endpoint), _key) : null;

        /// <summary>
        /// Default model will be set 'gpt-4'
        /// </summary>
        /// <returns></returns>
        public ChatClient? BuildChatClient(string aiModel = "gpt-4") => Build()?.GetChatClient(aiModel);

        public static AzureOpenAIClientBuilder Instance => new AzureOpenAIClientBuilder();

    }
}

It is highly recommended to store your endpoint and key to the Azure AI service of course not in the source code repository, but another place, for example on your user-specific environment variable or Azure key vault or similar place hard to obtain for malicious use, for example using your account to route much traffic to Chat GPT-4 only to end up being billed for this traffic. The code provided some 'default methods' which will look for environment variables. Add the key and endpoint to your Azure AI to these user specific environment variables.

AZURE_AI_SERVICES_KEY
AZURE_AI_SERVICES_ENDPOINT

To use the chat client the following code shows how to do this:

ChatGptDemo.cs




    public async Task<string?> RunChatGptQuery(ChatClient? chatClient, string msg)
        {
            if (chatClient == null)
            {
                Console.WriteLine("Sorry, the demo failed. The chatClient did not initialize propertly.");
                return null;
            }

            var stopWatch = Stopwatch.StartNew();

            string reply = await chatClient.GetStreamedReplyStringAsync(msg, outputToConsole: true);

            Console.WriteLine($"The operation took: {stopWatch.ElapsedMilliseconds} ms");


            Console.WriteLine();

            return reply;
        }

The communication against Azure AI service with Open AI Chat-GPT service is this line:

ChatGptDemo.cs



    string reply = await chatClient.GetStreamedReplyStringAsync(msg, outputToConsole: true);

The Chat GPT-4 service will return the data streamed so you can output the result as quickly as possible. I have tested it out using Standard Service S0 tier, it is a bit slower than the default speed you get inside the browser using Copilot, but it works and if you output to the console, you get a similar experience. The code here can be used in different environments, the repo contains a console app with .NET 8.0 Framework, written in C# as shown in the code. Here is the helper methods for the ChatClient, provided as extension methods.

ChatclientExtensions.cs



using OpenAI.Chat;
using System.ClientModel;
using System.Text;

namespace OpenAIDemo
{
    public static class ChatclientExtensions
    {

        /// <summary>
        /// Provides a stream result from the Chatclient service using AzureAI services.
        /// </summary>
        /// <param name="chatClient">ChatClient instance</param>
        /// <param name="message">The message to send and communicate to the ai-model</param>
        /// <returns>Streamed chat reply / result. Consume using 'await foreach'</returns>
        public static AsyncCollectionResult<StreamingChatCompletionUpdate> GetStreamedReplyAsync(this ChatClient chatClient, string message) =>
            chatClient.CompleteChatStreamingAsync(
                [new SystemChatMessage("You are an helpful, wonderful AI assistant"), new UserChatMessage(message)]);

        public static async Task<string> GetStreamedReplyStringAsync(this ChatClient chatClient, string message, bool outputToConsole = false)
        {
            var sb = new StringBuilder();
            await foreach (var update in GetStreamedReplyAsync(chatClient, message))
            {
                foreach (var textReply in update.ContentUpdate.Select(cu => cu.Text))
                {
                    sb.Append(textReply);
                    if (outputToConsole)
                    {
                        Console.Write(textReply);
                    }
                }
            }
            return sb.ToString();
        }

    }
}

The code presented here should make it a bit easier to communicate with the Azure AI Open AI Chat GPT-4 service. See the repository to test out the code. Screenshot below shows the demo in use in a console running against the Azure AI Chat GPT-4 service :

Saturday, 16 November 2024

Url encoding base 64 strings in .NET 9

This article shows new functionality how to url encode base 64 strings in .NET 9. In .NET 8 you would do multiple steps to url encode base 64 strings like this: Program.cs



using System.Buffers.Text;
using System.Net;
using System.Text;
using System.Text.Encodings.Web;

byte[] data = Encoding.UTF8.GetBytes("Hello there, how yall doin");
var base64 = Convert.ToBase64String(data);
var base64UrlEncoded = WebUtility.UrlEncode(base64);

Console.WriteLine(base64UrlEncoded);

We here first convert the string to a bytes in a byte array and then we base 64 encode the byte array into a Base64 string. Finally we url encode the string into a URL safe string. Let's see how simple this is in .NET 9 : Program.cs (v2)



using System.Buffers.Text;
using System.Net;
using System.Text;
using System.Text.Encodings.Web;

byte[] data = Encoding.UTF8.GetBytes("Hello there, how yall doin");
var base64UrlEncodedInNet9 = Base64Url.EncodeToString(data);

Console.WriteLine(base64UrlEncodedInNet9);

If we use ImplicitUsings here in the .csproj file the code above just becomes :


byte[] data = Encoding.UTF8.GetBytes("Hello there, how yall doin");
var base64UrlEncodedInNet9 = Base64Url.EncodeToString(data);
Console.WriteLine(base64UrlEncodedInNet9);

This shows we can skip the intermediate step where we first convert the bytes into a base64-string and then into a Url safe string and instead do a base-64 encoding and then an url encoding in one go. This way is more optimized, it is also possible here to use ReadOnlySpan (that works for both .NET 8 and .NET 9). Putting together we get:


using System.Buffers.Text;
using System.Net;
using System.Text;
using System.Text.Encodings.Web;

ReadOnlySpan data = Encoding.UTF8.GetBytes("Hello there, how yall doin");
var base64 = Convert.ToBase64String(data);
var base64UrlEncoded = WebUtility.UrlEncode(base64);

var base64UrlEncodedInNet9 = Base64Url.EncodeToString(data);

Console.WriteLine(base64UrlEncoded);
Console.WriteLine(base64UrlEncodedInNet9);

The output is the following :


SGVsbG8gdGhlcmUsIGhvdyB5YWxsIGRvaW4%3D
SGVsbG8gdGhlcmUsIGhvdyB5YWxsIGRvaW4

As we can see, the .NET 9 Base64.UrlEncode skips the the padding characters, so beware of that.

Note that by omitting the padding, it is necessary to pad the base 64 url encoded string if you want to decode it. Consider this helpful extension method to add the necessary padding:



/// <summary>
/// Provides extension methods for Base64 encoding operations.
/// </summary>
public static class Base64Extensions
{
    /// <summary>
    /// Adds padding to a Base64 encoded string to ensure its length is a multiple of 4.
    /// </summary>
    /// <param name="base64">The Base64 encoded string without padding.</param>
    /// <param name="isUrlEncode">Set to true if this is URL encode, will add instead '%3D%' as padding at the end (0-2 such padding chars, same for '=').</param>
    /// <returns>The Base64 encoded string with padding added, or the original string if it is null or whitespace.</returns>
    public static string? AddPadding(this string base64, bool isUrlEncode = false)
    {
        string paddedBase64 = !string.IsNullOrWhiteSpace(base64) ? base64.PadRight(base64.Length + (4 - (base64.Length % 4)) % 4, '=') : base64;
        return !isUrlEncode ? paddedBase64 : paddedBase64?.Replace("=", "%3D");
    }    
}

We can now achieve the same output with this extension method :




using System.Buffers.Text;
using System.Net;
using System.Text;
using System.Text.Encodings.Web;

ReadOnlySpan data = Encoding.UTF8.GetBytes("Hello there, how yall doin");
var base64 = Convert.ToBase64String(data);
var base64UrlEncoded = WebUtility.UrlEncode(base64);

var base64UrlEncodedInNet9 = Base64Url.EncodeToString(data);

// Using the extension method to add padding
base64UrlEncodedInNet9 = base64UrlEncodedInNet9.AddPadding(isUrlEncode: true);

Console.WriteLine(base64UrlEncoded);
Console.WriteLine(base64UrlEncodedInNet9);

Finally, the output using the two different approaching pre .NET 9 and .NET 9 gives the same results:


SGVsbG8gdGhlcmUsIGhvdyB5YWxsIGRvaW4%3D
SGVsbG8gdGhlcmUsIGhvdyB5YWxsIGRvaW4%3D