Coding Grounds: OpenAI

Showing posts with label OpenAI. Show all posts

Monday, 31 March 2025

Generating Dall-e-3 images using Microsoft Semantic Kernel

In this demo, Dall-e-3 images are generated from a console app using Microsoft Semantic Kernel. The semantic kernel is a library that offers different plugins for different AI services. It is supported for multiple languages, these are C#, Java and Python. Its goal is to ease the use of consuming AI services and building a shared infrastructure for these services and offer a way to conceptualize and abstract the consumption of these services. It can also be seen as a middleware for the services and offering a framework where consuming AI services becomes a more standardized process. A Github repo has been created with the code for this demo here:

Github repo for this demo

Dall-e-3 image generator with semantic kernel

The demo contains two steps, first building the semantic kernel itself and then the image generation. First off, the .csproj file has package references to the latest as of March 2025 nuget package of Microsoft Semantic Kernel.

DalleImageGeneratorWithSemanticKernel.csproj



<Project Sdk="Microsoft.NET.Sdk"> 

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <NoWarn>$(NoWarn);CS8618,IDE0009,CA1051,CA1050,CA1707,CA1054,CA2007,VSTHRD111,CS1591,RCS1110,RCS1243,CA5394,SKEXP0001,SKEXP0010,SKEXP0020,SKEXP0040,SKEXP0050,SKEXP0060,SKEXP0070,SKEXP0101,SKEXP0110</NoWarn>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.SemanticKernel" Version="1.44.0" />
    <PackageReference Include="Microsoft.Extensions.Configuration.Json" Version="8.0.1" />
  </ItemGroup>

</Project>

Note that multiple warnings are marked as no warning as semantic kernel is open for change in the future and thus flags multiple different warnings. The image generation demo is set up like this in the class ImageGeneration. Note how the Kernel object is built up here. It got a builder that offers many methods to add AI services. In this case we add an ITextToImageService. The modelName used here is "dall-e-3".

ImageGeneration.cs



using DalleImageGeneratorWithSemanticKernel;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.TextToImage;
using OpenAI.Images;
using System;
using System.Diagnostics;

namespace UseSemanticKernelFromNET;

public class ImageGeneration
{
    public async Task GenerateBasicImage(string modelName)
    {
        Kernel kernel = Kernel
            .CreateBuilder()
            .AddOpenAITextToImage(modelId:modelName, apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!).Build();

        ITextToImageService imageService = kernel.GetRequiredService<ITextToImageService>();

        Console.WriteLine("##### SEMANTIC KERNEL - IMAGE GENERATOR DALL-E-3 CONSOLE APP #####\n\n");


        string prompt =
           """
            In the humorous image, Vice President JD Vance and his wife are seen stepping out of their plane onto the icy runway of
            Thule Air Base. Just as they set foot on the frozen ground, a bunch of playful polar bears greet them enthusiastically, much like 
            overzealous fans welcoming celebrities. The surprised expressions on their faces are priceless as the couple finds 
            themselves being "chased" by these bundles of fur and excitement. JD Vance, with a mix of amusement and alarm, has one 
            shoe comically left behind in the snow, while his wife, holding onto her hat against the chilly wind, can't suppress a laugh.
            The scene is completed with members of the Air Base
            staff in the background, chuckling and capturing the moment on their phones, adding to the light-heartedness of the unexpected encounter.  
            The plane should carry the AirForce One Colors and read "United States of America". 
         """;

        Console.WriteLine($"\n ### STORY FOR THE IMAGE TO GENERATE WITH DALL-E-3 ### \n{prompt}\n\n");

        Console.WriteLine("\n\nStarting generation of dall-e-3 image...");

        var cts = new CancellationTokenSource();
        var cancellationToken = cts.Token;

        var rotationTask = Task.Run(() => ConsoleUtil.RotateDash(cancellationToken), cts.Token);

        var image = await imageService.GetOpenAIImageContentAsync(prompt,
            kernel: kernel,
            size: (1024, 1024), //for Dall-e-2 images, use: 256x256, 512x512, or 1024x1024. For dalle-3 images, use: 1024x1024, 1792x1024, 1024x1792. 
            style: "vivid",
            quality: "hd", //high
            responseFormat: "b64_json", // bytes
            cancellationToken: cancellationToken);       
        
        cts.Cancel(); //cancel to stop animating the waiting indicator

        var imageTmpFilePng = Path.ChangeExtension(Path.GetTempFileName(), "png");
        image?.FirstOrDefault()?.WriteToFile(imageTmpFilePng);

        Console.WriteLine($"Wrote image to location: {imageTmpFilePng}");

        Process.Start(new ProcessStartInfo
        {
            FileName = "explorer.exe",
            Arguments = imageTmpFilePng,
            UseShellExecute = true
        });

    }

}

A helper extension method has been added for the Open AI Dall-e-3 image creation. Please note that one should stick to not too many extension methods of semantic kernel itself as this defeats the purpose of a standardized way of using the semantic kernel. But in this case, it is just a helper method to customize the generation of particularly dall-e-3 (and dall-e-2) images from Open AI using the Semantic kernel. The code is shown below

TextToImageServiceExtensions.cs



using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Services;
using Microsoft.SemanticKernel.TextToImage;

namespace UseSemanticKernelFromNET;

public static class TextToImageServiceExtensions
{


    /// <summary>
    /// Generates OpenAI image content asynchronously based on the provided text input and settings.
    /// </summary>
    /// <param name="imageService">The image service used to generate the image content.</param>
    /// <param name="input">The text input used to generate the image.</param>
    /// <param name="kernel">An optional kernel instance for additional processing.</param>
    /// <param name="size">
    /// The desired size of the generated image. For DALL-E 2 images, use: 256x256, 512x512, or 1024x1024. 
    /// For DALL-E 3 images, use: 1024x1024, 1792x1024, or 1024x1792.
    /// </param>
    /// <param name="style">The style of the image. Must be "vivid" or "natural".</param>
    /// <param name="quality">The quality of the image. Must be "standard", "hd", or "high".</param>
    /// <param name="responseFormat">
    /// The format of the response. Must be one of the following: "url", "uri", "b64_json", or "bytes".
    /// </param>
    /// <param name="cancellationToken">A token to monitor for cancellation requests.</param>
    /// <returns>
    /// A task that represents the asynchronous operation. The task result contains a read-only list of 
    /// <see cref="ImageContent"/> objects representing the generated images.
    /// </returns>
    public static Task<IReadOnlyList<ImageContent>> GetOpenAIImageContentAsync(this ITextToImageService imageService,
        TextContent input,
        Kernel? kernel = null,
        (int width, int height) size = default((int, int)), // for Dall-e-2 images, use: 256x256, 512x512, or 1024x1024. For dalle-3 images, use: 1024x1024, 1792x1024, 1024x1792. 
        string style = "vivid",
        string quality = "hd",
        string responseFormat = "b64_json",        
        CancellationToken cancellationToken = default)
    {
        
        string? currentModelId = imageService.GetModelId();

        if (currentModelId != "dall-e-3" && currentModelId != "dall-e-2")
        {
            throw new NotSupportedException("This method is only supported for the DALL-E 2 and DALL-E 3 models.");
        }

        if (size.width == 0 || size.height == 0)
        {
            size = (1024, 1024); //defaulting here to (1024, 1024).
        }

        if (currentModelId == "dall-e-2"){
            var supportedSizes = new[]{
                (256, 256),
                (512, 512),
                (1024, 1024)
            };
            if (!supportedSizes.Contains(size))
            {
                throw new ArgumentException("For DALL-E 2, the size must be one of: 256x256, 512x512, or 1024x1024.");
            }
        }
        else if (currentModelId == "dall-e-3")
        {
            var supportedSizes = new[]{
                (1024, 1024),
                (1792, 1024),
                (1024, 1792)
            };
            if (!supportedSizes.Contains(size))
            {
                throw new ArgumentException("For DALL-E 3, the size must be one of: 256x256, 512x512, or 1024x1024.");
            }
        }

        return imageService.GetImageContentsAsync(
            input,
            new OpenAITextToImageExecutionSettings
                {
                    Size = size,
                    Style = style, //must be "vivid" or "natural"
                    Quality = quality, //must be "standard" or "hd" or "high"
                    ResponseFormat = responseFormat // url or uri or b64_json or bytes
                },
            kernel,
            cancellationToken);

    }
}

Screenshot of this demo, console app running:

The console app will generate the dall-e-3 image using OpenAI service for this and save the image as a PNG image and save it into file saved into a temporary location and then open this image using Windows default image viewer application. Example image generated :

Monday, 2 December 2024

Azure AI OpenAI chat GPT-4 client connection

This article presents code that shows how you can connect to OpenAI Chat GPT-4 client connection. The repository for the code presented is the following GitHub repo:

https://github.com/toreaurstadboss/OpenAIDemo

The repo contains useful helper methods to use Azure AI Service and create AzureOpenAIClient or the more generic ChatClient which is a specified chat client for the AzureOpenAIClient that uses a specific ai model, default ai model to use is 'gpt-4'. The creation of chat client is done using a class with a builder pattern. To create a chat client you can simply create it like this :

Program.cs


         const string modelName = "gpt-4";

            var chatClient = AzureOpenAIClientBuilder
                .Instance
                .WithDefaultEndpointFromEnvironmentVariable()
                .WithDefaultKeyFromEnvironmentVariable()
                .BuildChatClient(aiModel: modelName);

The builder looks like this:

AzureOpenAIClientBuilder.cs



using Azure.AI.OpenAI;
using OpenAI.Chat;
using System.ClientModel;

namespace ToreAurstadIT.OpenAIDemo
{

    /// <summary>
    /// Creates AzureOpenAIClient or ChatClient (default model is "gpt-4")
    /// Suggestion:
    /// Create user-specific Environment variables for : AZURE_AI_SERVICES_KEY and AZURE_AI_SERVICES_ENDPOINT to avoid exposing endpoint and key in source code.
    /// Then us the 'WithDefault' methods to use the two user-specific environment variables, which must be set.
    /// </summary>
    public class AzureOpenAIClientBuilder
    {

        private const string AZURE_AI_SERVICES_KEY = nameof(AZURE_AI_SERVICES_KEY);
        private const string AZURE_AI_SERVICES_ENDPOINT = nameof(AZURE_AI_SERVICES_ENDPOINT);

        private string? _endpoint = null;
        private ApiKeyCredential? _key = null;

        public AzureOpenAIClientBuilder WithEndpoint(string endpoint) { _endpoint = endpoint; return this; }

        /// <summary>
        /// Usage: Provide user-specific enviornment variable called : 'AZURE_AI_SERVICES_ENDPOINT'
        /// </summary>
        /// <returns></returns>
        public AzureOpenAIClientBuilder WithDefaultEndpointFromEnvironmentVariable() { _endpoint = Environment.GetEnvironmentVariable(AZURE_AI_SERVICES_ENDPOINT, EnvironmentVariableTarget.User); return this; }
       
        
        public AzureOpenAIClientBuilder WithKey(string key) { _key = new ApiKeyCredential(key); return this; }       
        public AzureOpenAIClientBuilder WithKeyFromEnvironmentVariable(string key) { _key = new ApiKeyCredential(Environment.GetEnvironmentVariable(key) ?? "N/A"); return this; }

        /// <summary>
        /// Usage : Provide user-specific environment variable called : 'AZURE_AI_SERVICES_KEY'
        /// </summary>
        /// <returns></returns>
        public AzureOpenAIClientBuilder WithDefaultKeyFromEnvironmentVariable() { _key = new ApiKeyCredential(Environment.GetEnvironmentVariable(AZURE_AI_SERVICES_KEY, EnvironmentVariableTarget.User) ?? "N/A"); return this; }

        public AzureOpenAIClient? Build() => !string.IsNullOrWhiteSpace(_endpoint) && _key != null ? new AzureOpenAIClient(new Uri(_endpoint), _key) : null;

        /// <summary>
        /// Default model will be set 'gpt-4'
        /// </summary>
        /// <returns></returns>
        public ChatClient? BuildChatClient(string aiModel = "gpt-4") => Build()?.GetChatClient(aiModel);

        public static AzureOpenAIClientBuilder Instance => new AzureOpenAIClientBuilder();

    }
}

It is highly recommended to store your endpoint and key to the Azure AI service of course not in the source code repository, but another place, for example on your user-specific environment variable or Azure key vault or similar place hard to obtain for malicious use, for example using your account to route much traffic to Chat GPT-4 only to end up being billed for this traffic. The code provided some 'default methods' which will look for environment variables. Add the key and endpoint to your Azure AI to these user specific environment variables.

AZURE_AI_SERVICES_KEY
AZURE_AI_SERVICES_ENDPOINT

To use the chat client the following code shows how to do this:

ChatGptDemo.cs




    public async Task<string?> RunChatGptQuery(ChatClient? chatClient, string msg)
        {
            if (chatClient == null)
            {
                Console.WriteLine("Sorry, the demo failed. The chatClient did not initialize propertly.");
                return null;
            }

            var stopWatch = Stopwatch.StartNew();

            string reply = await chatClient.GetStreamedReplyStringAsync(msg, outputToConsole: true);

            Console.WriteLine($"The operation took: {stopWatch.ElapsedMilliseconds} ms");


            Console.WriteLine();

            return reply;
        }

The communication against Azure AI service with Open AI Chat-GPT service is this line:

ChatGptDemo.cs



    string reply = await chatClient.GetStreamedReplyStringAsync(msg, outputToConsole: true);

The Chat GPT-4 service will return the data streamed so you can output the result as quickly as possible. I have tested it out using Standard Service S0 tier, it is a bit slower than the default speed you get inside the browser using Copilot, but it works and if you output to the console, you get a similar experience. The code here can be used in different environments, the repo contains a console app with .NET 8.0 Framework, written in C# as shown in the code. Here is the helper methods for the ChatClient, provided as extension methods.

ChatclientExtensions.cs



using OpenAI.Chat;
using System.ClientModel;
using System.Text;

namespace OpenAIDemo
{
    public static class ChatclientExtensions
    {

        /// <summary>
        /// Provides a stream result from the Chatclient service using AzureAI services.
        /// </summary>
        /// <param name="chatClient">ChatClient instance</param>
        /// <param name="message">The message to send and communicate to the ai-model</param>
        /// <returns>Streamed chat reply / result. Consume using 'await foreach'</returns>
        public static AsyncCollectionResult<StreamingChatCompletionUpdate> GetStreamedReplyAsync(this ChatClient chatClient, string message) =>
            chatClient.CompleteChatStreamingAsync(
                [new SystemChatMessage("You are an helpful, wonderful AI assistant"), new UserChatMessage(message)]);

        public static async Task<string> GetStreamedReplyStringAsync(this ChatClient chatClient, string message, bool outputToConsole = false)
        {
            var sb = new StringBuilder();
            await foreach (var update in GetStreamedReplyAsync(chatClient, message))
            {
                foreach (var textReply in update.ContentUpdate.Select(cu => cu.Text))
                {
                    sb.Append(textReply);
                    if (outputToConsole)
                    {
                        Console.Write(textReply);
                    }
                }
            }
            return sb.ToString();
        }

    }
}

The code presented here should make it a bit easier to communicate with the Azure AI Open AI Chat GPT-4 service. See the repository to test out the code. Screenshot below shows the demo in use in a console running against the Azure AI Chat GPT-4 service :

Coding Grounds