Coding Grounds

Monday, 22 August 2022

Splitting a ReadOnlySpan by a separator

This article will look into using the ReadonlySpan and doing an equivalent string split operation. The method uses ReadOnlySpan of type T (char) to split into 'words' or 'tokens', separated by a split char (separator). A span was introduced in C# 7.3 and can be used in .net standard 2.0 or .net framework using the Nuget package System.Memory.



  <ItemGroup>
    <PackageReference Include="System.Memory" Version="4.5.5" />
  </ItemGroup>

If you use newer target frameworks, you will get Span included as long as C# 7.3 is supported. The code here is just demonstration code, it successfully splits a ReadOnlySpan of T (char) using the contiguous memory on the stack, however, we must still convert to string here the 'tokens' or 'words' after the split operation. Note the usage of Slice method here to retrieve a range from the ReadOnlySpan of char here, we use a List of string to get the words or 'tokens'. It would be nice to somehow avoid string as much as possible, but we want to have an array of strings back anyways, so a List of string is used here to get the 'tokens'. What would be optimal would be to just return the split indexes as the code already extracts here and return those split indexes, which later could be used to build a string array. We have all the characters in the ReadOnlySpan of char here, so only having the split indexes would be sufficient. However, this would
from the consumer side be a bit cumbersome. You could though have a method like 'get nth word' using the split indexes here and so on.

 

using System;
using System.Collections.Generic;

namespace SpanStringSplit
{
    public static class SpanExtensions
    {

        public static string[] SplitViaSpan(this string input, char splitChar, StringSplitOptions splitOptions)
        {
            if (string.IsNullOrWhiteSpace(input) || input.IndexOf(splitChar) < 0)
            {
                return new string[] { input };
            }
            var tokens = SplitSpan(input.AsSpan(), splitChar, splitOptions);
            return tokens; 
        }

        public static string[] SplitSpan(this ReadOnlySpan<char> inputSpan, char splitChar, StringSplitOptions splitOptions)
        {
            if (inputSpan == null)
            {
                return new string[] { null };
            }
            if (inputSpan.Length == 0)
            {
                return splitOptions == StringSplitOptions.None ? new string[] { string.Empty } : new string[0]; 
            }
            bool isSplitCharFound = false; 
            foreach (char letter in inputSpan)
            {
                if (letter == splitChar)
                {
                    isSplitCharFound = true;
                    break;
                }
            }
            if (!isSplitCharFound)
            {
                return new string[] { inputSpan.ToString() }; 
            }

            bool IsTokenToBeAdded(string token) => !string.IsNullOrWhiteSpace(token) || splitOptions == StringSplitOptions.None;

            var splitIndexes = new List<int>();
            var tokens = new List<string>();
            int charIndx = 0;
            foreach (var ch in inputSpan)
            {
                if (ch == splitChar)
                {
                    splitIndexes.Add(charIndx);
                }
                charIndx++;
            }
            int currentSplitIndex = 0;
            foreach (var indx in splitIndexes)
            {
                if (currentSplitIndex == 0)
                {
                    string firstToken = inputSpan.Slice(0, splitIndexes[0]).ToString();
                    if (IsTokenToBeAdded(firstToken))
                    {
                        tokens.Add(firstToken);
                    }
                }
                else if (currentSplitIndex <= splitIndexes.Count)
                {
                    string intermediateToken = inputSpan.Slice(splitIndexes[currentSplitIndex - 1] + 1, splitIndexes[currentSplitIndex] - splitIndexes[currentSplitIndex - 1] - 1).ToString();
                    if (IsTokenToBeAdded(intermediateToken))
                    {
                        tokens.Add(intermediateToken);
                    }
                }
                currentSplitIndex++;
            }
            string lastToken = inputSpan.Slice(splitIndexes[currentSplitIndex - 1] + 1).ToString();
            if (IsTokenToBeAdded(lastToken))
            {
                tokens.Add(lastToken);
            }
            return tokens.ToArray();
        }

    }
}

And we have our suceeding unit tests :

 

using NUnit.Framework;
using System;

namespace SpanStringSplit.Test
{
    [TestFixture]
    public class SpanExtensionsSpec
    {
        [Test]
        public void SplitStringsViaSpan()
        {
            var tokens = ",,The,quick,brown,fox,jumped,over,the,lazy,,dog".SplitViaSpan(',', StringSplitOptions.RemoveEmptyEntries);
            CollectionAssert.AreEqual(new string[] { "The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog" }, tokens);
        }

        [Test]
        public void SplitStringsUsingSpan()
        {
            ReadOnlySpan<char> s = ",,The,quick,brown,fox,jumped,over,the,lazy,,dog".ToCharArray();                
            var tokens = s.SplitSpan(',', StringSplitOptions.RemoveEmptyEntries);
            CollectionAssert.AreEqual(new string[] { "The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog" }, tokens);
        }

    }
}

To sum up - we can use Span here to get an contiguous space of memory (stack in this case). To get a span from a string we use the extension method 'AsSpan()' To get a string from a range of a span (Slice), we just use the Slice method and then call ToString().

The following code then extracts the nth token (or word) only using the stack and first extracts the split indices (considering up to a given index + 1 of split indices if available) and then using the Splice method to get the chars of the token or 'word'.

 


        public static string GetNthToken(this ReadOnlySpan<char> inputSpan, char splitChar, int nthToken)
        {
            if (inputSpan == null)
            {
                return null;
            }
            int[] splitIndexes = inputSpan.SplitIndexes(splitChar, nthToken); 
            if (splitIndexes.Length == 0)
            {
                return inputSpan.ToString();
            }
            if (nthToken == 0 && splitIndexes.Length > 0)
            {
                return inputSpan.Slice(0, splitIndexes[0]).ToString(); 
            }
            if (nthToken > splitIndexes.Length)
            {
                return null; 
            }
            if (nthToken == splitIndexes.Length)
            {
                var split = inputSpan.Slice(splitIndexes[nthToken-1]+1).ToString();
                return split; 
            }
            if (nthToken <= splitIndexes.Length + 1)
            {
                var split = inputSpan.Slice(splitIndexes[nthToken-1]+1, splitIndexes[nthToken] - splitIndexes[nthToken-1]-1).ToString();
                return split; 
            }
            return null; 

        }

        public static int[] SplitIndexes(this ReadOnlySpan<char> inputSpan, char splitChar,
            int? highestSplitIndex = null)
        {
            if (inputSpan == null)
            {
                return Array.Empty<int>();
            }
            if (inputSpan.Length == 0)
            {
                return Array.Empty<int>();
            }
            bool isSplitCharFound = false;
            foreach (char letter in inputSpan)
            {
                if (letter == splitChar)
                {
                    isSplitCharFound = true;
                    break;
                }
            }
            if (!isSplitCharFound)
            {
                return Array.Empty<int>();
            }
         
            var splitIndexes = new List<int>();
            var tokens = new List<string>();
            int charIndex = 0;
            foreach (var ch in inputSpan)
            {
                if (ch == splitChar)
                {
                    if (highestSplitIndex.HasValue && highestSplitIndex + 1 < splitIndexes.Count)
                    {
                        break; 
                    }
                    splitIndexes.Add(charIndex);
                }
                charIndex++; 
            }
            return splitIndexes.ToArray(); 
        }

Now why would you use this instead of just sticking to ordinary string class methods. The main goal was to look into Span and how we can use it to look at contiguous memory and work at sub parts of this memory using the Slice method. In some applications, such as games and graphics in general, such micro optimizations are more important to avoid allocating a lot of string variables. Finding the split incides first (up to a given index if available) and then retrieving the nth token or word can be very useful instead of spitting into an array of strings. The unit tests are also passing for GetNthToken method :

 
        [Test]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 5, "fox")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 0, "")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 1, "")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 2, "The")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 3, "quick")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 7, "over")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 11, "dog")]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 12, null)]
        [TestCase(",, The, quick, brown, fox, jumped, over, the, lazy,, dog", 13, null)]
        public void GetNthWord(string input, int nthWord, string expectedWord)
        {
            ReadOnlySpan<char> s = ",,The,quick,brown,fox,jumped,over,the,lazy,,dog".ToCharArray();
            var word = s.GetNthToken(',', nthWord);
            Assert.AreEqual(word, expectedWord); 
        }

Sunday, 7 August 2022

Fiks for Tieto Min Arbeidsplan Auto-complete funksjon

Flere bruker Tieto Min Arbeidsplan på jobb i offentlig sektor. Dette produktet har en stor feil i seg når man skal redigere feks standardoppgaver. Når man skal søke opp en prosjektkode og man har mange koder, så ser man problemet. I stedet for å filtrere eller scrolle ned til riktig kode som
matcher det man har skrevet inn, så blir matchende elementer stylet med uthevet tekst (bold) og man scroller ikke. Dette er egentlig håpløst UI funksjonalitet. Her er en hotfix du kan gjøre. 1. Trykk F12 i nettleseren for å åpne Utviklingsverktøy. Testet OK med Firefox, Edge Chromium og Chrome. 2. Velg fanen Konsoll / Console. 3. Lim så inn Javascript funksjonen her :

 
 


(function() {

document.getElementsByClassName("ui-select-search")[0].addEventListener("keydown", function(evt){
 var searchQueryText = evt.srcElement.value; 
 var rowsInSelect = document.getElementsByClassName("ui-select-choices-row");
 for (var i=0;i<rowsInSelect.length;i++) { 
    var rowInSelect = rowsInSelect[i];
    var targetInnerDiv = rowInSelect.querySelector('div');
    //debugger
    if (targetInnerDiv != null && i >= 0 && searchQueryText.length >= 3 && targetInnerDiv.textContent.toLowerCase().indexOf(searchQueryText.toLowerCase()) >= 0) { 
      rowInSelect.scrollIntoView();
      break;
    }   
 }     
});

})();

Forklaring: Dette er en 'iffy', som er en Javascript funksjon som kaller seg selv etter å ha blitt opprettet. Vi legger en til event listener på 'keydown' eventen for søkefeltet som har css klassen 'ui-select-search' (dvs. alle slike søkeelementer, vanligvis kun 1 søkefelt der man er inne på siden 'Rediger standardoppgaver'. Når vi har 'keydown' og skriver på tastaturet så søker vi også opp alle elementer i DOM-en (Document Object Model, HTML-ens trestruktur av elementer/noder) som har
css klassen 'ui-select-choices-row'. Så itererer vi vha en for-løkke alle elementene vi finner her og vi ser på children av hvert element sin div tag. Hvis vi finner en substring som matcher (case insensitivt)
og man har skrevet
tre tegn, så scroller man matchende rad element into view, altså slik at man scroller slik at matchende rad er synlig. Det er ikke altså lagt til noe filtrering her siden det ble litt mer kompleks patch, i stedet er dette en viktig scrolle fix så man slipper å bruke masse tid på å manuelt scrolle etter hvilken rad fikk styling med uthevet tekst. Forhåpentligvis får Tieto fikset denne feilen / bugen snart.

Sunday, 24 July 2022

Generic repository pattern for Azure Cosmos DB

I have looked into Azure Cosmos DB to learn a bit about this schemaless database in Azure cloud. It is a powerful 'document database' which saves and loads data inside 'containers' in databases in Azure. You can work against this database strongly typed in C# by for example creating a repository pattern. The code I have made is in a class library which you can clone from here:

 
 git clone https://github.com/toreaurstadboss/AzureCosmosDbRepositoryLib.git

To get started with Azure Cosmos DB, you must create a user first that is against Azure Cosmos DB. This will be your 'db user' in the cloud of course. When you start up Azure Cosmos DB, select the data explorer tab to view your data, where you can enter manual queries and look at data (and manipulate it). Note that there are already more official packages for repository pattern .NET SDK available by David Pine, which you should consider using as shown here: https://docs.microsoft.com/en-us/events/azure-cosmos-db-azure-cosmos-db-conf/a-deep-dive-into-the-cosmos-db-repository-pattern-dotnet-sdk However, I have also published and pushed a simple GitHub repo I have created here which might be easier to get started with and understand. My goal anyways was to have a learning experience myself with testing out Azure Cosmos DB. The Github repo is available here: https://github.com/toreaurstadboss/AzureCosmosDbRepositoryLib The methods of the repository is listed inside IRepository :

 
  using AzureCosmosDbRepositoryLib.Contracts;
using Microsoft.Azure.Cosmos;
using System.Linq.Expressions;

namespace AzureCosmosDbRepositoryLib;


/// <summary>
/// Repository pattern for Azure Cosmos DB
/// </summary>
public interface IRepository<T> where T : IStorableEntity
{

    /// <summary>
    /// Adds an item to container in DB. 
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="item"></param>
    /// <returns></returns>
    Task<ISingleResult<T>?> Add(T item);

    /// <summary>
    /// Retrieves an item to container in DB. Param <paramref name="partitionKey"/> and param <paramref name="id"/> should be provided.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="id"></param>
    /// <returns></returns>
    Task<ISingleResult<T>?> Get(IdWithPartitionKey id);

    /// <summary>
    /// Searches for a matching items by predicate (where condition) given in <paramref name="searchRequest"/>.
    /// </summary>
    /// <param name="searchRequest"></param>
    /// <returns></returns>
    Task<ICollectionResult<T>?> Find(ISearchRequest<T> searchRequest);

    /// <summary>
    /// Searches for a matching items by predicate (where condition) given in <paramref name="searchRequest"/>.
    /// </summary>
    /// <param name="searchRequest"></param>
    /// <returns></returns>
    Task<ISingleResult<T>?> FindOne(ISearchRequest<T> searchRequest);

    /// <summary>
    /// Removes an item from container in DB. Param <paramref name="partitionKey"/> and param <paramref name="id"/> should be provided.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="partitionKey"></param>
    /// <param name="id"></param>
    /// <returns></returns>
    Task<ISingleResult<T>?> Remove(IdWithPartitionKey id);

    /// <summary>
    /// Removes items from container in DB. Param <paramref name="ids"/> must be provided.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="partitionKey"></param>
    /// <param name="id"></param>
    /// <returns></returns>
    Task<ICollectionResult<T>?> RemoveRange(List<IdWithPartitionKey> ids);

    /// <summary>
    /// Adds a set of items to container in DB. A shared partitionkey is used and the items are added inside a transaction as a single operation.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="items"></param>
    /// <param name="partitionKey"></param>
    /// <returns></returns>
    Task<ICollectionResult<T>?> AddRange(IDictionary<PartitionKey, T> items);

    /// <summary>
    /// Adds or updates items via 'Upsert' method in container in DB. 
    /// </summary>
    /// <param name="item"></param>
    /// <returns></returns>
    Task<ICollectionResult<T>?> AddOrUpdateRange(IDictionary<PartitionKey, T> items);


    /// <summary>
    /// Adds or updates an item via 'Upsert' method in container in DB. 
    /// </summary>
    /// <param name="item"></param>
    /// <returns></returns>
    Task<ISingleResult<T>?> AddOrUpdate(T item);

    /// <summary>
    /// Retrieves results paginated of page size. Looks at all items of type <typeparamref name="T"/> in the container. Send in a null value for continuationToken in first request and then use subsequent returned continuation tokens to 'sweep through' the paged data divided by <paramref name="pageSize"/>.
    /// </summary>
    /// <param name="pageSize"></param>
    /// <param name="continuationToken"></param>
    /// <param name="sortDescending">If true, sorting descending (sorting via LastUpdate property so newest items shows first)</param>
    /// <returns></returns>
    Task<IPaginatedResult<T>?> GetAllPaginated(int pageSize, string? continuationToken = null, bool sortDescending = false, Expression<Func<T, object>>[]? sortByMembers = null);

    /// <summary>
    /// On demand method exposed from exposing this respository on demands. Frees up resources such as CosmosClient object inside.
    /// </summary>
    void Dispose();

    /// <summary>
    /// Returns name of database in Azure Cosmos DB
    /// </summary>
    /// <returns></returns>
    string? GetDatabaseName();

    /// <summary>
    /// Returns Container id inside database in Azure Cosmos DB
    /// </summary>
    /// <returns></returns>
    string? GetContainerId(); 

}

Let's first look at retrieving the paginated results using a 'continuation token' which is how you do paging inside Azure Cosmos DB where this token is a 'bookmark'.

 
 
  public async Task<IPaginatedResult<T>?> GetAllPaginated(int pageSize, string? continuationToken = null, bool sortDescending = false,
        Expression<Func<T, object>>[]? sortByMembers = null)
    {
        string sortByMemberNames = sortByMembers == null ? "c.LastUpdate" :
            string.Join(",", sortByMembers.Select(x => "c." + x.GetMemberName()).ToArray()); 
        var query = new QueryDefinition($"SELECT * FROM c ORDER BY {sortByMemberNames} {(sortDescending ? "DESC" : "ASC")}".Trim()); //default query - will filter to type T via 'ItemQueryIterator<T>' 
        var queryRequestOptions = new QueryRequestOptions
        {
            MaxItemCount = pageSize
        };
        var queryResultSetIterator = _container.GetItemQueryIterator<T>(query, requestOptions: queryRequestOptions,
            continuationToken: continuationToken);
        var result = queryResultSetIterator.HasMoreResults ? await queryResultSetIterator.ReadNextAsync() : null;
        if (result == null)
            return null!;

        var sourceContinuationToken = result.ContinuationToken;
        var paginatedResult = new PaginatedResult<T>(sourceContinuationToken, result.Resource);
        return paginatedResult;

    }

We sort by LastUpdate member default and we send in the pagesize, sorting ascending default and allowing to specify sorting members. A helper method to get the name of the property expressions to use as sorting member is also used here. We get a query item iterator from the 'container' and then read the found items which is in the Resource property, all asynchronously. Note that we return the continutation token here in the end and we initially send in null as the continuation token. Each call to getting a new page will get a new continuation token so we can browse through the data in pages. When the continuation token is null, we have come to the end of the data. PaginatedResult looks like this:

 
 
namespace AzureCosmosDbRepositoryLib.Contracts
{

    public interface IPaginatedResult<T>
    {
        public IList<T> Items { get; }
        public string? ContinuationToken { get; set; }
    }

    public class PaginatedResult<T> : IPaginatedResult<T>
    {
        public PaginatedResult(string continuationToken, IEnumerable<T> items)
        {
            Items = new List<T>();
            if (items != null)
            {
                foreach (var item in items)
                {
                    Items.Add(item);
                }
            }
            ContinuationToken = continuationToken;
        }
        public IList<T> Items { get; private set; }
        public string? ContinuationToken { get; set; }
    }
}

Another thing in the lib is the contract IStorableEntity, which is a generic interface of type T, which defined a Id property - note the usage of JsonProperty attribute. Also, we set up a partition key for the item here.

 
 
using Microsoft.Azure.Cosmos;
using Newtonsoft.Json;

namespace AzureCosmosDbRepositoryLib.Contracts
{
    public interface IStorableEntity
    {
        [JsonProperty("id")]
        string Id { get; set; }

        PartitionKey? PartitionKey { get; }

        DateTime? LastUpdate { get; set; }
    }
}

It is important to both have set the id and partitionkey when you save, update and delete items in container in Azure Cosmos DB so it works as expected. There are other methods in this repo as seen in the IRepository interface. The repository class will take care of creating the database and container in Azure Cosmos DB if required. Note also that it is important in intranet scenarios to set up Gateway connection mode. This is done default and the reason why this is done is because of firewall issues.

 
 
  private void InitializeDatabaseAndContainer(CosmosClientOptions? clientOptions, ThroughputProperties? throughputPropertiesForDatabase, bool defaultToUsingGateway)
    {
        _client = clientOptions == null ?
            defaultToUsingGateway ?
            new CosmosClient(_connectionString, new CosmosClientOptions
            {
                ConnectionMode = ConnectionMode.Gateway //this is the connection mode that works best in intranet-environments and should be considered as best compatible approach to avoid firewall issues
            }) :
            new CosmosClient(_connectionString) :
            new CosmosClient(_connectionString, _cosmosClientOptions);

        //Run initialization 
        if (throughputPropertiesForDatabase == null)
        {
            _database = Task.Run(async () => await _client.CreateDatabaseIfNotExistsAsync(_databaseName)).Result; //create the database if not existing (will go for default options regarding scaling)
        }
        else
        {
            _database = Task.Run(async () => await _client.CreateDatabaseIfNotExistsAsync(_databaseName, throughputPropertiesForDatabase)).Result; //create the database if not existing - specify specific through put options
        }

        // The container we will create.  
        _container = Task.Run(async () => await _database.CreateContainerIfNotExistsAsync(_containerId, _partitionKeyPath)).Result;
    }

Another example using another iterator than the item query generator is the linq query generator. This is used inside the Find method :

 
 public async Task<ICollectionResult<T>?> Find(ISearchRequest<T>? searchRequest)
    {
        if (searchRequest?.Filter == null)
            return await Task.FromResult<ICollectionResult<T>?>(null);
        var linqQueryable = _container.GetItemLinqQueryable<T>();
        var stopWatch = Stopwatch.StartNew();
        try
        {
            using var feedIterator = linqQueryable.Where(searchRequest.Filter).ToFeedIterator();
            while (feedIterator.HasMoreResults)
            {
                var items = await feedIterator.ReadNextAsync();
                var result = BuildSearchResultCollection(items.Resource);
                result.ExecutionTimeInMs = stopWatch.ElapsedMilliseconds;
                return result;
            }
        }
        catch (Exception err)
        {
            return await Task.FromResult(BuildSearchResultCollection(err));
        }
        return await Task.FromResult<ICollectionResult<T>?>(null);
    }

 
 
using System.Linq.Expressions;
namespace AzureCosmosDbRepositoryLib.Contracts
{
    public interface ISearchRequest<T>
    {
        Expression<Func<T, bool>>? Filter { get; set; }
    }

    public class SearchRequest<T> : ISearchRequest<T>
    {
        public Expression<Func<T, bool>>? Filter { get; set; }
    }
}

Note that this lib is using Microsoft.Azure.Cosmos of version 3.29. There are differences between the major version obviously, so the methods shown here applies to Azure Cosmos DB 3.x. This is the only nuget package this lib requires. You should consider the SDK that David Pine created, but if you want to create a repository pattern against Azure Cosmos DB - then maybe you find my repository pattern a starting point and you can borrow some code from it. One final note - I had troubles doing a batch insert in the lib for a type of T using transactions in Azure Cosmos DB, that this seems to require a common partition key - it ended up in a colission. So the AddRange method in the lib is not batched with one partition but done sequentially looping through the items for now.. Other than that - the lib should work for some core usages in ordinary scenarios. The lib should log errors a bit better too, so the lib is primarily for demonstration usages and showing essential CRUD operations in Azure Cosmos DB. Note that the connection string should be saved into a appsettings.json file for example or in dev environments consider using dotnet user secrets as I have done so we do not expose secrets to source control. The connection string is shown under the 'Keys' tab in the Azure cosmos. You will look for 'primary connetion string' here as this is how you connect to your database and container(s), where data resides. Use the 'data explorer' tab to work with the data.