Monday, 28 October 2024

Enumerating concurrent collections with snapshots in C#

In standard collections in C#, it is not allowed to alter collections you iterate upon using foreach for example, since it throws InvalidOperationException - Collection was modified; enumeration operation may not execute. Concurrent collections can be altered while being iterated. This is the default behavior, allow concurrent behavior while iterating - as locking the entire concurrent collection is costly. You can however enforce a consistent way of iterating the concurrent collection by making a snapshot of it. For concurrent dictionaries, we use the ToArray method.


	var capitals = new ConcurrentDictionary<string, string>{
		["Norway"] = "Oslo",
		["Denmark"] = "Copenhagen",
		["Sweden"] = "Stockholm",
		["Faroe Islands"] = "Torshamn",
		["Finland"] = "Helsinki",
		["Iceland"] = "Reykjavik"
	};

	//make a snapshot of the concurrent dictionary first 
	
	var capitalsSnapshot = capitals.ToArray();
	
	//do some modifications
	
	foreach (var capital in capitals){
		capitals[capital.Key] = capital.Value.ToUpper();
	}

	foreach (var capital in capitalsSnapshot)
	{
		Console.WriteLine($"The capital in {capital.Key} is {capital.Value}");
	}

This outputs:


The capital in Denmark is Copenhagen
The capital in Sweden is Stockholm
The capital in Faroe Islands is Torshamn
The capital in Norway is Oslo
The capital in Finland is Helsinki
The capital in Iceland is Reykjavik  



The snapshot of the concurrent collection was not modified by the modifications done. Let's look at the concurrent collection again and iterate upon it.


	foreach (var capital in capitals)
	{
		Console.WriteLine($"The capital in {capital.Key} is {capital.Value}");
	}

This outputs:


Enumerate capitals in concurrent array - just enumerating with ToArray() - elements can be changed while enumerating. Faster, but more unpredictable
The capital in Denmark is COPENHAGEN
The capital in Sweden is STOCKHOLM
The capital in Faroe Islands is TORSHAMN
The capital in Norway is OSLO
The capital in Finland is HELSINKI
The capital in Iceland is REYKJAVIK



As we can see, the concurrent dictionary has modified its contents and this shows that we can get modifications upon iterating collections. If you do want to get consistent results, using a snapshot should be desired. But note that this will lock the entire collection and involve costly operations of copying the contents. If you do do concurrent collection snapshots, keep the number of snapshots to a minimum and iterate upon these snapshots, preferable only doing one snapshot in one single place in the method for the specific concurrent dictionary.

Monday, 7 October 2024

Partition methods for collections in C#

This article will look at some partition methods for collections in C#, specifically List<T>, ConcurrentDictionary<TKey, TValue> and Dictionary<TKey, TValue>

Definition of partitioning: Partitioning consists of splitting up a collection {n1, n2, .. nk } into partitions of size P = C , where C is a positive constant integer. The last partition will consist of [0, C], the last C elements.

Example: A list of 100 elements will be partition by size 30, giving four partitions : 1: 0-29 2: 30-59 3: 60-89 4: 90-99

Note that partition 4 only got 9 elements.

Let's head over to some code.

The partition methods are the following :



using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;

public static class CollectionExtensions
{

    public static IEnumerable<IList<T&t;> Partition<T>(this IList<T> source, int size)
    {
        for (int i = 0; i < Math.Ceiling(source.Count / (double)size); i++)
        {
            yield return new List<T>(source.Skip(i * size).Take(size));
        }
    }

    public static IEnumerable<Dictionary<TKey, TValue&t;> Partition<TKey, TValue>(this IDictionary<TKey, TValue> source, int size)
    {
        for (int i = 0; i < Math.Ceiling(source.Keys.Count / (double)size); i++)
        {
            yield return new Dictionary<TKey, TValue>(source.Skip(i * size).Take(size));
        }
    }

    public static IEnumerable<ConcurrentDictionary<TKey, TValue> Partition<TKey, TValue>(this ConcurrentDictionary<TKey, TValue> source, int size)
    {
        for (int i = 0; i < Math.Ceiling(source.Keys.Count / (double)size); i++)
        {
            yield return new ConcurrentDictionary<TKey, TValue>(source.Skip(i * size).Take(size));
        }
    }

}




These three methods are very similar. An example usage is shown below. We partition a ConcurrentDictionary, for example one consisting of 200,000 key value pairs into partitions by size 50,000. This will produce a total of four partitions which are then processed at parallell.

Make note that even though you can partition a ConcurrentDictionary into multiple concurrent dictionaries consist after partitioning, the simpler approach code at the bottom of the method was quicker when I tested it out. There are a lot of pitfalls when it comes to concurrent programming.

The key takeaway from this article was how you can partition a collection into multiple partitions, this will enable you to do "Divide and Conquer" strategy when it comes to collections to partition labor among several threads in parallell.



	static int Enumerate(ConcurrentDictionary<int, int> dict)
	{
		//var stopWatch = Stopwatch.StartNew();

		var dicts = dict.Partition(dict.Count / 4).ToList();

		//Console.WriteLine(dicts.ElementAt(0).Count());
		//Console.WriteLine($"Partitioning took: {stopWatch.ElapsedMilliseconds} ms");

		int total = 0;

		Parallel.For(0, 4, (i) =>
		{
			int subTotal = 0;
			var curDict = dicts.ElementAt(i);
			//int count = curDict.Count;
			//Console.WriteLine($"Number in curDict : {count}");
			foreach (var item in curDict)
			{
				Interlocked.Add(ref subTotal, item.Value);
			}
			Interlocked.Add(ref total, subTotal);
		});

		return total;
        
        //Simpler approach :

		//int expectedTotal = dict.Count;

		//int total = 0;
		//Parallel.ForEach(dict, keyValPair =>
		//	 {
		//		 //int count = dict.Count;
		//		 Interlocked.Add(ref total, keyValPair.Value);
		//	 });
		//return total;
	}


Monday, 30 September 2024

Generic alternate lookup for Dictionary in .NET 9

Alternate lookup for Dictionary in .NET 9 demo

This repo contains code that shows how an alternate lookup of dictionaries can be implemented in .NET 9. A generic alternate equality comparer is also included. Alternate lookups of dictionaries allows you to take control how you can look up values in a dictionaries in a custom manner. Usually, we use a simple key for a dictionary, such as an int. In case you instead have keys that are complex objects such as class instances, having a custom way of defining alternate lookup gives more flexibility. In the generic equality comparer, a key expression is provided, where a member expression is expected. You can for example have a class Person where you could use a property Id of type Guid and use that key to look up values in a dictionary that uses Person as a key. The code below and sample code demonstrates how it can be used.

Now, would you use this in .NET ? You can utilize usage of Spans, allowing increased performance for dictionary lookups. Also you can use this technique to more collections, such as HashSet, ConcurrentDictionary, FrozenDictionary and FrozenSet. The generic alternate equality comparer looks like this :


using System.Linq.Expressions;
using LookupDictionaryOptimized;


namespace LookupDictionaryOptimized
{
    public class AlternateEqualityComparer<T, TKey> : IEqualityComparer<T>, IAlternateEqualityComparer<TKey, T>
        where T : new()
    {
        private readonly Expression<Func<T, TKey>> _keyAccessor;

        private TKey GetKey(T obj) => _keyAccessor.Compile().Invoke(obj);

        public AlternateEqualityComparer(Expression<Func<T, TKey>> keyAccessor)
        {
            _keyAccessor = keyAccessor;
        }

        public AlternateEqualityComparer<T, TKey> Instance
        {
            get
            {
                return new AlternateEqualityComparer<T, TKey>(_keyAccessor);
            }
        }

        T IAlternateEqualityComparer<TKey, T>.Create(TKey alternate)
        {
            //create a dummy default instance if the requested key is not contained in the dictionary
            return Activator.CreateInstance<T>();
        }

        public bool Equals(T? x, T? y)
        {
            if (x == null && y == null)
            {
                return true;
            }
            if ((x == null && y != null) || (x != null && y == null))
            {
                return false;
            }
            TKey xKey = GetKey(x!);
            TKey yKey = GetKey(y!);
            return xKey!.Equals(yKey);
        }

        public int GetHashCode(T obj) => GetKey(obj)?.GetHashCode() ?? default;

        public int GetHashCode(TKey alternate) => alternate?.GetHashCode() ?? default;

        public bool Equals(TKey alternate, T other)
        {
            if (alternate == null && other == null)
            {
                return true;
            }
            if ((alternate == null && other != null) || (alternate != null && other == null))
            {
                return false;
            }
            TKey otherKey = GetKey(other);
            return alternate!.Equals(otherKey);
        }
    }

}

The demo below shows how to use this. When instantiating the dictionary, it is possibe to set the IEqualityComparer. You can at the same time implement IAlternateEqualityComparer. The generic class above does this for you, and an instance of this comparer is passed into the dictionary as an argument upon creation. A lookup can then be stored into a variable
using the GetAlternateLookup method.

Note about this demo code below. We could expand and allow multiple members or any custom logic when defining alternate equality lookup. But the code below only expects one key property. To get more control of the alterate lookup, you must write an equality and alternate equality comparer manually, but much of the plumbing code could be defined in a generic manner.

For example, we could define a compound key such as a ReadonlySpan of char or a string where we combine the key properties we want to use. Such a generic alternate equality comparer could expect a params of key properties and then build a compound key. It is possible here to to use HashCode.Combine method for example. I might look in to such an implementation later on, for example demo how to use TWO properties for a lookup or even consider a Func<bool> method to define as the equality comparison method. But quickly , the gains of a such a generic mechanism might become counteractive opposed to just writing an equality comparer and alternate comparer manually.

The primary motivation of alternate dictionary lookup is actually performance, as the alternate lookup allows to make more use of Spans and avoid a lot of allocations and give improved performance.


    /// <summary>
    /// Based from inspiration of nDepend blog article : https://blog.ndepend.com/alternate-lookup-for-dictionary-and-hashset-in-net-9/
    /// </summary>
    public static class DemoAlternateLookupV2
    {
        public static void RunGenericDemo()
        {
            var paul = new Person("Paul", "Jones");
            var joey = new Person("Joey", "Green");
            var laura = new Person("Laura", "Bridges");

            var mrX = new Person("Mr", "X"); //this object is not added to the dictionary

            AlternateEqualityComparer<Person, Guid> personComparer = new AlternateEqualityComparer<Person, Guid>(m => m.Id);

            var dict = new Dictionary<Person, int>(personComparer.Instance)
            {
                { paul, 11 },
                { joey, 22 },
                { laura, 33 }
            };

            var lauraId = laura.Id;
            //Dictionary<Person, int>.AlternateLookup<Guid> lookup = dict.GetAlternateLookup<Guid>();  Easier : just use var on left hand side

            var lookup = dict.GetAlternateLookup<Guid>();
            int lookedUpPersonId = lookup[lauraId];

            Console.WriteLine($"Retrieved a Dictionary<Person,Guid> value via alternate lookup key: {lauraId}.\nThe looked up value is: {lookedUpPersonId}");
            lookedUpPersonId.Should().Be(33);
            Console.WriteLine($"Expected value retrieved. OK.");

            Console.WriteLine("Testing also to look for a person not contained in the dictionary");

            bool lookedUpNonExistingPersonFound = lookup.ContainsKey(mrX.Id);
            Console.WriteLine($"Retrieved a Dictionary<Person,Guid> value via alternate lookup key: {mrX.Id}.\nThe looked up value found : {lookedUpNonExistingPersonFound}");

        }

    }

The generic alternate equality comparer requires a public parameterless constructor. Also, the provided keyExpression for the key - the property of the class which will serve as the alternate lookup. The Person class looks like this :



 namespace LookupDictionaryOptimized
{
    public class Person
    {

        public Person(string firstName, string lastName)
        {
            FirstName = firstName;
            LastName = lastName;
        }

        public Person()
        {
            FirstName = string.Empty;
            LastName = string.Empty;
            Id = Guid.Empty;
        }

        public string FirstName { get; set; }
        public string LastName { get; set; }
        public Guid Id { get; set; } = Guid.NewGuid();
    }
}

Output below:



Retrieved a Dictionary<Person,Guid> value via alternate lookup key: 5b2b1d28-c024-4b76-8cdd-2717c42dc7f8.
The looked up value is: 33
Expected value retrieved. OK.
Testing also to look for a person not contained in the dictionary
Retrieved a Dictionary<Person,Guid> value via alternate lookup key: 6ae6f259-14a6-4960-889b-15f33aab4ec0.
The looked up value found : False
Hit the any key to continue..

More about alternate lookups can be read in this nDepend blog article: https://blog.ndepend.com/alternate-lookup-for-dictionary-and-hashset-in-net-9/