Monday, 28 October 2024

Enumerating concurrent collections with snapshots in C#

In standard collections in C#, it is not allowed to alter collections you iterate upon using foreach for example, since it throws InvalidOperationException - Collection was modified; enumeration operation may not execute. Concurrent collections can be altered while being iterated. This is the default behavior, allow concurrent behavior while iterating - as locking the entire concurrent collection is costly. You can however enforce a consistent way of iterating the concurrent collection by making a snapshot of it. For concurrent dictionaries, we use the ToArray method.


	var capitals = new ConcurrentDictionary<string, string>{
		["Norway"] = "Oslo",
		["Denmark"] = "Copenhagen",
		["Sweden"] = "Stockholm",
		["Faroe Islands"] = "Torshamn",
		["Finland"] = "Helsinki",
		["Iceland"] = "Reykjavik"
	};

	//make a snapshot of the concurrent dictionary first 
	
	var capitalsSnapshot = capitals.ToArray();
	
	//do some modifications
	
	foreach (var capital in capitals){
		capitals[capital.Key] = capital.Value.ToUpper();
	}

	foreach (var capital in capitalsSnapshot)
	{
		Console.WriteLine($"The capital in {capital.Key} is {capital.Value}");
	}

This outputs:


The capital in Denmark is Copenhagen
The capital in Sweden is Stockholm
The capital in Faroe Islands is Torshamn
The capital in Norway is Oslo
The capital in Finland is Helsinki
The capital in Iceland is Reykjavik  



The snapshot of the concurrent collection was not modified by the modifications done. Let's look at the concurrent collection again and iterate upon it.


	foreach (var capital in capitals)
	{
		Console.WriteLine($"The capital in {capital.Key} is {capital.Value}");
	}

This outputs:


Enumerate capitals in concurrent array - just enumerating with ToArray() - elements can be changed while enumerating. Faster, but more unpredictable
The capital in Denmark is COPENHAGEN
The capital in Sweden is STOCKHOLM
The capital in Faroe Islands is TORSHAMN
The capital in Norway is OSLO
The capital in Finland is HELSINKI
The capital in Iceland is REYKJAVIK



As we can see, the concurrent dictionary has modified its contents and this shows that we can get modifications upon iterating collections. If you do want to get consistent results, using a snapshot should be desired. But note that this will lock the entire collection and involve costly operations of copying the contents. If you do do concurrent collection snapshots, keep the number of snapshots to a minimum and iterate upon these snapshots, preferable only doing one snapshot in one single place in the method for the specific concurrent dictionary.

Monday, 7 October 2024

Partition methods for collections in C#

This article will look at some partition methods for collections in C#, specifically List<T>, ConcurrentDictionary<TKey, TValue> and Dictionary<TKey, TValue>

Definition of partitioning: Partitioning consists of splitting up a collection {n1, n2, .. nk } into partitions of size P = C , where C is a positive constant integer. The last partition will consist of [0, C], the last C elements.

Example: A list of 100 elements will be partition by size 30, giving four partitions : 1: 0-29 2: 30-59 3: 60-89 4: 90-99

Note that partition 4 only got 9 elements.

Let's head over to some code.

The partition methods are the following :



using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;

public static class CollectionExtensions
{

    public static IEnumerable<IList<T&t;> Partition<T>(this IList<T> source, int size)
    {
        for (int i = 0; i < Math.Ceiling(source.Count / (double)size); i++)
        {
            yield return new List<T>(source.Skip(i * size).Take(size));
        }
    }

    public static IEnumerable<Dictionary<TKey, TValue&t;> Partition<TKey, TValue>(this IDictionary<TKey, TValue> source, int size)
    {
        for (int i = 0; i < Math.Ceiling(source.Keys.Count / (double)size); i++)
        {
            yield return new Dictionary<TKey, TValue>(source.Skip(i * size).Take(size));
        }
    }

    public static IEnumerable<ConcurrentDictionary<TKey, TValue> Partition<TKey, TValue>(this ConcurrentDictionary<TKey, TValue> source, int size)
    {
        for (int i = 0; i < Math.Ceiling(source.Keys.Count / (double)size); i++)
        {
            yield return new ConcurrentDictionary<TKey, TValue>(source.Skip(i * size).Take(size));
        }
    }

}




These three methods are very similar. An example usage is shown below. We partition a ConcurrentDictionary, for example one consisting of 200,000 key value pairs into partitions by size 50,000. This will produce a total of four partitions which are then processed at parallell.

Make note that even though you can partition a ConcurrentDictionary into multiple concurrent dictionaries consist after partitioning, the simpler approach code at the bottom of the method was quicker when I tested it out. There are a lot of pitfalls when it comes to concurrent programming.

The key takeaway from this article was how you can partition a collection into multiple partitions, this will enable you to do "Divide and Conquer" strategy when it comes to collections to partition labor among several threads in parallell.



	static int Enumerate(ConcurrentDictionary<int, int> dict)
	{
		//var stopWatch = Stopwatch.StartNew();

		var dicts = dict.Partition(dict.Count / 4).ToList();

		//Console.WriteLine(dicts.ElementAt(0).Count());
		//Console.WriteLine($"Partitioning took: {stopWatch.ElapsedMilliseconds} ms");

		int total = 0;

		Parallel.For(0, 4, (i) =>
		{
			int subTotal = 0;
			var curDict = dicts.ElementAt(i);
			//int count = curDict.Count;
			//Console.WriteLine($"Number in curDict : {count}");
			foreach (var item in curDict)
			{
				Interlocked.Add(ref subTotal, item.Value);
			}
			Interlocked.Add(ref total, subTotal);
		});

		return total;
        
        //Simpler approach :

		//int expectedTotal = dict.Count;

		//int total = 0;
		//Parallel.ForEach(dict, keyValPair =>
		//	 {
		//		 //int count = dict.Count;
		//		 Interlocked.Add(ref total, keyValPair.Value);
		//	 });
		//return total;
	}