Coding Grounds

Sunday, 18 April 2021

Implementing a Strip method with regex in C#

This article will present a Strip method that accepts a Regex which defines the pattern of allowed characters. It is similar to Regex Replace, but it works in the inverted way. Instead of removing the chars matching the pattern in Regex.Replace, this utility method instead lets you define the allowed chars, i.e. these chars defined in this regex are the chars I want to keep. First off we define the utility method, as an extension method.

 

        /// <summary>
        /// Strips away every character not defined in the provided regex <paramref name="allowedChars"/>
        /// </summary>
        /// <param name="s">Input string</param>
        /// <param name="allowedChars">The allowed characters defined in a Regex with pattern, for example: [A-z|0-9]+/</param>
        /// <returns>Input string with only the allowed characters</returns>
        public static string Strip(this string s, Regex allowedChars)
        {
            if (s == null)
            {
                return s;
            }
            if (allowedChars == null)
            {
                return string.Empty;
            }
            Match match = Regex.Match(s, allowedChars.ToString());
            List<char> allowedAlphabet = new List<char>();
            while (match.Success)
            {
                if (match.Success)
                {
                    for (int i = 0; i < match.Groups.Count; i++)
                    {
                        allowedAlphabet.AddRange(match.Groups[i].Value.ToCharArray());
                    }
                }
                match = match.NextMatch();
            }
            return new string(s.Where(ch => allowedAlphabet.Contains(ch)).ToArray());
        }

Here are some tests that tests out this Strip method:

 
 
 	 	[Test]
        [TestCase("abc123abc", "[A-z]+", "abcabc")]
        [TestCase("abc123def456", "[0-9]+", "123456")]
     	[TestCase("The F-32 Lightning II is a newer generation fighter jets than the F-16 Fighting Falcon", "[0-9]+", "3216")]
		[TestCase("Here are some Norwegian letters : ÆØÅ and in lowercase: æøå", "[æ|ø|å]", "æøå")]
		public void TestStripWithRegex(string input, string regexString, string expectedOutput)
        {
            var regex = new Regex(regexString);
            input.Strip(regex).Should().Be(expectedOutput);
        }

Monday, 1 March 2021

Implementing ToDictionary in Typescript

In this article I will present some code I just did in my SimpleTsLinq library, which you can easily install using Npm. The library is here on Npmjs.com : The ToDictionary method looks like this:


  
  if (!Array.prototype.ToDictionary) {
  Array.prototype.ToDictionary = function <T>(keySelector: (arg: T) => any): any {
    let hash = {};
    this.map(item => {
      let key = keySelector(item);
      if (!(key in hash)) {
        hash[key] = item;
      }
      else {
        if (!(Array.isArray(hash[key]))) {
          hash[key] = [hash[key]];
        }
        hash[key].push(item);
      }
    });
    return hash;
  }
}

Here is a unit test (spec) for this method :


  
    it('can apply method ToDictionary on an array, allowing specificaton of a key selector for the dictionary object', () => {
    let heroes = [{ name: "Han Solo", age: 47, gender: "M" }, { name: "Leia", age: 29, gender: "F" }, { name: "Luke", age: 24, gender: "M" }, { name: "Lando", age: 47, gender: "M" }];
    let dictionaryOfHeroes = heroes.ToDictionary<Hero>(x => x.gender);

    let expectedDictionary = {
      "F": {
        name: "Leia", age: 29, gender: "F"
      },
      "M": [
        { name: "Han Solo", age: 47, gender: "M" },
        { name: "Luke", age: 24, gender: "M" },
        { name: "Lando", age: 47, gender: "M" }
      ]
    };
    expect(dictionaryOfHeroes).toEqual(expectedDictionary);
  });

You can also test out this library using Npm RunKit here: We can make a dictionary with different keys, image example:

Friday, 5 February 2021

Overcoming limitations in Contains in Entity Framework

Entity Framework will hit a performance penalty bottleneck or crash if Contains contains a too large of a list. Here is how you can avoid this, using Marc Gravell's excellent approach in this. I am including some tests of this. I also suggest you consider LinqKit to use expandable queries to make this all work. First off, this class contains the extension methods for Entity Framework for this:

 
 
 public class EntityExtensions {
 
 /// <summary>
        /// This method overcomes a weakness with Entity Framework with Contains where you can partition the values to look for into 
        /// blocks or partitions, it is modeled after Marc Gravell's answer here:
        /// https://stackoverflow.com/a/568771/741368
        /// Entity Framework hits a limit of 2100 parameter limit in the DB but probably comes into trouble before this limit as even
        /// queries with several 100 parameters are slow.
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <typeparam name="TValue"></typeparam>
        /// <param name="source">Source, for example DbSet (table)</param>
        /// <param name="selector">Selector, key selector</param>
        /// <param name="blockSize">Size of blocks (chunks/partitions)</param>
        /// <param name="values">Values as parameters</param>
        /// 
        /// <example>
        ///   /// <[!CDATA[
        /// /// The following EF query will hit a performance penalty or time out if EF gets a too large list of operationids:
        /// ///
        /// /// var patients = context.Patients.Where(p => operationsIds.Contains(p.OperationId)).Select(p => new {
        /// ///  p.OperationId,
        /// ///  p.
        /// /// });
        /// ///
        /// /// 
        /// /// var patients = context.Patients.AsExpandable().InRange(p => p.OperationId, 1000, operationIds)
        /// //.Select(p => new
        /// //{
        /// //    p.OperationId,
        /// //    p.IsDaytimeSurgery
        /// //}).ToList();
        /// //]]
        /// </example>
        /// <returns></returns>
        public static IEnumerable<T> InRange<T, TValue>(
                this IQueryable<T> source,
                Expression<Func<T, TValue>> selector,
                int blockSize,
                IEnumerable<TValue> values)
        {
            MethodInfo method = null;
          
            foreach (MethodInfo tmp in typeof(Enumerable).GetMethods(
                    BindingFlags.Public | BindingFlags.Static))
            {
                if (tmp.Name == "Contains" && tmp.IsGenericMethodDefinition
                        && tmp.GetParameters().Length == 2)
                {
                    method = tmp.MakeGenericMethod(typeof(TValue));
                    break;
                }
            }

            if (method == null) throw new InvalidOperationException(
                   "Unable to locate Contains");
            foreach (TValue[] block in values.GetBlocks(blockSize))
            {
                var row = Expression.Parameter(typeof(T), "row");
                var member = Expression.Invoke(selector, row);
                var keys = Expression.Constant(block, typeof(TValue[]));
                var predicate = Expression.Call(method, keys, member);
                var lambda = Expression.Lambda<Func<T, bool>>(
                      predicate, row);
                foreach (T record in source.Where(lambda))
                {
                    yield return record;
                }
            }

        }

        /// <summary>
        /// Similar to Chunk, it partitions the IEnumerable source and returns the chunks or blocks by given blocksize. The last block can have variable length
        /// between 0 to blocksize since the IEnumerable can have of course variable size not evenly divided by blocksize. 
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="source"></param>
        /// <param name="blockSize"></param>
        /// <returns></returns>
        public static IEnumerable<T[]> GetBlocks<T>(
                this IEnumerable<T> source, int blockSize)
        {
            List<T> list = new List<T>(blockSize);
            foreach (T item in source)
            {
                list.Add(item);
                if (list.Count == blockSize)
                {
                    yield return list.ToArray();
                    list.Clear();
                }
            }
            if (list.Count > 0)
            {
                yield return list.ToArray();
            }
        }
        
  }

Linqkit allows us to rewrite queries for EF using expression trees. One class is ExpandableQuery. See the links here for further info about Linqkit and Linq-Expand.

 
  	/// <summary>Refer to http://www.albahari.com/nutshell/linqkit.html and
	/// http://tomasp.net/blog/linq-expand.aspx for more information.</summary>
	public static class Extensions
	{
		public static IQueryable<T> AsExpandable<T> (this IQueryable<T> query)
		{
			if (query is ExpandableQuery<T>) return (ExpandableQuery<T>)query;
			return new ExpandableQuery<T> (query);
		}

This all seems to look a bit cryptic, so lets see an integration test of mine instead:

 


        [Test]
        [Category(TestCategories.IntegrationTest)]
        public void GetDataChunkedDoesNotFail()
        {
            using (var context = DbContextManager.ScopedOpPlanDataContext)
            {
                int[] operationalUnitIds = new int[]{ 107455, 105431, 107646, 107846 };
                var reportItems = context.OperationalUnits.AsExpandable().InRange(ou => ou.FreshOrganizationalUnitId, 2, operationalUnitIds).ToList();
                Assert.IsNotNull(reportItems);           
                CollectionAssert.IsNotEmpty(reportItems); 
            }
        }

This shows how to use the InRange method of Marc Gravell. We use the AsExpandable method to allow us to hack into the expression tree of Entity Framework and the InRange method allows us to partition the work for EF. We do not know the siz of operational unit ids (usually it is low and another entity - operation Ids is of variable length and will in production blow up since we in some cases surpass the 2100 limit of Contains). And as I said before, Entity Framework will hit a performance bottleneck before 2100 parameteters are sent into the Contains method. This way of fixing it up will allow you to get stable running code in production again against large data and variable length. This code is tested with Entity Framework 6.2.0. Another article considers performance considerations for Contains and different approaches here: https://www.toptal.com/dot-net/entity-framework-performance-using-contains IMHO this approach has proven stable in a production environment for several years with large data and can be considered a stable workaround for EF slow Contains performance. I have made the LinqKit fork LinqKit.AsyncSupport available on Nuget here now: https://www.nuget.org/packages/ToreAurstadIt.LinqKit.AsyncSupport/1.1.0 This makes it possible to perform Async calls and expandable queries, i.e. queries with inline method calls for example. The nuget package now also sports symbol package for easier debugging experience. The source code for LinqKit.AsyncSupport is available here: https://github.com/toreaurstadboss/LinqKit.AsyncSupport