Friday, 5 February 2021

Overcoming limitations in Contains in Entity Framework

Entity Framework will hit a performance penalty bottleneck or crash if Contains contains a too large of a list. Here is how you can avoid this, using Marc Gravell's excellent approach in this. I am including some tests of this. I also suggest you consider LinqKit to use expandable queries to make this all work. First off, this class contains the extension methods for Entity Framework for this:
 
 
 public class EntityExtensions {
 
 /// <summary>
        /// This method overcomes a weakness with Entity Framework with Contains where you can partition the values to look for into 
        /// blocks or partitions, it is modeled after Marc Gravell's answer here:
        /// https://stackoverflow.com/a/568771/741368
        /// Entity Framework hits a limit of 2100 parameter limit in the DB but probably comes into trouble before this limit as even
        /// queries with several 100 parameters are slow.
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <typeparam name="TValue"></typeparam>
        /// <param name="source">Source, for example DbSet (table)</param>
        /// <param name="selector">Selector, key selector</param>
        /// <param name="blockSize">Size of blocks (chunks/partitions)</param>
        /// <param name="values">Values as parameters</param>
        /// 
        /// <example>
        ///   /// <[!CDATA[
        /// /// The following EF query will hit a performance penalty or time out if EF gets a too large list of operationids:
        /// ///
        /// /// var patients = context.Patients.Where(p => operationsIds.Contains(p.OperationId)).Select(p => new {
        /// ///  p.OperationId,
        /// ///  p.
        /// /// });
        /// ///
        /// /// 
        /// /// var patients = context.Patients.AsExpandable().InRange(p => p.OperationId, 1000, operationIds)
        /// //.Select(p => new
        /// //{
        /// //    p.OperationId,
        /// //    p.IsDaytimeSurgery
        /// //}).ToList();
        /// //]]
        /// </example>
        /// <returns></returns>
        public static IEnumerable<T> InRange<T, TValue>(
                this IQueryable<T> source,
                Expression<Func<T, TValue>> selector,
                int blockSize,
                IEnumerable<TValue> values)
        {
            MethodInfo method = null;
          
            foreach (MethodInfo tmp in typeof(Enumerable).GetMethods(
                    BindingFlags.Public | BindingFlags.Static))
            {
                if (tmp.Name == "Contains" && tmp.IsGenericMethodDefinition
                        && tmp.GetParameters().Length == 2)
                {
                    method = tmp.MakeGenericMethod(typeof(TValue));
                    break;
                }
            }

            if (method == null) throw new InvalidOperationException(
                   "Unable to locate Contains");
            foreach (TValue[] block in values.GetBlocks(blockSize))
            {
                var row = Expression.Parameter(typeof(T), "row");
                var member = Expression.Invoke(selector, row);
                var keys = Expression.Constant(block, typeof(TValue[]));
                var predicate = Expression.Call(method, keys, member);
                var lambda = Expression.Lambda<Func<T, bool>>(
                      predicate, row);
                foreach (T record in source.Where(lambda))
                {
                    yield return record;
                }
            }

        }

        /// <summary>
        /// Similar to Chunk, it partitions the IEnumerable source and returns the chunks or blocks by given blocksize. The last block can have variable length
        /// between 0 to blocksize since the IEnumerable can have of course variable size not evenly divided by blocksize. 
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="source"></param>
        /// <param name="blockSize"></param>
        /// <returns></returns>
        public static IEnumerable<T[]> GetBlocks<T>(
                this IEnumerable<T> source, int blockSize)
        {
            List<T> list = new List<T>(blockSize);
            foreach (T item in source)
            {
                list.Add(item);
                if (list.Count == blockSize)
                {
                    yield return list.ToArray();
                    list.Clear();
                }
            }
            if (list.Count > 0)
            {
                yield return list.ToArray();
            }
        }
        
  }

 
 
 
Linqkit allows us to rewrite queries for EF using expression trees. One class is ExpandableQuery. See the links here for further info about Linqkit and Linq-Expand.
 
  	/// <summary>Refer to http://www.albahari.com/nutshell/linqkit.html and
	/// http://tomasp.net/blog/linq-expand.aspx for more information.</summary>
	public static class Extensions
	{
		public static IQueryable<T> AsExpandable<T> (this IQueryable<T> query)
		{
			if (query is ExpandableQuery<T>) return (ExpandableQuery<T>)query;
			return new ExpandableQuery<T> (query);
		}
 
This all seems to look a bit cryptic, so lets see an integration test of mine instead:
 


        [Test]
        [Category(TestCategories.IntegrationTest)]
        public void GetDataChunkedDoesNotFail()
        {
            using (var context = DbContextManager.ScopedOpPlanDataContext)
            {
                int[] operationalUnitIds = new int[]{ 107455, 105431, 107646, 107846 };
                var reportItems = context.OperationalUnits.AsExpandable().InRange(ou => ou.FreshOrganizationalUnitId, 2, operationalUnitIds).ToList();
                Assert.IsNotNull(reportItems);           
                CollectionAssert.IsNotEmpty(reportItems); 
            }
        }

 
 
 
This shows how to use the InRange method of Marc Gravell. We use the AsExpandable method to allow us to hack into the expression tree of Entity Framework and the InRange method allows us to partition the work for EF. We do not know the siz of operational unit ids (usually it is low and another entity - operation Ids is of variable length and will in production blow up since we in some cases surpass the 2100 limit of Contains). And as I said before, Entity Framework will hit a performance bottleneck before 2100 parameteters are sent into the Contains method. This way of fixing it up will allow you to get stable running code in production again against large data and variable length. This code is tested with Entity Framework 6.2.0. Another article considers performance considerations for Contains and different approaches here: https://www.toptal.com/dot-net/entity-framework-performance-using-contains IMHO this approach has proven stable in a production environment for several years with large data and can be considered a stable workaround for EF slow Contains performance. I have made the LinqKit fork LinqKit.AsyncSupport available on Nuget here now: https://www.nuget.org/packages/ToreAurstadIt.LinqKit.AsyncSupport/1.1.0 This makes it possible to perform Async calls and expandable queries, i.e. queries with inline method calls for example. The nuget package now also sports symbol package for easier debugging experience. The source code for LinqKit.AsyncSupport is available here: https://github.com/toreaurstadboss/LinqKit.AsyncSupport

Share this article on LinkedIn.

5 comments:

  1. You can add the nuget package to LinqPad and test it out immediately. Press F4 and add the 1.1.0 version
    of ToreAurstadIt.LinqKit.AsyncSupport


    ReplyDelete
  2. The lib posted to Nuget uses .Net Framework 4.8 and EF 6.4.4 in its class library.

    ReplyDelete
  3. Hi Tore. I wrote a library that solves this problem on EF Core when using the SQL Server database provider. It does so with a single round trip to the server (no need to chunk it) in a way that preserves the query's execution plan.

    You can find more information here: https://github.com/yv989c/BlazarTech.QueryableValues

    ReplyDelete
  4. bp interview questions address the structural aspect of recruitment. It demonstrates how a candidate used their skills to overcome a professional obstacle. The BP video interview questions are designed to elicit behaviour in an individual that corresponds with the role for which they are applying. A structured bp video interview questions will bring out the skills through discussion with the recruiter, allowing them to identify high-performing candidates.

    ReplyDelete
  5. Develop futuristic & human-centric websites and applications for better user retention and conversion generation at the lowest costs with CronJ's react js development services.

    ReplyDelete