Entity Framework will hit a performance penalty bottleneck or crash if Contains contains a too large of a list.
Here is how you can avoid this, using Marc Gravell's excellent approach in this. I am including some tests of this. I also suggest you consider LinqKit to use expandable queries to make this all work.
First off, this class contains the extension methods for Entity Framework for this:
publicclassEntityExtensions {
///<summary>/// This method overcomes a weakness with Entity Framework with Contains where you can partition the values to look for into /// blocks or partitions, it is modeled after Marc Gravell's answer here:/// https://stackoverflow.com/a/568771/741368/// Entity Framework hits a limit of 2100 parameter limit in the DB but probably comes into trouble before this limit as even/// queries with several 100 parameters are slow.///</summary>///<typeparam name="T"></typeparam>///<typeparam name="TValue"></typeparam>///<param name="source">Source, for example DbSet (table)</param>///<param name="selector">Selector, key selector</param>///<param name="blockSize">Size of blocks (chunks/partitions)</param>///<param name="values">Values as parameters</param>//////<example>//////<[!CDATA[
/// /// The following EF query will hit a performance penalty or time out if EF gets a too large list of operationids:
/// ///
/// /// var patients = context.Patients.Where(p => operationsIds.Contains(p.OperationId)).Select(p => new {////// p.OperationId,////// p.////// });////////////////// var patients = context.Patients.AsExpandable().InRange(p => p.OperationId, 1000, operationIds)/// //.Select(p => new/// //{/// // p.OperationId,/// // p.IsDaytimeSurgery/// //}).ToList();/// //]]///</example>///<returns></returns>publicstaticIEnumerable<T> InRange<T, TValue>(this IQueryable<T> source,
Expression<Func<T, TValue>> selector,
int blockSize,
IEnumerable<TValue> values)
{
MethodInfo method = null;
foreach (MethodInfo tmp intypeof(Enumerable).GetMethods(
BindingFlags.Public | BindingFlags.Static))
{
if (tmp.Name == "Contains" && tmp.IsGenericMethodDefinition
&& tmp.GetParameters().Length == 2)
{
method = tmp.MakeGenericMethod(typeof(TValue));
break;
}
}
if (method == null) thrownew InvalidOperationException(
"Unable to locate Contains");
foreach (TValue[] block in values.GetBlocks(blockSize))
{
var row = Expression.Parameter(typeof(T), "row");
var member = Expression.Invoke(selector, row);
var keys = Expression.Constant(block, typeof(TValue[]));
var predicate = Expression.Call(method, keys, member);
var lambda = Expression.Lambda<Func<T, bool>>(
predicate, row);
foreach (T recordinsource.Where(lambda))
{
yieldreturnrecord;
}
}
}
///<summary>/// Similar to Chunk, it partitions the IEnumerable source and returns the chunks or blocks by given blocksize. The last block can have variable length/// between 0 to blocksize since the IEnumerable can have of course variable size not evenly divided by blocksize. ///</summary>///<typeparam name="T"></typeparam>///<param name="source"></param>///<param name="blockSize"></param>///<returns></returns>publicstaticIEnumerable<T[]> GetBlocks<T>(this IEnumerable<T> source, int blockSize)
{
List<T> list = new List<T>(blockSize);
foreach (T item in source)
{
list.Add(item);
if (list.Count == blockSize)
{
yieldreturn list.ToArray();
list.Clear();
}
}
if (list.Count > 0)
{
yieldreturn list.ToArray();
}
}
}
Linqkit allows us to rewrite queries for EF using expression trees.
One class is ExpandableQuery. See the links here for further info about Linqkit and Linq-Expand.
///<summary>Refer to http://www.albahari.com/nutshell/linqkit.html and/// http://tomasp.net/blog/linq-expand.aspx for more information.</summary>publicstaticclassExtensions
{
publicstaticIQueryable<T> AsExpandable<T> (this IQueryable<T> query)
{
if (query isExpandableQuery<T>) return (ExpandableQuery<T>)query;
returnnew ExpandableQuery<T> (query);
}
This all seems to look a bit cryptic, so lets see an integration test of mine instead:
This shows how to use the InRange method of Marc Gravell. We use the AsExpandable method to allow us to hack into the expression tree of Entity Framework and the InRange method allows us to partition the work
for EF. We do not know the siz of operational unit ids (usually it is low and another entity - operation Ids is of variable length and will in production blow up since we in some cases surpass the 2100 limit of Contains).
And as I said before, Entity Framework will hit a performance bottleneck before 2100 parameteters are sent into the Contains method. This way of fixing it up will allow you to get stable running code in production again against large data and variable length.
This code is tested with Entity Framework 6.2.0.
Another article considers performance considerations for Contains and different approaches here:
https://www.toptal.com/dot-net/entity-framework-performance-using-contains
IMHO this approach has proven stable in a production environment for several years with large data and can be considered a stable workaround for EF slow Contains performance.
I have made the LinqKit fork LinqKit.AsyncSupport available on Nuget here now:
https://www.nuget.org/packages/ToreAurstadIt.LinqKit.AsyncSupport/1.1.0
This makes it possible to perform Async calls and expandable queries, i.e. queries with inline method calls for example.
The nuget package now also sports symbol package for easier debugging experience.
The source code for LinqKit.AsyncSupport is available here:
https://github.com/toreaurstadboss/LinqKit.AsyncSupport