LINQ vs Loop – A performance test

I just installed Visual Studio 2008 beta 2 to see what the future holds for C#. The addition of LINQ has brought a variety of query keywords to the language. “Anything” can be queried; SQL databases (naturally), XML documents, and regular collections. Custom queryable objects can also be created by implementing IQueryable. Sadly, like every abstraction, these goodies all come at a cost. The question is how much?

I decided to create a simple test to see how much of a performance hit LINQ is. The simple test I deviced finds the numbers in an array that are less than 10. The code is quoted below.

public void LinqTest()
{
    const int SIZE = 10000, RUNS = 1000;
    int[] ints = new int[SIZE];
    for (int i = 0; i < SIZE; i++)
        ints[i] = i;</p>

<pre><code>DateTime start = DateTime.Now;
for (int t = 0; t &lt; RUNS; ++t)
{
    int[] less = (from int i in ints
                     where i &lt; 10
                     select i).ToArray();
}
TimeSpan spent = DateTime.Now - start;
Trace.WriteLine(string.Format("LINQ: {0}, avg. {1}", 
    spent, new TimeSpan(spent.Ticks / RUNS)));

DateTime start2 = DateTime.Now;
for (int t = 0; t &lt; RUNS; ++t)
{
    var l = new List&lt;int&gt;();
    foreach (var i in ints)
        if (i &lt; 10)
            l.Add(i);
    int[] less2 = l.ToArray();
}

TimeSpan spent2 = DateTime.Now - start2;
Trace.WriteLine(string.Format("Loop: {0}, avg. {1}", 
    spent2, new TimeSpan(spent2.Ticks / RUNS)));
</code></pre>

<p>}

Initially, I assumed the performance impact would not be too large, since its equivalent is the straightforward imperative loop, which should not be too hard for a compiler to deduce given static typing and a single collection to iterate across. Or?


LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790

As you can see, the performance impact is huge. LINQ performs 50 times worse than the traditional loop! This seems rather wild at first glance, but the explanation is this: The keywords introduced by LINQ are syntactic sugar for method invocations to a set of generic routines for iterating across collections and filtering through lambda expressions. Naturally, this will not perform as good as a traditional imperative loop, and less optimization is possible.

Having seen the performance impact, I am still of the view that LINQ is a great step towards a more declarative world for developers. Instead of saying “take these numbers, iterate over all of them, and insert them into this list if they are less then ten”, which is an informal description of a classical imperative loop, you can now say “from these numbers, give me those that are less than ten”. The difference may be subtle, but the latter is in my opinion far more declarative and easy to read.

This may very well be the next big thing, but it comes at a cost. So far, my advice is to create simple performance tests for the cases where you consider adopting LINQ, to spot possible pitfalls as early as possible.

  • http://blogs.msdn.com/mattwar Matt Warren

    Iteration of arrays is highly optimized by the C# compiler and .Net JIT. So if all you are doing is a tiny bit of work over an array and it is performance critical then iterate the loop directly. LINQ comes in handy in many other situations, as there are a lot of other API’s that expose collections that are not arrays.

  • http://ox.no Håvard

    Actually, foreach iteration across an array is not very optimized, but that’s another discussion. The point is rather that LINQ performs very lousily when it should not be necessary to perform so bad.

    Still, it is worth mentioning that this is a test done with a beta release, which probably has an impact on the performance (logging, debugging information, recent changes that hit performance, etc.).

  • Anders Hejlsberg

    I just ran across this and tried it out.

    It turns out that the culprit is the explicit mention of the “int” type in your query (“from int i in …”). This translates into a call to the Cast() extension method which ends up boxing and unboxing each integer. If you remove the “int” from the query you’ll see much better performance (about 2 times slower than the for loop). That’s representative of the expected performance.

    Explicit typing of a range variable in a query is really only necessary when querying older non-generic collection types (such as ArrayList). We’ll give some thought to ways of optimizing away the Cast() overhead in cases where it isn’t needed.

  • http://ox.no Håvard

    Yes, removing the int indeed speeds up the query. Thank you, Anders, for noticing and clearing this up.

    A Cast() optimization would be very welcome.

    You say that about 2 times slower is the expected performance, but where does that time go? Is the lambda in the Where() that slow? Or do the additional method calls introduced by substituting the loop for a query bring with them that much penalty? If the select is also a lambda here (which I presume it is), there may be some gain there as well in some cases (you could just skip it and yield return directly in the case above).

    Of course, the implementation might lose generality from performance improvement, but in my view a good implementation should not penalize the trivial cases. As a programmer I want a more declarative language, and being able to stop writing loops for the trivial cases and start writing declarative queries instead is a great step in the right direction.

    Maybe the introduction of comprehensions could be an alternative to optimizing LINQ for trivial queries? C# is becoming more and more function-oriented, and a form of comprehensions would be a welcome addition in my opinion.

  • Axel

    Use System.Diagnostics.Stopwatch instead of manually constructing the time checks, it is much easier. Note that your input array should have been randomized or shuffled, it appears very predictive to the human eye, also, it does little justice to the sequential execution of the code because of the way branching will occur in one way the first 10 times, and the other way the other 10,000 times, which probably will affect the CPU’s pipeline prediction, yielding even better performance to the for-loop.

  • http://ox.no Håvard

    Axel, thanks for the tip on Stopwatch.

    I just tried running the test with a randomized input array, and randomized partitioning point. Neither had any noticeable effect on the test results.

  • Pavol

    I also tried your performance test.(I’m using final release of .NET 3.5 and VS 2008). With a little change of code (making TimeSpan to Stopwatch) I got this results (in miliseconds): a) With explicit CAST to INT and result is Array: LINQ: 11550 FOREACH: 91 That’s terrible score…

    b) As Anders said, removing explicit CAST to INT speed up things a lot. After this change: LINQ: 335 FOREACH: 91 So very nice change…

    But also a huge impact to a results has ToArray() method. In this test, it’s called 1000 times, but in a real world you may work with large collections – make some LINQ operations on them with their implicit type (mostly IEnumerable) and then turn the result to Array, List or whatever. So with a little change made: RUNS = 1 (yes only loop) and SIZE = 10 000 000 and also delete ToArray() in LINQ and in FOREACH statement as well, you’ll get this piece of code:

    var less = (from i in ints where i < 10 select i);

    and

    var l = new List(); foreach (var i in ints) if (i < 10) l.Add(i);

    c) Run it and we’ll get: LINQ: 2 FOREACH: 111 d) if you add ToArray methods to both: LINQ: 346 FOREACH: 111 So only ToArray in LINQ case lasts 344 ms !!!

    OK, to make some decisions, we should test more than this, but in a result I’m glad, that I may use LINQ to simplify code in a huge manner (especially with Lambda expressions) a know that performance impact would be minimal.

  • Pingback: Exploring LINQ - Noticias externas

  • http://rebelheart.squarespace.com Eric

    Hey great post. I was wondering the same thing. Some of my colleagues and I were pitching the idea around of “how exactly are they doing the query under the hood?”.

    I was hoping that at least it was translating it into the same IL as a nested foreach; the idea that it’s slower is terrible. Booo!

  • john

    Pavol – the ToArray appears to take so long as the query is only executed at this point. Deferred query execution in LINQ means the query is only executed when you iterate through the query result (e.g. by calling ToArray). declaring and initialising the query doesn’t execute the query, but this initialisation probably accounts for the shorter time (2ms) you’ve seen.

  • Robert

    Wow, building in release, Loop is 10x faster than Linq without doing the int cast.

  • http://smartbitz.co.nz Bart

    An update using the latest service packs and running the same code as in your blog: the LINQ query is 5 – 6 times slower than the loop on my machine.

    However, the LINQ query is probably 5 – 6 times more understandable than the loop for someone reading the code and trying to figure out what it’s doing.

  • Brian

    I ran the same test and didn’t get the same results

    Linq .20 Loop .08

  • http://ox.no Håvard

    Brian: Interesting, with 3.5 SP1 I get the following results:

    With explicit int: Linq .446 Loop .231

    And without the explicit int: Linq .431 Loop .256

    This could indicate that Cast() is optimized away in SP1, but I have not verified this.

  • Venkat

    All the times in ms.

    I run the tests as Pavol suggested.

    LINQ: 1, avg. 24 Loop: 92, avg. 1322

    from int i / from i, There is no much difference

    LINQ: 247, avg. 3803 Loop: 89, avg. 1286

  • JS

    If you run the compiled verision of the application (.exe or .dll), LINQ actually runs faster

    running out of VS is not a valid test and should not be compared.

  • Brian

    This function does nothing, and thus benchmarking a compiled version is dangerous (so is benchmarking a debug version, but less so). A compiler will happily say, “This loop does nothing! Let’s skip it!” I’d be hesitant to trust any benchmarks that are not in use on production code…though they’re a nice tool for developers of compilers to use to check if their optimizations are working.

  • http://eraza.org Eric Kaufman

    Just wanted to add in that with .Net 4.0 they’re upgrading the compiler, and adding parallelism for LINQ queries, which I expect will change things out a lot. For those of you who did hop on the LINQ train, I would imagine that a lot of your queries might just start to work a bit quicker once the new compiler comes out.

    http://www.betanews.com/article/Everyone-talk-at-once-NET-40-will-include-Parallel-Extensions/1223931673

  • Ray

    My code for Linq:

    for (int t = 0; t < RUNS; ++t) { int[] less = ints.Where(i => i < 10).ToArray(); }

    My result: LINQ: 00:00:00.0936002, avg. 00:00:00.0000936 Loop: 00:00:00.1092002, avg. 00:00:00.0001092

  • funky

    Thanks! I was getting 50% of CPU usage by writing file to a Stream, like so:

    buffer = formData.Skip(offset).Take(buffer.Length).ToArray();

    After reading your comments, I’ve rewritten my code to use loop and now CPU is quiet as ever :)

  • Pingback: Performance Tip: Avoid using Linq Loops « rjchicago

  • Sanjay

    My test indicate that it a lot depends on how “random” the data in the ints array is. Adding the following to shuffle data to be < 10 and > 10 for (int j = 0; j < SIZE; j++) {

                if(j%2 == 0)
                    ints[j] = 5;

        }
    

    makes LINQ faster than the loop. Then again, I am testing in 2011. So a lot would have changed since this initial post in 2007.

  • LinqTrialerandHater

    Bah Linq, ever thought of what your going to do if you wish to port your application to a different language that doesn’t use Lamda expressions or Linq! youre stuffed! there really is no difference between linq and a for loop. If it was faster than a for loop I would think its cool, but its not and it’s also harder to look at, understand and debug. It’s just a bunch a snobs trying to be smarter and cleverer than everyone else, Wow! look at the weird and cryptic linq expression I can write! it’s slower and harder to understand and if I ever want to port my code to objective C++ or PHP or something. I can just sit on it, and go wow, I thought it was cool but its slower and really just microsofts way of keeping you locked into their product! So THERE!

  • CmdJerry

    The LINQ query object still is used underneath the enumerable object when looping. This means the query is called repeatedly when looping. Take a look at these two articles http://allthingscs.blogspot.com/2011/03/linq-performance.html and http://odetocode.com/code/737.aspx

  • http://ox.no Håvard

    CmdJerry:

    Your first source at best misleading, at worst wrong. (Theoretically, one might create a case where LINQ queries iterate across enumerables that themselves iterate the entire result set once for each item, but that’s irrelevant to the understanding of LINQ and its behavior.)

    Your second source explains the behavior correctly: LINQ queries use deferred execution, which implies that nothing is executed until you iterate the results of your query. Further, iteration with a plain loop will iterate the enumerable once and only once.

    Calling ToList() is just the same as writing a loop which adds the items to a list:

    var yourQuery = from item in manyItems select item;
    // Calling ToList()...
    var listOfItems = yourQuery.ToList();
    // ... is equivalent to building your own list in a loop
    var list = new List<T>();
    foreach(var item in yourQuery)
        list.Add(item);
    

    So there is nothing to gain performance-wise by calling ToList() when iterating across the results once. In fact, calling ToList() and then looping will perform worse than just looping. There’s nothing magic to it.

    I recommend you read up on deferred execution and the yield statement. Charlie Calvert’s post on LINQ and deferred execution is a good start.

  • Pingback: Multithreaded Exelreading

  • Pingback: LINQ – Performance analysis | Carlos Alberto García Márquez