SynopsisCreating a large collection by adding elements to a List<T> is inefficient, can large memory footprints and lead to OutOfMemory exceptions. So why do so many programmers do it? The normal response to pointing this out is: "What alternative do I have?" My response is always: Yield
Headaches with List<T>If you went and told somebody you were going to take an array, and re-size it several times until it is an unduly large object... you would probably wouldn't make it through the explanation before you realized how taxing of a process that would be! And yet... we see this code frequently:
static IEnumerable<long> buildUsingList()
var someList = new List<long>();
for (int i = 0; i < Iterations; i++)
It looks harmless enough, but lets look at some of the problems with this technique.
List<T> uses an array to store all it's elements. So this means each time we add one more entry than the array can hold, the list has to re-size the array. If the array is small enough, no big deal. In fact, it can probably stay in cache. However, once you add enough items, the array becomes too large and has to be moved to RAM. This is because arrays have to be a contiguous block.
This has a few implications:
- Memory Blocks
- Each time the OS has to go and find a new block, this can take a few clock cycles.
- Once it finds a new block, it has to move the data from the current block to the new location.
- If the array is too large, it can end up in RAM. This will cause memory paging.
- The CPU can't directly operate on any data that is not stored in cache. Therefore, any data stored in RAM will have to be copied first. Each cache miss will result in extra clock cycles.
- Large Object Heap (LOH)
- An object that is at least 85 KB is added to the Large Object Heap. So lets say that
Iterationsis 10880 (85 * 1024/8)? This would then create an array that is added to the LOH. Do this enough and the Garbage Collector will bring your software to a crawl.
Lets look at some benchmark results to prove to ourselves that there is actually a benefit to using
Yieldbenefits enormously from the optimize flag! Be sure you have it on if you benchmark/use
First, lets present the yield replacement to the
buildUsingList()function that was listed above:
static IEnumerable<long> buildUsingYield()
for (long i = 0; i < Iterations; i++)
yield return i;
Now lets look at the function that will call either
static void Main(string args)
var watch = System.Diagnostics.Stopwatch.StartNew();
var total = 0L;
for(int i=0; i<100000; i++)
total +=getTotal(buildUsingList()); // NOTE: This will be commented out for the yield test.
//total +=getTotal(buildUsingYield()); // NOTE: This will be un-commented for the yield test.
Notice I actually do something with the results (
Console.WriteLine(total)) to ensure the compiler doesn't remove the parts we care about.
Now for the results (NOTE: The results are in milliseconds):
Right away we can see that
Yieldhas an advantage. There are two reasons for that:
- Memory Footprint - As we discussed before, using
yieldprevents building several (100000) chunks of memory. This lets the garbage collector off the hook along with the memory pool who would have to find the memory. In fact for 100000 iterations, each list will be a LOH, which allows
yieldeven better results in comparison.
- Half the
Iterations- Notice that we build the collection and then go through it. However, because
yieldis lazy, it only goes through
Iterationsas it needs to. This leaves
yielddoing half the iterations as building a list.
If you have a situation where you want to build a collection of objects and then iterate through each, use
yield. You will have performance benefits and use less memory.
Yieldwon't help you if you need a collection that you can alter however. We'll look at what to use in that situation in a later post.