What is LINQ
and why it matters
I've been listening to some excellent podcasts recently where the subject of C# and LINQ came up:
I've been meaning to start blogging recently and this was a good excuse. I want to cover what LINQ is and why it matters, because it isn't just "querying databases". I have nothing but love in my heart for Siracusa and Casey but I feel like this is too important to let it go unexplained.
LINQ is about expressing the what, not the how.
LINQ allows you to cleanly operate on collections in memory.
LINQ uses lazy evaluation of sequences, aka "Deferred Execution".
LINQ is trivially parallelized with a single AsParallel()
method call. Need to preserve ordering? Then AsOrderedParallel()
will perform a merge and sort after the fact. Here's the entire implementation of Map-Reduce, using Extension Methods (akin to Categories in Objective-C)
public static ParallelQuery<R> MapReduce<T, M, K, R>(
this ParallelQuery<T> source,
Func<T, IEnumerable<M>> map,
Func<M, K> keySelector,
Func<IGrouping<K, M>, IEnumerable<R>> reduce)
{
return source.SelectMany(map)
.GroupBy(keySelector)
.SelectMany(reduce);
}
LINQ can project elements of a sequence into a new form, without making you define an endless array of 'holder' classes:
source.Where(x => x.Prop > 5)
.Select(x => new { Name = "x#" + x.Id, Prop = x.Prop })
LINQ will perform hash joins, but only if the number of items is large enough to make it worthwhile. Try writing the following code without using LINQ. Now imagine matching not just on Price, but on a composite key. How many hash tables, lists, and new classes did you need?
var result = from a in apples
join b in batteries on a.Price equals b.Price
select new { Apple = a, Battery = b, Price = a.Price };
LINQ can group, including on composite keys. The elements of the grouped sequence are available along with the key:
var result = from i in items
join r in records on r.y equals i.y
group i by new { key1 = i.y, key2 = i.z };
var list = result.Select(a => new {
Key1 = a.key1,
Key2 = a.key2,
FirstFiveItems = a.Take(5)
});
LINQ has many more operators for things like Sorting, Partitioning, Set operators, Aggregations, and more.
Probably the biggest and most important feature of LINQ is the fact that you don't have to consume what it produces. Instead, you can take an Expression Tree, which is the abstract syntax tree representing the entire LINQ query!
If you ever wondered how the LINQ to database products work, they simply have overloads or extension methods that take expressions instead of IQueryable. They can then analyse the tree to produce a SQL select statement. When non-SQL expressions are encountered, such as accessing a property on an object, it can choose to invoke that expression to produce a value which is then included in the SQL statement.
This expression functionality also underlies dynamic support - if your dynamic object can determine the binding site and type, it can return an expression to replace the dynamic bits, which is then compiled into a delegate (function pointer) which is much faster than dynamic lookup. Think using assembly to patch a trampoline at runtime, only in a completely type-safe way! You can also build an expression tree arbitrarily, then ask the runtime to compile it to a delegate on the fly.
var be = Expression.Power(Expression.Constant(2D), Expression.Constant(3D));
var le = Expression.Lambda<Func<double>>(be);
var compiled = le.Compile();
// compiled is now a JIT'd method in native assembly (x86, x64, or ARM)
compiled(); //invoke it just like any other method
Using LINQ on a daily basis gently forcesallows you to learn functional programming, set operations, monads, and other related concepts. Not in a superficial way like a school project, but deeply in your bones. You begin to apply the concepts to actual productive work. No professorial lecture on LISP ever covers edges cases like reading a file or opening a socket, but you sure will learn 100 ways to calculate Fibonacci numbers. By using functional styles in C# with LINQ, you can begin to apply the concepts in small chunks while being immediately productive, leading to a positive feedback loop that is at the root of mastering programming (some would even say the root of all learning).
Using LINQ, you start to understand that functions are data in your gut. It changes the way you approach problems. Every interesting method starts sprouting Func<T>
in the parameter list. You begin to see how immutability really means treating everything as an in-memory snapshot. You grok why you need this for concurrency. You realize how evil side-effects can be; how many great algorithms are impossible when your code spills its side-effect guts just walking down the street.
LINQ was the first step I took in learning how to reason about my programs and start to understand how to prove their correctness without ever having run it.
Not bad for something "used to query databases".
This blog represents my own personal opinion and is not endorsed by my employer.