Skip to content

2009 06 22 refactoring news

Fabian Schmied edited this page Jun 22, 2009 · 1 revision

Published on June 22nd, 2009 at 10:55

Refactoring News

In my last post, I wrote about our plans of changing and extending re-linq to make it a perfect foundation for LINQ-to-NHibernate. In this post, I'd like to give an update about the progress we're making.

Those were the first two items on my list of things to do:

1. We want to make re-linq capable of parsing virtually any query. Even such queries that contain expression types defined by the user, or by the specific LINQ provider built on re-linq.
For this, we need to replace the structural parsing classes that construct the QueryModel and Clause instances to make for a more flexible and extensible parsing experience.
(In fact, this has already been implemented by now.)
2. We want to make re-linq capable of parsing queries that do not follow the patterns exposed by the typical .NET compilers. These patterns allowed us to make assumptions when resolving the predicate and selector expressions used by Select, Where, SelectMany, and similar clauses. The problem is: these assumptions don't hold when a query is hand-written, so we need to implement a resolution algorithm that does not depend on them.
The result of this will be that the clauses produced by re-linq will contain predicate and projection expressions that directly point back to the clauses where the used data stems from, which should make it easy to generate queries from those clauses.

Let me start by saying that these items were completed on Friday. But what does that give you as a user of re-linq?

First of all, the new extensible structural parsing mechanism used to generate our clause-based QueryModel from expression trees conceptually enables re-linq to parse anything you feed it. This means that if you have a query method that re-linq cannot parse, for example because it's a custom method you added just for your provider, you can simply add your own parser and have it generate a custom IClause. To do so, simply derive from MethodCallExpressionNodeBase and override its abstract methods. Your query provider can register your custom node parser, and your query-generating back-end can interpret the custom clauses your node parser is creating.

Second, we now have a fully-featured expression resolver that will take the expressions passed to where clauses, select expressions, etc, and create simplified expressions that are interlinked with those clauses the expression's input data comes from.

To illustrate this, take a look at the following query:

var query = from i in new MyQueryable<int\>()
            select i;

This, by help of the C# compiler, will produce the following LambdaExpression passed to the Select method call: i => i. Which means that the Select call selects exactly those objects streaming into the clause.

When generating a query for the select clause, you will however need to tell your back-end what data exactly to select, so you need to know where the data is coming from. In this case, there's only one query data source, but in more complex examples, it won't be that easy to determine the source of the data streaming into a clause.

Therefore, re-linq gives you the following expression: [i]. The square brackets symbolize a QuerySourceReferenceExpression, an expression node that points back to a query data source. (In this case, the main from clause selecting data via the identifier "i".)

To illustrate that this is really useful, take a look at the following query:

var query2 = from i in new MyQueryable<int\> ()
             let x = (i + 1).ToString ()
             select new { x, i }
                into y
                where y.i > 5
                select y.x;

If you call ToString on the expression tree produced by this, the following will be output (formatted and slightly edited for readability):

value(LinqTest.MyQueryable`1[System.Int32])  
  .Select(i => new <>f__AnonymousType0`2(i = i, x = (i + 1).ToString()))  
  .Select(trans0 => new <>f__AnonymousType1`2(x = trans0.x, i = trans0.i))  
  .Where(y => (y.i > 5))  
  .Select(y => y.x)

Now, if you only take the last select projection (y => y.x), can you tell where the selected data is coming from?

Here's what re-linq makes of this after the last weeks' changes:

from Int32 i in value(LinqTest.MyQueryable`1[System.Int32])  
where ([i] > 5)  
select ([i] + 1).ToString()

Much simpler to generate a query from, I suppose. (In this example, it's even simpler than the original LINQ query because re-linq substituted the let and into parts into one query.)

Extensible parsers and simpler, interlinked expressions, making these two concepts works took most of our time during the last two weeks.

But we found a little time to implement even more:

  • A refactoring of IQueryExecutor, which is the interface you implement to supply a new query back-end to re-linq.
  • A better partial evaluator, which executes parts of the query in memory before they are put into the QueryModel.
  • Better support for result modifications, which are re-linq's interpretation of methods such as Take or Count that are attached to a query to modify its results.
  • Better support for let/into via expression substitution (you've already seen this in the example I gave above).
  • A better debugging experience by giving a more complete string representation of the QueryModel and by employing DebuggerDisplayAttributes.
  • Some minor cleanups.

This week and the next, we'll deal with several refactorings that should make QueryModel and its clauses easier to inspect, manipulate, and transform.

All in all, I'd say we're really making progress.

Clone this wiki locally