TopDown EnforceSorting implementation #5290

mingmwang · 2023-02-15T16:27:05Z

Which issue does this PR close?

Closes #5289.

Rationale for this change

Reimplement the EnforceSorting rule in a Top-Down approach to add/remove Sort when unnecessary
The new implementation does not need to keep the lineage information and record the position SortExec in another tree structure. The Sort removing and adding is driven by the properties and requirements.

What changes are included in this PR?

Sort removing, Sort is removed in two cases:

The Sort is redundant, the input of the Sort already has a finer ordering than this Sort enforces.
The Sort does not impact the final result ordering, some operators like RepartitionExec can not maintain the input ordering so that the Sort in its descendants can be removed

Sort adding, Sort is added to satisfy the ordering requirements
Ordering requirements is pushed down and propagated from the top node to its children and descendants. The basic process is:

The parent requirements are already satisfied, do not add Sort, generate new requirements if the current node itself has sort requirements to its input(required_input_ordering). For UnionExec, also generated new requirements based on UnionExec's output ordering properties to keep the main input ordering semantics correct and trim the unnecessary sort columns.
The parent requirements are not satisfied and can not push down, add Sort.
The parent requirements are not satisfied, the current node itself does not have its own sort requirements to its input, push down the sort requirements.
The parent requirements are not satisfied, the current node(like SortMergeJoinExec, WindowAggExec, SortPreservingMergeExec, etc) itself has its own sort requirements to its input. Check the compatibility of the parent requirements with its own sort requirements.
a) If the required input ordering is more specific, do not push down the parent requirements, keep everything unchanged.
b) If the the parent requirements are more specific, push down the parent requirements
c) If they are not compatible, add Sort, generate new requirements from required input ordering.

Global Sort optimization
This is achieved by the combination of GlobalSortSelection, EnforceDistribution and EnforceSorting.

Are these changes tested?

Are there any user-facing changes?

mingmwang · 2023-02-15T16:50:11Z

@mustafasrepo @ozankabak @yahoNanJing
Please help to take a look.

ozankabak · 2023-02-15T18:54:32Z

Thank you, we will digest this in the next several days and leave comments as we make progress.

ozankabak · 2023-02-16T00:44:17Z

The PR is very large in scope. It changes parts of the old code (and certainly makes some changes to its tests), and also adds new code (and new tests). It would be much easier to review this if it were broken down to two PRs, where the first one only replicates the current functionality, has no functionality regressions, and does not change any tests at all; with the second PR adding new functionality. Right now, the new rule is significantly longer than the old rule (which is bad), but it offers more functionality (which is great). So is switching from bottom-up to top-down a good change or a bad change? We can't tell easily.

Now, let me share my (very) preliminary impression so far after a cursory look: I see that it has better handling of sort preserving merges, smarter push-down of sorts under unions, and adds support for sort merge joins. These are the good bits. The cons are that it seems to lose partition awareness (though I'm not sure about this yet) and it seems to regress on some cases where it was doing better before. I think at least some of these are due to the presumption that there is a global output ordering to preserve, and I am not sure I agree with that.

Anyway, we will disentangle and review in detail, but I want to give you a heads up that this will take some time. We will need to analyze every case carefully, go back to the old version of the code (and tests), compare and contrast etc. Before we form an idea on the merits of bottom-up vs. top-down, our goal will be to create two functionally equal implementations passing exactly the same test suite. Without that, it is not possible to objectively decide.

Whatever the result on bottom-up vs. top-down is, I think this exercise will end up making the rule better, so that's great 🚀 I will keep you posted as we make progress in the upcoming days.

mingmwang · 2023-02-16T04:00:59Z

The PR is very large in scope. It changes parts of the old code (and certainly makes some changes to its tests), and also adds new code (and new tests). It would be much easier to review this if it were broken down to two PRs, where the first one only replicates the current functionality, has no functionality regressions, and does not change any tests at all; with the second PR adding new functionality. Right now, the new rule is significantly longer than the old rule (which is bad), but it offers more functionality (which is great). So is switching from bottom-up to top-down a good change or a bad change? We can't tell easily.

Now, let me share my (very) preliminary impression so far after a cursory look: I see that it has better handling of sort preserving merges, smarter push-down of sorts under unions, and adds support for sort merge joins. These are the good bits. The cons are that it seems to lose partition awareness (though I'm not sure about this yet) and it seems to regress on some cases where it was doing better before. I think at least some of these are due to the presumption that there is a global output ordering to preserve, and I am not sure I agree with that.

Anyway, we will disentangle and review in detail, but I want to give you a heads up that this will take some time. We will need to analyze every case carefully, go back to the old version of the code (and tests), compare and contrast etc. Before we form an idea on the merits of bottom-up vs. top-down, our goal will be to create two functionally equal implementations passing exactly the same test suite. Without that, it is not possible to objectively decide.

Whatever the result on bottom-up vs. top-down is, I think this exercise will end up making the rule better, so that's great 🚀 I will keep you posted as we make progress in the upcoming days.

Sure, please take your time and I will add more comments to the code to explain the rule process.
The new rule looks significantly longer than the original one is because of handling the propagating of sort requirements down. But I think the sort removing/adding procedure is very clear and the property/requirement driven framework is more powerful. It can effectively handling below case and figure out an optimal plan without adding Sort and then removing Sort back and forth.

Required('a', 'b', 'c')
   Required('a', 'b')
      Required('a')

We can leverage the same framework to do more advanced optimizations like re-ordering(PostgreSQL has this optimization) the multiple window execs in the plan tree and further reduce the number of Sorts. Generally I think the Top-Down based approach is more easy and straightforward to collect and propagate necessary properties and find the global optimal plan.

Required/Order by ('a', 'b', 'c')
   WindowExec1 Required('a', 'b', 'c')
      WindowExec2 Required('x', 'y', 'z')
          Order by ('x', 'y', 'z')

Some UT results are changed. Yes, I think the major arguing point is whether we should preserve output ordering during optimization process or we can trim the unnecessary sort columns.
As I know, SparkSQL preserves the output ordering(SparkSQL does not do very sophisticated sort optimizations), PostgreSQL sometimes preserves the output ordering but sometime not(I guess this is decided by the top/parent operators, if they are ordering sensitive, but I'm not sure).
For DataFusion, my preference is since we alway define the maintains_input_order() method for physical plan nodes, if it is true, we should preserve output ordering and should not trim or reverse output ordering, otherwise maintains_input_order() is meaningless and very confusing.

There are some other UT result changes, I am not sure whether they are due to the original bug or the new rule introduced regression, need to double confirm with you and check carefully, especially this one test_window_agg_complex_plan.

mingmwang · 2023-02-16T04:09:48Z

Regarding the WindowExec/Window expression reverse, do we support the reverse for all the built-in window functions?
For example for ROW_NUMBER() OVER , I think we should not allow the reverse, but I do not find a place to check or defining the allowed function list.

ozankabak · 2023-02-16T05:27:25Z

Not all built-in window functions are reversible. There is an indicator in the API called get_reverse_expr in the WindowExpr trait, which returns None if there is no equivalent reverse. For built-ins, this function calls reverse_expr, whose default value is None. Functions like LEAD and LAG override this to indicate reversibility, but ROW_NUMBER doesn't.

mingmwang · 2023-02-16T10:49:15Z

Some future work:

Support reordering multiple window expressions.
required_input_ordering() should remove the duplicate sort keys and remove the equal columns in the same EquivalenceProperties
Introducing FD(functional dependencies) to further avoid the unnecessary Sorts and Repartitions

ozankabak · 2023-02-16T21:18:21Z

We started the work to get the two approaches to a comparable state, @mustafasrepo is actively working on it. We will post more updates as we make progress.

mustafasrepo · 2023-02-21T06:45:44Z

datafusion/physical-expr/src/utils.rs

+            if prop.sort_options.is_some() {
+                PhysicalSortExpr {
+                    expr: prop.expr.clone(),
+                    options: prop.sort_options.unwrap(),
+                }
+            } else {


I think we can use If let Some(sort_options) = prop.sort_options idiom here. This would remove .unwrap()

mustafasrepo · 2023-02-21T06:48:06Z