-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/analyzer: Add support for using indexes in joins where ON condition includes top level OR clauses. #471
Conversation
…itions using top-level ORs.
…ncountering duplicate rows.
@@ -700,6 +700,34 @@ var ScriptTests = []ScriptTest{ | |||
{3, 3}, | |||
}, | |||
}, | |||
{ | |||
Name: "Indexed Join On Keyless Table", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zachmu What's the right way to skip this test in dolt, since we don't support indexes on keyless tables? Or should I just leave this test out for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can skip the whole thing dolt-side with:
diff --git a/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go b/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go
index 58bca274e..5ee14a568 100644
--- a/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go
+++ b/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go
@@ -259,11 +259,14 @@ func TestTruncate(t *testing.T) {
}
func TestScripts(t *testing.T) {
+ skipped := []string{
+ "create index r_c0 on r (c0);",
+ }
t.Run("no transactions", func(t *testing.T) {
- enginetest.TestScripts(t, newDoltHarness(t))
+ enginetest.TestScripts(t, newDoltHarness(t).WithSkippedQueries(skipped))
})
t.Run("with transactions", func(t *testing.T) {
- enginetest.TestScripts(t, newDoltHarness(t).withTransactionsEnabled(true))
+ enginetest.TestScripts(t, newDoltHarness(t).withTransactionsEnabled(true).WithSkippedQueries(skipped))
})
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the right approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only substantial comment is that you should have an additional engine test for 3 predicates in the OR join condition
@@ -700,6 +700,34 @@ var ScriptTests = []ScriptTest{ | |||
{3, 3}, | |||
}, | |||
}, | |||
{ | |||
Name: "Indexed Join On Keyless Table", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the right approach
sql/analyzer/indexed_joins.go
Outdated
@@ -402,6 +422,10 @@ type joinIndex struct { | |||
table string | |||
// The index that can be used in this join, if any. nil otherwise | |||
index sql.Index | |||
// If this is set, the join condition is a top-level OR | |||
// expression which can potentially make use of two (or more) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confusing... this field stores exactly two joinIndexes, which themselves may be OR conditions
enginetest/script_queries.go
Outdated
"insert into l values (0, 0, 0), (1, 0, 1), (2, 1, 0), (3, 0, 2), (4, 2, 0), (5, 1, 2), (6, 2, 1), (7, 2, 2);", | ||
"insert into r values (1, 1), (2, 2), (2, 2);", | ||
}, | ||
Query: "select pk, l.c0, l.c1 from l join r on l.c0 = r.c0 or l.c1 = r.c1 order by 1, 2, 3;", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a test of n=3 join columns here to ensure that recursive OR logic works correctly
@@ -66,7 +66,7 @@ func getTableAliases(n sql.Node, scope *Scope) (TableAliases, error) { | |||
|
|||
if at, ok := node.(*plan.TableAlias); ok { | |||
switch t := at.Child.(type) { | |||
case *plan.ResolvedTable, *plan.SubqueryAlias, *plan.ValueDerivedTable: | |||
case *plan.ResolvedTable, *plan.SubqueryAlias, *plan.ValueDerivedTable, *plan.TransformedNamedNode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that this isn't necessary in other places as well. Is that because most analyzer rules have already run before these things exist?
High time we derive a TableFactor interface for this kind of node, but that can be another day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a combination of that and the fact that some relevant entry points are already using this interface. We do optimize indexes quite late though...
"github.com/dolthub/go-mysql-server/sql" | ||
) | ||
|
||
type TransformedNamedNode struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably could use a comment here, something like
TransformedNamedNode is a wrapper for arbitrary logic to represent a table factor assembled from other nodes
See e.g. Concat
This adds support for concatenating the results from two more index lookups against a subordinate table in order to find the rows matching a join condition structured as
indexed_col = ... OR different_indexed_col = ...
.