Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/analyzer: Add support for using indexes in joins where ON condition includes top level OR clauses. #471

Merged
merged 5 commits into from
Jun 16, 2021

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Jun 15, 2021

This adds support for concatenating the results from two more index lookups against a subordinate table in order to find the rows matching a join condition structured as indexed_col = ... OR different_indexed_col = ....

@reltuk reltuk requested a review from zachmu June 15, 2021 23:00
@@ -700,6 +700,34 @@ var ScriptTests = []ScriptTest{
{3, 3},
},
},
{
Name: "Indexed Join On Keyless Table",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachmu What's the right way to skip this test in dolt, since we don't support indexes on keyless tables? Or should I just leave this test out for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can skip the whole thing dolt-side with:

diff --git a/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go b/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go
index 58bca274e..5ee14a568 100644
--- a/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go
+++ b/go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go
@@ -259,11 +259,14 @@ func TestTruncate(t *testing.T) {
 }
 
 func TestScripts(t *testing.T) {
+       skipped := []string{
+               "create index r_c0 on r (c0);",
+       }
        t.Run("no transactions", func(t *testing.T) {
-               enginetest.TestScripts(t, newDoltHarness(t))
+               enginetest.TestScripts(t, newDoltHarness(t).WithSkippedQueries(skipped))
        })
        t.Run("with transactions", func(t *testing.T) {
-               enginetest.TestScripts(t, newDoltHarness(t).withTransactionsEnabled(true))
+               enginetest.TestScripts(t, newDoltHarness(t).withTransactionsEnabled(true).WithSkippedQueries(skipped))
        })
 }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup exactly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the right approach

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only substantial comment is that you should have an additional engine test for 3 predicates in the OR join condition

@@ -700,6 +700,34 @@ var ScriptTests = []ScriptTest{
{3, 3},
},
},
{
Name: "Indexed Join On Keyless Table",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the right approach

@@ -402,6 +422,10 @@ type joinIndex struct {
table string
// The index that can be used in this join, if any. nil otherwise
index sql.Index
// If this is set, the join condition is a top-level OR
// expression which can potentially make use of two (or more)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing... this field stores exactly two joinIndexes, which themselves may be OR conditions

"insert into l values (0, 0, 0), (1, 0, 1), (2, 1, 0), (3, 0, 2), (4, 2, 0), (5, 1, 2), (6, 2, 1), (7, 2, 2);",
"insert into r values (1, 1), (2, 2), (2, 2);",
},
Query: "select pk, l.c0, l.c1 from l join r on l.c0 = r.c0 or l.c1 = r.c1 order by 1, 2, 3;",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a test of n=3 join columns here to ensure that recursive OR logic works correctly

@@ -66,7 +66,7 @@ func getTableAliases(n sql.Node, scope *Scope) (TableAliases, error) {

if at, ok := node.(*plan.TableAlias); ok {
switch t := at.Child.(type) {
case *plan.ResolvedTable, *plan.SubqueryAlias, *plan.ValueDerivedTable:
case *plan.ResolvedTable, *plan.SubqueryAlias, *plan.ValueDerivedTable, *plan.TransformedNamedNode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this isn't necessary in other places as well. Is that because most analyzer rules have already run before these things exist?

High time we derive a TableFactor interface for this kind of node, but that can be another day

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a combination of that and the fact that some relevant entry points are already using this interface. We do optimize indexes quite late though...

"github.com/dolthub/go-mysql-server/sql"
)

type TransformedNamedNode struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably could use a comment here, something like

TransformedNamedNode is a wrapper for arbitrary logic to represent a table factor assembled from other nodes

See e.g. Concat

@reltuk reltuk merged commit a25dd40 into master Jun 16, 2021
@Hydrocharged Hydrocharged deleted the aaron/support-join-on-or-clause-uses-indexes branch December 8, 2021 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants