Skip to content

[Gen4] Only remove extracted subquery from predicates on merge#11072

Closed
arthurschreiber wants to merge 1 commit intovitessio:mainfrom
arthurschreiber:arthur/keep-vindex-predicates-on-subquery-merge
Closed

[Gen4] Only remove extracted subquery from predicates on merge#11072
arthurschreiber wants to merge 1 commit intovitessio:mainfrom
arthurschreiber:arthur/keep-vindex-predicates-on-subquery-merge

Conversation

@arthurschreiber
Copy link
Member

@arthurschreiber arthurschreiber commented Aug 23, 2022

Description

When merging a subquery into an outer route, all vindex predicates are currently reset and filled again from the outer route's SeenPredicates.

func (r *Route) resetRoutingSelections(ctx *plancontext.PlanningContext) error {
switch r.RouteOpCode {
case engine.DBA, engine.Next, engine.Reference, engine.Unsharded:
// these we keep as is
default:
r.RouteOpCode = engine.Scatter
}
r.Selected = nil
for i, vp := range r.VindexPreds {
r.VindexPreds[i] = &VindexPlusPredicates{ColVindex: vp.ColVindex, TableID: vp.TableID}
}
for _, predicate := range r.SeenPredicates {
err := r.tryImprovingVindex(ctx, predicate)
if err != nil {
return err
}
}
return nil
}

For routes of ApplyJoin nodes, the SeenPredicates list is empty and thus routing information could not be restored correctly.

Instead of resetting the vindex predicates, we can just remove all vindex predicate options for the merged subquery instead, and leave all other options untouched. This ensures that we still can pick any vindexes derived from the join operation, instead of falling back to a Scatter route.

Related Issue(s)

This fixes #10823.

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

When merging a subquery into a route, all vindex predicates were reset and filled again from the route's `SeenPredicates`. For routes of `ApplyJoin` nodes, the `SeenPredicates` list is empty and thus routing information could not be restored correctly.

Instead of resetting the vindex predicates, we can just remove all vindex predicate options for the merged subquery instead, and leave all other options untouched.

Signed-off-by: Arthur Schreiber <arthurschreiber@github.com>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Aug 23, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive. Additionally, flag names should use dashes (-) as word separators rather than underscores (_).
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.


for j, option := range vp.Options {
if sqlparser.EqualsExpr(option.Predicates[0], extractedSubquery.Original) {
idx = j
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could break here after we found one matching option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether the extracted subquery can match more than one vindex option or not. @harshit-gangal, do you know?

Otherwise, let's break yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked this. Even if the same inner query exists multiple times in an outer query, each ExtractedSubquery object is unique. This is due to the unique name generated for each subquery's replacement bind variable.

I added a comment over at https://github.com/vitessio/vitess/pull/11104/files#diff-ecda8980a2e86d0f2a98ac67bb7b55d759fa7d1e8037dab2497f8fd167ada4c5R120-R121 that describes this.

idx := -1

for j, option := range vp.Options {
if sqlparser.EqualsExpr(option.Predicates[0], extractedSubquery.Original) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only looks at the first predicate in the list. I think that's correct for all single column vindexes. Do we need to do anything special here for multi column vindexes? I don't think there's any other support for subqueries that match against tuples, so multi column vindex (and thus vindex predicates with more than one value) are not relevant yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safest bet would be to check the length of option.Predicates, and, only if the length is equal to one, compare the two predicates.

idx := -1
for i, predicate := range route.SeenPredicates {
if sqlparser.EqualsExpr(predicate, extractedSubquery) {
idx = i
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we could break after we found one match.

return nil, err
outer.Selected = nil
switch outer.RouteOpCode {
case engine.DBA, engine.Next, engine.Reference, engine.Unsharded:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken this list from resetRoutingSelections, but I'm not sure we can actually ever reach this code with one of these routing op codes? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure too. I am assuming if, for instance, we had a DBA query with a subquery? cc @systay

@arthurschreiber
Copy link
Member Author

@systay After having talked this through with @evaccaro and @pudiva, there might be a different way of fixing this:

diff --git a/go/vt/vtgate/planbuilder/physical/route_planning.go b/go/vt/vtgate/planbuilder/physical/route_planning.go
index 8c78dc4667..5ac19a3e94 100644
--- a/go/vt/vtgate/planbuilder/physical/route_planning.go
+++ b/go/vt/vtgate/planbuilder/physical/route_planning.go
@@ -563,6 +563,7 @@ func createRouteOperatorForJoin(aRoute, bRoute *Route, joinPredicates []sqlparse
                VindexPreds:         append(aRoute.VindexPreds, bRoute.VindexPreds...),
                SysTableTableSchema: append(aRoute.SysTableTableSchema, bRoute.SysTableTableSchema...),
                SysTableTableName:   sysTableName,
+               SeenPredicates:      append(aRoute.SeenPredicates, bRoute.SeenPredicates...),
                Source: &ApplyJoin{
                        LHS:       aRoute.Source,
                        RHS:       bRoute.Source,

By merging the SeenPredicates from both routes of the join into the merged join route, resetRoutingSelections would be able to re-create the required Vindex predicate options and pick the correct vindex option for routing.

Which fix do you prefer? 🙇‍♂️

Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I will let one more person ack this and answer my questions.

return nil, err
outer.Selected = nil
switch outer.RouteOpCode {
case engine.DBA, engine.Next, engine.Reference, engine.Unsharded:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure too. I am assuming if, for instance, we had a DBA query with a subquery? cc @systay

idx := -1

for j, option := range vp.Options {
if sqlparser.EqualsExpr(option.Predicates[0], extractedSubquery.Original) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safest bet would be to check the length of option.Predicates, and, only if the length is equal to one, compare the two predicates.


for j, option := range vp.Options {
if sqlparser.EqualsExpr(option.Predicates[0], extractedSubquery.Original) {
idx = j
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether the extracted subquery can match more than one vindex option or not. @harshit-gangal, do you know?

Otherwise, let's break yes

@arthurschreiber
Copy link
Member Author

arthurschreiber commented Aug 26, 2022

@frouioui Thank you for reviewing this! 🙇‍♂️

I opened #11104 which contains an alternative (and I think architecturally more sound) approach to fixing the same issue.

I'm going to close this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: Gen4 planner generates an incorrect and potentially inefficient query when using a correlated subquery

3 participants