vtgate: subquery support#3923
Conversation
|
@sougou Is this complete and just waiting on a review? |
|
Pretty much. I need to update to the latest master. |
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This implementation follows the design from doc/VTGateSubqueries.md. A new PulloutSubquery primitive has been created. Its action is to execute a subquery, save the resulting value(s) in a bind variable, and pass that down to the underlying primitive. The type of value set depends on whether the subquery occurred in an IN clause, etc. In the case of IN and NOT IN clauses, if the list returned is empty, a guard variable is set to work around the inability for IN clauses to handle an empty list syntax. Subqueries can only be pulled out if they are not correlated. Correlated subqueries will be handled when we add the ability to execute expressions in VTGate. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
demmer
left a comment
There was a problem hiding this comment.
Overall this seems good to me.
I made a few comments that I think would improve clarity for both this change and other elements of the core builder interface, but none are fundamental blockers.
| vars map[string]struct{} | ||
| refs map[*column]string | ||
| vars map[string]struct{} | ||
| varIndex int |
There was a problem hiding this comment.
It'd be great to have some comments here as to what these are used for.
Specifically it looks to me from reading the code that varIndex would be better named uniqueSuffix but even without the name it'd be good to have a comment.
| # Note the improved Underlying plan as SelectIN. | ||
| "select id from user where id in (select col from user)" | ||
| { | ||
| "Original": "select id from user where id in (select col from user)", |
There was a problem hiding this comment.
One thing that occurred to me in reading this -- can we push down a subquery to a single shard if specified in the query?
Specifically suppose two tables are both sharded by a company_id column and we ran a query like:
select * from user where company_id=123 and id in (select user_id from admins where company_id=123);
In this case the query can be pushed down to whichever shard has both the user and user_admin.
Does the current implementation handle this efficiently?
| } | ||
|
|
||
| # cross-shard subquery in NOT IN clause. | ||
| "select id from user where id not in (select col from user)" |
There was a problem hiding this comment.
I also think it would be incredibly useful for cases like this to include vtexplain test cases for each of these as well to be able to more easily see as a reviewer (and someone getting to know Vitess) which exact queries end up running for these various patterns.
| // If an expression has no references to the current query, then the left-most | ||
| // origin is chosen as the default. | ||
| func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (origin builder, pushExpr sqlparser.Expr, err error) { | ||
| func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (pullouts []*pulloutSubquery, origin builder, pushExpr sqlparser.Expr, err error) { |
There was a problem hiding this comment.
The block comment above should probably be updated to indicate that this doesn't always bail if subqueries that can't be merged with the route (now) and also that the expressions might be rewritten, some subqueries are pulled out, etc.
Moreover, I think that similar to the comment on the Primitive() that findOrigin is a misleading name for this function, since it does more than finding the origin. Maybe something like analyzeExpr or buildExpr?
| // Primitive satisfies the builder interface. | ||
| func (ps *pulloutSubquery) Primitive() engine.Primitive { | ||
| ps.eSubquery.Subquery = ps.subquery.Primitive() | ||
| ps.eSubquery.Underlying = ps.underlying.Primitive() |
There was a problem hiding this comment.
Not new to this PR but it's non-intuitive that Primitive() actually mutates the builder data structures and doesn't just return the primitive itself, especially with the interface documentation of:
// Primitve returns the underlying primitive.
Primitive() engine.Primitive
IMO this should be renamed to BuildPrimitive().
There was a problem hiding this comment.
(Also I just noticed the Primitve typo in the doc comment)
| } | ||
|
|
||
| func (pb *primitiveBuilder) join(rpb *primitiveBuilder, ajoin *sqlparser.JoinTableExpr) error { | ||
| if ajoin != nil && ajoin.Condition.Using != nil { |
There was a problem hiding this comment.
I assume this join case now works because of subqueries? That's not immediately obvious to me.
There was a problem hiding this comment.
This code was unreachable. The check was alread performed in the code path for newJoin in join.go. OCD cleanup...
| if leftIsRoute && rightIsRoute { | ||
| // If both are routes, they have an opportunity | ||
| // to merge into one. | ||
| if lRoute.ERoute.Opcode == engine.SelectNext || rRoute.ERoute.Opcode == engine.SelectNext { |
There was a problem hiding this comment.
Same here. See new comments added below.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
sougou
left a comment
There was a problem hiding this comment.
Deferred for separate PR:
- Primtive -> Primitive in other places
- Find better name for
findOrigin - jointab comments
Primitive->BuildPrimitive
Also, answering a comment I couldn't find in the PR: Subqueries that have the same route as the outer query will not be broken out. They will be pushed down as single unit. The new subquery code does not change this behavior.
| } | ||
|
|
||
| func (pb *primitiveBuilder) join(rpb *primitiveBuilder, ajoin *sqlparser.JoinTableExpr) error { | ||
| if ajoin != nil && ajoin.Condition.Using != nil { |
There was a problem hiding this comment.
This code was unreachable. The check was alread performed in the code path for newJoin in join.go. OCD cleanup...
| if leftIsRoute && rightIsRoute { | ||
| // If both are routes, they have an opportunity | ||
| // to merge into one. | ||
| if lRoute.ERoute.Opcode == engine.SelectNext || rRoute.ERoute.Opcode == engine.SelectNext { |
There was a problem hiding this comment.
Same here. See new comments added below.
| // If an expression has no references to the current query, then the left-most | ||
| // origin is chosen as the default. | ||
| func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (origin builder, pushExpr sqlparser.Expr, err error) { | ||
| func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (pullouts []*pulloutSubquery, origin builder, pushExpr sqlparser.Expr, err error) { |
| vars map[string]struct{} | ||
| refs map[*column]string | ||
| vars map[string]struct{} | ||
| varIndex int |
| // Primitive satisfies the builder interface. | ||
| func (ps *pulloutSubquery) Primitive() engine.Primitive { | ||
| ps.eSubquery.Subquery = ps.subquery.Primitive() | ||
| ps.eSubquery.Underlying = ps.underlying.Primitive() |
demmer
left a comment
There was a problem hiding this comment.
I found one more spelling typo and had another naming suggestion. but I'm fine to move ahead with or without those changes.
go/vt/vtgate/planbuilder/jointab.go
Outdated
| jt.varIndex++ | ||
| suffix := strconv.Itoa(jt.varIndex) | ||
| var1 := "__sq" + suffix | ||
| var2 := "__has_values" + suffix |
There was a problem hiding this comment.
Something which just occurred to me -- we should probably be more consistent in the naming scheme for these bind vars. Sorry for suggesting yet another renaming change, but I think it's worth it for clarity.
Specifically it seems like it'd be more consistent to change these to "__sq_valueX" and "__sq_has_valuesX" (or maybe invert the boolean sense of the latter and use "__sq_valueX" and "__sq_emptyX")?
There was a problem hiding this comment.
I went with __sq_has_valuesX. The 'is_empty' approach felt less intuitive for the EXISTS case.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This implementation follows the design from doc/VTGateSubqueries.md.
A new PulloutSubquery primitive has been created. Its action is to
execute a subquery, save the resulting value(s) in a bind variable,
and pass that down to the underlying primitive.
The type of value set depends on whether the subquery occurred in
an IN clause, etc. In the case of IN and NOT IN clauses, if the
list returned is empty, a guard variable is set to work around
the inability for IN clauses to handle an empty list syntax.
Subqueries can only be pulled out if they are not correlated.
Correlated subqueries will be handled when we add the
ability to execute expressions in VTGate.