Skip to content

vtgate: subquery support#3923

Merged
sougou merged 5 commits intovitessio:masterfrom
sougou:sq2
Nov 18, 2018
Merged

vtgate: subquery support#3923
sougou merged 5 commits intovitessio:masterfrom
sougou:sq2

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented May 12, 2018

This implementation follows the design from doc/VTGateSubqueries.md.

A new PulloutSubquery primitive has been created. Its action is to
execute a subquery, save the resulting value(s) in a bind variable,
and pass that down to the underlying primitive.

The type of value set depends on whether the subquery occurred in
an IN clause, etc. In the case of IN and NOT IN clauses, if the
list returned is empty, a guard variable is set to work around
the inability for IN clauses to handle an empty list syntax.

Subqueries can only be pulled out if they are not correlated.
Correlated subqueries will be handled when we add the
ability to execute expressions in VTGate.

@sougou sougou requested review from demmer and rafael May 12, 2018 04:32
@derekperkins
Copy link
Copy Markdown
Member

@sougou Is this complete and just waiting on a review?

@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Oct 15, 2018

Pretty much. I need to update to the latest master.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This implementation follows the design from doc/VTGateSubqueries.md.

A new PulloutSubquery primitive has been created. Its action is to
execute a subquery, save the resulting value(s) in a bind variable,
and pass that down to the underlying primitive.

The type of value set depends on whether the subquery occurred in
an IN clause, etc. In the case of IN and NOT IN clauses, if the
list returned is empty, a guard variable is set to work around
the inability for IN clauses to handle an empty list syntax.

Subqueries can only be pulled out if they are not correlated.
Correlated subqueries will be handled when we add the
ability to execute expressions in VTGate.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Copy link
Copy Markdown
Member

@demmer demmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems good to me.

I made a few comments that I think would improve clarity for both this change and other elements of the core builder interface, but none are fundamental blockers.

vars map[string]struct{}
refs map[*column]string
vars map[string]struct{}
varIndex int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to have some comments here as to what these are used for.

Specifically it looks to me from reading the code that varIndex would be better named uniqueSuffix but even without the name it'd be good to have a comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR.

# Note the improved Underlying plan as SelectIN.
"select id from user where id in (select col from user)"
{
"Original": "select id from user where id in (select col from user)",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that occurred to me in reading this -- can we push down a subquery to a single shard if specified in the query?

Specifically suppose two tables are both sharded by a company_id column and we ran a query like:

select * from user where company_id=123 and id in (select user_id from admins where company_id=123);

In this case the query can be pushed down to whichever shard has both the user and user_admin.

Does the current implementation handle this efficiently?

}

# cross-shard subquery in NOT IN clause.
"select id from user where id not in (select col from user)"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it would be incredibly useful for cases like this to include vtexplain test cases for each of these as well to be able to more easily see as a reviewer (and someone getting to know Vitess) which exact queries end up running for these various patterns.

// If an expression has no references to the current query, then the left-most
// origin is chosen as the default.
func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (origin builder, pushExpr sqlparser.Expr, err error) {
func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (pullouts []*pulloutSubquery, origin builder, pushExpr sqlparser.Expr, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block comment above should probably be updated to indicate that this doesn't always bail if subqueries that can't be merged with the route (now) and also that the expressions might be rewritten, some subqueries are pulled out, etc.

Moreover, I think that similar to the comment on the Primitive() that findOrigin is a misleading name for this function, since it does more than finding the origin. Maybe something like analyzeExpr or buildExpr?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR.

// Primitive satisfies the builder interface.
func (ps *pulloutSubquery) Primitive() engine.Primitive {
ps.eSubquery.Subquery = ps.subquery.Primitive()
ps.eSubquery.Underlying = ps.underlying.Primitive()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new to this PR but it's non-intuitive that Primitive() actually mutates the builder data structures and doesn't just return the primitive itself, especially with the interface documentation of:

	// Primitve returns the underlying primitive.
	Primitive() engine.Primitive

IMO this should be renamed to BuildPrimitive().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also I just noticed the Primitve typo in the doc comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR.

}

func (pb *primitiveBuilder) join(rpb *primitiveBuilder, ajoin *sqlparser.JoinTableExpr) error {
if ajoin != nil && ajoin.Condition.Using != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this join case now works because of subqueries? That's not immediately obvious to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was unreachable. The check was alread performed in the code path for newJoin in join.go. OCD cleanup...

if leftIsRoute && rightIsRoute {
// If both are routes, they have an opportunity
// to merge into one.
if lRoute.ERoute.Opcode == engine.SelectNext || rRoute.ERoute.Opcode == engine.SelectNext {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. See new comments added below.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Copy link
Copy Markdown
Contributor Author

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred for separate PR:

  • Primtive -> Primitive in other places
  • Find better name for findOrigin
  • jointab comments
  • Primitive -> BuildPrimitive

Also, answering a comment I couldn't find in the PR: Subqueries that have the same route as the outer query will not be broken out. They will be pushed down as single unit. The new subquery code does not change this behavior.

}

func (pb *primitiveBuilder) join(rpb *primitiveBuilder, ajoin *sqlparser.JoinTableExpr) error {
if ajoin != nil && ajoin.Condition.Using != nil {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was unreachable. The check was alread performed in the code path for newJoin in join.go. OCD cleanup...

if leftIsRoute && rightIsRoute {
// If both are routes, they have an opportunity
// to merge into one.
if lRoute.ERoute.Opcode == engine.SelectNext || rRoute.ERoute.Opcode == engine.SelectNext {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. See new comments added below.

// If an expression has no references to the current query, then the left-most
// origin is chosen as the default.
func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (origin builder, pushExpr sqlparser.Expr, err error) {
func (pb *primitiveBuilder) findOrigin(expr sqlparser.Expr) (pullouts []*pulloutSubquery, origin builder, pushExpr sqlparser.Expr, err error) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR.

vars map[string]struct{}
refs map[*column]string
vars map[string]struct{}
varIndex int
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR.

// Primitive satisfies the builder interface.
func (ps *pulloutSubquery) Primitive() engine.Primitive {
ps.eSubquery.Subquery = ps.subquery.Primitive()
ps.eSubquery.Underlying = ps.underlying.Primitive()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR.

Copy link
Copy Markdown
Member

@demmer demmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one more spelling typo and had another naming suggestion. but I'm fine to move ahead with or without those changes.

jt.varIndex++
suffix := strconv.Itoa(jt.varIndex)
var1 := "__sq" + suffix
var2 := "__has_values" + suffix
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something which just occurred to me -- we should probably be more consistent in the naming scheme for these bind vars. Sorry for suggesting yet another renaming change, but I think it's worth it for clarity.

Specifically it seems like it'd be more consistent to change these to "__sq_valueX" and "__sq_has_valuesX" (or maybe invert the boolean sense of the latter and use "__sq_valueX" and "__sq_emptyX")?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with __sq_has_valuesX. The 'is_empty' approach felt less intuitive for the EXISTS case.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants