Skip to content

select *: expand column list for tables with authoritative column lists#4336

Merged
demmer merged 5 commits intovitessio:masterfrom
planetscale:ss-vtgatev3
Nov 22, 2018
Merged

select *: expand column list for tables with authoritative column lists#4336
demmer merged 5 commits intovitessio:masterfrom
planetscale:ss-vtgatev3

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented Nov 4, 2018

Constructs like select * from t order by col were previously rejected because VTGate was not able to predetermine the position where col would appear in the select list.

This PR allows you to specify a column list for tables as authoritative, which will allow vtgate to expand the * expression into the actual column list, thereby allowing it to identify the correct column position for post-processing constructs that refer to such columns.

Change list:

  • Add orderedColumns to table in symtab, and wrap it with an addColumn func.
  • Add is_authoritative to vschema, and transmit the value into isAuthoritative in table.
  • Make subquery & vindex func column lists authoritative to increase smartness of vtgate.

@sougou sougou requested a review from demmer November 4, 2018 01:45
@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Nov 4, 2018

@demmer @shlomi-noach
This PR is on top of the subquery change to avoid potential conflicts.

@shlomi-noach
Copy link
Copy Markdown
Contributor

cc @tomkrouper

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo Primtive -> Primitive

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are checking the number of rows, but the error talks about the number of columns. By design?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, result.Rows[0] gets us the first row, which is a slice(array) consisting of column values.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand most of this code, but my spidey sense is tingling here. Should this if not have a corresponding else?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think I get it now.

@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Nov 15, 2018

You should review #3923 first :)

@sougou
Copy link
Copy Markdown
Contributor Author

sougou commented Nov 15, 2018

This PR is on top of that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo -- orderedColumns

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FWIW I think calling this columnNames is sufficient. It's self-evidently ordered from the fact that it's an array.

Copy link
Copy Markdown
Member

@demmer demmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that is_authoritative is too broad of an attribute name for this feature.
After all the vschema is always authoritative for the sharding controls, etc.

How about we keep this more focused and call the attribute "authoritative_columns" or something of that sort.

Also typo "ot" above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a calculated risk. My prediction is that the next step will be to pull the schema info from vttablet, at which time the meaning of this flag will become more accurate.

If you think it will backfire, I'll change it.

Copy link
Copy Markdown
Member

@demmer demmer Nov 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this potential future is all the more reason to use a more targeted configuration now that takes into account future expansions. Specifically, once the schema is propagated from the vttablet, I'd expect that by default we wouldn't want to set this flag and that instead all information would be pulled from the tablets.

What if we replaced the is_authoritative boolean with a schema_source enum containing options:

  • DEFAULT -- legacy behavior, eventually to be enhanced with schema info pulled from vttablet
  • STATIC -- the new behavior being added in this PR
  • NONE -- same as legacy behavior for now, but will disable the feature where schemas are pulled from vttablet.

If we go down this road we have a decision to make around the weight_string sorting changes made in #3522. IMO a consistent plan would be to couple the enabling of those changes with the changes in this PR and make it so that by default the columns definition in the vschema is ignored, but that if the schema_source is static then we enable both the weight_string replacement and the expanded column list for select * changes in this PR.

That approach would, however, break compatibility for anyone who is dependent on the current weight_string implementation and uses select * (including Slack FWIW) since users would need to be sure to fill in all the column names into the vschema and enable the new feature.

The other approach would be to just control new existing feature with the flags and leave the weight_string sorting alone. This is better for backwards compatibility but would be less internally consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, #3522 and this feature are fairly orthogonal. Column expansion happens early, at the time of select expression parsing, and weight_string decision is made later during order by processing. So, at that time, the columns will already be expanded out, which will cause weight_string to be added if needed.

I'll add a test case to demonstrate this behavior, which will also help me prove to myself that this works as intended.

As for is_authoritative: you seem to think it's not appropriate. I'm fine changing it to authoritative_columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I think the two are linked isn't anything related to the code that does the expansion, it's that weight_string ordering and select * expansion both depend on the column definitions being manually populated in the vschema.

So it seems to me that we should take both features into account when we start talking about ways to control whether or not vtgate should respect the column definitions in the vschema to inform the query planner or whether it should operate as best it can without knowing the schema.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to go ahead with authoritative_column_list instead of the enum approach outlined above. We may need to revisit in the future assuming we pull the schema list from vttablet.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is truly unreachable why not return an appropriate error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the code is reachable for joining an information_schema table with another. I've updated the comment. Nevertheless, the case itself is not an error condition. The behavior is as intended. I somehow assumed the higher level checks made this code path unreachable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, isn't this supposed to be friendlier on the GC:

tables := make([]*table, 0, len(st.orderedTables);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also had to add a guard for length==0, which is why I went with the simpler code at the cost of GC overhead. Without the guard, the nil check fails.

To expand '*' we need a list of ordered columns.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
If this flag is set for a table, we'll treat the column list
as authoritative and expand select *.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Add logic to use new symtab metadata about authoritative
tables to expand 'select *' expressions into the full
column list.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Copy link
Copy Markdown
Contributor Author

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also renamed orderedTables->tableNames

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the code is reachable for joining an information_schema table with another. I've updated the comment. Nevertheless, the case itself is not an error condition. The behavior is as intended. I somehow assumed the higher level checks made this code path unreachable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also had to add a guard for length==0, which is why I went with the simpler code at the cost of GC overhead. Without the guard, the nil check fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a calculated risk. My prediction is that the next step will be to pull the schema info from vttablet, at which time the meaning of this flag will become more accurate.

If you think it will backfire, I'll change it.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
I also had to fixup vtexplain to make it load vschemas with enums
properly.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
@demmer demmer merged commit 32dd398 into vitessio:master Nov 22, 2018
@sougou sougou deleted the ss-vtgatev3 branch November 27, 2018 22:16
@acharis
Copy link
Copy Markdown
Contributor

acharis commented Mar 14, 2019

@sougou can you write a little guide here on how to use this feature? we just saw one of these and i was about to file a bug when github (very helpfully!) suggested 3788 and then i saw this PR.
also, if using this means adding some list of columns to the vschema for every table...can i push on some more user friendly way (like the above mentioned idea of pulling the schema from the tablet?)

@dweitzman
Copy link
Copy Markdown
Collaborator

#4711 adds vschema DDL commands to update these columns without uploading an entirely new vschema

Some opinions:

  • It seems like a better experience to hack the authoritative column code so it doesn't require that all selected columns be in this list. For example, maybe you want to debug routing by doing select @@hostname from dual, which by default the authoritative column code would disallow. All we want is to help people doing select *, so we modified the code to remove the strictness.
  • We don't use the vschema DDLs mentioned above. We use an idempotent tool that knows how to build and upload the correct vschema based on the mysql schema. After running a DDL we can just run our fix_vschema tool and we're good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants