Skip to content

v3: support for aggregation and sorting of TEXT#3522

Merged
sougou merged 1 commit intovitessio:masterfrom
sougou:v3
Jan 13, 2018
Merged

v3: support for aggregation and sorting of TEXT#3522
sougou merged 1 commit intovitessio:masterfrom
sougou:v3

Conversation

@sougou
Copy link
Copy Markdown
Contributor

@sougou sougou commented Jan 3, 2018

This change adds support for aggregating and sorting of columns
that may need to follow collation rules.

The feature depends on mysql's weight_string function
that returns a lexically comparable value of any text column.
If a group by or order by uses a text column like varchar, then
v3 will additionally request the weight_string versions of such
columns and perform ordering and aggregation using those values
instead.

The identification of a text column is currently based on a new
"columns" field in the vschema that allows you to specify the
type of each column. This part was implemented in a previous PR.

Two approaches were possible:

  1. Request additional weight_string values at the time of push-down.
  2. Rewire the primitives to request and use the additional weight_string
    values during the Wireup phase.
    I went with option 2 because it minimizes overall impact of the
    code, which will allow us to yank this behavior out when we
    implement collation aware sorting in vtgate. Also, since all
    the weight_string columns get added at the end, we only have
    to truncate the rows before returning what's needed.

The following changes were made:

  • sqltypes: result truncate functionality.
  • engine: Add TruncateColumnCount field that can be used to truncate
    a result if needed.
  • planbuilder: Change Wireup to check and request weight_strings.
    If weight_string was requested, set the row truncation to make
    sure that the weight_string values don't get passed on.

@sougou sougou requested review from demmer and rafael January 3, 2018 06:52
@dumbunny
Copy link
Copy Markdown
Contributor

dumbunny commented Jan 5, 2018

LGTM

Copy link
Copy Markdown
Member

@demmer demmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but overall this LGTM.

Overall I would really love if we could find an alternative to weight_string but as discussed on Slack that doesn't look to be the case given the golang utf-8 limitations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rephrase this as "truncated to the specified number of columns".
I was initially confused thinking that this would truncate the individual fields.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems cleaner and I suspect more efficient on the gc to only create weightStrings when needed rather than always create the empty array.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is a map. There's no autocreation support for it like slices.
Deferring the creation will introduce conditionals in multiple places (nil-check->make).

Besides, this is planbuilder time, which should be low QPS ideally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume removing these is intentional (which I support) but that seems orthogonal to these changes.

Copy link
Copy Markdown
Contributor Author

@sougou sougou Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. There was too much spam in tests.

This change adds support for aggregating and sorting of columns
that may need to follow collation rules.

The feature depends on mysql's weight_string function
that returns a lexically comparable value of any text column.
If a group by or order by uses a text column like varchar, then
v3 will additionally request the weight_string versions of such
columns and perform ordering and aggregation using those values
instead.

The identification of a text column is currently based on a new
"columns" field in the vschema that allows you to specify the
type of each column. This part was implemented in a previous PR.

Two approaches were possible:
1. Request additional weight_string values at the time of push-down.
2. Rewire the primitives to request and use the additional weight_string
   values during the Wireup phase.
I went with option 2 because it minimizes overall impact of the
code, which will allow us to yank this behavior out when we
implement collation aware sorting in vtgate. Also, since all
the weight_string columns get added at the end, we only have
to truncate the rows before returning what's needed.

The following changes were made:
* sqltypes: result truncate functionality.
* engine: Add TruncateColumnCount field that can be used to truncate
  a result if needed.
* planbuilder: Change Wireup to check and request weight_strings.
  If weight_string was requested, set the row truncation to make
  sure that the weight_string values don't get passed on.
@indera-shsp
Copy link
Copy Markdown
Contributor

indera-shsp commented Sep 9, 2020

@sougou Is this a known bug?

select id, companyName As test from companyProfile where id = 999123 order by companyName;
+-------+-------------+
| id    | test        |
+-------+-------------+
|999123 | TheCompany  |
+-------+-------------+
1 row in set (0.09 sec)


select id, companyName As test from companyProfile where id in(999123) order by companyName;


ERROR 1064 (42000): vtgate: http://vtgate-...: target: app.-80.master, used tablet: staging-978852100 (app-x-80x-
replica-0.vttablet.default.svc.cluster.local):
 vttablet: 
rpc error: code = InvalidArgument desc = You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'test) from companyProfile where 1 != 1' at line 1 (errno 1064) (sqlstate 42000) (CallerID: ...): Sql: "selec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants