Skip to content

[native] Add ANALYZE STATS support#20055

Merged
karteekmurthys merged 1 commit intoprestodb:masterfrom
karteekmurthys:map-internal-functions
Jul 20, 2023
Merged

[native] Add ANALYZE STATS support#20055
karteekmurthys merged 1 commit intoprestodb:masterfrom
karteekmurthys:map-internal-functions

Conversation

@karteekmurthys
Copy link
Contributor

@karteekmurthys karteekmurthys commented Jul 5, 2023

Analyze stats requires functions max_data_size_for_stats and sum_data_size_for_stats that were implemented in Velox. But these are internal functions in Presto mapped to presto.default.$internal$ namesapce.

Translate the name mappings correctly during fragment translation to Velox to support "ANALYZE STATS"

Resolves: facebookincubator/velox#5447 and facebookincubator/velox#5484

@karteekmurthys karteekmurthys requested a review from a team as a code owner July 5, 2023 22:56
@karteekmurthys karteekmurthys marked this pull request as draft July 5, 2023 22:56
@karteekmurthys karteekmurthys self-assigned this Jul 5, 2023
@aditi-pandit
Copy link
Contributor

@karteekmurthys : Do you have an e2e test ?

@karteekmurthys
Copy link
Contributor Author

@karteekmurthys : Do you have an e2e test ?

I will merge analyze tests with this PR. We cannot directly call internal functions as a user.

@aditi-pandit
Copy link
Contributor

@karteekmurthys : Do you have an e2e test ?

I will merge analyze tests with this PR. We cannot directly call internal functions as a user.

Yes, please add the analyze tests.

@karteekmurthys karteekmurthys force-pushed the map-internal-functions branch 2 times, most recently from d6df1fd to d2ad340 Compare July 6, 2023 22:16
@karteekmurthys karteekmurthys marked this pull request as ready for review July 7, 2023 04:27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Please add a newline between the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Please add a newline here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be removed I think. Only please add a comment that the query results in a table giving the (column_name, data_size, distinct_values_count, nulls_fraction, row_count, low_value, high_value).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not needed.

@aditi-pandit aditi-pandit requested a review from a team July 11, 2023 21:23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ignored? Would you add a comment and perhaps refer to a GitHub issue that explains the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit @karteekmurthys Would take a look at this question? I'm still wondering about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue that i faced during testing. Please refer L73.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys Please, make sure all comments represent a standalone change and have [native] or [native pos] prefix. Make sure there are not commits like "address review comments". Please, start commit titles with active verbs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kAggregateFunctionsMap -> kFunctionNames for readability; 'Aggregate' is already part of the encoding function name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* Expected number of rows updated */

Please, remove this comment. Adding such comments to each assertUpdate call will make code unreadable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? How do you verify the results of the SHOW STATS command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMIW, assertQuery compares the results of presto and prestissimo. Do I need to explicit check for column values?
https://prestodb.io/docs/current/sql/show-stats.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that you run ANALYZE on Prestissimo and therefore we don't know if it works or not. If it doesn't work, then SHOW STATS will show nothing for both Prestissimo and Presto and the test will pass. That's not what we want, is it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if ANALYZE works but computes stats incorrectly, the tests won't detect that, will it?

Copy link
Contributor

@aditi-pandit aditi-pandit Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova : We had been thinking about what would be a more complete test here to cover the situations you outlined.

i) One option is to compare the results of SHOW STATS with values expected in a static list. This list is populated with what we consider correct results.
ii) The Java side also does Statement object level tests. We don't have that framework in prestissimo testing.

Right now the only guarantee is that the sequence of operations on either side are consistent.

But i) would make it more complete.

Wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i) One option is to compare the results of SHOW STATS with values expected in a static list. This list is populated with what we consider correct results.

Sounds reasonable to me. Thanks.

@mbasmanova mbasmanova changed the title Map presto internal function names to velox function names [native] Map internal aggregate function names Jul 11, 2023
@karteekmurthys karteekmurthys force-pushed the map-internal-functions branch 3 times, most recently from 7aab88c to 4d17b1f Compare July 13, 2023 18:38
@aditi-pandit aditi-pandit requested a review from mbasmanova July 13, 2023 23:14
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys Please, review the list of commits and (1) decide if you need 4 commits or 1 commit would be sufficient; (2) Add [native] or [native pos] prefix to all commit titles; (3) check the changes in each commit to make sure they match the commit message.

@karteekmurthys karteekmurthys force-pushed the map-internal-functions branch from 4d17b1f to 24879ed Compare July 14, 2023 17:50
@karteekmurthys
Copy link
Contributor Author

karteekmurthys commented Jul 14, 2023

@karteekmurthys Please, review the list of commits and (1) decide if you need 4 commits or 1 commit would be sufficient; (2) Add [native] or [native pos] prefix to all commit titles; (3) check the changes in each commit to make sure they match the commit message.

I have squashed it down to single commit now. Please review.

@karteekmurthys
Copy link
Contributor Author

@mbasmanova would you please review this.

@aditi-pandit aditi-pandit requested a review from mbasmanova July 18, 2023 20:23
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys Looks good to me, but CI is red.

Curious, why this doesn't work for PoS? CC: @vermapratyush

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys It would be nice to update commit message to include more details about this change.

@karteekmurthys karteekmurthys changed the title [native] Map internal aggregate function names [native] Map internal aggregate function names and add analyze stats e2e tests Jul 19, 2023
@aditi-pandit aditi-pandit changed the title [native] Map internal aggregate function names and add analyze stats e2e tests [native] Add analyze stats support Jul 19, 2023
@aditi-pandit aditi-pandit changed the title [native] Add analyze stats support [native] Add ANALYZE STATS support Jul 19, 2023
@karteekmurthys karteekmurthys merged commit e1b6f76 into prestodb:master Jul 20, 2023
@wanglinsong wanglinsong mentioned this pull request Jul 27, 2023
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Should we make function max_data_size_for_stats and sum_data_size_for_stats private and internal?

3 participants