Skip to content

Prepare broken column statistics table using Hive metastore database directly#13949

Merged
findepi merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/hive-flaky-analyze-test
Sep 14, 2022
Merged

Prepare broken column statistics table using Hive metastore database directly#13949
findepi merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/hive-flaky-analyze-test

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Sep 1, 2022

Description

Fixes #13889

Documentation

(x) No documentation is needed.

Release notes

(x) No release notes entries required.

@ebyhr ebyhr added the no-release-notes This pull request does not require release notes entry label Sep 1, 2022
@cla-bot cla-bot bot added the cla-signed label Sep 1, 2022
@ebyhr ebyhr force-pushed the ebi/hive-flaky-analyze-test branch from 1fe7f11 to 9540a66 Compare September 1, 2022 04:20
@ebyhr ebyhr marked this pull request as ready for review September 1, 2022 04:21
@findinpath findinpath requested a review from findepi September 9, 2022 07:31
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepareBrokenColumnStatisticsTable may stop functioning and we will never find out.
Perhaps retry the method instead?

Also, why ANALYZE is expected to fail? eventually it should work. As a user, if a have some bad stats, I'd expect ANALYZE to fix them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps retry the method instead?

The test already has invocationCount = 3. I would like to leave to the option than retrying.

Also, why ANALYZE is expected to fail? eventually it should work. As a user, if a have some bad stats, I'd expect ANALYZE to fix them.

I think we agreed to fix only read path in case of broken statistics. Sorry if I misunderstood the conclusion. That's why there's assertThatThrownBy(() -> query("ANALYZE " + tableName)) at L72. As far as I tried, there's no option in Thrift Hive metastore client for updating or dropping column statistics when it has duplicated entires.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agreed to fix only read path in case of broken statistics

that's fine

but we may want to fix the ANALYZE too, so i wouldn't want to use ANALYZE in the test as a way to check that stats are broken

The test already has invocationCount = 3.

this means that it's attempted 3 times and all 3 times need to pass
it doesn't handle yet the case where setup didn't succeed in producing erroneous situation

why not retry prepareBrokenColumnStatisticsTable until stats are broken

also, can we inject broken stats into HMS database directly?
that could free us from any concurrency in the test

Copy link
Copy Markdown
Member Author

@ebyhr ebyhr Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can break MySQL database directly. I will change to the approach.
The reason I avoided was future HMS may not have duplicated stats issue and direct database modification may hide such changes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the previous approach is generally good. However, if it's not reliable, we may need to change the approach. I prefer to change the approach rather than to disable the test.

@ebyhr ebyhr force-pushed the ebi/hive-flaky-analyze-test branch from 9540a66 to 5a4daf8 Compare September 14, 2022 01:02
@ebyhr ebyhr force-pushed the ebi/hive-flaky-analyze-test branch from 5a4daf8 to 342a10c Compare September 14, 2022 01:11
@findepi findepi changed the title Skip Hive analyze test if failed to prepare broken column statistics Prepare broken column statistics table using Hive metastore database directly Sep 14, 2022
@findepi findepi merged commit 2f55bc7 into trinodb:master Sep 14, 2022
@github-actions github-actions bot added this to the 396 milestone Sep 14, 2022
@ebyhr ebyhr deleted the ebi/hive-flaky-analyze-test branch September 14, 2022 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed no-release-notes This pull request does not require release notes entry

Development

Successfully merging this pull request may close these issues.

Flaky TestHiveAnalyzeCorruptStatistics.testAnalyzeCorruptColumnStatisticsOnEmptyTable

3 participants