-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analyze: fix AnalyzeV2 with global index #54420
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
Hi @Defined2014. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #54420 +/- ##
=================================================
- Coverage 72.8095% 56.2539% -16.5556%
=================================================
Files 1533 1663 +130
Lines 435505 614257 +178752
=================================================
+ Hits 317089 345544 +28455
- Misses 98790 245244 +146454
- Partials 19626 23469 +3843
Flags with carried forward coverage won't be shown. Click here to find out more.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 🤟 🖤
I have some questions(not only for you, I am still learning, so any comments from anyone all help):
- What happens if the stats version is 1?
- What happens if we only analyze a few partitions of one table in one analyze statement? Do we analyze the global index every time? What happens if the partition's stats are outdated, but the global index stats are updated?
I would like to ask @winoros and @time-and-fate those experts to help review this PR. 👀
@@ -88,8 +88,8 @@ func analyzeIndexPushdown(idxExec *AnalyzeIndexExec) *statistics.AnalyzeResults | |||
Count: cnt, | |||
Snapshot: idxExec.snapshot, | |||
} | |||
if idxExec.idxInfo.MVIndex { | |||
result.ForMVIndex = true | |||
if idxExec.idxInfo.MVIndex || (idxExec.idxInfo.Global && statsVer == statistics.Version2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask what is the current behavior when the stats version is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ForMVIndex
flag is used in update stats_meta
table in SaveTableStatsToStorage
function.
And for stats version V1, it should update stats_meta
table directly no matter it already have a record or not.
And for stats version V2, it will check do we have a newer stats or not, if the result is not newer than orignal one, we will ignore it. Because in general, stats v2 will only have one analyze result. But this is problematic for mv index and global index. And mv index can't work with stats version V1, so only the global index needs to be checked.
pkg/planner/core/planbuilder.go
Outdated
@@ -2181,6 +2181,10 @@ func getModifiedIndexesInfoForAnalyze( | |||
if originIdx.State != model.StatePublic { | |||
continue | |||
} | |||
// For global index, we also need to analyze it as independent index analyze task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comments above this function.
pkg/planner/core/planbuilder.go
Outdated
@@ -2327,6 +2331,13 @@ func (b *PlanBuilder) buildAnalyzeFullSamplingTask( | |||
} | |||
} | |||
|
|||
// Generate global indexes idxTask. | |||
for _, indexInfo := range tbl.TableInfo.Indices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we collect this information in the getModifiedIndexesInfoForAnalyze. Because in that function we:
- collect the normal indexes
- skip and collect the MVIndexes
- skip the global index
How about we also collect the global indexes in this function? Then we don't need to iterate the indexes again. And also helps us do the index check in one place.
@@ -108,7 +108,10 @@ type AnalyzeResults struct { | |||
// take care of those table-level fields. | |||
// In conclusion, when saving the analyze result for mv index, we need to store the index stats, as for the | |||
// table-level fields, we only need to update the version. | |||
ForMVIndex bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are still some comments using ForMVIndex
.
|
@@ -2169,18 +2169,24 @@ func getColOffsetForAnalyze(colsInfo []*model.ColumnInfo, colID int64) int { | |||
// in the execution phase of ANALYZE, we need to modify index.Columns[i].Offset according to colInfos. | |||
// TODO: find a better way to find indexed columns in ANALYZE rather than use IndexColumn.Offset | |||
// For multi-valued index, we need to collect it separately here and analyze it as independent index analyze task. | |||
// For global index, we also need to analyze it as independent index analyze task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more comments for it. Unlike other special handled indexes. The scope of collecting is also changed for the global index.
Other special indexes are still collected in the scope of partition. However, the global index is collected in the scope of the whole table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't see any place to handle the real special part of the global index.
// | ||
// The global index has only one key range, so an independent task is used to process it. | ||
// Global index needs to update only the version at the table-level fields, just like mv index. | ||
ForMVIndexOrGlobalIndex bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a enum or bitmask, not a bool.
What problem does this PR solve?
Issue Number: close #54233
Problem Summary: Supports processing global index in analyze version V2. Because the data of the global index is not scattered in every partitions, only one analyze is required for the whole table.
Because we use an independent index analyze task, there will be results of the same version when saving, which needs to be processed like mv index.
After the analysis is complete, the global index will only have global stats.
What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.