Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyze: fix AnalyzeV2 with global index #54420

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Defined2014
Copy link
Contributor

@Defined2014 Defined2014 commented Jul 3, 2024

What problem does this PR solve?

Issue Number: close #54233

Problem Summary: Supports processing global index in analyze version V2. Because the data of the global index is not scattered in every partitions, only one analyze is required for the whole table.

Because we use an independent index analyze task, there will be results of the same version when saving, which needs to be processed like mv index.

After the analysis is complete, the global index will only have global stats.

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented Jul 3, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none labels Jul 3, 2024
@Defined2014 Defined2014 marked this pull request as ready for review July 3, 2024 10:02
@ti-chi-bot ti-chi-bot bot added sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 3, 2024
Copy link

tiprow bot commented Jul 3, 2024

Hi @Defined2014. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jul 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.2539%. Comparing base (95edc2d) to head (8bee4e5).
Report is 37 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #54420         +/-   ##
=================================================
- Coverage   72.8095%   56.2539%   -16.5556%     
=================================================
  Files          1533       1663        +130     
  Lines        435505     614257     +178752     
=================================================
+ Hits         317089     345544      +28455     
- Misses        98790     245244     +146454     
- Partials      19626      23469       +3843     
Flag Coverage Δ
integration 37.0583% <94.4444%> (?)
unit 71.8246% <100.0000%> (+0.0106%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <ø> (ø)
parser ∅ <ø> (∅)
br 52.5948% <ø> (+6.4751%) ⬆️

@Defined2014 Defined2014 changed the title analyze: fix analyze with global index analyze: fix AnalyzeV2 with global index Jul 4, 2024
Copy link

ti-chi-bot bot commented Jul 4, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fixdb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@hi-rustin hi-rustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🤟 🖤

I have some questions(not only for you, I am still learning, so any comments from anyone all help):

  1. What happens if the stats version is 1?
  2. What happens if we only analyze a few partitions of one table in one analyze statement? Do we analyze the global index every time? What happens if the partition's stats are outdated, but the global index stats are updated?

I would like to ask @winoros and @time-and-fate those experts to help review this PR. 👀

@@ -88,8 +88,8 @@ func analyzeIndexPushdown(idxExec *AnalyzeIndexExec) *statistics.AnalyzeResults
Count: cnt,
Snapshot: idxExec.snapshot,
}
if idxExec.idxInfo.MVIndex {
result.ForMVIndex = true
if idxExec.idxInfo.MVIndex || (idxExec.idxInfo.Global && statsVer == statistics.Version2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what is the current behavior when the stats version is 1?

Copy link
Contributor Author

@Defined2014 Defined2014 Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ForMVIndex flag is used in update stats_meta table in SaveTableStatsToStorage function.

And for stats version V1, it should update stats_meta table directly no matter it already have a record or not.

And for stats version V2, it will check do we have a newer stats or not, if the result is not newer than orignal one, we will ignore it. Because in general, stats v2 will only have one analyze result. But this is problematic for mv index and global index. And mv index can't work with stats version V1, so only the global index needs to be checked.

@@ -2181,6 +2181,10 @@ func getModifiedIndexesInfoForAnalyze(
if originIdx.State != model.StatePublic {
continue
}
// For global index, we also need to analyze it as independent index analyze task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comments above this function.

@@ -2327,6 +2331,13 @@ func (b *PlanBuilder) buildAnalyzeFullSamplingTask(
}
}

// Generate global indexes idxTask.
for _, indexInfo := range tbl.TableInfo.Indices {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we collect this information in the getModifiedIndexesInfoForAnalyze. Because in that function we:

  1. collect the normal indexes
  2. skip and collect the MVIndexes
  3. skip the global index

How about we also collect the global indexes in this function? Then we don't need to iterate the indexes again. And also helps us do the index check in one place.

@@ -108,7 +108,10 @@ type AnalyzeResults struct {
// take care of those table-level fields.
// In conclusion, when saving the analyze result for mv index, we need to store the index stats, as for the
// table-level fields, we only need to update the version.
ForMVIndex bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are still some comments using ForMVIndex.

@Defined2014
Copy link
Contributor Author

Defined2014 commented Jul 5, 2024

Thanks! 🤟 🖤

I have some questions(not only for you, I am still learning, so any comments from anyone all help):

  1. What happens if the stats version is 1?
  2. What happens if we only analyze a few partitions of one table in one analyze statement? Do we analyze the global index every time? What happens if the partition's stats are outdated, but the global index stats are updated?

I would like to ask @winoros and @time-and-fate those experts to help review this PR. 👀

  1. For stats version 1, it also calls generateIndexTasks funtion to generate global index tasks. Except for the key range, everything else is the same as the normal index, so there is not much to change.

  2. We will analyze the global index every time.

@@ -2169,18 +2169,24 @@ func getColOffsetForAnalyze(colsInfo []*model.ColumnInfo, colID int64) int {
// in the execution phase of ANALYZE, we need to modify index.Columns[i].Offset according to colInfos.
// TODO: find a better way to find indexed columns in ANALYZE rather than use IndexColumn.Offset
// For multi-valued index, we need to collect it separately here and analyze it as independent index analyze task.
// For global index, we also need to analyze it as independent index analyze task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more comments for it. Unlike other special handled indexes. The scope of collecting is also changed for the global index.
Other special indexes are still collected in the scope of partition. However, the global index is collected in the scope of the whole table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see any place to handle the real special part of the global index.

//
// The global index has only one key range, so an independent task is used to process it.
// Global index needs to update only the version at the table-level fields, just like mv index.
ForMVIndexOrGlobalIndex bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a enum or bitmask, not a bool.

@winoros
Copy link
Member

winoros commented Jul 9, 2024

QQ_1720533123912
For example, the global scope is conflicted with for each partition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Statistics cannot be generated correctly for a global index that has an expression index
3 participants