Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: improve FLASHBACK DATABASE for many table case #54439

Merged
merged 11 commits into from
Jul 13, 2024

Conversation

lance6716
Copy link
Contributor

@lance6716 lance6716 commented Jul 4, 2024

What problem does this PR solve?

Issue Number: close #54415

moving reading dropped TableInfo from job submit node to DDL owner, because it's too long to persist all TableInfo

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below) 100k tables under a DB failde at last step which is writting "schema diff". Will fix it in separate PR.
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 4, 2024
Copy link

tiprow bot commented Jul 4, 2024

Hi @lance6716. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor Author

/check-issue-triage-complete

Copy link

codecov bot commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 83.63636% with 9 lines in your changes missing coverage. Please review.

Project coverage is 56.1689%. Comparing base (a490882) to head (a4c9f95).
Report is 10 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #54439         +/-   ##
=================================================
- Coverage   72.9150%   56.1689%   -16.7461%     
=================================================
  Files          1549       1670        +121     
  Lines        436294     611996     +175702     
=================================================
+ Hits         318124     343752      +25628     
- Misses        98671     244965     +146294     
- Partials      19499      23279       +3780     
Flag Coverage Δ
integration 37.3936% <0.0000%> (?)
unit 71.9232% <83.6363%> (-0.0239%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <ø> (ø)
parser ∅ <ø> (∅)
br 52.5994% <ø> (+6.7354%) ⬆️

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 5, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: lance6716 <[email protected]>
@lance6716 lance6716 changed the title [WIP]ddl: FLASHBACK DATABASE read TableInfo on DDL owner ddl: improve FLASHBACK DATABASE for many table case and retry case Jul 5, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024
pkg/ddl/ddl.go Outdated
// RecoverSchemaIDOnOwner is the new logic to avoid a large RecoverTabsInfo can't
// be persisted. If it's set, DDL owner will recover RecoverTabsInfo instead of
// job submit node.
RecoverSchemaIDOnOwner int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this id duplicate with job.SchemaID and *model.DBInfo.ID in this struct, maybe add a LoadOnExecute bool?

tk.MustGetErrMsg("flashback database test_flashback", "[ddl:-1]DDL job rollback, error msg: mock error for clearTablePlacementAndBundles")
require.NoError(t, failpoint.Enable("github.com/pingcap/tidb/pkg/ddl/mockClearTablePlacementAndBundlesErr", `1*return()`))
tk.MustExec("flashback database test_flashback")
require.NoError(t, failpoint.Disable("github.com/pingcap/tidb/pkg/ddl/mockClearTablePlacementAndBundlesErr"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
require.NoError(t, failpoint.Disable("github.com/pingcap/tidb/pkg/ddl/mockClearTablePlacementAndBundlesErr"))
testfailpoint.Disable(t, "github.com/pingcap/tidb/pkg/ddl/mockClearTablePlacementAndBundlesErr"))

previous too

Comment on lines 1212 to 1217
if err == nil && job.State != model.JobStateRollingback && job.State != model.JobStateCancelled {
// must let `runDDLJob` return an error, otherwise `needUpdateRawArgs` will
// marshal empty Args into RawArgs.
err = errors.New("job can't be cancelled")
}
return ver, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertJob2RollbackJob also checks job state, we can merge this part into it.

Signed-off-by: lance6716 <[email protected]>
Copy link

ti-chi-bot bot commented Jul 11, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-10 02:31:25.062526289 +0000 UTC m=+411182.297760402: ☑️ agreed by D3Hunter.
  • 2024-07-11 06:34:55.900312996 +0000 UTC m=+512193.135547112: ☑️ agreed by tangenta.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 11, 2024

@lance6716: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow 85c7faf link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 11, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 11, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 11, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2024
Signed-off-by: lance6716 <[email protected]>
@lance6716 lance6716 changed the title ddl: improve FLASHBACK DATABASE for many table case and retry case ddl: improve FLASHBACK DATABASE for many table case Jul 12, 2024
Comment on lines +571 to +574
err = clearTablePlacementAndBundles(w.ctx, recoverInfo.TableInfo)
if err != nil {
return ver, errors.Trace(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function sends a HTTP request. If there is a performance issue for onRecoverSchema, we can solve it by batching bundles from all involved tableInfo.

@lance6716
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2024
Copy link

ti-chi-bot bot commented Jul 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, tangenta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 12, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 12, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 13, 2024

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit 8b78a4f into pingcap:master Jul 13, 2024
21 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #54602.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.5 release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't FLASHBACK DATABASE if there're too many tables under the database
4 participants