Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disttask: correct the usage of context #48343

Merged
merged 3 commits into from
Nov 7, 2023

Conversation

tangenta
Copy link
Contributor

@tangenta tangenta commented Nov 7, 2023

What problem does this PR solve?

Issue Number: close #48303

Problem Summary:

After cancelling the job, I found that the ingest worker did not stop. Instead, it was waiting for the server side(TiKV):

11 @ 0x1c2d6ee 0x1c3e185 0x21cf23c 0x225699a 0x2256987 0x2255c1f 0x22548da 0x2255a73 0x3d9bb96 0x3d999f1 0x3d98066 0x3d8942b 0x3d88f47 0x3d8be0e 0x3ac2a96 0x1c625c1
#	0x21cf23b	google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader+0x7b			/go/pkg/mod/google.golang.org/[email protected]/internal/transport/transport.go:327
#	0x2256999	google.golang.org/grpc/internal/transport.(*Stream).RecvCompress+0xb9			/go/pkg/mod/google.golang.org/[email protected]/internal/transport/transport.go:342
#	0x2256986	google.golang.org/grpc.(*csAttempt).recvMsg+0xa6					/go/pkg/mod/google.golang.org/[email protected]/stream.go:1070
#	0x2255c1e	google.golang.org/grpc.(*clientStream).RecvMsg.func1+0x1e				/go/pkg/mod/google.golang.org/[email protected]/stream.go:927
#	0x22548d9	google.golang.org/grpc.(*clientStream).withRetry+0x139					/go/pkg/mod/google.golang.org/[email protected]/stream.go:776
#	0x2255a72	google.golang.org/grpc.(*clientStream).RecvMsg+0x112					/go/pkg/mod/google.golang.org/[email protected]/stream.go:926
#	0x3d9bb95	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doWrite.func5+0x2b5	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/region_job.go:354
#	0x3d999f0	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doWrite+0x1890	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/region_job.go:389
#	0x3d98065	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).writeToTiKV+0x25	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/region_job.go:189
#	0x3d8942a	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).executeJob+0xaa	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/local.go:1435
#	0x3d88f46	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).startWorker+0x1c6	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/local.go:1344
#	0x3d8be0d	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doImport.func5+0x2d	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/local.go:1677
#	0x3ac2a95	golang.org/x/sync/errgroup.(*Group).Go.func1+0x55

Normally, it should quit when the related context is canceled.

func (s *Stream) waitOnHeader() {
	if s.headerChan == nil {
		// On the server headerChan is always nil since a stream originates
		// only after having received headers.
		return
	}
	select {
	case <-s.ctx.Done():
		// Close the stream to prevent headers/trailers from changing after
		// this function returns.
		s.ct.CloseStream(s, ContextErr(s.ctx.Err()))
		// headerChan could possibly not be closed yet if closeStream raced
		// with operateHeaders; wait until it is closed explicitly here.
		<-s.headerChan
	case <-s.headerChan:
	}
}

Finally I found that the context passed to the local backend is wrong. It should be "task context" instead of "manager context", because the latter will only be canceled when the TiDB process exits.

As for the problem of getting stuck, there should be other solutions: #48352.

What is changed and how it works?

  • For scheduler methods such as Init(), Run(), Pause(), etc., we pass the task context.
  • For interactions with the task table, we pass the manager context.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 7, 2023
Copy link

tiprow bot commented Nov 7, 2023

Hi @tangenta. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 7, 2023
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #48343 (044dfab) into master (662528d) will increase coverage by 1.4455%.
Report is 4 commits behind head on master.
The diff coverage is 93.3333%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #48343        +/-   ##
================================================
+ Coverage   71.3789%   72.8244%   +1.4455%     
================================================
  Files          1402       1425        +23     
  Lines        406735     413129      +6394     
================================================
+ Hits         290323     300859     +10536     
+ Misses        96491      93354      -3137     
+ Partials      19921      18916      -1005     
Flag Coverage Δ
integration 43.4756% <0.0000%> (?)
unit 71.3974% <93.3333%> (+0.0185%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9874% <ø> (ø)
parser ∅ <ø> (∅)
br 48.6843% <ø> (-4.2287%) ⬇️

@ywqzzy
Copy link
Contributor

ywqzzy commented Nov 7, 2023

/retest

Copy link

tiprow bot commented Nov 7, 2023

@ywqzzy: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tangenta
Copy link
Contributor Author

tangenta commented Nov 7, 2023

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Nov 7, 2023
@ywqzzy
Copy link
Contributor

ywqzzy commented Nov 7, 2023

/retest

@ywqzzy
Copy link
Contributor

ywqzzy commented Nov 7, 2023

/retest

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 7, 2023
Copy link

ti-chi-bot bot commented Nov 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wjhuang2016, ywqzzy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 7, 2023
Copy link

ti-chi-bot bot commented Nov 7, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-07 10:20:22.132579172 +0000 UTC m=+3553219.719689301: ☑️ agreed by ywqzzy.
  • 2023-11-07 10:27:35.753494134 +0000 UTC m=+3553653.340604278: ☑️ agreed by wjhuang2016.

@ti-chi-bot ti-chi-bot bot merged commit 4e49859 into pingcap:master Nov 7, 2023
14 of 16 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #48369.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add index status was always running and can not be cancelled due to dead lock
4 participants