Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup failed when cluster is scaling #346

Closed
pingyu opened this issue Jun 18, 2023 · 0 comments · Fixed by #347
Closed

Backup failed when cluster is scaling #346

pingyu opened this issue Jun 18, 2023 · 0 comments · Fixed by #347
Labels
type/bug Something isn't working

Comments

@pingyu
Copy link
Collaborator

pingyu commented Jun 18, 2023

Bug Report

1. Describe the bug

Backup failed when cluster is scaling, see the following logs:

[2023/06/17 08:44:19.303 +08:00] [WARN] [tikv_br.go:184] ["Backup log"] [content="[2023/06/17 00:44:19.196 +00:00] [INFO] [info.go:37] [\"Welcome to Backup & Restore (BR)\"] [release-version=] [git-hash=] [git-branch=] [go-version=go1.18] [utc-build-time=\"2023-06-12 02:17:54\"] [race-enabled=false]\r\n[2023/06/17 00:44:19.197 +00:00] [INFO] [common.go:325] [arguments] [__command=\"tikv-br backup raw\"] [ca=] [cert=] [check-requirements=false] [checksum=false] [dst-api-version=V2] [end=] [format=raw] [key=] [pd=\"[http://v2tc-pd.endless-rawkv-br-tps-1808627-1-192:2379]\"] [ratelimit=64] [s3.endpoint=http://minio-peer.endless-rawkv-br-tps-1808627-1-192.svc:9000] [start=] [storage=s3://tmp/tikv-br-test/v2_br_scale/d366813c-491d-40e3-97ff-5cd29cd33b5d]\r\n[2023/06/17 00:44:19.197 +00:00] [INFO] [conn.go:244] [\"new mgr\"] [pdAddrs=v2tc-pd.endless-rawkv-br-tps-1808627-1-192:2379]\r\n[2023/06/17 00:44:19.200 +00:00] [INFO] [client.go:392] [\"[pd] create pd client with endpoints\"] [pd-address=\"[http://v2tc-pd.endless-rawkv-br-tps-1808627-1-192:2379]\"]\r\n[2023/06/17 00:44:19.204 +00:00] [INFO] [base_client.go:332] [\"[pd] update member urls\"] [old-urls=\"[http://v2tc-pd.endless-rawkv-br-tps-1808627-1-192:2379]\"] [new-urls=\"[http://v2tc-pd-0.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-1.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-2.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-3.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-4.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379]\"]\r\n[2023/06/17 00:44:19.204 +00:00] [INFO] [base_client.go:350] [\"[pd] switch leader\"] [new-leader=http://v2tc-pd-4.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379] [old-leader=]\r\n[2023/06/17 00:44:19.204 +00:00] [INFO] [base_client.go:105] [\"[pd] init cluster id\"] [cluster-id=7245425977568034033]\r\n[2023/06/17 00:44:19.204 +00:00] [INFO] [client.go:687] [\"[pd] tso dispatcher created\"] [dc-location=global]\r\n[2023/06/17 00:44:19.207 +00:00] [INFO] [conn.go:221] [\"checked alive KV stores\"] [aliveStores=5] [totalStores=5]\r\n[2023/06/17 00:44:19.207 +00:00] [INFO] [client.go:392] [\"[pd] create pd client with endpoints\"] [pd-address=\"[v2tc-pd.endless-rawkv-br-tps-1808627-1-192:2379]\"]\r\n[2023/06/17 00:44:19.210 +00:00] [INFO] [base_client.go:332] [\"[pd] update member urls\"] [old-urls=\"[http://v2tc-pd.endless-rawkv-br-tps-1808627-1-192:2379]\"] [new-urls=\"[http://v2tc-pd-0.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-1.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-2.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-3.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379,http://v2tc-pd-4.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379]\"]\r\n[2023/06/17 00:44:19.211 +00:00] [INFO] [base_client.go:350] [\"[pd] switch leader\"] [new-leader=http://v2tc-pd-4.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379] [old-leader=]\r\n[2023/06/17 00:44:19.211 +00:00] [INFO] [base_client.go:105] [\"[pd] init cluster id\"] [cluster-id=7245425977568034033]\r\n[2023/06/17 00:44:19.211 +00:00] [INFO] [client.go:687] [\"[pd] tso dispatcher created\"] [dc-location=global]\r\n[2023/06/17 00:44:19.213 +00:00] [INFO] [client.go:80] [\"new backup client\"]\r\n[2023/06/17 00:44:19.227 +00:00] [ERROR] [client.go:847] [\"[pd] getTS error\"] [dc-location=global] [stream-addr=http://v2tc-pd-4.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379] [error=\"[PD:client:ErrClientGetTSO]rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster\"] [stack=\"github.com/tikv/pd/client.(*client).handleDispatcher
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:847\"]\r\n[2023/06/17 00:44:19.228 +00:00] [INFO] [base_client.go:275] [\"[pd] cannot update member from this address\"] [address=http://v2tc-pd-0.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379] [error=\"[PD:grpc:ErrGRPCDial]context canceled: context canceled\"] [errorVerbose=\"[PD:grpc:ErrGRPCDial]context canceled: context canceled
github.com/pingcap/errors.AddStack
	/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStackByCause
	/go/pkg/mod/github.com/pingcap/[email protected]/normalize.go:307
github.com/tikv/pd/client/grpcutil.GetClientConn
	/go/pkg/mod/github.com/tikv/pd/[email protected]/grpcutil/grpcutil.go:56
github.com/tikv/pd/client.(*baseClient).getOrCreateGRPCConn
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:417
github.com/tikv/pd/client.(*baseClient).getMembers
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:303
github.com/tikv/pd/client.(*baseClient).updateMember
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:262
github.com/tikv/pd/client.(*baseClient).memberLoop
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:143
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571\"]\r\n[2023/06/17 00:44:19.228 +00:00] [ERROR] [base_client.go:144] [\"[pd] failed updateMember\"] [error=\"[PD:grpc:ErrGRPCDial]context canceled: context canceled\"] [errorVerbose=\"[PD:grpc:ErrGRPCDial]context canceled: context canceled
github.com/pingcap/errors.AddStack
	/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStackByCause
	/go/pkg/mod/github.com/pingcap/[email protected]/normalize.go:307
github.com/tikv/pd/client/grpcutil.GetClientConn
	/go/pkg/mod/github.com/tikv/pd/[email protected]/grpcutil/grpcutil.go:56
github.com/tikv/pd/client.(*baseClient).getOrCreateGRPCConn
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:417
github.com/tikv/pd/client.(*baseClient).getMembers
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:303
github.com/tikv/pd/client.(*baseClient).updateMember
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:262
github.com/tikv/pd/client.(*baseClient).memberLoop
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:143
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571
github.com/tikv/pd/client.(*baseClient).updateMember
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:280
github.com/tikv/pd/client.(*baseClient).memberLoop
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:143
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571\"] [stack=\"github.com/tikv/pd/client.(*baseClient).memberLoop
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:144\"]\r\n[2023/06/17 00:44:19.228 +00:00] [INFO] [base_client.go:275] [\"[pd] cannot update member from this address\"] [address=http://v2tc-pd-0.v2tc-pd-peer.endless-rawkv-br-tps-1808627-1-192.svc:2379] [error=\"[PD:grpc:ErrGRPCDial]context canceled: context canceled\"] [errorVerbose=\"[PD:grpc:ErrGRPCDial]context canceled: context canceled
github.com/pingcap/errors.AddStack
	/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStackByCause
	/go/pkg/mod/github.com/pingcap/[email protected]/normalize.go:307
github.com/tikv/pd/client/grpcutil.GetClientConn
	/go/pkg/mod/github.com/tikv/pd/[email protected]/grpcutil/grpcutil.go:56
github.com/tikv/pd/client.(*baseClient).getOrCreateGRPCConn
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:417
github.com/tikv/pd/client.(*baseClient).getMembers
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:303
github.com/tikv/pd/client.(*baseClient).updateMember
	/go/pkg/mod/github.com/tikv/pd/[email protected]/base_client.go:262
github.com/tikv/pd/client.(*client).handleDispatcher
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:854
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571\"]\r\n[2023/06/17 00:44:19.228 +00:00] [INFO] [client.go:706] [\"[pd] exit tso dispatcher\"] [dc-location=global]\r\n[2023/06/17 00:44:19.228 +00:00] [INFO] [collector.go:202] [\"units canceled\"] [cancel-unit=0]\r\n[2023/06/17 00:44:19.228 +00:00] [INFO] [collector.go:68] [\"Raw backup failed summary\"] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0]\r\n[2023/06/17 00:44:19.228 +00:00] [ERROR] [backup.go:33] [\"failed to backup raw kv\"] [error=\"rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster\"] [errorVerbose=\"rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster
github.com/tikv/pd/client.(*client).processTSORequests
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1081
github.com/tikv/pd/client.(*client).handleDispatcher
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:837
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571
github.com/tikv/pd/client.(*tsoRequest).Wait
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1297
github.com/tikv/pd/client.(*client).GetTS
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1317
github.com/tikv/migration/br/pkg/backup.(*Client).GetTS
	/go/src/github.com/tikv/migration/br/pkg/backup/client.go:104
github.com/tikv/migration/br/pkg/backup.(*Client).UpdateBRGCSafePoint
	/go/src/github.com/tikv/migration/br/pkg/backup/client.go:142
github.com/tikv/migration/br/pkg/task.RunBackupRaw
	/go/src/github.com/tikv/migration/br/pkg/task/backup_raw.go:139
main.runBackupRawCommand
	/go/src/github.com/tikv/migration/br/cmd/br/backup.go:32
main.newRawBackupCommand.func1
	/go/src/github.com/tikv/migration/br/cmd/br/backup.go:72
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:897
main.main
	/go/src/github.com/tikv/migration/br/cmd/br/main.go:56
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571\"] [stack=\"main.runBackupRawCommand
	/go/src/github.com/tikv/migration/br/cmd/br/backup.go:33
main.newRawBackupCommand.func1
	/go/src/github.com/tikv/migration/br/cmd/br/backup.go:72
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:897
main.main
	/go/src/github.com/tikv/migration/br/cmd/br/main.go:56
runtime.main
	/usr/local/go/src/runtime/proc.go:250\"]\r\n[2023/06/17 00:44:19.229 +00:00] [ERROR] [main.go:58] [\"br failed\"] [error=\"rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster\"] [errorVerbose=\"rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster
github.com/tikv/pd/client.(*client).processTSORequests
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1081
github.com/tikv/pd/client.(*client).handleDispatcher
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:837
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571
github.com/tikv/pd/client.(*tsoRequest).Wait
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1297
github.com/tikv/pd/client.(*client).GetTS
	/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1317
github.com/tikv/migration/br/pkg/backup.(*Client).GetTS
	/go/src/github.com/tikv/migration/br/pkg/backup/client.go:104
github.com/tikv/migration/br/pkg/backup.(*Client).UpdateBRGCSafePoint
	/go/src/github.com/tikv/migration/br/pkg/backup/client.go:142
github.com/tikv/migration/br/pkg/task.RunBackupRaw
	/go/src/github.com/tikv/migration/br/pkg/task/backup_raw.go:139
main.runBackupRawCommand
	/go/src/github.com/tikv/migration/br/cmd/br/backup.go:32
main.newRawBackupCommand.func1
	/go/src/github.com/tikv/migration/br/cmd/br/backup.go:72
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:897
main.main
	/go/src/github.com/tikv/migration/br/cmd/br/main.go:56
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571\"] [stack=\"main.main
	/go/src/github.com/tikv/migration/br/cmd/br/main.go:58
runtime.main
	/usr/local/go/src/runtime/proc.go:250\"]\r\ncat: can't open '\r': No such file or directory\r\n"]

2. Minimal reproduce step (Required)

  1. Start backup.
  2. Scale (in or out) the TiKV cluster.

3. What did you see instead (Required)

Backup succeed.

4. What did you expect to see? (Required)

Backup failed.

5. What is your migration tool and TiKV version? (Required)

  • TiKV: master (0daa38c454c8d518a62f372f5fe679c1e858871e)
  • TiKV CDC: N/A
  • TiKV BR: master (37f5972)
  • TiKV Online Bulk Load: N/A
@pingyu pingyu added the type/bug Something isn't working label Jun 18, 2023
pingyu added a commit that referenced this issue Jun 19, 2023
* improve fault tolerance for TSO

Signed-off-by: Ping Yu <[email protected]>

* fix

Signed-off-by: Ping Yu <[email protected]>

* fix gh

Signed-off-by: Ping Yu <[email protected]>

* add LTS versions

Signed-off-by: Ping Yu <[email protected]>

---------

Signed-off-by: Ping Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant