Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min-resolved-ts: check dc label #944

Merged
merged 6 commits into from
Aug 17, 2023
Merged

Conversation

HuSharp
Copy link
Member

@HuSharp HuSharp commented Aug 15, 2023

Background

  1. api interface
    Nowadays we have 2 api interfaces for obtaining min resolved ts

    • /pd/api/v1/min-resolved-ts: obtain cluster's min resolved ts
    • /pd/api/v1/min-resolved-ts/{store_id}: obtain each store's min resolved ts

    For client-go's updateSafeTS, we call approach II for each store, which is not necessary to call API so many times
    We can use the approach I to get cluster's min resolved ts when @@txn_scope is global

  2. avoid using kv request reason:
    if TiDB failed to connect a TiKV store to fetch the latest resolved_ts, it will remain the out-of-dated value until the TiKV comes back, though PD will mark the store as unavailable and keep updating the minimal resolved_ts with other TiKV stores. We may need to keep these two behaviors the same or unify them.

Summary pr

judge @@txn_scope value which from the config:

  • when find is global, we can get cluster-level resolved ts to avoid using kv grpc.
  • when find is not global, we need to use kv grpc.

Signed-off-by: husharp <[email protected]>
@HuSharp HuSharp marked this pull request as ready for review August 15, 2023 07:58
@HuSharp HuSharp force-pushed the check_dc_label branch 2 times, most recently from 6352532 to 8da079b Compare August 15, 2023 08:08
Signed-off-by: husharp <[email protected]>
@HuSharp
Copy link
Member Author

HuSharp commented Aug 16, 2023

@nolouch @JmPotato @lhy1024 PTAL, thx!

Copy link

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci failed

@HuSharp
Copy link
Member Author

HuSharp commented Aug 16, 2023

ci failed

seems like is a non-related flaky fai...

we can check another pr meeting it as well. https://github.com/tikv/client-go/actions/runs/5873468298/job/15926773544?pr=947

tikv/kv.go Outdated
@@ -601,18 +592,40 @@ func (s *KVStore) updateSafeTS(ctx context.Context) {
}(ctx, wg, storeID, storeAddr)
}

txnScopeMap := make(map[string][]uint64)
if clusterTS != 0 && err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its better improve the readable like:

clusterMinSafeTS, txnScopeMap, err := s.buildTxnScopeMap(ctx, stores)
// fast path get cluster scope MinSafeTs
if clusterMinSafeTS!= 0 && err == nil {
     s.minSafeTS.Store(oracle.GlobalTxnScope, clusterMinSafeTS)
     return
}

// Get SafeTS from TiKV by Store
for _, store := range stores {
...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! fixed.

Signed-off-by: husharp <[email protected]>
Signed-off-by: husharp <[email protected]>
@you06
Copy link
Contributor

you06 commented Aug 16, 2023

I check out the flacky test, can you help change L255 of the test into atomic.StoreInt64(&reloadRegionInterval, originReloadRegionInterval) in this PR?

reloadRegionInterval = originReloadRegionInterval

I will update other branches.

Signed-off-by: husharp <[email protected]>
@HuSharp
Copy link
Member Author

HuSharp commented Aug 16, 2023

I check out the flacky test, can you help change L255 of the test into atomic.StoreInt64(&reloadRegionInterval, originReloadRegionInterval) in this PR?

reloadRegionInterval = originReloadRegionInterval

I will update other branches.

fixed

tikv/kv.go Outdated
// - if stores label are global, return get cluster min resolved ts from pd.
// - if contains dc label store, return try to get it from TiKV.
func (s *KVStore) buildTxnScopeMap(ctx context.Context, stores []*Store) (safeTS uint64, txnScopeMap map[string][]uint64, err error) {
isGlobal := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need judge from the tikv labels, I think should judge from the config like the usage in tidb:
https://github.com/pingcap/tidb/blob/88225787f3c3da18323415c72208bce20876009a/expression/builtin_time.go#L6617-L6632

Copy link
Member Author

@HuSharp HuSharp Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

Signed-off-by: husharp <[email protected]>
if s.pdHttpClient != nil && isGlobal {
clusterMinSafeTS, err := s.pdHttpClient.GetClusterMinResolvedTS(ctx)
if err != nil {
logutil.BgLogger().Debug("get cluster-level min resolved timestamp from PD failed", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can warn it, it is called every 2 secs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clusters upgraded from v6.0.0~v6.2.0 will meet this error, does it make sense?
https://docs.pingcap.com/tidb/stable/pd-configuration-file#min-resolved-ts-persistence-interval-new-in-v600

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@iosmanthus iosmanthus merged commit e5fe177 into tikv:tidb-6.5 Aug 17, 2023
15 of 17 checks passed
@HuSharp HuSharp deleted the check_dc_label branch September 1, 2023 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants