Bug 1874584: UPSTREAM: <carry>: retry etcd errors #327

tkashem · 2020-09-01T17:16:37Z

No description provided.

tkashem · 2020-09-01T19:37:10Z

/retest

tkashem · 2020-09-01T23:03:40Z

/retest

tkashem · 2020-09-02T02:52:19Z

/retest

p0lyn0mial · 2020-09-02T14:30:22Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/etcd3retry/retry_etcdclient.go

if the err could be read as rpctypes.Error(err) (https://github.com/etcd-io/etcd/blob/master/etcdserver/api/v3rpc/rpctypes/error.go#L230) then according to https://github.com/etcd-io/etcd/blob/master/clientv3/retry.go#L53 all immutable errors that map to codes.Unavailable are safe to retry

tkashem · 2020-09-02T15:07:39Z

/retest

tkashem · 2020-09-02T16:32:07Z

/retest

tkashem · 2020-09-02T17:52:50Z

/retest

tkashem · 2020-09-02T20:51:03Z

/retest

tkashem · 2020-09-04T02:46:43Z

/retest

tkashem · 2020-09-04T13:24:57Z

/retest

tkashem · 2020-09-04T15:27:05Z

/retest

tkashem · 2020-09-06T14:20:51Z

/retest

This commit renews openshift#327 What has changed compared to the original PR is: - The retryClient interface has been adapted to storage.Interface. - The isRetriableEtcdError method has been completely changed; it seems that previously the error we wanted to retry was not being retried. Even the unit tests were failing. Overall, I still think this is not the correct fix. The proper fix should be added to the etcd client. UPSTREAM: <carry>: retry etcd Unavailable errors This is the second commit for the retry logic. This commit adds unit tests and slightly improves the logging. During a rebase squash with the previous one. UPSTREAM: <carry>: retry_etcdclient: expose retry logic functionality during rebase merge with: UPSTREAM: <carry>: retry etcd Unavailable errors

This commit renews openshift#327 What has changed compared to the original PR is: - The retryClient interface has been adapted to storage.Interface. - The isRetriableEtcdError method has been completely changed; it seems that previously the error we wanted to retry was not being retried. Even the unit tests were failing. Overall, I still think this is not the correct fix. The proper fix should be added to the etcd client. UPSTREAM: <carry>: retry etcd Unavailable errors This is the second commit for the retry logic. This commit adds unit tests and slightly improves the logging. During a rebase squash with the previous one. UPSTREAM: <carry>: retry_etcdclient: expose retry logic functionality during rebase merge with: UPSTREAM: <carry>: retry etcd Unavailable errors UPSTREAM: <carry>: Don't retry storage calls with side effects. The existing patch retried any etcd error returned from storage with the code "Unavailable". Writes can only be safely retried if the client can be absolutely sure that the initial attempt ended before persisting any changes. The "Unavailable" code includes errors like "timed out" that can't be safely retried for writes.

This commit renews openshift#327 What has changed compared to the original PR is: - The retryClient interface has been adapted to storage.Interface. - The isRetriableEtcdError method has been completely changed; it seems that previously the error we wanted to retry was not being retried. Even the unit tests were failing. Overall, I still think this is not the correct fix. The proper fix should be added to the etcd client. UPSTREAM: <carry>: retry etcd Unavailable errors This is the second commit for the retry logic. This commit adds unit tests and slightly improves the logging. During a rebase squash with the previous one. UPSTREAM: <carry>: retry_etcdclient: expose retry logic functionality during rebase merge with: UPSTREAM: <carry>: retry etcd Unavailable errors UPSTREAM: <carry>: Don't retry storage calls with side effects. The existing patch retried any etcd error returned from storage with the code "Unavailable". Writes can only be safely retried if the client can be absolutely sure that the initial attempt ended before persisting any changes. The "Unavailable" code includes errors like "timed out" that can't be safely retried for writes. UPSTREAM: <carry>: Add retries for GetCurrentResourceVersion. UPSTREAM: <carry>: squash: storage interface underlying the retryClient has changed Removed methods: - Count Added methods: - Stats - SetKeysFunc - CompactRevision

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2020

openshift-ci-robot requested review from deads2k and smarterclayton September 1, 2020 17:16

tkashem force-pushed the etcd-retry branch 2 times, most recently from 1d09507 to bb7a0d3 Compare September 1, 2020 21:54

tkashem force-pushed the etcd-retry branch from bb7a0d3 to 015ad97 Compare September 1, 2020 23:06

tkashem mentioned this pull request Sep 2, 2020

UPSTREAM: <carry>: retry etcd errors #322

Closed

tkashem force-pushed the etcd-retry branch 2 times, most recently from 6fd0f1f to 8a7c1ce Compare September 2, 2020 04:28

p0lyn0mial reviewed Sep 2, 2020

View reviewed changes

tkashem force-pushed the etcd-retry branch 2 times, most recently from 9e18ee5 to e11df03 Compare September 2, 2020 19:03

tkashem force-pushed the etcd-retry branch 2 times, most recently from de5fe83 to ab2788b Compare September 3, 2020 21:12

tkashem force-pushed the etcd-retry branch from ab2788b to fa3bac1 Compare September 4, 2020 13:40

tkashem force-pushed the etcd-retry branch from fa3bac1 to 337ef58 Compare September 4, 2020 15:37

tkashem force-pushed the etcd-retry branch 3 times, most recently from f215018 to 243d8fd Compare September 9, 2020 21:54

Bug 1874584: UPSTREAM: <carry>: retry etcd errors #327

Bug 1874584: UPSTREAM: <carry>: retry etcd errors #327

Uh oh!

Conversation

tkashem commented Sep 1, 2020

Uh oh!

tkashem commented Sep 1, 2020

Uh oh!

tkashem commented Sep 1, 2020

Uh oh!

tkashem commented Sep 2, 2020

Uh oh!

p0lyn0mial Sep 2, 2020

Choose a reason for hiding this comment

Uh oh!

tkashem commented Sep 2, 2020

Uh oh!

tkashem commented Sep 2, 2020

Uh oh!

tkashem commented Sep 2, 2020

Uh oh!

tkashem commented Sep 2, 2020

Uh oh!

tkashem commented Sep 4, 2020

Uh oh!

tkashem commented Sep 4, 2020

Uh oh!

tkashem commented Sep 4, 2020

Uh oh!

tkashem commented Sep 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants