-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jepsen: Sets test flaky #17491
Comments
The This appears to have started on Aug 1 (we don't have much history to work with because the tests had other issues prior to july 25, but it had several green days in a row before it started failing frequently on aug 1). That would put the culprit somewhere in this commit range. |
That commit range is a red herring. The failure is non-deterministic and was still present during the four consecutive green runs we had. That means it's been present since before the most recent fixes to the tests, so it's going to be mixed in with other failures and bisecting will not be very efficient in tracking it down. |
The problem is here: cockroach/pkg/kv/dist_sender.go Lines 1180 to 1186 in cdcf1b0
We assume that GRPC's (Why do we see Treating GPRC |
This error code is used for fail-fast errors (which can be retried unambiguously), but it is also used in other cases (such as a server draining) in which we cannot assume that the previous attempt was not completed. (It's unclear whether this assumption was once true and changed or if it's always been incorrect. The specific source of ambiguous Unavailable errors we're seeing is grpc/grpc-go#1147) This is expected to increase prevalence of AmbiguousResultErrors; this will be fixed in a follow-up change. Fixes cockroachdb#17491
This error code is used for fail-fast errors (which can be retried unambiguously), but it is also used in other cases (such as a server draining) in which we cannot assume that the previous attempt was not completed. (It's unclear whether this assumption was once true and changed or if it's always been incorrect. The specific source of ambiguous Unavailable errors we're seeing is grpc/grpc-go#1147) This is expected to increase prevalence of AmbiguousResultErrors; this will be fixed in a follow-up change. Fixes cockroachdb#17491
This error code is used for fail-fast errors (which can be retried unambiguously), but it is also used in other cases (such as a server draining) in which we cannot assume that the previous attempt was not completed. (It's unclear whether this assumption was once true and changed or if it's always been incorrect. The specific source of ambiguous Unavailable errors we're seeing is grpc/grpc-go#1147) This is expected to increase prevalence of AmbiguousResultErrors; this will be fixed in a follow-up change. Fixes cockroachdb#17491
The Jepsen
sets
test is failing about half the time (under thestart-kill-2
andmajority-ring+start-kill-2
nemeses). The final output looks like this, which appears to be a real inconsistency:The text was updated successfully, but these errors were encountered: