Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/opencensus: Fix flaky metrics test #6372

Merged
merged 4 commits into from
Jun 20, 2023

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Jun 15, 2023

Fixes #6231.

This adds a sync point between Unary and Streaming RPCs recording completed RPC's Server Side and the test. The test simply waits for the RPC to finish client side, but stats.End is recorded in a defer for the Unary and Streaming RPC case after status is written to the wire. Thus, previously there was no sync point between the test and this metric being recorded. Sync at the view global level, as that is synced with exporter by Unregistering views, and will stop recording metrics after, thus has to wait for the two row emissions for Unary and Streaming RPCs at the view global level, not at the exporter level.

Verified passes over 10k runs on Forge. Previously was failing 18/10k times on Forge.

RELEASE NOTES: N/A

@zasweq zasweq requested a review from arvindbr8 June 15, 2023 00:37
@zasweq zasweq added this to the 1.57 Release milestone Jun 15, 2023
Copy link
Contributor

@tobotg tobotg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zasweq
lgtm (modulo one minor change).

// appear for server completed RPC's view (by checking for length of rows to be
// 2). Returns an error if both the Unary and Streaming metric not found within
// the passed context's timeout.
func waitForServerCompletedRPCs(ctx context.Context, fe *fakeExporter) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function argument: you may want to remove fe *fakeExporter from waitForServerCompletedRPCs function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great catch. Deleted.

Copy link
Member

@arvindbr8 arvindbr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

if err != nil {
continue
}
if len(rows) == 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a len check here, should we implicitly check for 1 Unary and 1 Streaming RPC metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly*. But sure.

stats/opencensus/e2e_test.go Show resolved Hide resolved
stats/opencensus/e2e_test.go Show resolved Hide resolved
@arvindbr8 arvindbr8 assigned zasweq and unassigned arvindbr8 Jun 15, 2023
@zasweq zasweq assigned arvindbr8 and unassigned zasweq Jun 15, 2023
Comment on lines 249 to 257
m := make(map[string]bool)
for _, row := range rows {
for _, tag := range row.Tags {
m[tag.Value] = true
}
}
if m["grpc.testing.TestService/UnaryCall"] && m["grpc.testing.TestService/FullDuplexCall"] {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zasweq, but please note that this new implementation doesn't enforce anymore your initial thought of having the expected metrics in 2 different rows. I don't know if that makes a difference.

May I also suggest to return fast if the tags are found without looping through the entire arrays (well, the looping will not be expensive since the test data set here is small), with something similar to

	unaryMetricFound := false
	streamingMetricFound := false
	for _, row := range rows {
		for _, tag := range row.Tags {
			if tag.Value == "grpc.testing.TestService/UnaryCall" {
				unaryMetricFound = true
			} else if tag.Value == "grpc.testing.TestService/FullDuplexCall" {
				streamingMetricFound = true
			}
			if unaryMetricFound && streamingMetricFound {
				return nil
			}
		}
	}

or (if the metrics are expected to be in 2 different rows)

	unaryMetricFound := false
	streamingMetricFound := false
	for _, row := range rows {
		for _, tag := range row.Tags {
			if tag.Value == "grpc.testing.TestService/UnaryCall" {
				unaryMetricFound = true
				break
			} else if tag.Value == "grpc.testing.TestService/FullDuplexCall" {
				streamingMetricFound = true
				break
			}
		}
		if unaryMetricFound && streamingMetricFound {
			return nil
		}
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is verified right after by the want declared having only the two rows declared separately and checked in cmp.Diff. I went ahead and switched to your second solution though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion.

@@ -237,6 +237,36 @@ func distributionDataLatencyCount(vi *viewInformation, countWant int64, wantTags
return nil
}

// waitForServerCompletedRPCs waits until both Unary and Streaming metric rows
// appear, in two seperate rows, for server completed RPC's view. Returns an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// appear, in two seperate rows, for server completed RPC's view. Returns an
// appear, in two separate rows, for server completed RPC's view. Returns an

stats/opencensus/e2e_test.go Show resolved Hide resolved
@arvindbr8
Copy link
Member

arvindbr8 commented Jun 16, 2023

Minor typo. Should fix the test failure below. LGTM otherwise

@arvindbr8 arvindbr8 assigned zasweq and unassigned arvindbr8 Jun 16, 2023
@zasweq zasweq merged commit dd350d0 into grpc:master Jun 20, 2023
1 check passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: AllMetricsOneFunction
3 participants