Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: tweak metrics to support service-level benchmarks #908

Merged
merged 3 commits into from
Aug 9, 2023

Conversation

ianbotsf
Copy link
Contributor

@ianbotsf ianbotsf commented Jul 28, 2023

Issue #

Closes awslabs/aws-sdk-kotlin#968

Description of changes

Various changes necessary to support service-level benchmarks including:

  • Increase precision on durations recorded from millisecond-level to nanosecond-level
  • Record time-to-first-byte metric in OkHttp event listener
    • Pass TTFB value out of event listener via ExecutionContext
  • Change the implementation of smithy.client.attempt_duration to measure attempt duration instead of merely transmission duration

Companion PR: awslabs/aws-sdk-kotlin#1006

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


override fun requestFailed(call: Call, ioe: IOException) = trace(ioe) { "request failed" }

override fun requestHeadersStart(call: Call) = trace { "sending request headers" }

override fun requestHeadersEnd(call: Call, request: Request) = trace { "finished sending request headers" }
override fun requestHeadersEnd(call: Call, request: Request) {
if (request.body == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: Shouldn't this be after sending the body? If there is no body this is fine, we should be able to detect that though and either set it here or in responseBodyEnd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about this comment, requestTimeEnd is set in both places: requestHeadersEnd if there is no body, and requestBodyEnd if there is a body, so I think it's correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, I just missed it in first review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes what confused me is that this entire file has the events out-of-order (i.e requestHeadersEnd is defined after requestBodyStart even though it occurs before in the chain of events). It could be worth reordering these to match the order shown in the OkHttp docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that would make it easier to parse this file. I can submit a mini-PR after this one gets merged.


@InternalApi
public object EngineAttributes {
public val TimeToFirstByte: AttributeKey<Duration> = AttributeKey("TimeToFirstByte")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: docs

val attemptDuration = attemptStart?.elapsedNow() ?: return
metrics.rpcAttemptDuration.recordSeconds(attemptDuration, perRpcAttributes, metrics.provider.contextManager.current())

context.executionContext.getOrNull(EngineAttributes.TimeToFirstByte)?.let { ttfb ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should we clear this after reading it? e.g. what if we retry and it fails before recording TTFB? We'll be using the first TTFB value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right we may have an issue if we fail to record TTFB on a retry. We certainly can clear it after reading it, although this makes me wonder if there's a new set of execution context which is scoped to a single attempt and should be cleared when an attempt is ended.

What would we think about a new optional scope for attributes:

interface MutableAttributes : Attributes {
    // ...existing set, remove, and computeIfAbsent methods...

    operator fun <T : Any> set(scope: AttributeScope, key: AttributeKey<T>, value: T)
    fun <T : Any> removeAll(scope: AttributeScope)
}

Get operations would find an attribute regardless of its scope but removeAll could be used to clear all attributes with a specific scope (e.g., an attempt).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe. Seems like it may complicate some things but wouldn't know until I tried. Also raises questions like what attributes are visible? If I set an attribute in one scope but not another are they visible to other scopes? Are they independent (i.e. can they shadow each other)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is likely too complicated without a deeper design. I'll just remove it upon reading.

@@ -94,6 +96,13 @@ internal class OperationTelemetryInterceptor(
if (attempts > 1) {
metrics.rpcRetryCount.add(1L, perRpcAttributes, metrics.provider.contextManager.current())
}

val attemptDuration = attemptStart?.elapsedNow() ?: return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the implementation of smithy.client.attempt_duration to measure attempt duration instead of merely transmission duration

This now includes signing and other things right? The current description of the metric was accurate for what it measured: The time it takes to connect to the service, send the request, and receive the HTTP status code and headers from the response for an operation. If we are updating the metric definition lets update the description to match.

Can you first explain why this is needed/better? Just want to understand the motivation for the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this now includes identity resolution, endpoint resolution, and signing. The inclusion of EP resolution and signing will likely have minimal impact to the metric but identity resolution is a potentially-expensive operation since it may involve local IO or calls to remote services. Given that the intent is to measure the SDK-added overhead and that identity resolution, EP resolution, and signing are per-attempt, we'd need to know the duration of the entire attempt.

I'll update the metric documentation.


override fun requestFailed(call: Call, ioe: IOException) = trace(ioe) { "request failed" }

override fun requestHeadersStart(call: Call) = trace { "sending request headers" }

override fun requestHeadersEnd(call: Call, request: Request) = trace { "finished sending request headers" }
override fun requestHeadersEnd(call: Call, request: Request) {
if (request.body == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about this comment, requestTimeEnd is set in both places: requestHeadersEnd if there is no body, and requestBodyEnd if there is a body, so I think it's correct?

@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 8, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ianbotsf ianbotsf merged commit 1c44d5d into main Aug 9, 2023
@ianbotsf ianbotsf deleted the service-benchmarks branch August 9, 2023 17:03
aajtodd pushed a commit that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark Baseline
3 participants