Skip to content

tracing: add option to choice mutation policy for RequestID extension#16770

Merged
mattklein123 merged 11 commits intoenvoyproxy:mainfrom
Shikugawa:delegate-sampling
Jul 8, 2021
Merged

tracing: add option to choice mutation policy for RequestID extension#16770
mattklein123 merged 11 commits intoenvoyproxy:mainfrom
Shikugawa:delegate-sampling

Conversation

@Shikugawa
Copy link
Copy Markdown
Member

@Shikugawa Shikugawa commented Jun 2, 2021

Signed-off-by: Shikugawa rei@tetrate.io

Commit Message: Close #16530. In the previous implementation, the sampling decision was determined by the UUID and additional headers. In this state, without the additional headers, the sampling decision depends entirely on the UUID. In this case, in an environment where Envoy is mixed with other sidecars that have different request ID generation procedures, Envoy receiving requests from other sidecars will always send a span with invalid sampling. This behavior is considered to be unnatural as the default behavior. Therefore, in this PR, we provide the mutation_policy option for Request ID extension. When this option is configured as SKIP, it will always skip the request ID mutation and execute request ID (currently UUID) independent sampling decision algorithm and enable sampling of the span by default. With this flag in place, the sampling is completely determined by the additional header, which makes the behavior intuitive and independent of the sidecar type.
Additional Description:
Risk Level: Low
Testing: Unit
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

cc @basvanbeek

Signed-off-by: Shikugawa <rei@tetrate.io>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #16770 was opened by Shikugawa.

see: more, trace.

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
Added an API comment.

Also would suggest adding more info to the doc.

Shikugawa added 2 commits June 8, 2021 21:21
Signed-off-by: Shikugawa <rei@tetrate.io>
@Shikugawa Shikugawa changed the title tracing: delegate sampling decision logic to tracing provider tracing: add option to bypass sampling decision by x-request-id Jun 8, 2021
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also update docs/root/intro/arch_overview/observability/tracing.rst?

Comment on lines +197 to +199
// Trace reason is decided by request_id by default. If true, these sampling strategy decisions
// will be decided by only tracing provider and :ref:`x-request-id<config_http_conn_man_headers_x-request-id>`
// never be modified.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand this correctly, but please let me know if I'm wrong here:

Suggested change
// Trace reason is decided by request_id by default. If true, these sampling strategy decisions
// will be decided by only tracing provider and :ref:`x-request-id<config_http_conn_man_headers_x-request-id>`
// never be modified.
// By default, tracing requires the :ref:`x-request-id <config_http_conn_man_headers_x-request-id>` header to be set, and modifies its value.
// If true, the request will be traced, and *x-request-id* will not be modified.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add that the request will bypass x-request-id based tracing decision.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an enum? Do we want to maybe support other modes of operation in the future?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What operation will be expected in the future?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was interested in your perspective on this.

Copy link
Copy Markdown
Member Author

@Shikugawa Shikugawa Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch It can be changed as follows.

enum TraceRequestIdSampleDecision {
  None,
  Bypass, // It acts as `bypass_sampling_with_request_id` is true.
}

WDYT? cc. @lizan

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, with NONE, BYPASS. If you have high confidence we would never have some other trace sample strategy with request ID I'm fine with the current bool, just would like to make sure this is deliberative.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Shikugawa <rei@tetrate.io>
@Shikugawa Shikugawa force-pushed the delegate-sampling branch from ea8eba5 to adf26d7 Compare June 17, 2021 07:37
@Shikugawa
Copy link
Copy Markdown
Member Author

@htuch @mattklein123 I'm considering that we might add a new tracing stat forced_sampling that means not sampled randomly and new tracing reason Tracing::Reason::SamplingForced. WDYT?

@htuch
Copy link
Copy Markdown
Member

htuch commented Jun 18, 2021

There are already a bunch of different forced Tracing::Reason, e.g. ServiceForced, ClientForced; can you make it clearer what this is in the name?

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@htuch
Copy link
Copy Markdown
Member

htuch commented Jun 18, 2021

Over to @mattklein123 for tracing implementation review.

@alyssawilk
Copy link
Copy Markdown
Contributor

@mattklein123 is out for the week. I can do a pass while he's out but looking this over I agree with Adi this should include an update to docs/root/intro/arch_overview/observability/tracing.rst as I'm not quite sure from the API documentation what this option provides. Mind adding that for now and also seeing if the coverage failure is real?

/wait

Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I fixed a similar issue recently with users requesting to not pack trace status into the UUID. See pack_trace_reason here: https://github.com/envoyproxy/envoy/blob/main/api/envoy/extensions/request_id/uuid/v3/uuid.proto

IMO it would be better to add a new configuration option for the default request ID provider to not use request ID for sampling. I think this would be a bit cleaner as it would be colocated with the other related option and also avoid adding new configuration to HCM. WDYT?

/wait

@Shikugawa
Copy link
Copy Markdown
Member Author

IMO it would be better to add a new configuration option for the default request ID provider to not use request ID for sampling. I think this would be a bit cleaner as it would be colocated with the other related option and also avoid adding new configuration to HCM. WDYT?

SGTM. Let me change.

@Shikugawa Shikugawa changed the title tracing: add option to bypass sampling decision by x-request-id tracing: add option to choice mutation policy for RequestID extension Jun 29, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. Starting with an API comment.

/wait

@htuch htuch removed their assignment Jul 1, 2021
Signed-off-by: Shikugawa <rei@tetrate.io>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with small comments.

/wait

Signed-off-by: Shikugawa <rei@tetrate.io>
lizan
lizan previously approved these changes Jul 2, 2021
Copy link
Copy Markdown
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer to @mattklein123

@repokitteh-read-only repokitteh-read-only bot removed the api label Jul 2, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Signed-off-by: Shikugawa <rei@tetrate.io>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

// Set whether to use :ref:`x-request-id<config_http_conn_man_headers_x-request-id>` or not.
// This defaults to true.
// Set whether to use :ref:`x-request-id<config_http_conn_man_headers_x-request-id>` for sampling or not.
// This defaults to true. See the :ref:`context propagation <arch_overview_tracing_context_propagation>`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This defaults to true. See the :ref:`context propagation <arch_overview_tracing_context_propagation>`
// This defaults to true. See the :ref:`context propagation <arch_overview_tracing_context_propagation>` overview

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Shikugawa <rei@tetrate.io>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@repokitteh-read-only repokitteh-read-only bot removed the api label Jul 8, 2021
@mattklein123 mattklein123 merged commit 9170b3e into envoyproxy:main Jul 8, 2021
@Shikugawa Shikugawa deleted the delegate-sampling branch July 9, 2021 01:34
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow trace instrumentation to sample requests decoupled from x-request-id

6 participants