Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context prop combined #147

Merged
merged 79 commits into from
Feb 6, 2020
Merged

Conversation

mwear
Copy link
Member

@mwear mwear commented Nov 8, 2019

This is a context propagation prototype based on open-telemetry/oteps#66.

Examples

All in one - Example

The example below is an all-in-one example that shows configuration, handling an HTTP request, and making a call to an external service. In the all-in-one example there is deeper nesting than you will typically see, due to the collocation of the code. In the real world, different components would handle different parts of the request in their own separate locations.

# Set global injector / extractors (accepts an array)
OpenTelemetry.propagation.http_injectors = [OpenTelemetry::Trace::Propagation.http_trace_context_injector, OpenTelemetry::CorrelationContext::Propagation.http_injector]
OpenTelemetry.propagation.http_extractors = [OpenTelemetry::Trace::Propagation.rack_http_trace_context_extractor, OpenTelemetry::CorrelationContext::Propagation.rack_http_extractor]
OpenTelemetry.correlations = OpenTelemetry::SDK::CorrelationContext::Manager.new
tracer = OpenTelemetry.tracer_factory.tracer('myapp', '1.0.0')

# One time setup above ^^^^, subsequent usage below:

# Extract context from an inbound request
extracted_context = OpenTelemetry.propagation.extract(rack_env)

# Add correlations to the extracted context
context = OpenTelemetry.correlations.set_value('k1', 'v1', context: extracted_context)

# Set current context
OpenTelemetry::Context.with_current(context) do
  # SpanContext from the extracted context will be an implicit parent for `service-span`
  tracer.in_span('service-span') do
    tracer.in_span('external-req-span') do
      headers = {'Accept' => 'application/json'}
      # Inject context into an outbound request
      headers = OpenTelemetry.propagation.inject(headers)
      Net::HTTP.get(URI('http://site.com'), headers)
    end
  end
end

# Inject and Extract can take explicit context, injectors and extractors

context = OpenTelemetry.propagation.extract(carrier, context: some_context, http_extractors: [ex1, ex2, ex3])

headers = OpenTelemetry.propagation.inject(headers, context: some_context, http_injectors: [in1, in2, in3])

OTEP Examples

The following examples are Ruby versions of the examples in open-telemetry/oteps#66.

Global initialization

OpenTelemetry.propagation.http_injectors = [OpenTelemetry::Trace::Propagation.http_trace_context_injector, OpenTelemetry::CorrelationContext::Propagation.http_injector]
OpenTelemetry.propagation.http_extractors = [OpenTelemetry::Trace::Propagation.rack_http_trace_context_extractor, OpenTelemetry::CorrelationContext::Propagation.rack_http_extractor]
OpenTelemetry.correlations = OpenTelemetry::SDK::CorrelationContext::Manager.new
tracer = OpenTelemetry.tracer_factory.tracer('myapp', '1.0.0')

Inject & Extract with Implicit Context

def handle_request(headers)
  context = OpenTelemetry.propagation.extract(headers)

  OpenTelemetry::Context.with_current(context)
    tracer.in_span('span-name') do
      version = OpenTelemetry.correlations.value('client-version')

      case(version)
      when 'v1.0'
        data = fetch_data_from_service_b
      when 'v2.0'
        data = fetch_data_from_service_c
      end

      send_response(data)
    end
  end
end
def fetch_data_from_service_b
  headers = {}
  headers = OpenTelemetry.propagation.inject(headers)

  # make an http request
  uri = URI('http://site.com')
  http = Net::HTTP.new(uri.host, uri.port)
  http.get(('/', headers)
end

Inject and Extract with Explicit Context

def handle_request(headers)
  extractors = OpenTelemetry.propagation.http_extractors
  context = OpenTelemetry.propagation.extract(headers, context: context, http_extractors: extractors)

  span = tracer.start_span('span-name', with_context: context)

  version = OpenTelemetry.correlations.value('client-version', context: context)
  tracer.with_span(span) do |span|
    case(version)
    when 'v1.0'
      data = fetch_data_from_service_b(Context.current)
    when 'v2.0'
      data = fetch_data_from_service_c(Context.current)
    end
  end

  send_response(data)
  span.finish
end
def fetch_data_from_service_b(context)
  headers = {}

  injectors = OpenTelemetry.propagation.http_injectors
  headers = OpenTelemetry.propagation.inject(headers, context: context, http_injectors: injectors)

  # make an http request
  uri = URI('https://site.com')
  http = Net::HTTP.new(uri.host, uri.port)
  http.get(('/', headers)
end

Additional Notes and Examples

Complex Modifications to Correlation Context

The CorrelationContext::Manager provides methods to make single modifications and multi-step modifications to correlation context as easy and as efficient as possible.

Single Line Modifications

context = OpenTelemetry.correlations.set_value('k1', 'v1)

Multi-line Modifications

# this creates a single Context instance with correlation context modified by 
# multiple operations

context = OpenTelemetry.correlations.build_context do |correlations|
  correlations.remove_value('k1')  #remove k1 from the implicit parent correlation context
  correlations.set_value('k2', 'v2')
  correlations.set_value('k3', 'v3') 
end

# which is easier and more efficient than the alternative, which creates three 
# context instances, each with a single modification of correlation context

ctx1 = OpenTelemetry.correlations.remove_value('k1')
ctx2 = OpenTelemetry.correlations.set_value('k2', 'v2', context: ctx1)
ctx3 = OpenTelemetry.correlations.set_value('k3', 'v3', context: ctx2)

Some may object to the use of the builder pattern in Ruby, but I think it makes for a better API. The alternative would be a method with kwargs and the following signature: build_context(values: {}, remove_keys: [], clear: false, context: Context.current). The method implementation and usage is not exactly straightforward, and with the place holder {} and [] there will be more object allocations than with the builder.

Giving the user a builder when they need it or single line modifications when they don't give users a good option for all scenarios.

Realistic Scenario Extracting and Setting Correlations

Below is a likely realistic scenario where context is extracted off the wire and correlations setup at a request boundary.

# extract context off the wire, includes parent span information and correlations from the caller  
extracted_context = OpenTelemetry.propagation.extract(headers)

# modify correlations from the inbound service
context = OpenTelemetry.correlations.build_context(context: extracted_context) do |correlations|
  correlations.remove_value('k1')  #remove k1 from the extracted correlation context
  correlations.set_value('k2', 'v2')
  correlations.set_value('k3', 'v3') 
end

# Set context as current
OpenTelemetry::Context.with_current(context) do
  # code that executes in the block will execute in this context
end

Context considerations

During this PR I've created two different context implementations. This commit shows the difference.

The initial implementation maintains a reference to a parent context and a hash of entries. New contexts duplicate their parent's entries and add their own additional entries. Since contexts are immutable, they are safe to share between threads without additional synchronization.

The second, and current implementation takes a linked list approach. Each a context has a parent, key and value. Keys are looked up by traversing the parent reference until one is found.

The hash based context is optimized for quick lookups, while the linked-list implementation is optimized (possibly) for more efficient memory usage. Both implementations expose the same API.

I've preemptively gone with the linked list implementation, but other implementations are still on the table. We should craft some benchmarks to determine which implementation is going to be best for most use cases. When benchmarking we should look at deeply nested traces, moderately nested, and shallow traces to get a sense of the performance characteristics.

Next steps

This PR is pretty sizable. There are a few reasons for this. One, is that it the implementation was following an in progress OTEP. Another reason, is that context propagation touches a large surface area, so some of this is inevitable. Nevertheless, there is a lot to digest in this PR and we should figure out the best way to evaluate and integrate this work. I expect I'll probably do a walk through at a future SIG meeting. I'm interested in any feedback people have about the approach and any suggestions to improve it.

@mwear mwear force-pushed the context_prop_combined branch from b1e5db3 to eef67cb Compare November 13, 2019 02:58
@mwear mwear marked this pull request as ready for review December 10, 2019 04:39
@mwear mwear changed the title Context prop combined [Draft: Not Ready for Review] - Context prop combined Dec 12, 2019
@mwear
Copy link
Member Author

mwear commented Dec 12, 2019

👋 Unfortunately I cannot put this back into (GH doesn't have this ability yet), but there are going to be some fairly significant changes to this work, so I wouldn't put any effort into reviewing this at the moment. I'll update the title and let folks know when it's really ready. Sorry for the false alarm.

@mwear mwear changed the title [Draft: Not Ready for Review] - Context prop combined Context prop combined Dec 20, 2019
@mwear
Copy link
Member Author

mwear commented Dec 20, 2019

I'm putting this back into a ready for review state. I expect there to be some suggestions, changes, and improvements. I think it's in a reasonable state to start getting some 👀 on it.

@mwear mwear force-pushed the context_prop_combined branch 2 times, most recently from 4ce543c to 7d065f2 Compare December 25, 2019 19:03
# The Propagation class provides methods to inject and extract context
# to pass across process boundaries
class Propagation
HTTP_TRACE_CONTEXT_EXTRACTOR = Trace::Propagation::HttpTraceContextExtractor.new

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small twist on this design from the otep was that propagators were modularized and returned their extractor/injector when invoked

  bagExtract, bagInject = Correlations::HTTPPropagator()
  traceExtract, traceInject = Tracer::B3Propagator()

It might read a little more elegantly if we don't have a single module for everything propagation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs some clean up. I have some ideas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a little better now.

@@ -4,28 +4,146 @@
#
# SPDX-License-Identifier: Apache-2.0

require 'opentelemetry/context/propagation'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to wrap my head around this whole thing but I'm a little surprised to see this dep because the otep author seems to indicate

The Propagator API knows about the Context API, but the Context API does not need to know about Propagation.

https://github.com/open-telemetry/oteps/pull/66/files#r359638018

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Context API is actually separate. That require is for namespacing reasons, so that all propagation code nests under OpenTelemetry::Context::Propagation. Since OpenTelemetry::Context doesn't actually depend on anything in OpenTelemetry::Context::Propagation, we could consider moving propagation up to the top level, OpenTelemetry::Propagation. However, propagation does depend on context, so there is a relationship there.

api/lib/opentelemetry/context.rb Outdated Show resolved Hide resolved
api/lib/opentelemetry/context.rb Show resolved Hide resolved
api/lib/opentelemetry/context.rb Show resolved Hide resolved

ctx_to_attach ||= @parent || ROOT
ctx_to_attach.attach
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read too far into this PR yet, so I may be missing some context (ha!) here. The design of this class makes me uncomfortable.

  • detach(prev) seems unnecessary when we can just call prev.attach
  • we have 2 forms of nesting: the implicit linked list of single-valued Contexts, and the current and previously attached Contexts on the Fiber's stack, and the 2 forms are not necessarily related
  • single-valued Contexts can result in havoc when set_values(hash) is combined with detach(nil), where detach(nil) will attach the Context without the last value in hash rather than the receiver of set_values(hash) (with_values helps with this, but exposing both methods makes it easier for people to shoot themselves in the foot)
  • the linked list of single-valued Contexts makes lookup proportional to the number of values set in a codepath, which is potentially a lot more than the count of active values (a value can be set for a key many times, and each time increases the cost to access values for other keys set earlier)
  • exposing both block-structured contexts and explicit attachment is potentially very error prone, based on my experience with a similar mechanism in Shopify's tracing gem (I strongly prefer only supporting block-structured contexts).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all valid concerns and I share most of them.

I don't think that attach and detach should be exposed if we can avoid it. I'll double check and see, but we might be able to remove them. Other options would be to put an @api private comment, or if we must expose them for edge cases, we can put a nasty gram in the comment to use at your own risk. They were inspired by this prior art: https://github.com/grpc/grpc-java/blob/master/context/src/main/java/io/grpc/Context.java#L405-L449.

I borrowed the single valued contexts idea from go. Take a look at line 480 until the end of the file: https://golang.org/src/context/context.go. You can also take a look at the hash based context I had earlier in this PR: a226469. Let me know if you'd be more comfortable with one over the other.

Copy link
Member Author

@mwear mwear Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and removed Context#attach and Context#detach. They aren't necessary for the block form, and if we find they become necessary, we can bring them back later with caveats and warnings.

api/lib/opentelemetry/context/propagation.rb Show resolved Hide resolved

def span_context_from(context)
context[ContextKeys.current_span_key]&.context ||
context[ContextKeys.extracted_span_context_key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way too much context here, and yet I still feel like I'm missing context 😞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context behind this is the same as: #147 (comment)

api/lib/opentelemetry/trace/tracer.rb Show resolved Hide resolved
#
# @return [Span]
def start_span(name, with_parent: nil, with_parent_context: nil, attributes: nil, links: nil, start_timestamp: nil, kind: nil, sampling_hint: nil)
span_context = with_parent&.context || with_parent_context || current_span.context
def start_span(name, with_parent: nil, with_parent_context: nil, with_context: nil, attributes: nil, links: nil, start_timestamp: nil, kind: nil, sampling_hint: nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter name is really confusing. We now have 3 parameters to specify the parent span context, and they're confusingly similarly named while accepting completely different argument types:

  • with_parent: Span
  • with_parent_context: SpanContext
  • with_context: Context

The latter expands to 5 different options:

  • implicit context current span's context
  • implicit context extracted span context
  • explicit context current span's context
  • explicit context extracted span context
  • invalid context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably get rid of with_parent_context. I don't see a use case where you'd have a span context without a Context or a Span any longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of with_context and repurposed with_parent_context to take a Context instead of a SpanContext.

@mwear
Copy link
Member Author

mwear commented Jan 17, 2020

I've addressed everything except the single-valued contexts question as I think that is still up to debate. This is ready for more feedback when folks have the time.

@mwear mwear requested a review from fbogsany January 17, 2020 00:07
# @return [Context]
def remove_value(key, context: Context.current)
correlations = correlations_for(context)
return context unless correlations.key?(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wrapping my head around how this all hangs together... Does the early return of the existing context (instead of always returning a new context) create any surprises for the caller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Contexts are immutable this should not be a surprise and it should be a safe optimization.

self.current = ctx
yield_value.nil? ? blk.call : blk.call(yield_value)
ensure
self.current = prev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is slightly less efficient than its predecessor with. The old method accessed the fiber-local variable once (returning a hash), and then manipulated the hash. The new version has the fiber-local variable access, and two fiber-local variable sets. I'm unsure of the added overhead, but it will add up as uses of context correlations and spans increase.

I'm not sure there's a clean and efficient alternative - really just raising the concern at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I returned to the previous implementation that had Context#attach and Context#detach. I just put an @api private tag on them.

module Context
extend self
# Manages context on a per-thread basis
class Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about this, the Context API from the OTEP differs in an important way from this implementation: it recommends a CreateKey(name) -> key function that returns a key for a name that is subsequently used to get, set and remove values.

I think the addition of this abstraction might allow a more efficient underlying implementation.

The OTEP describes two cross-cutting concerns (Observability and Correlations), while the Context API is generic enough to support additional concerns outside of the two specified. I think there is value in leveraging knowledge of the two specified concerns to improve lookup efficiency. For example, giving Context three fields:

@observability = nil
@correlations = nil
@extensions = {}.freeze

and copying the fields in attach rather than creating a linked list.

My assumption here is that we can define a private class that contains all the relevant context for Observability and likewise for Correlations. For Correlations, that may well be just a frozen Hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Context object itself should be a general purpose context object that is not tied to observability or even OpenTelemetry. There has been some talk of extracting Context into a separate package at some point in the future. Whether or not that will happen remains to be seen, but we should design with that goal in mind.

i went ahead and reverted back to my original frozen hash design to alleviate concerns over the linked list approach. I also implemented Context#create_key and am using Context::Keys for indexing entries. I think this gives us what we need and keeps the context decoupled from observability.

mwear added 27 commits February 6, 2020 13:22
This is subject to change, but we'll use the key as it's spec'd today.
We should still do some benchmarking, but I think this is going to be a
better implementation, so I'm preemptively using it for the purpose of
the initial PR.
This commit improves the ergonomics of Propagation#inject and
Propagation#extract by providing defaults arguments where possible.
For both methods, carrier is the only required argument. Context
defaults to Context.current and the http_injectors / extractors
default to those registered globally.
…Context

This commit adds a builder to faciliate making multiple modifications to
CorrelationContext without creating multiple, intermediary contexts.
Manager#build_context should be used when making multiple modifications
to the correlation context. When making a single modification, all other
methods should be used. Some may object to the builder and might recommend
a single method with kwargs, but I think this makes for a more fluent API.
This commit replaces with_context with with_parent_context and removes
with_context.
This reverts commit 88138ea66627f7d3153c4d668ed228a481f24e05.
There are perf concerns about the linked list approach.
@mwear mwear force-pushed the context_prop_combined branch from 6711e69 to 805dfc7 Compare February 6, 2020 21:23
@mwear
Copy link
Member Author

mwear commented Feb 6, 2020

Thanks for your reviews. As mentioned during our meeting, this is just a starting point and we'll continue to refine this work over time.

@mwear mwear merged commit c95f7cd into open-telemetry:master Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants