Context prop combined #147

mwear · 2019-11-08T00:28:50Z

This is a context propagation prototype based on open-telemetry/oteps#66.

Examples

All in one - Example

The example below is an all-in-one example that shows configuration, handling an HTTP request, and making a call to an external service. In the all-in-one example there is deeper nesting than you will typically see, due to the collocation of the code. In the real world, different components would handle different parts of the request in their own separate locations.

# Set global injector / extractors (accepts an array)
OpenTelemetry.propagation.http_injectors = [OpenTelemetry::Trace::Propagation.http_trace_context_injector, OpenTelemetry::CorrelationContext::Propagation.http_injector]
OpenTelemetry.propagation.http_extractors = [OpenTelemetry::Trace::Propagation.rack_http_trace_context_extractor, OpenTelemetry::CorrelationContext::Propagation.rack_http_extractor]
OpenTelemetry.correlations = OpenTelemetry::SDK::CorrelationContext::Manager.new
tracer = OpenTelemetry.tracer_factory.tracer('myapp', '1.0.0')

# One time setup above ^^^^, subsequent usage below:

# Extract context from an inbound request
extracted_context = OpenTelemetry.propagation.extract(rack_env)

# Add correlations to the extracted context
context = OpenTelemetry.correlations.set_value('k1', 'v1', context: extracted_context)

# Set current context
OpenTelemetry::Context.with_current(context) do
  # SpanContext from the extracted context will be an implicit parent for `service-span`
  tracer.in_span('service-span') do
    tracer.in_span('external-req-span') do
      headers = {'Accept' => 'application/json'}
      # Inject context into an outbound request
      headers = OpenTelemetry.propagation.inject(headers)
      Net::HTTP.get(URI('http://site.com'), headers)
    end
  end
end

# Inject and Extract can take explicit context, injectors and extractors

context = OpenTelemetry.propagation.extract(carrier, context: some_context, http_extractors: [ex1, ex2, ex3])

headers = OpenTelemetry.propagation.inject(headers, context: some_context, http_injectors: [in1, in2, in3])

OTEP Examples

The following examples are Ruby versions of the examples in open-telemetry/oteps#66.

Global initialization

OpenTelemetry.propagation.http_injectors = [OpenTelemetry::Trace::Propagation.http_trace_context_injector, OpenTelemetry::CorrelationContext::Propagation.http_injector]
OpenTelemetry.propagation.http_extractors = [OpenTelemetry::Trace::Propagation.rack_http_trace_context_extractor, OpenTelemetry::CorrelationContext::Propagation.rack_http_extractor]
OpenTelemetry.correlations = OpenTelemetry::SDK::CorrelationContext::Manager.new
tracer = OpenTelemetry.tracer_factory.tracer('myapp', '1.0.0')

Inject & Extract with Implicit Context

def handle_request(headers)
  context = OpenTelemetry.propagation.extract(headers)

  OpenTelemetry::Context.with_current(context)
    tracer.in_span('span-name') do
      version = OpenTelemetry.correlations.value('client-version')

      case(version)
      when 'v1.0'
        data = fetch_data_from_service_b
      when 'v2.0'
        data = fetch_data_from_service_c
      end

      send_response(data)
    end
  end
end

def fetch_data_from_service_b
  headers = {}
  headers = OpenTelemetry.propagation.inject(headers)

  # make an http request
  uri = URI('http://site.com')
  http = Net::HTTP.new(uri.host, uri.port)
  http.get(('/', headers)
end

Inject and Extract with Explicit Context

def handle_request(headers)
  extractors = OpenTelemetry.propagation.http_extractors
  context = OpenTelemetry.propagation.extract(headers, context: context, http_extractors: extractors)

  span = tracer.start_span('span-name', with_context: context)

  version = OpenTelemetry.correlations.value('client-version', context: context)
  tracer.with_span(span) do |span|
    case(version)
    when 'v1.0'
      data = fetch_data_from_service_b(Context.current)
    when 'v2.0'
      data = fetch_data_from_service_c(Context.current)
    end
  end

  send_response(data)
  span.finish
end

def fetch_data_from_service_b(context)
  headers = {}

  injectors = OpenTelemetry.propagation.http_injectors
  headers = OpenTelemetry.propagation.inject(headers, context: context, http_injectors: injectors)

  # make an http request
  uri = URI('https://site.com')
  http = Net::HTTP.new(uri.host, uri.port)
  http.get(('/', headers)
end

Additional Notes and Examples

Complex Modifications to Correlation Context

The CorrelationContext::Manager provides methods to make single modifications and multi-step modifications to correlation context as easy and as efficient as possible.

Single Line Modifications

context = OpenTelemetry.correlations.set_value('k1', 'v1)

Multi-line Modifications

# this creates a single Context instance with correlation context modified by 
# multiple operations

context = OpenTelemetry.correlations.build_context do |correlations|
  correlations.remove_value('k1')  #remove k1 from the implicit parent correlation context
  correlations.set_value('k2', 'v2')
  correlations.set_value('k3', 'v3') 
end

# which is easier and more efficient than the alternative, which creates three 
# context instances, each with a single modification of correlation context

ctx1 = OpenTelemetry.correlations.remove_value('k1')
ctx2 = OpenTelemetry.correlations.set_value('k2', 'v2', context: ctx1)
ctx3 = OpenTelemetry.correlations.set_value('k3', 'v3', context: ctx2)

Some may object to the use of the builder pattern in Ruby, but I think it makes for a better API. The alternative would be a method with kwargs and the following signature: build_context(values: {}, remove_keys: [], clear: false, context: Context.current). The method implementation and usage is not exactly straightforward, and with the place holder {} and [] there will be more object allocations than with the builder.

Giving the user a builder when they need it or single line modifications when they don't give users a good option for all scenarios.

Realistic Scenario Extracting and Setting Correlations

Below is a likely realistic scenario where context is extracted off the wire and correlations setup at a request boundary.

# extract context off the wire, includes parent span information and correlations from the caller  
extracted_context = OpenTelemetry.propagation.extract(headers)

# modify correlations from the inbound service
context = OpenTelemetry.correlations.build_context(context: extracted_context) do |correlations|
  correlations.remove_value('k1')  #remove k1 from the extracted correlation context
  correlations.set_value('k2', 'v2')
  correlations.set_value('k3', 'v3') 
end

# Set context as current
OpenTelemetry::Context.with_current(context) do
  # code that executes in the block will execute in this context
end

Context considerations

During this PR I've created two different context implementations. This commit shows the difference.

The initial implementation maintains a reference to a parent context and a hash of entries. New contexts duplicate their parent's entries and add their own additional entries. Since contexts are immutable, they are safe to share between threads without additional synchronization.

The second, and current implementation takes a linked list approach. Each a context has a parent, key and value. Keys are looked up by traversing the parent reference until one is found.

The hash based context is optimized for quick lookups, while the linked-list implementation is optimized (possibly) for more efficient memory usage. Both implementations expose the same API.

I've preemptively gone with the linked list implementation, but other implementations are still on the table. We should craft some benchmarks to determine which implementation is going to be best for most use cases. When benchmarking we should look at deeply nested traces, moderately nested, and shallow traces to get a sense of the performance characteristics.

Next steps

This PR is pretty sizable. There are a few reasons for this. One, is that it the implementation was following an in progress OTEP. Another reason, is that context propagation touches a large surface area, so some of this is inevitable. Nevertheless, there is a lot to digest in this PR and we should figure out the best way to evaluate and integrate this work. I expect I'll probably do a walk through at a future SIG meeting. I'm interested in any feedback people have about the approach and any suggestions to improve it.

mwear · 2019-12-12T01:13:36Z

👋 Unfortunately I cannot put this back into (GH doesn't have this ability yet), but there are going to be some fairly significant changes to this work, so I wouldn't put any effort into reviewing this at the moment. I'll update the title and let folks know when it's really ready. Sorry for the false alarm.

mwear · 2019-12-20T02:30:15Z

I'm putting this back into a ready for review state. I expect there to be some suggestions, changes, and improvements. I think it's in a reasonable state to start getting some 👀 on it.

benedictfischer09 · 2020-01-08T02:27:17Z

api/lib/opentelemetry/context/propagation/propagation.rb

+      # The Propagation class provides methods to inject and extract context
+      # to pass across process boundaries
+      class Propagation
+        HTTP_TRACE_CONTEXT_EXTRACTOR = Trace::Propagation::HttpTraceContextExtractor.new


A small twist on this design from the otep was that propagators were modularized and returned their extractor/injector when invoked

bagExtract, bagInject = Correlations::HTTPPropagator() traceExtract, traceInject = Tracer::B3Propagator()

It might read a little more elegantly if we don't have a single module for everything propagation

This definitely needs some clean up. I have some ideas.

This should be a little better now.

benedictfischer09 · 2020-01-08T03:09:08Z

api/lib/opentelemetry/context.rb

@@ -4,28 +4,146 @@
 #
 # SPDX-License-Identifier: Apache-2.0

+require 'opentelemetry/context/propagation'


I'm still trying to wrap my head around this whole thing but I'm a little surprised to see this dep because the otep author seems to indicate

The Propagator API knows about the Context API, but the Context API does not need to know about Propagation.

https://github.com/open-telemetry/oteps/pull/66/files#r359638018

The Context API is actually separate. That require is for namespacing reasons, so that all propagation code nests under OpenTelemetry::Context::Propagation. Since OpenTelemetry::Context doesn't actually depend on anything in OpenTelemetry::Context::Propagation, we could consider moving propagation up to the top level, OpenTelemetry::Propagation. However, propagation does depend on context, so there is a relationship there.

api/lib/opentelemetry/context.rb

fbogsany · 2020-01-09T02:30:11Z

api/lib/opentelemetry/context.rb

+
+      ctx_to_attach ||= @parent || ROOT
+      ctx_to_attach.attach
+    end


I haven't read too far into this PR yet, so I may be missing some context (ha!) here. The design of this class makes me uncomfortable.

detach(prev) seems unnecessary when we can just call prev.attach

we have 2 forms of nesting: the implicit linked list of single-valued Contexts, and the current and previously attached Contexts on the Fiber's stack, and the 2 forms are not necessarily related

single-valued Contexts can result in havoc when set_values(hash) is combined with detach(nil), where detach(nil) will attach the Context without the last value in hash rather than the receiver of set_values(hash) (with_values helps with this, but exposing both methods makes it easier for people to shoot themselves in the foot)

the linked list of single-valued Contexts makes lookup proportional to the number of values set in a codepath, which is potentially a lot more than the count of active values (a value can be set for a key many times, and each time increases the cost to access values for other keys set earlier)

exposing both block-structured contexts and explicit attachment is potentially very error prone, based on my experience with a similar mechanism in Shopify's tracing gem (I strongly prefer only supporting block-structured contexts).

These are all valid concerns and I share most of them.

I don't think that attach and detach should be exposed if we can avoid it. I'll double check and see, but we might be able to remove them. Other options would be to put an @api private comment, or if we must expose them for edge cases, we can put a nasty gram in the comment to use at your own risk. They were inspired by this prior art: https://github.com/grpc/grpc-java/blob/master/context/src/main/java/io/grpc/Context.java#L405-L449.

I borrowed the single valued contexts idea from go. Take a look at line 480 until the end of the file: https://golang.org/src/context/context.go. You can also take a look at the hash based context I had earlier in this PR: a226469. Let me know if you'd be more comfortable with one over the other.

I went ahead and removed Context#attach and Context#detach. They aren't necessary for the block form, and if we find they become necessary, we can bring them back later with caveats and warnings.

api/lib/opentelemetry/context/propagation.rb

fbogsany · 2020-01-09T04:25:17Z

api/lib/opentelemetry/trace/propagation/http_trace_context_injector.rb

+
+        def span_context_from(context)
+          context[ContextKeys.current_span_key]&.context ||
+            context[ContextKeys.extracted_span_context_key]


Way too much context here, and yet I still feel like I'm missing context 😞

The context behind this is the same as: #147 (comment)

api/lib/opentelemetry/trace/propagation/http_trace_context_extractor.rb

api/lib/opentelemetry/trace/tracer.rb

fbogsany · 2020-01-09T04:45:52Z

api/lib/opentelemetry/trace/tracer.rb

      #
      # @return [Span]
-      def start_span(name, with_parent: nil, with_parent_context: nil, attributes: nil, links: nil, start_timestamp: nil, kind: nil, sampling_hint: nil)
-        span_context = with_parent&.context || with_parent_context || current_span.context
+      def start_span(name, with_parent: nil, with_parent_context: nil, with_context: nil, attributes: nil, links: nil, start_timestamp: nil, kind: nil, sampling_hint: nil)


The new parameter name is really confusing. We now have 3 parameters to specify the parent span context, and they're confusingly similarly named while accepting completely different argument types:

with_parent: Span

with_parent_context: SpanContext

with_context: Context

The latter expands to 5 different options:

implicit context current span's context

implicit context extracted span context

explicit context current span's context

explicit context extracted span context

invalid context

We could probably get rid of with_parent_context. I don't see a use case where you'd have a span context without a Context or a Span any longer.

I got rid of with_context and repurposed with_parent_context to take a Context instead of a SpanContext.

api/lib/opentelemetry/correlation_context/propagation/context_keys.rb

mwear · 2020-01-17T00:07:46Z

I've addressed everything except the single-valued contexts question as I think that is still up to debate. This is ready for more feedback when folks have the time.

api/lib/opentelemetry/context.rb

sdk/lib/opentelemetry/sdk/correlation_context/builder.rb

sdk/lib/opentelemetry/sdk/correlation_context/manager.rb

fbogsany · 2020-01-21T03:09:11Z

sdk/lib/opentelemetry/sdk/correlation_context/manager.rb

+        # @return [Context]
+        def remove_value(key, context: Context.current)
+          correlations = correlations_for(context)
+          return context unless correlations.key?(key)


I'm still wrapping my head around how this all hangs together... Does the early return of the existing context (instead of always returning a new context) create any surprises for the caller?

Given that Contexts are immutable this should not be a surprise and it should be a safe optimization.

sdk/test/opentelemetry/sdk/correlation_context/manager_test.rb

fbogsany · 2020-01-21T03:32:36Z

api/lib/opentelemetry/context.rb

+        self.current = ctx
+        yield_value.nil? ? blk.call : blk.call(yield_value)
+      ensure
+        self.current = prev


This method is slightly less efficient than its predecessor with. The old method accessed the fiber-local variable once (returning a hash), and then manipulated the hash. The new version has the fiber-local variable access, and two fiber-local variable sets. I'm unsure of the added overhead, but it will add up as uses of context correlations and spans increase.

I'm not sure there's a clean and efficient alternative - really just raising the concern at this point.

I returned to the previous implementation that had Context#attach and Context#detach. I just put an @api private tag on them.

api/lib/opentelemetry/correlation_context/propagation/http_extractor.rb

fbogsany · 2020-01-22T16:38:47Z

api/lib/opentelemetry/context.rb

-  module Context
-    extend self
+  # Manages context on a per-thread basis
+  class Context


Thinking some more about this, the Context API from the OTEP differs in an important way from this implementation: it recommends a CreateKey(name) -> key function that returns a key for a name that is subsequently used to get, set and remove values.

I think the addition of this abstraction might allow a more efficient underlying implementation.

The OTEP describes two cross-cutting concerns (Observability and Correlations), while the Context API is generic enough to support additional concerns outside of the two specified. I think there is value in leveraging knowledge of the two specified concerns to improve lookup efficiency. For example, giving Context three fields:

@observability = nil @correlations = nil @extensions = {}.freeze

and copying the fields in attach rather than creating a linked list.

My assumption here is that we can define a private class that contains all the relevant context for Observability and likewise for Correlations. For Correlations, that may well be just a frozen Hash.

The Context object itself should be a general purpose context object that is not tied to observability or even OpenTelemetry. There has been some talk of extracting Context into a separate package at some point in the future. Whether or not that will happen remains to be seen, but we should design with that goal in mind.

i went ahead and reverted back to my original frozen hash design to alleviate concerns over the linked list approach. I also implemented Context#create_key and am using Context::Keys for indexing entries. I think this gives us what we need and keeps the context decoupled from observability.

This is subject to change, but we'll use the key as it's spec'd today.

We should still do some benchmarking, but I think this is going to be a better implementation, so I'm preemptively using it for the purpose of the initial PR.

This commit improves the ergonomics of Propagation#inject and Propagation#extract by providing defaults arguments where possible. For both methods, carrier is the only required argument. Context defaults to Context.current and the http_injectors / extractors default to those registered globally.

…Context This commit adds a builder to faciliate making multiple modifications to CorrelationContext without creating multiple, intermediary contexts. Manager#build_context should be used when making multiple modifications to the correlation context. When making a single modification, all other methods should be used. Some may object to the builder and might recommend a single method with kwargs, but I think this makes for a more fluent API.

…ctor

This commit replaces with_context with with_parent_context and removes with_context.

This reverts commit 88138ea66627f7d3153c4d668ed228a481f24e05.

There are perf concerns about the linked list approach.

…relations

mwear · 2020-02-06T21:32:34Z

Thanks for your reviews. As mentioned during our meeting, this is just a starting point and we'll continue to refine this work over time.

mwear mentioned this pull request Nov 8, 2019

Context propagation prototype #146

Closed

mwear force-pushed the context_prop_combined branch from b1e5db3 to eef67cb Compare November 13, 2019 02:58

mwear marked this pull request as ready for review December 10, 2019 04:39

mwear requested review from bai, dazuma, elskwid, fbogsany and luvtechno as code owners December 10, 2019 04:39

mwear changed the title ~~Context prop combined~~ [Draft: Not Ready for Review] - Context prop combined Dec 12, 2019

mwear changed the title ~~[Draft: Not Ready for Review] - Context prop combined~~ Context prop combined Dec 20, 2019

mwear force-pushed the context_prop_combined branch 2 times, most recently from 4ce543c to 7d065f2 Compare December 25, 2019 19:03

benedictfischer09 reviewed Jan 8, 2020

View reviewed changes

fbogsany reviewed Jan 9, 2020

View reviewed changes

mwear requested a review from fbogsany January 17, 2020 00:07