Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of async resources #1533

Closed
dyladan opened this issue Sep 14, 2020 · 14 comments
Closed

Handling of async resources #1533

dyladan opened this issue Sep 14, 2020 · 14 comments
Labels
Discussion Issue or PR that needs/is extended discussion.

Comments

@dyladan
Copy link
Member

dyladan commented Sep 14, 2020

I am creating this issue in the JS repo so we can discuss, but I believe this may also require a spec change.

In JS, in some cases it is impossible to gather resources synchronously. For instance in GCP, it is required to make a call to a remote endpoint, which cannot be done synchronously. For this reason, our resource detectors are asynchronous. Currently, the resource is provided to the TracerProvider at construction time, which means resource detection must complete before a TracerProvider is created. This means that program startup must be made asynchronous for anyone that wants resource detection.

There is currently a PR to solve this by making the resource a Promise, which is awaited before sending the span to the export pipeline. This solution allows SDK creation to be synchronous, but introduces other problems:

  1. The export pipeline will now be asynchronous in nature. Currently, after calling span.end, you can be sure the span has been sent to the span processor's onEnd method. After this PR, calling span.end() will mean that the span will be sent to the span processor's onEnd method some time in the future.
  2. 2 promises are now created on each span, which is non-trivial overhead, because the span must wait for the resource to resolve before calling the span processor's onStart or onEnd methods.

There have been several suggestions for ways to "fix" this issue, but each of them has their own problems

  1. Detect resources before constructing tracer provider (this is the current situation)
    • Spans can't be created before resources resolve
    • SDK initialization is asynchronous if you need to use async resource detection
  2. Resource is a promise in provider, tracer, and span. Resource is resolved before span is sent to span processor (this is the open PR)
    • Export is now async. Calling span.end, you cannot immediately forceFlush and assume the span you just ended will be included in the export.
  3. Attach resource to span as a Promise which is resolved by the exporter
    • ReadableSpan interface now contains a promise to resource instead of a real resource
    • Exporter now needs to await this promise on every export (overhead), or save the first exported resource and use it for every export (violates spec because exporter would not be able to be used with multiple tracer providers with different resources)
  4. Move resource detection to the exporter, making resources per-exporter instead of per-tracer-provider
    • This violates spec (resource is a member of tracer provider, resource is a member of readable span) and prevents the exporter from being shared by multiple tracer providers with differing resources
  5. Make resource global and include it as a top-level API. On first export, the exporter resolves the global resource and saves it. On subsequent exports, the same resource is used.
    • Only a global resource is possible
    • Violates spec for the same reasons as (3)
  6. Add an async method to the tracer provider to resolve resources which can only be called once.
    • What happens if spans are created before resources are resolved?

I would prefer solution 5. To me, it is the most straightforward way to ensure resources are included in the export without making the export pipeline async and allowing SDK creation to be sync. Unfortunately, this would violate spec and we would need to request a spec change.

/cc @mwear and @obecny because the async SDK startup issue was originally raised by lightstep

/cc @mayurkale22 as the other maintainer

/cc @open-telemetry/technical-committee @open-telemetry/specs-approvers for any guidance here on what we may be able to achieve within limits of spec, what spec may need to be updated/changed, or if we may be allowed to implement one of the solutions which I identified above as potentially spec-noncompliant.

@dyladan dyladan added the Discussion Issue or PR that needs/is extended discussion. label Sep 14, 2020
@Flarna
Copy link
Member

Flarna commented Sep 14, 2020

Would be interesting how other techs have solved this problem. I assume GCP resource detection via HTTP call is not JS specific.
Sure, other techs can do outgoing calls sync but on the other hand it's a race with spans created by other threads. I guess blocking all Java threads till resource detection is finished is also no option.

I agree with @dyladan that resources and spans should be decoupled. Each export message needs resource, not each span or each metric.

@dyladan
Copy link
Member Author

dyladan commented Sep 14, 2020

Would be interesting how other techs have solved this problem. I assume GCP resource detection via HTTP call is not JS specific.
Sure, other techs can do outgoing calls sync but on the other hand it's a race with spans created by other threads. I guess blocking all Java threads till resource detection is finished is also no option.

Asking this question in the maintainers gitter channel

@obecny
Copy link
Member

obecny commented Sep 15, 2020

My 3 cents:

Imho exporter should not care about resolving anything - whatever comes it will be send to the backend and nothing more. The point 5 should be changed to
Make resource global and include it as a top-level API. On first export, the ***SPAN PROCESSOR*** resolves the glo...

@dyladan
Copy link
Member Author

dyladan commented Sep 15, 2020

Making the span processor await the promise reintroduces the same problem that the export pipeline is now async. Calling span.end then immediately exporter.forceFlush will not work.

@dyladan
Copy link
Member Author

dyladan commented Sep 15, 2020

Recommended by @trask in the maintainers channel:

Make the Resource a lazy wrapper around deferred resources and make Resource.getAttributes async.

@Flarna
Copy link
Member

Flarna commented Sep 15, 2020

Making the span processor await the promise reintroduces the same problem that the export pipeline is now async. Calling span.end then immediately exporter.forceFlush will not work.

forceFlush is on SpanProcessor. The spans arrive sync there and it can await the resource for calling export.

@obecny
Copy link
Member

obecny commented Sep 15, 2020

Recommended by @trask in the maintainers channel:

Make the Resource a lazy wrapper around deferred resources and make Resource.getAttributes async.

But it doesn't resolve any of our problem. It doesn't matter what will be the interface of resource. If the resource is async it needs to be resolved somewhere somehow at some point. Attaching resources at the very beginning to the span is imho wrong then. I would say that the span should be possible to serialize at any moment. Spec requires us to attach resource to every exported span, how we do this it is up to us. Having said that the most natural place for me for resolving the async resources is the span processor. Moreover keeping this logic in span processor might even give people ability to update resources periodically or on demand adding its own logic to it - if they want that for any reason. Imagine the situation when resource is used to read the version of certain service. The service might get update and the node will be running for 1 year without any restart. What would be more valuable reporting version of this service for the whole year the same or updating the version. Now if we move the resources to the span processors someone can implement an easy mechanism for refreshing resources every 5 minutes and then simply attaching the new resolved resources into spans. I'm not even talking about situation when certain resource is not available and then become available again in 5 minutes. This all circumstances leads me to conclusion that the resources should be refreshed and kept in processors as it feels the best place there, whereas exporter doesn't need to really care about all the things except converting and sending data.

@Flarna
Copy link
Member

Flarna commented Sep 15, 2020

@obecny Resource is imutable according to spec. Therefore your usecase of the service update doesn't apply to resource. Such information needs to be transferred by some other mechanism.

In general the resource is needed together with the spans as they identify the thing monitored. a span alone has just a traceId/spanId, only the resource give the context to which thing a span belongs.

@rauno56
Copy link
Member

rauno56 commented Jun 10, 2021

Resource is immutable according to spec

@Flarna, do you mean that if a span starts off with resource unset, then, according to the spec, it cannot be added at the exporter?

@Flarna
Copy link
Member

Flarna commented Jun 10, 2021

@rauno56 Yes, the exporter uses the Resource "captured" by span at start. If a resource element is added/changed later it's not updated for in progress spans.

Besides that backends may assume that a specific set of resources is a unique description of a process. So they may cache it somehow or use it as a key in a map of processes.

@rauno56
Copy link
Member

rauno56 commented Jun 10, 2021

I guess I lack certain knowledge about the topic to continue the argument:

  1. Can resource description change during a lifetime of a process(form @obecny, it seemed "yes", you indicate "no")?
  2. Is it absolutely required for a span to know its resource description fully at the time of its start?
  3. Does "resource is immutable" mean
    a) resource attribute on a span is immutable; or
    b) SDK has to decide the process-wide Resource for once for the whole process lifetime and stick to it?

If a resource element is added/changed later it's not updated for in progress spans.

As a wild hypothetical, I know it might be far fetched(so please tell me if I'm totally outside of the ballpark) depending on the answers to the above questions one could imagine we have an API for the instrumentations to ask for the "current immutable resource"(which we could replace as the resource changes or new things are discovered) or have that internal and only attach it to otherwise resourceless spans during export.

@vmarchaud
Copy link
Member

@rauno56 We've add a discussion about this and explained few things there: https://cloud-native.slack.com/archives/C01NL1GRPQR/p1623080369097000

@Flarna
Copy link
Member

Flarna commented Jun 10, 2021

Currently the Resource class is implemented immutable. If you merge two Resource instances a new one is returned but the two originals are unmodified (similar as Context).

  • Span#resource is readonly and initialized with Tracer#resource at creation
  • Tracer#resource is readonly and initialized with TracerProvider#resource at creation
  • TracerProvider#resource is readonly and initialized at creation time

So no chance to update spans.

Technically you could recreate your TracerProvider and all your Tracers to get a new Resource into them but I guess the only place where this makes sense are tests.

Clearly this could be changed but that's a spec topic.

@pichlermarc
Copy link
Member

I think this is resolved by #3460 🙂
Feel free to re-open this if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Issue or PR that needs/is extended discussion.
Projects
None yet
Development

No branches or pull requests

6 participants