Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(events): propagate context vars to listeners #2324

Conversation

RobertRosca
Copy link
Contributor

@RobertRosca RobertRosca commented Sep 20, 2023

Pull Request Checklist

  • New code has 100% test coverage
  • (If applicable) The prose documentation has been updated to reflect the changes introduced by this PR
  • (If applicable) The reference documentation has been updated to reflect the changes introduced by this PR

Description

Request/Correlation ID propagation is super useful for logging and debugging, especially when using an event-based system. I've previously used an ASGI Correlation ID middleware to attach a unique ID to each request, but the correlation IDs are lost at runtime for event listener functions.

Looked through the Litestar code for emitters the context is lost because of the behaviour of AnyIO task groups. The AnyIO docs say:

Whenever a new task is spawned, context will be copied to the new task. It is important to note which context will be copied to the newly spawned task. It is not the context of the task group’s host task that will be copied, but the context of the task that calls TaskGroup.start() or TaskGroup.start_soon().

SimpleEventEmitter._worker is what calls start_soon, and since this runs outside of the context of the endpoint/middlewares the context is lost.

A simple way to work around this is to include copy of the context in the event tuple, and explicitly using that context to run start_soon. This works fine and preserves all context vars (including the correlation IDs) when the listeners are called.

As an example here are three log entries (request, response, and a log message sent in the body of a listener function) from testing this change:

image

The correlation ID is the hash in between the log level and the module, which gets set when the initial request is recieved and attached throughout all log messages afterwards. Without this change the ID was missing for the health check handler log entry, but with the changes in this PR it works again.

Since it was a small change (and looking at code is easier than talking about it 😛 ) I implemented it and added a test case to check that the context is propagated to listeners.

What do people think?

@RobertRosca RobertRosca requested review from a team as code owners September 20, 2023 14:09
Copy link
Member

@provinzkraut provinzkraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting addition!

@peterschutt
Copy link
Contributor

What do people think?

No objection from me. My question is whether we should consider it a breaking change and whether we should feature-flag it somehow.

@RobertRosca
Copy link
Contributor Author

RobertRosca commented Sep 21, 2023

I don't think this would break any existing user code since emitters are relatively new (added in v2 in August), context vars are not used that commonly, and IMO the chance of somebody wanting to use them and understanding they're not propagated to listeners and expecting that behaviour is very low. But I might be biased 😛

Thinking about this more, even if it isn't a breaking change, it can definitely cause problems in the future once there are more native or third party emitter backends. If you switch to another event emitter backend which does not propagate context, and you're using some third party package/middleware that requires propagation, then it will silently not work with no indication of what the problem is.

For example, going back to the scenario I ran into: you're using a third party CorrelationID middleware which adds correlation IDs via context variables and the default SimpleEventEmitter which (with the changes in this PR) propagates context, so the logging works fine.

You need to scale up or do something more complex and switch the emitter backend to something else like a KafkaEventEmitter, the correlation IDs added by the middleware are 'lost' and it could be very confusing to figure out why this happened or how to fix it if you don't know how the correlation ID middleware and context variables work.

If there was a way to ask the emitter backend if it propagates context or not, then the middleware could check this and log a warning or raise an error if it doesn't, and the user would know what the problem is and how to fix it.

Maybe an abstract property on the emitter backend class:

class BaseEventEmitterBackend(AsyncContextManager["BaseEventEmitterBackend"], ABC):
    @property
    @abstractmethod
    def propagate_context(self) -> bool:
        raise NotImplementedError("propagate context flag must be set by emitter backend implementation")
    ...


class SimpleEventEmitter(BaseEventEmitterBackend):
    propagate_context = True
    ...

That way, at app startup, anything that requires context propagation can check app.event_emitter.propagate_context and act differently if propagation is not supported by the emitter backend. This could also allow configuration, both for the whole backend (fully disable propagation if not required) or just on single emit calls (request.app.emit("foo", propagate_context=True/False)).

I guess this could also be done by using a mixin class that just marks that the emitter backend supports context propagation, then middlewares could check if the emitter backend is a subclass of the mixin class and log a warning or raise an error at app startup if it isn't. But this then can't be configured by the user.

@provinzkraut provinzkraut enabled auto-merge (squash) September 23, 2023 08:57
@peterschutt
Copy link
Contributor

I don't think this would break any existing user code since emitters are relatively new (added in v2 in August), context vars are not used that commonly, and IMO the chance of somebody wanting to use them and understanding they're not propagated to listeners and expecting that behaviour is very low. But I might be biased 😛

You are probably right that with low adoption we would fly under the radar. I do use contextvars quite a bit via structlog's bind_contextvars. Using this as an example, if one were emitting a log event from the handler, and another log event from the listener, I believe this change would mean those two log events would share bound context, which may not be desirable.

@provinzkraut
Copy link
Member

@peterschutt Sounds like this can't be merged because it's breaking?

@RobertRosca I think you're on the right track with making this configurable. I'm not sure if making this config available on every single call is a good idea though as it would be quite confusing having this option available on backends that don't even support it.
One flag for the whole backend seems to be good enough.

@peterschutt
Copy link
Contributor

@peterschutt Sounds like this can't be merged because it's breaking?

I like the idea of being able to correlate things done by listeners to the requests that caused them. I just don't know if blanket sharing of context between the request and the listener invocation is desirable.

Could we have it configurable that select context is passed?

>>> from contextvars import *
>>> var: ContextVar[int] = ContextVar('var', default=42)
>>> var.set(43)
<Token var=<ContextVar name='var' default=42 at 0x7f6d6d66e6d0> at 0x7f6d6d675080>
>>> var.get()
43
>>> new_ctx = Context()
>>> new_ctx.run(var.set, var.get())
<Token var=<ContextVar name='var' default=42 at 0x7f6d6d66e6d0> at 0x7f6d6d6752c0>
>>> new_ctx.run(var.get)
43

auto-merge was automatically disabled November 3, 2023 16:46

Merge queue setting changed

@JacobCoffee
Copy link
Member

What's the latest on this @RobertRosca @peterschutt ?

@peterschutt
Copy link
Contributor

What's the latest on this @RobertRosca @peterschutt ?

Nothing further from me since my last comments.

@RobertRosca RobertRosca force-pushed the feat/events/context-propagation branch from 888cf9d to af7bf45 Compare November 20, 2023 08:50
update type checking imports, explicitly cast partial function for mypy
Copy link

Documentation preview will be available shortly at https://litestar-org.github.io/litestar-docs-preview/2324

@RobertRosca
Copy link
Contributor Author

Ah sorry, was busy with work and didn't follow up on this. I rebased the branch from main and fixed the remaining issues with linting/type checks.

From the discussions above and after looking at how this is handled in related cases I'd:

  1. Add a property to the BaseEventEmitterBackend which defines the default propagate context configuration but also if context propagation is available, e.g. propagate_context: bool | None = None. If set to None then propagation is not available at all.

  2. Add option at the app/app config level, e.g. Litestar(..., event_emitter_propagate_contextvars: bool | None) (or event_emitter_backend_propagate_contextvars?).

    If set to None it doesn't change the default, if set to True and the event emitter backend does not support propagation then it can raise an exception saying that the event emitter backend implementation does not support contextvar propagation, event_emitter_propagate_contextvars=False can be set, with a link to some relevant docs page.

  3. Add section to docs on other cases where finer-grained control is needed showing examples of how to have contextvar propagation enabled in general, but not propagate it for a specific emit call, e.g. expand on:

    import contextvars
    
    @get("/")
    def route_handler(request: Request[Any, Any, Any]) -> None:
        # To propagate entire context
        request.app.emit("event")
    
        # To not propagate context:
        ctx = contextvars.Context()  # create new empty context
        ctx.run(request.app.emit, "event")  # runs in empty context, propagates nothing

Which would (I think?) address the remaining points.

@RobertRosca
Copy link
Contributor Author

I did have a look around at how other projects handle this, found a few semi-related discussions/use cases:

  • Here's a discussion on python forums about back-propagation of contextvar modifications. Not completely related, but it does some related topics.
  • Flask uses contextvars for a lot of internals to manage request-scoped data, as far as I can tell from the documentation it sounds like contextvars are always propagated.
  • Django mentions contextvars here which says: "Threadlocals and contextvars values are preserved across the boundary in both directions.". So there's no configuration for propagation, it just always happens.
  • Recent change to asyncio.create_task allowing for create_task(..., context=ctx)

Which still just leaves this current approach of implicitly copying the current context and running the relevant functions with it, or of explicitly passing a context as a keyword argument (e.g. app.emit/EventEmitter.emit accepting an optional kwarg context which is set in the Message if provided).

Explicitly passing the context is flexible but also more complex, and since there's always the trick of doing ctx.run(request.app.emit, ...) with an empty context when you want propagation enabled with an way to 'disable' it in some cases, I think the current approach is fine.

Also since, as far as I could tell, both Django and Flask copy the whole context (albeit in different use cases) it seems like a reasonable approach as the default.

provinzkraut and others added 2 commits December 3, 2023 12:46
* wip

Signed-off-by: Janek Nouvertné <[email protected]>

* some debugging

Signed-off-by: Janek Nouvertné <[email protected]>

* formatting

Signed-off-by: Janek Nouvertné <[email protected]>

* use a separate connection to publish/listen

Signed-off-by: Janek Nouvertné <[email protected]>

* reintroduce flaky

Signed-off-by: Janek Nouvertné <[email protected]>

* Fix typing

Signed-off-by: Janek Nouvertné <[email protected]>

* Add psycopg backend

Signed-off-by: Janek Nouvertné <[email protected]>

* Fix backend issues

Signed-off-by: Janek Nouvertné <[email protected]>

* Undo test debugging changes

Signed-off-by: Janek Nouvertné <[email protected]>

* mark groups

Signed-off-by: Janek Nouvertné <[email protected]>

* Ensure channel names ar quoted

Signed-off-by: Janek Nouvertné <[email protected]>

* sleep debugging

Signed-off-by: Janek Nouvertné <[email protected]>

* update docs

Signed-off-by: Janek Nouvertné <[email protected]>

* Add missing test

Signed-off-by: Janek Nouvertné <[email protected]>

* Fix docs link

Signed-off-by: Janek Nouvertné <[email protected]>

* Add missing listener test

Signed-off-by: Janek Nouvertné <[email protected]>

* Formatting

Signed-off-by: Janek Nouvertné <[email protected]>

* Fix test typing

Signed-off-by: Janek Nouvertné <[email protected]>

* Fix some coverage issue

Signed-off-by: Janek Nouvertné <[email protected]>

---------

Signed-off-by: Janek Nouvertné <[email protected]>
Co-authored-by: Cody Fincher <[email protected]>
Signed-off-by: Janek Nouvertné <[email protected]>
@JacobCoffee
Copy link
Member

What's the status here? Should we rebase on the latest and finish it out, close it, or?

@JacobCoffee
Copy link
Member

Sorry about this, but this should be reopened and pointed to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants