Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit Context Access #937

Closed
mitsuhiko opened this issue Apr 3, 2018 · 23 comments
Closed

Implicit Context Access #937

mitsuhiko opened this issue Apr 3, 2018 · 23 comments

Comments

@mitsuhiko
Copy link

mitsuhiko commented Apr 3, 2018

I'm currently trying to write a new rust library for Sentry and I'm running into some major issues on the way to futures 0.2. Previously it was possible to access task local data from anywhere but now that is gone.

This is particularly problematic because it makes it completely impossible to support our breadcrumbs feature for futures. My goal was to add a log handler that acts as a proxy and appends all log statements of a thread to a local size capped ring buffer and then dispatches to the main log handler optionally. That way if an error event happens we can append all captured log messages with it.

However this now cannot be done any more as the log handler that works with the log crate cannot get access to this context. I understand that there was an RFC that went this way just now but I really want to challenge this design decision before it's too late.

It also makes many features that depend on implicit context passing completely impossible to implement in Rust. I used implicit context handling in the past not just for Sentry related debugging tools but also to enforce security contexts in multi tenant applications where it's crucial that the contextual information is not lost. The lack of implicit context access now means that the only sane way to isolate multiple tenants in Rust is to isolate them onto different threads and use thread locals which seems like a massive step in the wrong direction.

Is there any way we can go back to implicit contexts?

@mitsuhiko
Copy link
Author

I also want to point out that other communities went through the same experience:

  • Python 3 just gained an implicit event loop in 3.5(?) and implicit context variables through PEP 567
  • .NET has the logical call context which is also implicitly passed
  • Node has domains which are used for similar purposes (and are very commonly used by tools like sentry/bugsnag and many more)
  • The go community decided to go with an explicit context object but the effects on the APIs that require such context data have been quite negative. As an example one can look at how the opentracing APIs on go look.

@aturon
Copy link
Member

aturon commented Apr 3, 2018

@mitsuhiko thanks for the feedback! One thing that I think would be really helpful is to take one of the examples where you feel a pattern has become completely impossible, and explain it in full detail (including perhaps some code snippets).

@mitsuhiko
Copy link
Author

The two examples I mentioned above look something like this:

Local Context Logging / Event Recording

For the Sentry SDK in JavaScript we generally do things like this (pseudocode for brevity):

import {addBreadcrumb} from '@sentry/shim';

let realLog = console.log;
console.log = function() {
  addBreadcrumb('log', argsToMessage(arguments));
  return realLog.apply(this, arguments);
};

Here addBreadcrumb internally uses our "give me the current context" function so we know where to send data to. In JavaScript this is either the global context (a stack) if we are in browser or it uses the current "domain" in node (and this might use a different system once the node people know what their replacement will be).

The idea is that no matter how deep in the call stack we are, we want to be able to record relevant information to our buffer so that when an error happens we can send all relevant information along.

Similar patterns are applied when people do something like open tracing where you need to record begin and end of operations. This is typically also bound to some form of context. For instance this can be used to record the time of a SQL query start to end. This might also not have access to the context necessarily. For instance in Python many systems that record such timings are implemented by hooking into already existing APIs that often do not have some sort of context. For instance all newrelic libraries I know about are hooking functions and recording into some sort of local implicit context the timing information captured to then flush it out later.

Tenant Separation / Auditing

The second case that commonly comes up that can use implicit contexts are multi tenant applications. There it's important that people do not accidentally wrote that that can escape the current tenant (where tenant usually means customer). Such applications often bind relevant information in an implied context so that auditing and security assertions can be easily written.

For instance if the current remote address, current authenticated user and current tenant are easily available than database code can for instance automatically constraint queries or assert that data from other tenants are not returned. Likewise internal audit functions can use that context data to record an audit trail and attach as much information as possible. This is really useful for building secure SaaS systems where audit trails are often used to backtrack from security incidents where a bug lead to privilege escalations.


I covered some of those cases in a talk I gave last year about building SaaS apps in Python: http://mitsuhiko.pocoo.org/practicalsaas.pdf

About halfway through the talk are some examples.

@aturon
Copy link
Member

aturon commented Apr 3, 2018

Thanks @mitsuhiko!

To further clarify: in the 0.2 design, anything of type Future automatically has access to the task context, since that's part of the definition of the trait. This means that, for example, when using async/await notation a context is always available in a TLS-like fashion, which also means that task-local data is implicitly available in all async/await code.

Digging in to the event recording case: if you're grabbing the task context today, presumably you're running within some outer future, right? But you're worrying about code deep within that future that currently grabs the context out of TLS? Have you thought about how this will look with async/await?

@mitsuhiko
Copy link
Author

@aturon the issue is that it requires someone passing down the context. That is something this code can generally not assume. In particular stuff like Sentry generally cannot assume that people re-architecture all their apps just to get better error reports. We also have traditionally had this issue in the Python community where lots of state was held on TLS because an API precedes some context data later. For instance almost the entire web world binds the current "culture"/"locale" etc. that should be used on a thread local / task local.

Digging in to the event recording case: if you're grabbing the task context today, presumably you're running within some outer future, right?

For what it's worth I also ran into issues with futures 0.1.20 since task::current panics if there is no active task. The only workaround for 0.1.20 i came up with was setting up a landing pad for the panic which is horrible and won't work in abort cases.

What I wanted to do with futures 0.1 was to see if there is an active task, if then use task local storage, otherwise thread local storage. If none of that has an active scope fall back to a process wide one.

Have you thought about how this will look with async/await?

With futures 0.2 I most likely cannot support this at all. In Python and server side JavaScript we have nice solutions with PEP 567 and node domains.

@aturon
Copy link
Member

aturon commented Apr 3, 2018

@mitsuhiko OK, so the core issue is that you have code which may be run in a futures context, but is not required to be. When you are run within a future, you want to extract data from its context.

I agree that the 0.2 design, as it stands, makes this kind of thing effectively impossible for task-local data. In particular, we currently provide &mut access to the context to the future, which precludes stashing a handle to it separately in TLS.

@cramertj, what would you think about something like the following:

  • Take the context as a shared borrow, and provide RefCell-like access to task-local data.
  • In std environments, provide a well-known thread-local that stores the current context, if any; executors are expected to place the context within that thread-local prior to invoking the future.
  • Provide read access to the thread-local context in std environments.

In other words, we can keep the top-level traits using explicit context passing, which is the only option in no-std environments. But in std environments, you always have the option of dynamically checking whether you're executing within a task via TLS.

@mitsuhiko
Copy link
Author

OK, so the core issue is that you have code which may be run in a futures context, but is not required to be. When you are run within a future, you want to extract data from its context.

Correct. More importantly this really only matters for std environments. I do not see a case where no_std or similar is in need of this.

@aturon
Copy link
Member

aturon commented Apr 4, 2018

@mitsuhiko I talked with @cramertj a bit on IRC, and it seemed like a better way to handle these needs in general is to use scoped TLS, which works independently of execution model (i.e., works equally well with futures-style tasks and OS threads). Essentially, you get to decorate your stack with contextual information on the way down, which can then be read out below. This works great with the task model, because to start executing a new task you have to exit all frames of the current one, which clears the scoped_tls (and, conversely, the next time you poll it is re-established).

AFAICT this would work well for the use-cases you've outlined -- and in some sense would improve it, since you'd no longer have logic about falling back to plain TLS.

What do you think?

@mitsuhiko
Copy link
Author

Since tokio::spawn or whatever else would be used to spawn a future tears down the stack at that point, none of that data becomes available within a task. So each task would be executed without any data being available. This also extends to and_then and similar things. I currently do not see how scoped-tls can be used in a useful way but that might also me misunderstanding the API.

From my understanding of it it does not seem very helpful at least because of the stack tearing down most of the time. In node for instance the domain carries through promise callbacks automatically (example here: https://gist.github.com/mitsuhiko/355d921bdde4ebded34bd2cf73cb4b18)

@skade
Copy link

skade commented Apr 4, 2018

@aturon This is similar to the discussion we had last week about "per-request" semi-globals.

I also don't see how scoped-TLS works in all the contexts there, maybe an example would help.

FWIW, I agree that this feature is desirable, but maybe it could be provided by the executors?

@aturon
Copy link
Member

aturon commented Apr 4, 2018

To try to make this a bit more concrete, I've sketched out some combinators based on a scoped TLS discipline. Imagine we have a TLS mechanism, SCOPED_TLS_MAP, that allows scoped updates to a thread-local typeset. You use set to install a given piece of context, and with to gain access to it; both take closures, and set removes the piece of context on the way out.

We can then provide this in a futures-style API as follows:

pub struct InContext<C, F> {
    cx: C,
    fut: F,
}

pub fn in_context<C, F>(cx: C, fut: F) -> InContext<C, F> {
    InContext { cx, fut }
}

impl<C, F: Future> Future for InContext<C, F> {
    type Item = F::Item;
    type Error = F::Error;

    fn poll(&mut self, cx: &mut task::Context) -> Poll<F::Item, F::Error> {
        SCOPED_TLS_MAP.set(&self.cx, || fut.poll(cx))
    }
}

pub struct GetContext<F>(f: F);

pub fn get_context<F>(f: F) -> GetContext<F> {
    GetContext(f)
}

impl<C, T, E, F: FnMut(&mut task::Context, &C) -> Poll<T, E>> Future for GetContext<F> {
    type Item = T;
    type Error = E;

    fn poll(&mut self, cx: &mut task::Context) -> Poll<T, E> {
        SCOPED_TLS_MAP.with(|c: &C| (self.0)(cx, c))
    }
}

Regarding and_then, the point here is that, at some outer layer in the app you should have a clear-cut point of "conceptual task" (regardless of the execution model). You use in_context to wrap that entire future, and are guaranteed that everything reachable within it will see this data when using get_context.

Regarding spawn, this is an orthogonal concern which is equally a problem with threads and the futures 0.1 task system (which gives you fresh task-local data when spawning). I believe the answer is to provide spawn-hooks which you can use to add wrappers as above.

@aturon
Copy link
Member

aturon commented Apr 4, 2018

Let me further clarify: you actually don't need GetContext separately. In general, anyone can just call SCOPED_TLS_MAP.with to gain access to the current context, whether within a future or not.

The key thing is InContext. For a framework build around the Service trait, for example, you'd use InContext to wrap the entire future provided by the service, ensuring that contextual information is available globally.

@aturon
Copy link
Member

aturon commented Apr 4, 2018

Here's a sketch of a spawn hook API:

impl task::Context {
    fn with_spawn_hook<'b, F>(&mut self, f: &'b mut F) -> task::Context<'b>
      where F: FnMut(Box<Future<Item = ()>>) -> Box<Future<Item = ()>>;
}

@mitsuhiko
Copy link
Author

Given the conversation we just had on IRC I think to the best way to summarize my concern s not that I'm not happy with layering this on TLS but that's important for context handling is that there is an agreed upon idea of what context is and how propagates.

I think it's obvious that if each task would make its own context it's not very useful. As such context needs to be carried (somtimes, but not all the time) to other spawned tasks or threads. This requires a standardized API or a system that permits other crates to provide such a standardized API.

A web framework or task queue would need to establish a context for each request handled / task processed that other libraries can then use. Futures spawned within that context would by default (through a spawn hook) inherit that context unless explicitly opted out somehow.

@aturon
Copy link
Member

aturon commented Apr 4, 2018

@mitsuhiko FWIW, I'm not aware of any approach to this problem that does not ultimately layer on thread-local storage (or some equivalent new language feature).

@tikue
Copy link
Contributor

tikue commented Apr 19, 2018

@skade was the discussion about request semi-globals recorded anywhere? I'm currently thinking about similar things: I want to be able to propagate a request deadline to all futures spawned for a particular request. Not all RPC libraries want to have to trust the user to propagate deadlines correctly.

I was experimenting with an executor that decorates the spawned futures so that the first thing to happen in the future is to set a task local deadline. The problem right now is that the user still has to opt-in to the executor, which is no good. I was trying to use tokio_executor::with_default, but I discovered that with_default doesn't propagate to background threads.

@tikue
Copy link
Contributor

tikue commented May 14, 2018

Is this being considered for futures 0.3?

@lqf96
Copy link

lqf96 commented May 27, 2018

This just looks so similar to the JS zone proposal. Maybe we can refer to Zone.js to figure out what functionalities should a context provide and how futures designs its own Context-related APIs.

BTW, context-local storage are not going to be equivalent to thread-local storage, especially when you are using a multi-core runtime (like the one provided by Tokio), where tasks may be moved across thread boundaries. So my opinion is that at least there should be a specific group of APIs for Context-local storage.

@mitsuhiko
Copy link
Author

I keep running into this over and over. One note I wanted to leave is that .NET has excellent support for this through the ExecutionContext system. .NET calls the data linked to an execution context "ambient data" (as opposed to thread/task local data). There is a blog post explaining this better than i could: https://blogs.msdn.microsoft.com/pfxteam/2012/06/15/executioncontext-vs-synchronizationcontext/

The API could look very similar in Rust:

ExecutionContext ec = ExecutionContext::capture();
thread::spawn(|| {
    ec::run(|| {
        // the code here continues the flow from outside.
    });
});

In .NET spawning a thread automatically continues the flow so this would work similar in Rust ideally. To prevent the flowing one could call ExecutionContext.suppress_flow(); same as in .NET.

@mitsuhiko
Copy link
Author

mitsuhiko commented Jun 8, 2018

I toyed around with this today and I'm really pleased with the general API that this would provide. You can find my experiments here: https://github.com/mitsuhiko/rust-execution-context

For the user this looks roughly like this:

#[macro_use]
extern crate execution_context;
use execution_context::ExecutionContext;

flow_local!(static TEST: u32 = 42);

fn main() {
    assert_eq!(*TEST.get(), 42);

    let ec = ExecutionContext::capture();
    TEST.set(23);
    assert_eq!(*TEST.get(), 23);
    ec.run(|| {
        assert_eq!(*TEST.get(), 42);
    });
}

Now obviously there are many questions left with this, but it generally follows the .NET semantics quite close. There are definitely some differences and performance has not yet been evaluated. Would love to get feedback on this anyways.

I honestly believe that this is the right way to to generally work with the flow of execution, particularly in Rust. However given the performance considerations in the language I am pretty sure that the carrying forward of contexts would need to be something that users can opt-in and out of.

@tikue
Copy link
Contributor

tikue commented Jun 8, 2018

With this model, how do I, as framework author, ensure users will have their execution context captured when they spawn child tasks with tokio_executor::spawn?

@mitsuhiko
Copy link
Author

@tikue in an ideal world rust would automatically capture and carry forward an execution context on default thread spawn (that's what .NET does) and tokio would do the same for spawning futures (obviously that would additionally have to do some work to work with user space task switching).

This is effectively what .NET does.

@mitsuhiko
Copy link
Author

I should clarify that the actual beauty of this system is that there is a standardized API with this where the flow propagation would be suppressed. If threads were to automatically capture, a user would out out like this:

{
    let _guard = ExecutionContext::suppress_flow();
    thread::spawn(|| ...);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants