-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit Context Access #937
Comments
I also want to point out that other communities went through the same experience:
|
@mitsuhiko thanks for the feedback! One thing that I think would be really helpful is to take one of the examples where you feel a pattern has become completely impossible, and explain it in full detail (including perhaps some code snippets). |
The two examples I mentioned above look something like this: Local Context Logging / Event RecordingFor the Sentry SDK in JavaScript we generally do things like this (pseudocode for brevity): import {addBreadcrumb} from '@sentry/shim';
let realLog = console.log;
console.log = function() {
addBreadcrumb('log', argsToMessage(arguments));
return realLog.apply(this, arguments);
}; Here The idea is that no matter how deep in the call stack we are, we want to be able to record relevant information to our buffer so that when an error happens we can send all relevant information along. Similar patterns are applied when people do something like open tracing where you need to record begin and end of operations. This is typically also bound to some form of context. For instance this can be used to record the time of a SQL query start to end. This might also not have access to the context necessarily. For instance in Python many systems that record such timings are implemented by hooking into already existing APIs that often do not have some sort of context. For instance all newrelic libraries I know about are hooking functions and recording into some sort of local implicit context the timing information captured to then flush it out later. Tenant Separation / AuditingThe second case that commonly comes up that can use implicit contexts are multi tenant applications. There it's important that people do not accidentally wrote that that can escape the current tenant (where tenant usually means customer). Such applications often bind relevant information in an implied context so that auditing and security assertions can be easily written. For instance if the current remote address, current authenticated user and current tenant are easily available than database code can for instance automatically constraint queries or assert that data from other tenants are not returned. Likewise internal audit functions can use that context data to record an audit trail and attach as much information as possible. This is really useful for building secure SaaS systems where audit trails are often used to backtrack from security incidents where a bug lead to privilege escalations. I covered some of those cases in a talk I gave last year about building SaaS apps in Python: http://mitsuhiko.pocoo.org/practicalsaas.pdf About halfway through the talk are some examples. |
Thanks @mitsuhiko! To further clarify: in the 0.2 design, anything of type Digging in to the event recording case: if you're grabbing the task context today, presumably you're running within some outer future, right? But you're worrying about code deep within that future that currently grabs the context out of TLS? Have you thought about how this will look with async/await? |
@aturon the issue is that it requires someone passing down the context. That is something this code can generally not assume. In particular stuff like Sentry generally cannot assume that people re-architecture all their apps just to get better error reports. We also have traditionally had this issue in the Python community where lots of state was held on TLS because an API precedes some context data later. For instance almost the entire web world binds the current "culture"/"locale" etc. that should be used on a thread local / task local.
For what it's worth I also ran into issues with futures 0.1.20 since What I wanted to do with futures 0.1 was to see if there is an active task, if then use task local storage, otherwise thread local storage. If none of that has an active scope fall back to a process wide one.
With futures 0.2 I most likely cannot support this at all. In Python and server side JavaScript we have nice solutions with PEP 567 and node domains. |
@mitsuhiko OK, so the core issue is that you have code which may be run in a futures context, but is not required to be. When you are run within a future, you want to extract data from its context. I agree that the 0.2 design, as it stands, makes this kind of thing effectively impossible for task-local data. In particular, we currently provide @cramertj, what would you think about something like the following:
In other words, we can keep the top-level traits using explicit context passing, which is the only option in no-std environments. But in std environments, you always have the option of dynamically checking whether you're executing within a task via TLS. |
Correct. More importantly this really only matters for |
@mitsuhiko I talked with @cramertj a bit on IRC, and it seemed like a better way to handle these needs in general is to use scoped TLS, which works independently of execution model (i.e., works equally well with futures-style tasks and OS threads). Essentially, you get to decorate your stack with contextual information on the way down, which can then be read out below. This works great with the task model, because to start executing a new task you have to exit all frames of the current one, which clears the scoped_tls (and, conversely, the next time you AFAICT this would work well for the use-cases you've outlined -- and in some sense would improve it, since you'd no longer have logic about falling back to plain TLS. What do you think? |
Since From my understanding of it it does not seem very helpful at least because of the stack tearing down most of the time. In node for instance the domain carries through promise callbacks automatically (example here: https://gist.github.com/mitsuhiko/355d921bdde4ebded34bd2cf73cb4b18) |
@aturon This is similar to the discussion we had last week about "per-request" semi-globals. I also don't see how scoped-TLS works in all the contexts there, maybe an example would help. FWIW, I agree that this feature is desirable, but maybe it could be provided by the executors? |
To try to make this a bit more concrete, I've sketched out some combinators based on a scoped TLS discipline. Imagine we have a TLS mechanism, We can then provide this in a futures-style API as follows: pub struct InContext<C, F> {
cx: C,
fut: F,
}
pub fn in_context<C, F>(cx: C, fut: F) -> InContext<C, F> {
InContext { cx, fut }
}
impl<C, F: Future> Future for InContext<C, F> {
type Item = F::Item;
type Error = F::Error;
fn poll(&mut self, cx: &mut task::Context) -> Poll<F::Item, F::Error> {
SCOPED_TLS_MAP.set(&self.cx, || fut.poll(cx))
}
}
pub struct GetContext<F>(f: F);
pub fn get_context<F>(f: F) -> GetContext<F> {
GetContext(f)
}
impl<C, T, E, F: FnMut(&mut task::Context, &C) -> Poll<T, E>> Future for GetContext<F> {
type Item = T;
type Error = E;
fn poll(&mut self, cx: &mut task::Context) -> Poll<T, E> {
SCOPED_TLS_MAP.with(|c: &C| (self.0)(cx, c))
}
} Regarding Regarding |
Let me further clarify: you actually don't need The key thing is |
Here's a sketch of a spawn hook API: impl task::Context {
fn with_spawn_hook<'b, F>(&mut self, f: &'b mut F) -> task::Context<'b>
where F: FnMut(Box<Future<Item = ()>>) -> Box<Future<Item = ()>>;
} |
Given the conversation we just had on IRC I think to the best way to summarize my concern s not that I'm not happy with layering this on TLS but that's important for context handling is that there is an agreed upon idea of what context is and how propagates. I think it's obvious that if each task would make its own context it's not very useful. As such context needs to be carried (somtimes, but not all the time) to other spawned tasks or threads. This requires a standardized API or a system that permits other crates to provide such a standardized API. A web framework or task queue would need to establish a context for each request handled / task processed that other libraries can then use. Futures spawned within that context would by default (through a spawn hook) inherit that context unless explicitly opted out somehow. |
@mitsuhiko FWIW, I'm not aware of any approach to this problem that does not ultimately layer on thread-local storage (or some equivalent new language feature). |
@skade was the discussion about request semi-globals recorded anywhere? I'm currently thinking about similar things: I want to be able to propagate a request deadline to all futures spawned for a particular request. Not all RPC libraries want to have to trust the user to propagate deadlines correctly. I was experimenting with an executor that decorates the spawned futures so that the first thing to happen in the future is to set a task local deadline. The problem right now is that the user still has to opt-in to the executor, which is no good. I was trying to use |
Is this being considered for futures 0.3? |
This just looks so similar to the JS zone proposal. Maybe we can refer to Zone.js to figure out what functionalities should a context provide and how BTW, context-local storage are not going to be equivalent to thread-local storage, especially when you are using a multi-core runtime (like the one provided by Tokio), where tasks may be moved across thread boundaries. So my opinion is that at least there should be a specific group of APIs for Context-local storage. |
I keep running into this over and over. One note I wanted to leave is that .NET has excellent support for this through the The API could look very similar in Rust: ExecutionContext ec = ExecutionContext::capture();
thread::spawn(|| {
ec::run(|| {
// the code here continues the flow from outside.
});
}); In .NET spawning a thread automatically continues the flow so this would work similar in Rust ideally. To prevent the flowing one could call |
I toyed around with this today and I'm really pleased with the general API that this would provide. You can find my experiments here: https://github.com/mitsuhiko/rust-execution-context For the user this looks roughly like this: #[macro_use]
extern crate execution_context;
use execution_context::ExecutionContext;
flow_local!(static TEST: u32 = 42);
fn main() {
assert_eq!(*TEST.get(), 42);
let ec = ExecutionContext::capture();
TEST.set(23);
assert_eq!(*TEST.get(), 23);
ec.run(|| {
assert_eq!(*TEST.get(), 42);
});
} Now obviously there are many questions left with this, but it generally follows the .NET semantics quite close. There are definitely some differences and performance has not yet been evaluated. Would love to get feedback on this anyways. I honestly believe that this is the right way to to generally work with the flow of execution, particularly in Rust. However given the performance considerations in the language I am pretty sure that the carrying forward of contexts would need to be something that users can opt-in and out of. |
With this model, how do I, as framework author, ensure users will have their execution context captured when they spawn child tasks with |
@tikue in an ideal world rust would automatically capture and carry forward an execution context on default thread spawn (that's what .NET does) and tokio would do the same for spawning futures (obviously that would additionally have to do some work to work with user space task switching). This is effectively what .NET does. |
I should clarify that the actual beauty of this system is that there is a standardized API with this where the flow propagation would be suppressed. If threads were to automatically capture, a user would out out like this: {
let _guard = ExecutionContext::suppress_flow();
thread::spawn(|| ...);
} |
I'm currently trying to write a new rust library for Sentry and I'm running into some major issues on the way to futures 0.2. Previously it was possible to access task local data from anywhere but now that is gone.
This is particularly problematic because it makes it completely impossible to support our breadcrumbs feature for futures. My goal was to add a log handler that acts as a proxy and appends all log statements of a thread to a local size capped ring buffer and then dispatches to the main log handler optionally. That way if an error event happens we can append all captured log messages with it.
However this now cannot be done any more as the log handler that works with the log crate cannot get access to this context. I understand that there was an RFC that went this way just now but I really want to challenge this design decision before it's too late.
It also makes many features that depend on implicit context passing completely impossible to implement in Rust. I used implicit context handling in the past not just for Sentry related debugging tools but also to enforce security contexts in multi tenant applications where it's crucial that the contextual information is not lost. The lack of implicit context access now means that the only sane way to isolate multiple tenants in Rust is to isolate them onto different threads and use thread locals which seems like a massive step in the wrong direction.
Is there any way we can go back to implicit contexts?
The text was updated successfully, but these errors were encountered: