-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: WebAssembly support #479
Comments
It sounds interesting! I'll need a bit to get my head around what it entails, but it seems quite reasonable to 1. limit the use of assumptions about time, and 2. be as serious as possible about If it's alright, I'll take a few to read up on things. Mostly, I'd love to avoid having code in TD/DD that I don't understand, because it's largely on me if something is broken elsewhere. But I have to imagine if nothing else a feature flag for time would be really easy, as should be fixing DD's 64 bit assumption (which is probably a bug anyhow). |
I think all changes look fine from a high level. The time changes don't seem to change any behavior on non-wasm targets. The RHH code isn't used anywhere at this point. Before committing to supporting wasm, I'd like to know if there's a regression test we could have. It seems testing isn't as simple as it is for other targets, but if we can get it into CI then I don't think there's anything blocking this. As @frankmcsherry points out, there are some parts of the code that might assume |
I've started to take a look (sorry for the delay) and have some quick thoughts:
I'm going to look into a DD PR that essentially tracks your changes (fixing RHH) but subtracts out the time-based code rather than modifies it. I'll report back here about that. |
The RHH fix seems more subtle than at first glance. The code also has elsewhere this /// Indicates both the desired location and the hash signature of the key.
fn desired_location<K: Hashable>(&self, key: &K) -> usize {
let hash: usize = key.hashed().into().try_into().unwrap();
hash / self.divisor
} where the The code is not currently active, though .. it could become so in the future when I get some time. Ideally it wouldn't silently break the WASM builds, though this is an example of where it could/would. My understanding of WASM is that 64-bit types are fine, just that |
Apologies for my lack of reply sooner - I've been away. Thanks for making progress on these changes 😃.
Ah yes, I haven't used that before.
I'd be very happy with that - I have had a lot of fun composing DD operators without having to reach for what dogsdogsdogs provides.
Yep that's right, similar to using Let me know if there's something I can help with that doesn't involve changing production "load-bearing" code. E.g. setting up some tests as suggested - details for WASM: https://rustwasm.github.io/wasm-bindgen/wasm-bindgen-test/index.html. Note this does run the compiled WASM using Node JS. Or I can provide a basic WASM example for motivation - play around with an interactive dataflow by visiting a web page and clicking on some buttons. |
I'm looking at this again, and have some thoughts and questions:
@antiguru : I'm looking at the code and Alternately, making the scheduler pluggable, which has been a long-standing ask, I think! |
@oli-w you said
and that would be great! I'm not very familiar with the ecosystem, and it would be great to have something to aim at / with. In particular, I'd love to remove / factor out "time" generally, rather than replace it with something that works in a few more cases. And while I can test much of that myself, understanding how to try it out for WASM could make that more productive. |
I put up TimelyDataflow/timely-dataflow#577 which may allow you to use TD without it invoking |
So far I have been running DD inside WASM using a Web Worker - which is similar to running on a separate thread. There are quite a few hoops to jump through to get threads working in WASM and several caveats (details), which I haven't tried yet. For the moment I'm happy to run everything in one dedicated Web Worker. I had sidestepped the code that parks the thread by manually calling what let alloc = Generic::Thread(timely::communication::allocator::thread::Thread::new());
let config = timely::WorkerConfig::default();
let worker = Worker::new(config, alloc); Then I'm able to control how often I push progress through and when by calling
I like this idea of being able to provide your own scheduler. The default implementation could be parking a thread, whereas for other cases I imagine steps:
Awesome, I'll work on setting something up. I'm having a look at TimelyDataflow/timely-dataflow#577 and will report back, thanks! |
We're only using it in one place in Materialize, which is in |
Thinking out loud: we have the ability to activate operators across threads, and arguably a polling approach to waking up operators is a bit of a smell. Not terrible, but 100% one should strive to active an operator as soon as there is work to do, and anywhere we are not doing that we could consider how we might do it (e.g. hand an activator to the thing that is enqueueing data to replay). I'm sure there are moments where one can't do that (some queue that you don't control the implementation of), so maybe removing it isn't the best. But, good to chew on and understand. TD is pretty close to being free of "real time" concepts. |
Ah, and |
Yeah, spamming activations needs to be avoided. Materialize uses a wrapper that needs to be ack'ed before it can be scheduled again: https://github.com/MaterializeInc/materialize/blob/e194a6f3fdd59c8efd5bd7521da7fba9cb79ac3d/src/timely-util/src/activator.rs#L25 |
I pushed another commit onto TimelyDataflow/timely-dataflow#577, this time moving the action of parking into the scheduler (and away from the communication layer). This allows the scheduler to "conceal" its internal take on whether and for how long to park, rather than needing to communicate it through the worker to the communication layer. I'm not 100% certain why parking was in the communication layer, tbh. It's possible that I've forgotten something important, but the logic for all channel allocators was just to check if their This is a bit off topic from the PR's subject (make |
Exciting progress! TimelyDataflow/timely-dataflow#577 (comment) I still need to create a test though to aim towards for WASM with deferred events, that could potentially be solved with a "pluggable scheduler". Can you remind me how I can schedule data in TD / DD to be delay by some amount of wall-clock time (i.e. a |
Right, so: two things you could do:
A third option, in between these two, is to have a deferred event that introduces the data through an input at a time of your choosing. It is still presented to TD/DD as an explicitly timestamped event, but you get to control when this happens. So, do the "delaying" at the boundary of your app and TD/DD, rather than "inside TD/DD" using either of the first two mechanisms. If you can say a bit more about your goal with the delaying I could opine a bit more. Most uses that I know of prefer the second option (in Materialize these are called "temporal filters", and we don't really use the first form because of the non-determinism it would introduce). |
I have a light background goal of putting together a TD/DD demo that uses JSON objects as the "data", and yields a playground that allows you to stitch dataflows together, interactively in a web app. Still TBD what the "Rust closure" replacement will be (some lightweight IR?) but .. if I make progress on that it might result in a clearer forum to work through idioms. |
I basically want to do the same thing as SELECT ... FROM event
WHERE mz_now() > event.time I want to What I also haven't quite understood is how to handle "quiet" periods, where the system isn't otherwise doing anything (so nothing is calling To make sure this change is "pushed through" TD/DD, I need to either:
|
I see! I think I understand. Let me unpack how So what using
You can implement this with a What MZ does is continually "tick" the inputs, second-by-second, or as frequently as you would like to be certain that nothing much has changed. In that context, things could change at any moment, because the updates are coming in from external sources. This is the first thing that you've indicated, and .. it's not too wasteful; TD/DD don't need to do lots of work to stay up to date, but .. they will actually do a reasonable amount of work when temporal filters are in play. But you make a good point that one doesn't fundamentally need to tick the system to see if anything has changed. One thing that seems possible is that when you have a temporal filter construct, you could just downstream put
This is a bit ad-hoc, in that I'm making it up and the logic might be wrong. It certainly is embedding an understanding of what the temporal filter is going to do into DD logic, rather than having DD directly tell you the truth about what work is outstanding. It seems like a good ask to want to reflect that information, though. Perhaps another similar way to accomplish the same is rather than using a temporal filter as above, having it present as another output of the computation named |
Thanks, that makes sense to me at a high level. I'll have to do a bit of experimenting to make sure I can put it into practice. If MZ achieves this by regularly ticking time forward, I'm happy to do the same and that will work fine in WASM - I just need to hook it up to be called using setInterval.
I think this is closest to what I originally had in mind. The trick with a pair of updates cancelling each other out at some time is pretty neat though, gut feeling is this would be the most reliable and I'm keen to try that approach.
I worked on something today that hopefully can help - check out https://oli-w.github.io/dd-wasm-playground, where you can:
This is using the Code is here: https://github.com/oli-w/dd-wasm-playground feel free to play around and I'm happy to help with any questions. |
Oh fascinating! Very neat, and I'll check this out. I have been poking at a minimal IR for DD, and potentially blending that it could be neat. I'll try it out and see if I can propose anything! :D |
Hi,
I was interested in seeing if I could run differential-dataflow on the web by compiling to WebAssembly using wasm-pack and it worked out really well! However it required some
hackschanges to differential-dataflow and timely-dataflow, specifically:std::time::Instant
withweb_time::Instant
- crate: web-time This is because there is no "current time" implemented for thewasm32-unknown-unknown
compilation target - so we have to hook it up to the browser's Performance.now() function. In all other contexts this uses the regularstd::time::Instant
under the hood so shouldn't make any difference.(1 << 63)
which needs to be(1 << 31)
in 32-bit WebAssembly.The changes required are:
I was wondering if you would be interested in incorporating this or would prefer to keep this kind of change as a fork?
The text was updated successfully, but these errors were encountered: