-
Notifications
You must be signed in to change notification settings - Fork 13
Step 1: Figure out use cases? #1
Comments
I can also go first: The one time I really wanted workers in Node was when building the coverage tooling for Node core itself; Parsing and serializing the JSON coverage information is a noticeable bottleneck, so being able to distribute that across multiple threads would be nice to have. |
The first thing that comes to my mind: off-I/O-thread template rendering. |
Code portability. |
Seconding @Fishrock123's suggestion of off-I/O-thread template rendering — especially for server side rendering of React. Along the lines of JSON parsing, I suspect it would allow for much more efficient architectures for things like Babel & module bundlers, since this would allow authors to hand off the main-thread-blocking JS parse to a pool of threads (& possibly do so with a cheap object transfer?) |
Parsing/Generating For example http://papaparse.com/ does this efficiently on client-side using web workers (I guess) |
is that a real thing or something hypothetical? |
@vkurchatkin We do have an actual serialization/deserialization API now in V8, that should be a pretty good start. It’s basically what Chrome uses for their WebWorkers, so that should qualify for most people’s understanding of “cheap”. (I’d like to keep this thread on-topic for use cases; don’t be shy to open new issues here for anything. People who go watch the repo are opting into the noise. ;)) |
@addaleax fair enough. My problem with usecases is that unless a usecase is specified well enough, you could argue that anything can be solved with current multi-process model, including all examples above. For me in process workers are about efficiency and simplicity, not usecases. |
IMHO a strong use-case as @matthewp mentioned is code portability (a.k.a isomorphism). Sorry bring up |
Another use-case that comes to mind is low-priority tasks (even I/O bound). Things like Cache-Hydration, off-line batch-processing, etc. |
For parallelization of tokenization and tree construction in parse5. This can be handy for all parsers, I guess. |
Numeric computation. The ability to more cheaply distribute numeric computational tasks to multiple workers, as commonly found in machine learning and analysis of larger datasets (akin to MATLAB's With the current model, need to perform wholesale copying of data between multiple processes. With shared array buffers, workers can operate on the same buffer, allowing better memory efficiency and performance. In short, better environment support for parallel map-reduce style operations would be highly beneficial as Node.js applications become more computationally intensive. |
Compilers like TypeScript and Angular would parallelize parts of the pipeline |
[question] for the parsing and heavy computation use-case. What do see as the specific benefits of Worker? multi core utilization and/or ability to keep a main thread responsive? |
@refack Keeping the main thread responsive. ex, for a server not blocking incoming requests. |
@refack Different stages of parsing (e.g. input preprocessing, tokenization, syntactic analysis) can be performed in parallel, thus reducing cumulative parsing time. |
Just another example of worker use case popped from the top of my mind: parallelization of the gulp build tasks. Currently computationally heavy tasks (e.g. linting, compilation) can't be performed in parallel due to their blocking nature. Service workers should significantly reduce build times. |
A reference for those who don't "watch" this repo - spinoff discussion on High level architecture |
In jsdom, we would like this for two reasons:
We could also benefit from it for keeping the main thread responsive and parallelizing multiple files by doing background HTML and CSS parsing, as @inikulin alludes to. |
@domenic couldn't parallelization of file parsing be accomplished via |
No, because the serialization overhead of sending it over IPC outweighs the benefit. |
@domenic the problem is that serialization is required anyway |
Not when using SharedArrayBuffer (or transferring normal ArrayBuffers). And also not when using strings (which are immutable and thus don't need to be serialized). |
Ah yes, right, I had almost forgot.
|
At Housing.com our Node processes listens to exchanges of RabbitMq to flush and update cached keys that we keep in memory, for e.g. List of whitelist domains, List of cities, List of experiments etc. It would be great if that task gets off-loaded from our main app and somehow through Shared buffers or other means we can do the updating of that cache. This is one of the usecase where you want the background tasks to be really working in background and not in your process's main thread. |
At Airbnb, we use https://npmjs.com/hypernova as a long-running node React (we only use React, but it can render anything) rendering service for our Rails app to use as a service. Web Workers seem like they would make for a much more efficient sandbox than |
@domenic Immutable but not fixed. Strings are moved around by the garbage collector. They need to be copied out before they can be used in another VM. Your point about ArrayBuffers and SharedArrayBuffers is correct though. |
As far as I can tell, there are things you can’t do with Realms that you could do with Workers, like limiting memory usage (which you’ll usually want when running untrusted code is your goal). |
Effectively what I want is both control over a Worker and over a Realm, but having a Worker gives me a Realm for free (and I assume that once the Realms API lands, I'll be able to use it in conjunction with creating a Worker, whether on the web or in node). |
There are a bunch of use-cases that people identified here and if we zoom out, it all comes down to one high-level feature: Map-reduce with efficient data structure sharing. Almost all problems can be reduced to that: bundlers need worker processes that offload parsing/compilation to other threads/processes. Test runners would like to parallelize test runs. Client side frameworks would like the ability to do server-rendering efficiently and in parallel. Almost all of these are CPU bound and currently slowed down by slow IPC. I'm happy to dogfood any implementation proposals in projects that we work on at Facebook. |
May bring us some new programming models, like Actor model of Erlang and Akka. Some libraries are exploring it, even in single-threaded environments. |
Today, we process billions of requests. We use Cluster with n-1 number of workers, so at least one CPU is available to process incoming requests on the OS level. That being said, as a Node.js user, we have the interest of keeping the event loops unblocked that process requests. In a normal software model, we have hotpath (or critical path to get to response as fast as possible) and what can be considered background work. The most ideal setup is to have worker Event Loops process hot path, and pass work to Background Worker whose soul purpose is to process non-critical, yet important functions, such as logging, processing database reads and writes that can be done in the background, etc. This work can be done after hotpath is complete. Yet the event loop processing requests doesnt have to busy itself with that work, since its an Actor worried about a different objective. Forgive the ascii graphic...the model I propose looks something like this:
The outcome is Cluster Master does is its job round robining requests without being blocked and healing / spawning Cluster Workers that process Internet Requests. Every Cluster Worker pushes its non-critical background logic to a Background Workers. This frees Cluster Workers to be solely focused on throughput. I recommend a I think this kind of capability and specific use case makes Node.js software designs capable of being generally more efficient |
I guess I have a use case, and if not, I hope someone can point me in another direction ;) I have some static data that is loaded and indexed (a specialized tree structure). The service is spending almost all of it's CPU time traversing this (static) tree. The best scenario I can imagine, being a guy with zero threads experience, is if a variable can simply be shared across all instances spawned by the cluster module (maybe in a frozen condition) - but I guess that's not feasible. |
I think @cpojer sums this up well: most of the use cases are basically map-reduce. However, in my case (fero), I essentially have a microservice consuming one log of events, processing it and producing another to fan out to clients. I wanted to split this so the business logic is run in one thread, and another thread picks up the output and writes to the wire. This seems like the perfect use case for shared memory, since the output is already a Buffer, so offloading the @addaleax - in response to your question, I think some form of |
/cc @mogill |
I had an idea to be using these for sharded network communication with external services, mainly over websocket. I'm not sure what the benefits of offloading json/etf parsing to a worker would be, mainly since i'm not sure of the actual performance of a good structured clone algorithm, but I think that use cases like this might be taken into account. |
A few years ago I developed a native addon for Node that provides shared memory parallelism and addresses some of the issued being discussed here. Extended Memory Semantics (EMS) is based on the Tera MTA/Cray XMT shared memory parallel programming and execution model so it supports both loop level parallelism and heterogeneous fork-join multitasking, but the atomic operations are different from those used with a The shared-memory parallelism use cases discussed here all share the need for some combination of:
EMS addresses the latter two better than the first, with some syntactic sugar heterogeneous parallelism could be made more idiomatic. The EMS distribution includes examples ranging from MPI-like bulk synchronous parallelism to OpenMP-like parallel loops to a single HTTP request forking parallel workers that generate a single response. Another project implemented a conventional single async front-end process that shared memory with a parallel back-end process. EMS does not prescribe a source of parallelism. True zero-copy data sharing is not possible in Node because by definition each JS process is running in an isolated VM and the only way for data to move in/out of the VM is via copying. The ability to reference data in another VM (read: By extension, this is also true for any kind of VM (VirtualBox, Python, JVM, containers) -- if a process is not performing a copy-in/out of data from the VM it is probably breaking an assumption somewhere in the VM. Although this architectural limitation is unfortunate, it also means inter-language data sharing has no additional cost, and EMS presently implements data sharing between any number of Node, Python, and C/C++ programs. EMS relies on the OS' virtual memory mechanisms to provide data persistence, which is not traditionally considered an aspect of shared memory but will become more so as non-volatile and software defined main memory becomes mainstream. Conversion/serialization, or at least copying, of data is almost inevitable, but there is a substantial mechanical advantage to communicating data via shared memory instead of via an OS network communication protocol. Specifically, synchronous shared memory access is faster than the overhead needed to make it asynchronous, and is many orders of magnitude faster than exchanging network communication with a server process. |
@p3x-robot check this out: https://gist.github.com/hellerbarde/2843375 Let's assume that the latency for shared memory as discussed here is 100 ns, whilst for Redis over a 1gbps network would be 20,000ns. Given that the network stack is involved, I unscientifically imagine that Redis on the same box will be somewhere between 3000ns and 20,000ns. That's an enormous difference, and for the sorts of number-crunching tasks that you'd especially want to parallelise, this could be the difference between being able to give an answer within a few seconds versus having to wait minutes or hours. |
It seems JSC is going to experiment with shared state threads at the language level: https://webkit.org/blog/7846/concurrent-javascript-it-can-work/ The memory model is outlined in the blog post but it's essentially 'simple property access is atomic.' Perhaps not relevant to Node.js short-term but still interesting. |
From an application programmer's perspective I believe the most important aspect of the Webkit proposal is that by default all variables are shared unless explicitly made thread-private. This may be desirable for new codes, but given JS's history as a devoutly single-threaded execution model we might expect few existing frameworks and libraries to work in a shared-by-default execution model. It would be better to separate sharing data from the source of parallelism: A shared-nothing webworker/cluster model benefits from a way to share specific variables in the same way a shared-everything model benefits from a way to make specific variables private. The difference is whether or not legacy JS code can be used in a parallel region. It would be a lost opportunity to require multi-threaded JS applications to deal with the same challenges as languages that do not provide standard libraries that are thread-safe (i.e.: C++ and STL/Boost). |
Though JSC is going to experiment threads in JS, it's far from being language standards even stage 0 (I don't think other js engine teams agree this direction.) And it seems impossible to switch vm from V8 to JSC in Node.js. So it is totally irrelevant. |
@hax Not quite. One of Node's longer-term goals is VM agnosticism. node-chakra exists and works; a node-jsc is not out of the question if there is enough interest. |
While compiling a list of tools for parallel and/or shared-memory Javascript Stepping back to look at how explicit parallelism is used in languages which have supported it for decades (i.e.: C/C++, Java) the only commonality is very few programs actually need multiple cores and the ones that do always find a way regardless of the tooling. Most of those programs use a small degree of heterogenous parallelism to achieve what Node.js programmers get from asynchronous I/O and the event loop. Vanishingly few programs parallelize loop-level computation across dozens of cores. I wouldn't consider a lack of responses to this issue's root question of use cases as a lack of interest in parallel JS as much as an artifact of interacting with a diverse and fragmented user base. |
Without WebWorker APIs, SharedArrayBuffer is useless. |
Disco(...ssion)@jokeyrhyme just one question, how do you use threads in a server farm? not redis is the fastest??? |
I think the creator was not thinking on threads, but a little bigger, don't you think? |
Once you will be happy about threads are implemented, you will create a bigger system and you will use server node clusters including NodeJs and Redis (or something faster, but right now that is the best as I know) and you will forget about threads. I think for an addon C++ is good in NodeJs, but pure JS it will be slow either with web-workers or without. Functional: JS Look at IntelliJ, I love it, I use it only. But still, it is slow as hell, one process, and frozen, yes, tons of threads, and a big indexing freezes the program. If they just used another process for indexing and if communicated and then re-loaded the index, it would be speedy like Chrome and NodeJs. |
So I've followed these worker implementation issues for quite a while now, and am happy to see an initial implementation on the horizon. I'm the author of the https://xible.io project (https://github.com/spectrumbroad/xible). It's basically a flow-based or graphical programming environment. So far, that's all fine. I actually track startup performance for xible here: https://xible.io/benchmarks. Another possible solution for my 'issue' is nodejs/node#17058. But having threads where required modules are shared would be absolutely amazing in terms of performance. To circumvent this init time somewhat, I created a setting in xible which allows you to configure that a flow always has an empty initialized fork available. This means that when the time comes to start it, a lot of the pre-work has already been done as the fork is already up-and-running. This does however use resources (memory predominately) while not actually being of direct use. |
@steve2507 It might be good to hear more about this. For now, Worker startup performance might not be significantly better than what child processes deliver, because a lot of work is spent initializing V8 and Node, similar to child processes. So, for now, the recommendation would be to use Worker thread pools – how does that fit into your situation? |
@addaleax Thank you for your input! :-) Also, where I would see a pooled worker thread as agnostic to what flow it runs; that would miss out on some of the possible optimizations. A flow consists of nodes, which can also Running with this setting of forking an extra process can be memory intensive by the way. For the deployment pipeline use case; this setting would be off because no one cares about that half a second startup time. Even if it would be on; memory is not an issue in this area because such machines usually do not restrict the amount of memory to a point where this becomes problematic. To give you an example of what I mean by a simple scene; here's one that just turns the lights on, changes the color temperature and changes the color on the lights; So yeah, shared memory threads would be absolutely next-level for me. Negating the need for xible and node(pack)s to init/require more than once would cut down startup performance and memory usage by a massive margin. Thanks again! Really do appreciate it! |
I’m closing the existing issues here. If you have feedback about the existing Workers implementation in Node.js 10+, please use #6 for that! |
I think one of the first things we’ll want to do is to figure out what the actual use cases for (Web)Workers in Node are, so that we have a better idea of the requirements the API and implementation have to fulfill.
I guess Workers could be applied to anything that needs a fast way to share information across some kind of parallel applications, but a lot of that can already be addressed using multiple processes and standard IPC methods, so what’s going to be most interesting is to hear what one can not do fast right now without them.
The text was updated successfully, but these errors were encountered: