Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let v8 use libuv thread pool #11855

Closed
jiakai1000 opened this issue Mar 15, 2017 · 36 comments
Closed

let v8 use libuv thread pool #11855

jiakai1000 opened this issue Mar 15, 2017 · 36 comments
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding. question Issues that look for answers. v8 engine Issues and PRs related to the V8 dependency.

Comments

@jiakai1000
Copy link

Is it possible to let v8 use libuv thread pool ? this could reduce extra thread numbers.

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2017

If that were possible, doing so would further reduce the performance of fs, dns.lookup() (default method used internally by node), etc.

@mscdex mscdex added libuv Issues and PRs related to the libuv dependency or the uv binding. question Issues that look for answers. v8 engine Issues and PRs related to the V8 dependency. labels Mar 15, 2017
@jiakai1000
Copy link
Author

@mscdex We can increase libuv thread number.
Also, doing so we can make thread pool load more balance.

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2017

Sure, you can increase it, but I would guess most people use the default (currently 4).

@jiakai1000
Copy link
Author

Well, and we also can increase default libuv thread number...

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2017

I'm not 100% sure but I think 4 may have been chosen as that is/was the typical number of cores/cpus on most machines?

@jasnell
Copy link
Member

jasnell commented Mar 15, 2017

While increasing the default will hopefully be something we can do at some point, 4 is still the most common average if I'm remembering correctly.

@hashseed
Copy link
Member

In any case, having V8 use threads outside of the thread pool seems wrong. If 4 is the right default, you would in fact use more than 4 if V8 spins up its own threads. If you expect 4 + additional V8 threads to be a good number, then 4 is chosen too conservatively.

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2017

@hashseed I'm not sure what the performance difference is (waiting for a libuv thread vs. OS scheduler), but if V8 were to use the libuv thread pool, some node requests would/could get blocked (even more so than they may be currently), whereas they may not have before.

@hashseed
Copy link
Member

FWIW V8 hands over thread management to Chrome when embedded in Chrome.

@jiakai1000
Copy link
Author

@hashseed Could you please explain the reason what Chrome did ?

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2017

Node is quite a different thing than Chrome though ;-)

@hashseed
Copy link
Member

hashseed commented Mar 15, 2017

@jeisinger probably knows more.

Iiuc V8 simply prefers to let the embedder take control. In case of Chrome, page start-up can be very busy wrt threading, and Chrome's scheduler likely has a better grasp than the OS scheduler.

@jeisinger
Copy link
Contributor

In the past, each isolate of v8 would create it's own worker pool. To fix that, I moved the worker pool to a single v8:: Platform instance.

Furthermore, chrome uses a central scheduler to eg throttle tasks in background tabs or give rendering related tasks a higher priority when the frame deadline approaches. To achieve this, it uses a custom v8:: Platform implementation.

@bnoordhuis
Copy link
Member

FWIW, the reason V8 has its own threadpool is that it was the least amount of work. The plan was always to come up with something better.

What constitutes 'better' is something of an open question, though. Blindly shoveling it into the libuv threadpool will probably cause a performance hit.

@jorangreef
Copy link
Contributor

I'm not 100% sure but I think 4 may have been chosen as that is/was the typical number of cores/cpus on most machines?

I think the conservative wisdom is to set threads = number of cores. This holds for when those threads are being used for CPU intensive tasks. The reason being that setting 4 threads for 4 cores reduces the cost of context switching, and then you would typically pin them to a core etc.

But the threads = number of cores idea only holds for when those threads are being used for CPU intensive tasks. In reality, as far as I understand, Node really uses the threadpool to simulate non-blocking disk and network operations where the underlying system does not provide a decent non-blocking api. So Node's threads are not "run hot".

Therefore, for Node, it would actually make much more sense to increase the default number of threads available for disk and network operations. This would reduce head-of-line blocking caused by slow DNS requests or degraded disks, by increasing concurrency. The time-scale of context switches in this async use-case is dwarfed by the time-scale of disk or network IO. Also, the memory overhead of libuv threads is low: 128 threads should cost 1 MB in total according to the libuv docs (http://docs.libuv.org/en/v1.x/threadpool.html)?

@bnoordhuis
Copy link
Member

@jorangreef What you say is true but there is a caveat: system calls can consume significant kernel-mode CPU time. I've experimented with the approach you suggest but performance for some operation falls off a cliff when you have many more threads than cores.

Example: stat(2) on deep (or deeply symlinked) file paths. The inode metadata is probably in the dirent cache but traversing it can still be costly. If you have many threads hammering the cache, performance degrades in worse than linear fashion.

Most programs would not exhibit such pathological behavior but a specialized tool like a backup program might (and people do write such things in node.js.)

@jmshinn
Copy link

jmshinn commented Jul 18, 2017

@bnoordhuis Would it not then make sense to allow writers of such problematic tools to reduce the thread count to equal the core count, rather than force most people, working on simpler things, to understand they can increase the core count to some unstudied number?

@gibfahn
Copy link
Member

gibfahn commented Jul 18, 2017

@bnoordhuis would it be possible to dynamically scale the thread count without incurring excessive overhead? I'm guessing you could be quite conservative and only change the number when you start to run into heavy performance problems.

I remember @sam-github saying that the node event loop time was a pretty good indicator of how your app was performing, what would the equivalent be for this?

I guess my question is, is it just a question of doing the work, or is it likely to result in a net performance decrease in most situations.

@bnoordhuis
Copy link
Member

Libuv could use the work queue size or the average flight time of a work request as an input signal but that doesn't distinguish between threads waiting for I/O and threads doing computation.

For I/O-bound work loads num_threads > num_cpus is usually acceptable, for CPU-bound loads it's not. Computing bitcoin hashes in 1,000-fold parallel on a machine that only has 8 cores is not efficient.

Libuv knows what category its own work items belong to (fs, dns - all I/O) but it doesn't know that for external work items that come in through uv_queue_work(), which is what crypto.randomBytes() and the zlib functions use, for example.

What's more, libuv only knows about the current process but a common use case with node is to spawn multiple processes. You don't want to get into a perfect storm situation where multiple processes think "hey, throughput times are going up, let's create some more threads."

It's not intractable but fixing it in either node1 or libuv will involve a large amount of engineering and for an uncertain payoff.

Perhaps determining the ideal thread pool size is more of an ops things. We should expose hooks but leave tuning it to the programmer or the system administrator.

1 Node could side-step libuv, use its own thread pool and orchestrate with other node processes over IPC.

@jeisinger
Copy link
Contributor

note that #14001 switches v8 to libuv's threadpool

/cc @matthewloring

@TimothyGu
Copy link
Member

@bnoordhuis Is it currently possible to provide libuv with a hint to limit CPU-heavy threads, while allowing more I/O-heavy threads? V8 has a ExpectedRuntime enum we could maybe hook onto.

@bnoordhuis
Copy link
Member

@TimothyGu Not at the moment.

@jeisinger
Copy link
Contributor

That enum is not used by v8. We only added it because chrome used to use Windows worker pool reflecting WT_EXECUTELONGFUNCTION

@jorangreef
Copy link
Contributor

The problem is that a single threadpool is used for IO and CPU threads.

If there were two threadpools, one could be used for IO (and have many threads) and the other could be used for CPU (and have threads === cores).

This would make tuning possible. Currently, there's no way to tune the threadpool for both use cases.

If libuv knows that all its threads are IO only, then the above can be rolled out as follows:

  1. All AsyncWorker, Node and other userland threads continue to use the current threadpool. This becomes known as the CPU threadpool and keeps the existing 4 thread default.

  2. Add a new "IO" threadpool with a saner, conservative yet tuneable default (perhaps 16).

  3. Libuv moves only its own known IO ops into this new threadpool and exposes a way for userland to do the same if it wants to.

@bnoordhuis
Copy link
Member

@jorangreef We've been discussing such schemes in libuv since the thread pool was first added. You can find at least two attempts in my fork, maybe more.

Both attempts stranded on not performing significantly better most of the time and significantly worse some of the time. :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding. question Issues that look for answers. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests