-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Vendor jobserver impl and rm thread spawning in parallel compile_objects #889
Conversation
This looks great at first glance, but it might not be for a week or two before I can actually review it (it's a big PR). |
That's totally fine, it's huge and change a lot of stuff. I might also polish it a bit in the meantime and added more testing for the parallel |
Yep, feel free. |
Overall I like your approach and I think it solves the issue that we've had with initial channel-based implementation nicely, though I think it would be nice to explain:
I didn't look much into the jobserver reimplementation, but I'm assuming it's a verbatim copy of |
The static token queue is for cases where there's no jobserver configured and we need to create a new jobserver. Instead of porting the code to create a new jobserver, it's much simpler to have a static token queue implemented using atomics.
I think there's some comments in the inherited_jobserver mod.
It's actually based on jobslot, a fork of jobserver. But even so it's not identical apart from try_acquire, I have simplified the code a bit and I will port that back to jobslot later.
Can you please elaborate on which part to split up?
Thank you and thanks for reviewing my code! |
Oh yeah, I've meant that we should probably have comments for the first 2 points; by static I've meant that we have this
I think the async executor should be in a separate module (something like |
That makes sense, I will add that.
That's a good idea! I will extract anything related to it as a separate module. |
cc @nrc since this PR creates a small async executor and it might help your "Async fundamentals initiative: portable and interoperable" in adding an small single-thread async executor to libstd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It seems that this PR have some MSRV issues, will wait for #890 to be merged and rebase. |
Running
The first one is rust-lang/rust#79278 stablised in 1.53, the second one is rust-lang/rust#77853 stablised in 1.51 , the last one is rust-lang/rust#81940 stablised in 1.52 Given that rust 1.53 is released on Jun 18, 2021 and it has been more than two years, @thomcc would it makes sense to bump the MSRV (again)? Also 1.46 has problem with linking on latest MacOS and fetching from crates.io is super slow in CI, so it makes sense to bump MSRV to a newer version. |
I'm fine with it. I'd be okay bumping as new as 1.63 (a bit older than a year, and also compatible with debian stable), but beyond that I don't see a reason for compat with old Rust. (I'll be actually reviewing this this weekend, just was skimming through the comments) |
It supports non-blocking `try_acquire` and is much simpler than the one provided by `jobserver` Signed-off-by: Jiahao XU <[email protected]>
Also fixed compilation errors in mod `job_token` Signed-off-by: Jiahao XU <[email protected]>
Remove use of mpsc since the future is executed on one single thread only. Signed-off-by: Jiahao XU <[email protected]>
The mpsc is stored in a global variable and Rust never calls `Drop::drop` on global variables, so they are never released. This commit removes the mpsc and replaces that with an `AtomicBool` for the implicit token to fix this, also dramatically simplifies the code. Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Return `Ok(None)` instead of `Err()` if no token is ready. Signed-off-by: Jiahao XU <[email protected]>
`O_RDWR` is a valid access mode for both read and write end of the pipe. Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Co-authored-by: Piotr Osiewicz <[email protected]>
Co-authored-by: Piotr Osiewicz <[email protected]>
Co-authored-by: Piotr Osiewicz <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good. Clean and nice to read, especially for code that could be much more complex. I left a lot of comments but I don't think any should be tough to resolve (anything that is a big change we should punt on until after this release, to avoid releasing too many possibly-breaking things at once).
// Note that we could use `num_cpus` here but it's an extra | ||
// dependency that will almost never be used, so | ||
// it's generally not too worth it. | ||
let mut parallelism = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps we could use (at least as an upper bound) https://doc.rust-lang.org/nightly/std/thread/fn.available_parallelism.html if we bump to 1.59.
Did you say that some folks (@dot-asm? Sorry if I'm misremembering) needed support for older Rust than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @dot-asm is against bumping msrv past 1.56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dot-asm can you make a defense as to why? It would be pretty nice to use available_parallelism here to avoid oversaturating the CPU on machines without a lot of concurrency (which includes a lot of vms/containers and machines people use for CI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't NUM_JOBS
going to be set most of the time when cc
is being invoked through Cargo? Thus making parallelism
's default value less likely to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. This is fine for now then, but I think we should this in a version or so, even if only when parallelism
is active. For now, you're right that it's probably not a big deal.
(I might reach out to @dot-asm to see why they're stuck and how common their case is likely to be though)
Anyway can you change the comment to say that it should use available_parallelism once MSRV is bumped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've added a "TODO" comment.
Co-authored-by: Thom Chiovoloni <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Since the manual specifies that only `--jobsewrver-auth` will be used and windows does not have the concept of fds anyway. Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
// Note that we could use `num_cpus` here but it's an extra | ||
// dependency that will almost never be used, so | ||
// it's generally not too worth it. | ||
let mut parallelism = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. This is fine for now then, but I think we should this in a version or so, even if only when parallelism
is active. For now, you're right that it's probably not a big deal.
(I might reach out to @dot-asm to see why they're stuck and how common their case is likely to be though)
Anyway can you change the comment to say that it should use available_parallelism once MSRV is bumped?
src/job_token/windows.rs
Outdated
THREAD_SYNCHRONIZE, WAIT_ABANDONED, WAIT_FAILED, WAIT_OBJECT_0, WAIT_TIMEOUT, | ||
}; | ||
|
||
const WAIT_ABANDOEND_ERR_MSG: &str = r#" The specified object is a mutex object that was not released by the thread that owned the mutex object before the owning thread terminated. Ownership of the mutex object is granted to the calling thread and the mutex state is set to nonsignaled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way more error information than we should provide, and none of it is likely to be useful for us. It's also wrong, because we're waiting on a semaphore object, but it says it was a mutex object, and describes some of the concerns around abandoned mutex objects.
Also, Windows semaphore objects aren't owned by a process/thread (unlike windows mutex objects), so I am pretty sure that WAIT_ABANDONED can never be returned for one anyway.
So we should probably just change this to something like "Wait on jobserver semaphore returned WAIT_ABANDONED"
(it probably doesn't need to be const this way too), and add a comment in try_acquire that we believe this to be impossible for a semaphore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated the errmsg and added the comment.
Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work, this is great!
This PR contains a simpler, vendored implementation of jobserver optimized for cc, it supports non-blocking
try_acquire
.By using the non-blocking
try_acquire
and async-as-state-machine, I was able to completely eliminate usage of additional thread inside parallel version ofcompile_objects
.