Optimization: Vendor jobserver impl and rm thread spawning in parallel compile_objects #889

NobodyXu · 2023-10-22T02:06:17Z

This PR contains a simpler, vendored implementation of jobserver optimized for cc, it supports non-blocking try_acquire.

By using the non-blocking try_acquire and async-as-state-machine, I was able to completely eliminate usage of additional thread inside parallel version of compile_objects.

thomcc · 2023-10-22T07:00:00Z

This looks great at first glance, but it might not be for a week or two before I can actually review it (it's a big PR).

NobodyXu · 2023-10-22T07:55:16Z

but it might not be for a week or two before I can actually review it (it's a big PR).

That's totally fine, it's huge and change a lot of stuff.

I might also polish it a bit in the meantime and added more testing for the parallel compile_objects to detect regression we previously seen (e.g. passing more objects than the cores, msrv).

thomcc · 2023-10-22T07:55:59Z

Yep, feel free.

src/job_token.rs

src/lib.rs

osiewicz · 2023-10-24T22:58:09Z

Overall I like your approach and I think it solves the issue that we've had with initial channel-based implementation nicely, though I think it would be nice to explain:

why do we have a static token queue in the first place
why we can't really store the tokens in the queue and have to use an external jobserver for that (that's due to Rust not supporting Drop for static).

I didn't look much into the jobserver reimplementation, but I'm assuming it's a verbatim copy of jobserver with try_acquire slapped on top; if so, I'd suggest to state which version it is based on so that one could verify in the future if there's a good reason (e.g. a bugfix) to sync them again.
I also think that it would be pretty cool to split this up a bit.
Overall though this looks pretty cool to me, good job. =)

NobodyXu · 2023-10-24T23:56:06Z

why do we have a static token queue in the first place

The static token queue is for cases where there's no jobserver configured and we need to create a new jobserver.

Instead of porting the code to create a new jobserver, it's much simpler to have a static token queue implemented using atomics.

why we can't really store the tokens in the queue and have to use an external jobserver for that (that's due to Rust not supporting Drop for static).

I think there's some comments in the inherited_jobserver mod.

I didn't look much into the jobserver reimplementation, but I'm assuming it's a verbatim copy of jobserver with try_acquire slapped on top; if so, I'd suggest to state which version it is based on so that one could verify in the future if there's a good reason (e.g. a bugfix) to sync them again.

It's actually based on jobslot, a fork of jobserver.

But even so it's not identical apart from try_acquire, I have simplified the code a bit and I will port that back to jobslot later.

I also think that it would be pretty cool to split this up a bit.

Can you please elaborate on which part to split up?

Overall though this looks pretty cool to me, good job. =)

Thank you and thanks for reviewing my code!

src/job_token.rs

osiewicz · 2023-10-25T00:09:19Z

Oh yeah, I've meant that we should probably have comments for the first 2 points; by static I've meant that we have this JOBSERVER global that we've introduced in the previous PR; what problems does it solve? Why can't we use the job queue there? The answer to both of these questions is obvious to me and you given that we've cooperated and we have the understanding of the problem, but it's not going to be obvious to someone who reads this code for the first time.

Can you please elaborate on which part to split up?

I think the async executor should be in a separate module (something like executor, idk). Backoff logic could be hidden away as well, as this code (IMO) mixes the job scheduling (high level) with polling for the futures.

NobodyXu · 2023-10-25T00:40:37Z

Oh yeah, I've meant that we should probably have comments for the first 2 points;

That makes sense, I will add that.

I think the async executor should be in a separate module (something like executor, idk). Backoff logic could be hidden away as well, as this code (IMO) mixes the job scheduling (high level) with polling for the futures.

That's a good idea!

I will extract anything related to it as a separate module.

NobodyXu · 2023-10-25T11:45:04Z

cc @nrc since this PR creates a small async executor and it might help your "Async fundamentals initiative: portable and interoperable" in adding an small single-thread async executor to libstd.

osiewicz

LGTM

NobodyXu · 2023-10-27T13:46:33Z

It seems that this PR have some MSRV issues, will wait for #890 to be merged and rebase.

NobodyXu · 2023-11-01T12:39:49Z

Running cargo +1.46 c --all-features gives me the following errors:

error[E0658]: or-patterns syntax is experimental
  --> src/job_token/unix.rs:88:17
   |
88 |                 Some(libc::O_RDONLY) | Some(libc::O_RDWR),
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #54883 <https://github.com/rust-lang/rust/issues/54883> for more information

error[E0658]: or-patterns syntax is experimental
  --> src/job_token/unix.rs:89:17
   |
89 |                 Some(libc::O_WRONLY) | Some(libc::O_RDWR),
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #54883 <https://github.com/rust-lang/rust/issues/54883> for more information

error[E0658]: use of unstable library feature 'slice_strip'
  --> src/job_token/unix.rs:28:21
   |
28 |                 arg.strip_prefix(b"--jobserver-fds=")
   |                     ^^^^^^^^^^^^
   |
   = note: see issue #73413 <https://github.com/rust-lang/rust/issues/73413> for more information

error[E0658]: use of unstable library feature 'slice_strip'
  --> src/job_token/unix.rs:29:37
   |
29 |                     .or_else(|| arg.strip_prefix(b"--jobserver-auth="))
   |                                     ^^^^^^^^^^^^
   |
   = note: see issue #73413 <https://github.com/rust-lang/rust/issues/73413> for more information

error[E0658]: use of unstable library feature 'slice_strip'
  --> src/job_token/unix.rs:33:31
   |
33 |         if let Some(fifo) = s.strip_prefix(b"fifo:") {
   |                               ^^^^^^^^^^^^
   |
   = note: see issue #73413 <https://github.com/rust-lang/rust/issues/73413> for more information

error[E0599]: no method named `split_once` found for reference `&str` in the current scope
  --> src/job_token/unix.rs:64:31
   |
64 |         let (read, write) = s.split_once(',')?;
   |                               ^^^^^^^^^^ method not found in `&str`

error: aborting due to 6 previous errors

Some errors have detailed explanations: E0599, E0658.
For more information about an error, try `rustc --explain E0599`.
error: could not compile `cc`.

To learn more, run the command again with --verbose.

The first one is rust-lang/rust#79278 stablised in 1.53, the second one is rust-lang/rust#77853 stablised in 1.51 , the last one is rust-lang/rust#81940 stablised in 1.52

Given that rust 1.53 is released on Jun 18, 2021 and it has been more than two years, @thomcc would it makes sense to bump the MSRV (again)?

Also 1.46 has problem with linking on latest MacOS and fetching from crates.io is super slow in CI, so it makes sense to bump MSRV to a newer version.

thomcc · 2023-11-03T23:09:15Z

would it makes sense to bump the MSRV (again)?

I'm fine with it. I'd be okay bumping as new as 1.63 (a bit older than a year, and also compatible with debian stable), but beyond that I don't see a reason for compat with old Rust.

(I'll be actually reviewing this this weekend, just was skimming through the comments)

It supports non-blocking `try_acquire` and is much simpler than the one provided by `jobserver` Signed-off-by: Jiahao XU <[email protected]>

Also fixed compilation errors in mod `job_token` Signed-off-by: Jiahao XU <[email protected]>

Remove use of mpsc since the future is executed on one single thread only. Signed-off-by: Jiahao XU <[email protected]>

The mpsc is stored in a global variable and Rust never calls `Drop::drop` on global variables, so they are never released. This commit removes the mpsc and replaces that with an `AtomicBool` for the implicit token to fix this, also dramatically simplifies the code. Signed-off-by: Jiahao XU <[email protected]>

Signed-off-by: Jiahao XU <[email protected]>

Return `Ok(None)` instead of `Err()` if no token is ready. Signed-off-by: Jiahao XU <[email protected]>

`O_RDWR` is a valid access mode for both read and write end of the pipe. Signed-off-by: Jiahao XU <[email protected]>

Signed-off-by: Jiahao XU <[email protected]>

Co-authored-by: Piotr Osiewicz <[email protected]>

Signed-off-by: Jiahao XU <[email protected]>

thomcc

This is really good. Clean and nice to read, especially for code that could be much more complex. I left a lot of comments but I don't think any should be tough to resolve (anything that is a big change we should punt on until after this release, to avoid releasing too many possibly-breaking things at once).

src/async_executor.rs

src/job_token/unix.rs

thomcc · 2023-11-05T12:59:29Z

src/job_token.rs

+            // Note that we could use `num_cpus` here but it's an extra
+            // dependency that will almost never be used, so
+            // it's generally not too worth it.
+            let mut parallelism = 4;


I think perhaps we could use (at least as an upper bound) https://doc.rust-lang.org/nightly/std/thread/fn.available_parallelism.html if we bump to 1.59.

Did you say that some folks (@dot-asm? Sorry if I'm misremembering) needed support for older Rust than that?

Yes, @dot-asm is against bumping msrv past 1.56

@dot-asm can you make a defense as to why? It would be pretty nice to use available_parallelism here to avoid oversaturating the CPU on machines without a lot of concurrency (which includes a lot of vms/containers and machines people use for CI)

Isn't NUM_JOBS going to be set most of the time when cc is being invoked through Cargo? Thus making parallelism's default value less likely to be used.

That's a fair point. This is fine for now then, but I think we should this in a version or so, even if only when parallelism is active. For now, you're right that it's probably not a big deal.

(I might reach out to @dot-asm to see why they're stuck and how common their case is likely to be though)

Anyway can you change the comment to say that it should use available_parallelism once MSRV is bumped?

Yes, I've added a "TODO" comment.

src/job_token/unix.rs

src/job_token.rs

src/job_token/unix.rs

src/job_token/windows.rs

src/lib.rs

Co-authored-by: Thom Chiovoloni <[email protected]>

Signed-off-by: Jiahao XU <[email protected]>

Since the manual specifies that only `--jobsewrver-auth` will be used and windows does not have the concept of fds anyway. Signed-off-by: Jiahao XU <[email protected]>

Signed-off-by: Jiahao XU <[email protected]>

thomcc · 2023-11-07T07:50:03Z

src/job_token.rs

+            // Note that we could use `num_cpus` here but it's an extra
+            // dependency that will almost never be used, so
+            // it's generally not too worth it.
+            let mut parallelism = 4;


That's a fair point. This is fine for now then, but I think we should this in a version or so, even if only when parallelism is active. For now, you're right that it's probably not a big deal.

(I might reach out to @dot-asm to see why they're stuck and how common their case is likely to be though)

Anyway can you change the comment to say that it should use available_parallelism once MSRV is bumped?

src/job_token/unix.rs

src/job_token/windows.rs

thomcc · 2023-11-07T08:09:44Z

src/job_token/windows.rs

+    THREAD_SYNCHRONIZE, WAIT_ABANDONED, WAIT_FAILED, WAIT_OBJECT_0, WAIT_TIMEOUT,
+};
+
+const WAIT_ABANDOEND_ERR_MSG: &str = r#" The specified object is a mutex object that was not released by the thread that owned the mutex object before the owning thread terminated. Ownership of the mutex object is granted to the calling thread and the mutex state is set to nonsignaled.


This is way more error information than we should provide, and none of it is likely to be useful for us. It's also wrong, because we're waiting on a semaphore object, but it says it was a mutex object, and describes some of the concerns around abandoned mutex objects.

Also, Windows semaphore objects aren't owned by a process/thread (unlike windows mutex objects), so I am pretty sure that WAIT_ABANDONED can never be returned for one anyway.

So we should probably just change this to something like "Wait on jobserver semaphore returned WAIT_ABANDONED" (it probably doesn't need to be const this way too), and add a comment in try_acquire that we believe this to be impossible for a semaphore.

Thanks, I've updated the errmsg and added the comment.

Signed-off-by: Jiahao XU <[email protected]>

thomcc

Thanks for all the work, this is great!

This was referenced Oct 22, 2023

feat: Use channels to maintain job tokens & reuse the implicit token without dropping it first #878

Merged

cargo build hangs when using cc >= 1.0.80 (workaround in crate version 0.84.4) twistedfall/opencv-rust#480

Closed