Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QNX Neutrino: exponential backoff when fork/spawn needs a retry #109432

Merged
merged 3 commits into from
Jun 3, 2023

Conversation

flba-eb
Copy link
Contributor

@flba-eb flba-eb commented Mar 21, 2023

Fixes #108594: When retrying, sleep with an exponential duration. When sleep duration is lower than minimum possible sleeping time, yield instead (this will not be often due to the exponential increase of duration).

Minimum possible sleeping time is determined using libc::clock_getres but only when spawn/fork failed the first time in a request. This is cached using a LazyLock.

CC @gh-tr

r? @workingjubilee
@rustbot label +O-neutrino

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. O-neutrino OS: QNX Neutrino, a POSIX-compatible real-time operating system labels Mar 21, 2023
@flba-eb flba-eb marked this pull request as ready for review March 21, 2023 18:10
@rustbot
Copy link
Collaborator

rustbot commented Mar 21, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the long turnaround time, April was apparently not great for thinking.

library/std/src/sys/unix/process/process_unix.rs Outdated Show resolved Hide resolved
library/std/src/sys/unix/process/process_unix.rs Outdated Show resolved Hide resolved
Comment on lines +194 to +202
delay *= 2;
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like it if we still bailed for ErrorKind::WouldBlock or ErrorKind::Interrupted after a while, probably if we're getting up to actual seconds between yields.

@workingjubilee
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 5, 2023
@flba-eb
Copy link
Contributor Author

flba-eb commented Jun 2, 2023

Thank you very much @workingjubilee for the good feedback! Returning ErrorKind::WouldBlock instead of sleeping for seconds sounds like a good idea.

@flba-eb flba-eb force-pushed the 108594_forkspawn_exponential_backoff branch from 9b71cad to 716cc5a Compare June 2, 2023 15:53
@flba-eb
Copy link
Contributor Author

flba-eb commented Jun 2, 2023

Requested changes have been implemented but should be tested before merging them. Branch is rebased; requested changes are in a separate commit to make the review easier.

@flba-eb
Copy link
Contributor Author

flba-eb commented Jun 2, 2023

r? @workingjubilee
Requested changes are done and tested on QNX Neutrino.

@rustbot
Copy link
Collaborator

rustbot commented Jun 2, 2023

Could not assign reviewer from: workingjubilee.
User(s) workingjubilee are either the PR author or are already assigned, and there are no other candidates.
Use r? to specify someone else to assign.

@flba-eb
Copy link
Contributor Author

flba-eb commented Jun 2, 2023

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 2, 2023
@flba-eb
Copy link
Contributor Author

flba-eb commented Jun 2, 2023

@workingjubilee What do you think about giving up after one second? Does it make sense to document this somewhere as QNX Neutrino specific behavior? Where?

@workingjubilee
Copy link
Member

Yeah, 1 second as the upper bar seems fine. If you wish to document it in the platform support (for now? maybe we should have a "platform quirks" page somewhere...) page, then go ahead. It's an unlikely circumstance, even under kernel stress, that I think having this surface in Rust programs as an unhandled "glitch" is fine, i.e. real programs usually won't hit this. However, I feel I should note that EAGAIN is actually specifically defined as a plausible POSIX error for posix_spawn due to it including "all the reasons fork/exec would fail", and fork can EAGAIN.

@workingjubilee
Copy link
Member

workingjubilee commented Jun 3, 2023

Given that, it might actually make more sense to document this as a possible error for all platforms? But any changes for documenting spawn's errors for all platforms should be in another PR, and you might want to document it as specifically likely for Neutrino anyways.

@flba-eb
Copy link
Contributor Author

flba-eb commented Jun 3, 2023

Yes I think this should be documented in another PR. With this patch I wasn't able at all to reproduce the issue anymore (before the ui test suit always had 4-10 failures). As soon as we're sleeping the problem basically goes away (even when reducing the give-up threshold to milliseconds). After thinking about it, I would not even mention it in nto-qnx.md anymore. Would you agree?

@workingjubilee
Copy link
Member

Sure! Then let's go with this.

@bors r+

@bors
Copy link
Contributor

bors commented Jun 3, 2023

📌 Commit 716cc5a has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 3, 2023
@bors
Copy link
Contributor

bors commented Jun 3, 2023

⌛ Testing commit 716cc5a with merge 1e17cef...

@bors
Copy link
Contributor

bors commented Jun 3, 2023

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 1e17cef to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 3, 2023
@bors bors merged commit 1e17cef into rust-lang:master Jun 3, 2023
@rustbot rustbot added this to the 1.72.0 milestone Jun 3, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1e17cef): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.3% [-4.3%, -1.3%] 8
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.3% [-4.3%, -1.3%] 8

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 647.289s -> 647.076s (-0.03%)

@flba-eb flba-eb deleted the 108594_forkspawn_exponential_backoff branch June 27, 2023 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. O-neutrino OS: QNX Neutrino, a POSIX-compatible real-time operating system S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spawning / Forking a process on QNX Neutrino can fail
5 participants