Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly installation failures #3390

Closed
JoshLind opened this issue Jun 22, 2023 · 6 comments
Closed

Nightly installation failures #3390

JoshLind opened this issue Jun 22, 2023 · 6 comments
Assignees
Labels

Comments

@JoshLind
Copy link

JoshLind commented Jun 22, 2023

Problem

Hi,

We have a bunch of continuously running CI/CD jobs that install the nightly version of rust (specifically, so that we can use nightly cargo fmt across our codebase). However, recently, we've seen these jobs begin to flake when trying to install the nightly builds. For example, our scripts run these commands to install rust nightly:

rustup toolchain install nightly
rustup component add rustfmt --toolchain nightly

But, these have begun flaking recently (e.g., the last few days) with errors:

+ rustup toolchain install nightly
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: update not yet available, sorry! try again later
error: toolchain 'nightly-x86_64-unknown-linux-gnu' is not installable

We have recently upgraded to use the rust 1.70.0 toolchain (which I suspect is when the flakes started). I know there's a couple options we have, e.g., fix our nightly build to a more stable date and then rename the toolchain from nightly-DATE to nightly, but we thought it was worth calling out the issue and seeing if folks were aware.

Steps

  1. Use the 1.70.0 rust toolchain.
  2. Run a script every few hours that calls:
rustup toolchain install nightly
rustup component add rustfmt --toolchain nightly
  1. See that some jobs flake with the above error?

Possible Solution(s)

Update all our scripts to fix the nightly installation to a specific date/build and use that?

Notes

In the ideal case we'd have an option in rustup to install the most recent nightly that works, e.g., rustup install nightly --any-recent-version, which could essentially just loop back through all the builds until one of them works, but I recognize this is likely a niche ask 😄

Rustup version

info: This is the version for the rustup toolchain manager, not the rustc compiler.
rustup 1.26.0 (5af9b9484 2023-04-05)
info: The currently active `rustc` version is `rustc 1.70.0 (90c541806 2023-05-31)`

Installed toolchains

Default host: x86_64-unknown-linux-gnu
rustup home:  /home/runner/.rustup
@rbtcollins
Copy link
Contributor

Not a Rustup problem - see dtolnay/rust-toolchain#88

I'm not sure what rust-lang project to move this onto, but I'm inclined to close this particular report - even though rustup is the visible face of it, it isn't the driving factor.

@jdno
Copy link
Member

jdno commented Jun 23, 2023

Hi folks,

Thanks for raising this and sorry for the inconvenience. This is almost certainly a bug in our infrastructure. We've added Fastly as a CDN last week and seem to not invalidate the cache consistently. I've implemented a fix and will roll that out as soon as possible.

If you want to move the issue, rust-lang/infra-team is a good place for it. Otherwise we can leave it open until the fix is confirmed and close it after the weekend.

Here is a link to the discussion in Zulip for context: https://rust-lang.zulipchat.com/#narrow/stream/242791-t-infra/topic/flaky.20rustup.20downloads

@JoshLind
Copy link
Author

JoshLind commented Jun 23, 2023

Awesome, thanks for the fast replies @rbtcollins and @jdno 😄 Happy to move and/or close this when you feel appropriate.

@jdno
Copy link
Member

jdno commented Jun 26, 2023

I've deployed a few changes to how we invalidate the cache on Fastly, which should fix the inconsistencies that caused the install failures. I'll check the cache tomorrow morning after the next nightly has been released to confirm this.

If anyone experiences the issue again after tonight's nightly is published, please let me know so that I can investigate. 🙂

@jdno
Copy link
Member

jdno commented Jun 27, 2023

The cache looks consistent this morning and all Fastly nodes are serving the same content as CloudFront. I expect this to fix the flaky installation issues, but please let me know over the coming days if that isn't the case.

@jdno jdno self-assigned this Jun 27, 2023
@jdno
Copy link
Member

jdno commented Jul 14, 2023

We've received no new reports of this issue since refactoring our cache invalidation two weeks ago, so I'm going to close this issue. If anyone does experience the this problem again, please report it in the #t-infra stream on Zulip or here in the issue.

👋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants