Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock queue if updating toolchain fails #1307

Merged
merged 1 commit into from
Apr 4, 2021

Conversation

Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Mar 6, 2021

Reduces the impact from failures like #1305, rather than continuing to build crates with the failing toolchain we will just stop building crates until an admin has time to investigate the cause.

Example log when this happens:

web_1         | 2021/03/06 10:38:14 [INFO] docs_rs::docbuilder::rustwide_builder: copying essential files for rustc 1.52.0-nightly (caca2121f 2021-03-05)
web_1         | 2021/03/06 10:38:14 [ERROR] docs_rs::docbuilder::queue: Updating toolchain failed, locking queue: couldn't copy '/opt/docsrs/rustwide/builds/essential-files-20210305-1.52.0-nightly-caca2121f/target/doc/FireSans-Medium.woff2' to '/tmp/essential-filessNMWC1/FireSans-Medium.woff2'
web_1         | 2021/03/06 10:38:14 [ERROR] docs_rs::build_queue: Failed to build package serde-1.0.123 from queue: couldn't copy '/opt/docsrs/rustwide/builds/essential-files-20210305-1.52.0-nightly-caca2121f/target/doc/FireSans-Medium.woff2' to '/tmp/essential-filessNMWC1/FireSans-Medium.woff2'
web_1         | Backtrace:    0: failure::backtrace::internal::InternalBacktrace::new
web_1         |    1: <failure::backtrace::Backtrace as core::default::Default>::default
web_1         |    2: rustwide::build::BuildBuilder::run
web_1         |    3: docs_rs::docbuilder::rustwide_builder::RustwideBuilder::add_essential_files
web_1         |    4: docs_rs::docbuilder::rustwide_builder::RustwideBuilder::update_toolchain
web_1         |    5: docs_rs::build_queue::BuildQueue::process_next_crate
web_1         |    6: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
web_1         |    7: docs_rs::utils::queue_builder::queue_builder
web_1         |    8: std::sys_common::backtrace::__rust_begin_short_backtrace
web_1         |    9: core::ops::function::FnOnce::call_once{{vtable.shim}}
web_1         |   10: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
web_1         |              at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/liballoc/boxed.rs:1081
web_1         |       <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
web_1         |              at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/liballoc/boxed.rs:1081
web_1         |       std::sys::unix::thread::Thread::new::thread_start
web_1         |              at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/sys/unix/thread.rs:87
web_1         |   11: start_thread
web_1         |   12: __clone
web_1         |
web_1         | 2021/03/06 10:38:14 [WARN] docs_rs::utils::queue_builder: Lock file exits, skipping building new crates

Reduces the impact from failures like rust-lang#1305, rather than continuing to
build crates with the failing toolchain we will just stop building
crates until an admin has time to investigate the cause.
Comment on lines +90 to +94
if let Err(err) = builder.update_toolchain() {
log::error!("Updating toolchain failed, locking queue: {}", err);
self.lock()?;
return Err(err);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than calling update_toolchain in more places, do you think it makes sense to change build_package to return an error enum so we can tell where the error came from? The current change seems racy: a new nightly could be published between calling update_toolchain here and calling it in build_package.

I'm also slightly concerned that this doesn't handle errors when an admin runs cratesfyi build crate, but I don't know a simple way to handle that, and it seems rare for that to be the first build with a new toolchain anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the approach I tried first, but I got stuck with working out how to make it work with failure, I could take another attempt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, could you maybe add a new error type and try downcasting to that? Don't spend too much time on it :) this should work fine 99.9% of the time.

@jyn514 jyn514 added A-builds Area: Building the documentation for a crate S-waiting-on-author Status: This PR is incomplete or needs to address review comments labels Mar 6, 2021
@jyn514
Copy link
Member

jyn514 commented Apr 4, 2021

I think this is fine to merge - only the queue needs to know what the error type is, and the edge case for it to be racy is less than a few seconds a day:

let mut conn = self.db.get()?;
if !self.should_build(&mut conn, name, version)? {
return Ok(false);
}

@jyn514 jyn514 merged commit 47ff0a3 into rust-lang:master Apr 4, 2021
@Nemo157 Nemo157 deleted the lock-queue-on-update-failure branch May 31, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-builds Area: Building the documentation for a crate S-waiting-on-author Status: This PR is incomplete or needs to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants