Skip to content

Commit

Permalink
Auto merge of #2486 - alexcrichton:flock, r=brson
Browse files Browse the repository at this point in the history
Fix running Cargo concurrently

Cargo has historically had no protections against running it concurrently. This
is pretty unfortunate, however, as it essentially just means that you can only
run one instance of Cargo at a time **globally on a system**.

An "easy solution" to this would be the use of file locks, except they need to
be applied judiciously. It'd be a pretty bad experience to just lock the entire
system globally for Cargo (although it would work), but otherwise Cargo must be
principled how it accesses the filesystem to ensure that locks are properly
held. This commit intends to solve all of these problems.

A new utility module is added to cargo, `util::flock`, which contains two types:

* `FileLock` - a locked version of a `File`. This RAII guard will unlock the
  lock on `Drop` and I/O can be performed through this object. The actual
  underlying `Path` can be read from this object as well.
* `Filesystem` - an unlocked representation of a `Path`. There is no "safe"
  method to access the underlying path without locking a file on the filesystem
  first.

Built on the [fs2] library, these locks use the `flock` system call on Unix and
`LockFileEx` on Windows. Although file locking on Unix is [documented as not so
great][unix-bad], but largely only because of NFS, these are just advisory, and
there's no byte-range locking. These issues don't necessarily plague Cargo,
however, so we should try to leverage them. On both Windows and Unix the file
locks are released when the underlying OS handle is closed, which means that
if the process dies the locks are released.

Cargo has a number of global resources which it now needs to lock, and the
strategy is done in a fairly straightforward way:

* Each registry's index contains one lock (a dotfile in the index). Updating the
  index requires a read/write lock while reading the index requires a shared
  lock. This should allow each process to ensure a registry update happens while
  not blocking out others for an unnecessarily long time. Additionally any
  number of processes can read the index.
* When downloading crates, each downloaded crate is individually locked. A lock
  for the downloaded crate implies a lock on the output directory as well.
  Because downloaded crates are immutable, once the downloaded directory exists
  the lock is no longer needed as it won't be modified, so it can be released.
  This granularity of locking allows multiple Cargo instances to download
  dependencies in parallel.
* Git repositories have separate locks for the database and for the project
  checkout. The datbase and checkout are locked for read/write access when an
  update is performed, and the lock of the checkout is held for the entire
  lifetime of the git source. This is done to ensure that any other Cargo
  processes must wait while we use the git repository. Unfortunately there's
  just not that much parallelism here.
* Binaries managed by `cargo install` are locked by the local metadata file that
  Cargo manages. This is relatively straightforward.
* The actual artifact output directory is just globally locked for the entire
  build. It's hypothesized that running Cargo concurrently in *one directory* is
  less of a feature needed rather than running multiple instances of Cargo
  globally (for now at least). It would be possible to have finer grained
  locking here, but that can likely be deferred to a future PR.

So with all of this infrastructure in place, Cargo is now ready to grab some
locks and ensure that you can call it concurrently anywhere at any time and
everything always works out as one might expect.

One interesting question, however, is what does Cargo do on contention? On one
hand Cargo could immediately abort, but this would lead to a pretty poor UI as
any Cargo process on the system could kick out any other. Instead this PR takes
a more nuanced approach.

* First, all locks are attempted to be acquired (a "try lock"). If this
  succeeds, we're done.
* Next, Cargo prints a message to the console that it's going to block waiting
  for a lock. This is done because it's indeterminate how long Cargo will wait
  for the lock to become available, and most long-lasting operations in Cargo
  have a message printed for them.
* Finally, a blocking acquisition of the lock is issued and we wait for it to
  become available.

So all in all this should help Cargo fix any future concurrency bugs with file
locking in a principled fashion while also allowing concurrent Cargo processes
to proceed reasonably across the system.

[fs2]: https://github.com/danburkert/fs2-rs
[unix-bad]: http://0pointer.de/blog/projects/locking.html

Closes #354
  • Loading branch information
bors committed Mar 17, 2016
2 parents 4576ae2 + 8eac1d6 commit c5360c4
Show file tree
Hide file tree
Showing 16 changed files with 860 additions and 137 deletions.
11 changes: 11 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ docopt = "0.6"
env_logger = "0.3"
filetime = "0.1"
flate2 = "0.2"
fs2 = "0.2"
git2 = "0.4"
git2-curl = "0.4"
glob = "0.2"
Expand Down
2 changes: 1 addition & 1 deletion src/bin/cargo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ fn is_executable(metadata: &fs::Metadata) -> bool {
}

fn search_directories(config: &Config) -> Vec<PathBuf> {
let mut dirs = vec![config.home().join("bin")];
let mut dirs = vec![config.home().clone().into_path_unlocked().join("bin")];
if let Some(val) = env::var_os("PATH") {
dirs.extend(env::split_paths(&val));
}
Expand Down
1 change: 1 addition & 0 deletions src/cargo/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ extern crate curl;
extern crate docopt;
extern crate filetime;
extern crate flate2;
extern crate fs2;
extern crate git2;
extern crate glob;
extern crate libc;
Expand Down
78 changes: 43 additions & 35 deletions src/cargo/ops/cargo_install.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use std::env;
use std::ffi::OsString;
use std::fs::{self, File};
use std::io::prelude::*;
use std::io;
use std::io::SeekFrom;
use std::path::{Path, PathBuf};

use toml;
Expand All @@ -14,10 +14,12 @@ use core::PackageId;
use ops::{self, CompileFilter};
use sources::{GitSource, PathSource, RegistrySource};
use util::{CargoResult, ChainError, Config, human, internal};
use util::{Filesystem, FileLock};

#[derive(RustcDecodable, RustcEncodable)]
enum CrateListing {
V1(CrateListingV1),
Empty,
}

#[derive(RustcDecodable, RustcEncodable)]
Expand Down Expand Up @@ -67,9 +69,15 @@ pub fn install(root: Option<&str>,
specify alternate source"))))
};

let mut list = try!(read_crate_list(&root));
let dst = root.join("bin");
try!(check_overwrites(&dst, &pkg, &opts.filter, &list));
// Preflight checks to check up front whether we'll overwrite something.
// We have to check this again afterwards, but may as well avoid building
// anything if we're gonna throw it away anyway.
{
let metadata = try!(metadata(config, &root));
let list = try!(read_crate_list(metadata.file()));
let dst = metadata.parent().join("bin");
try!(check_overwrites(&dst, &pkg, &opts.filter, &list));
}

let target_dir = if source_id.is_path() {
config.target_dir(&pkg)
Expand All @@ -82,6 +90,11 @@ pub fn install(root: Option<&str>,
found at `{}`", pkg, target_dir.display()))
}));

let metadata = try!(metadata(config, &root));
let mut list = try!(read_crate_list(metadata.file()));
let dst = metadata.parent().join("bin");
try!(check_overwrites(&dst, &pkg, &opts.filter, &list));

let mut t = Transaction { bins: Vec::new() };
try!(fs::create_dir_all(&dst));
for bin in compile.binaries.iter() {
Expand All @@ -103,7 +116,7 @@ pub fn install(root: Option<&str>,
}).extend(t.bins.iter().map(|t| {
t.file_name().unwrap().to_string_lossy().into_owned()
}));
try!(write_crate_list(&root, list));
try!(write_crate_list(metadata.file(), list));

t.bins.truncate(0);

Expand Down Expand Up @@ -230,51 +243,40 @@ fn check_overwrites(dst: &Path,
Ok(())
}

fn read_crate_list(path: &Path) -> CargoResult<CrateListingV1> {
let metadata = path.join(".crates.toml");
let mut f = match File::open(&metadata) {
Ok(f) => f,
Err(e) => {
if e.kind() == io::ErrorKind::NotFound {
return Ok(CrateListingV1 { v1: BTreeMap::new() });
}
return Err(e).chain_error(|| {
human(format!("failed to open crate metadata at `{}`",
metadata.display()))
});
}
};
fn read_crate_list(mut file: &File) -> CargoResult<CrateListingV1> {
(|| -> CargoResult<_> {
let mut contents = String::new();
try!(f.read_to_string(&mut contents));
try!(file.read_to_string(&mut contents));
let listing = try!(toml::decode_str(&contents).chain_error(|| {
internal("invalid TOML found for metadata")
}));
match listing {
CrateListing::V1(v1) => Ok(v1),
CrateListing::Empty => {
Ok(CrateListingV1 { v1: BTreeMap::new() })
}
}
}).chain_error(|| {
human(format!("failed to parse crate metadata at `{}`",
metadata.display()))
human("failed to parse crate metadata")
})
}

fn write_crate_list(path: &Path, listing: CrateListingV1) -> CargoResult<()> {
let metadata = path.join(".crates.toml");
fn write_crate_list(mut file: &File, listing: CrateListingV1) -> CargoResult<()> {
(|| -> CargoResult<_> {
let mut f = try!(File::create(&metadata));
try!(file.seek(SeekFrom::Start(0)));
try!(file.set_len(0));
let data = toml::encode_str::<CrateListing>(&CrateListing::V1(listing));
try!(f.write_all(data.as_bytes()));
try!(file.write_all(data.as_bytes()));
Ok(())
}).chain_error(|| {
human(format!("failed to write crate metadata at `{}`",
metadata.display()))
human("failed to write crate metadata")
})
}

pub fn install_list(dst: Option<&str>, config: &Config) -> CargoResult<()> {
let dst = try!(resolve_root(dst, config));
let list = try!(read_crate_list(&dst));
let dst = try!(metadata(config, &dst));
let list = try!(read_crate_list(dst.file()));
let mut shell = config.shell();
let out = shell.out();
for (k, v) in list.v1.iter() {
Expand All @@ -291,7 +293,8 @@ pub fn uninstall(root: Option<&str>,
bins: &[String],
config: &Config) -> CargoResult<()> {
let root = try!(resolve_root(root, config));
let mut metadata = try!(read_crate_list(&root));
let crate_metadata = try!(metadata(config, &root));
let mut metadata = try!(read_crate_list(crate_metadata.file()));
let mut to_remove = Vec::new();
{
let result = try!(PackageIdSpec::query_str(spec, metadata.v1.keys()))
Expand All @@ -300,7 +303,7 @@ pub fn uninstall(root: Option<&str>,
Entry::Occupied(e) => e,
Entry::Vacant(..) => panic!("entry not found: {}", result),
};
let dst = root.join("bin");
let dst = crate_metadata.parent().join("bin");
for bin in installed.get() {
let bin = dst.join(bin);
if fs::metadata(&bin).is_err() {
Expand Down Expand Up @@ -336,7 +339,7 @@ pub fn uninstall(root: Option<&str>,
installed.remove();
}
}
try!(write_crate_list(&root, metadata));
try!(write_crate_list(crate_metadata.file(), metadata));
for bin in to_remove {
try!(config.shell().status("Removing", bin.display()));
try!(fs::remove_file(bin));
Expand All @@ -345,13 +348,18 @@ pub fn uninstall(root: Option<&str>,
Ok(())
}

fn resolve_root(flag: Option<&str>, config: &Config) -> CargoResult<PathBuf> {
fn metadata(config: &Config, root: &Filesystem) -> CargoResult<FileLock> {
root.open_rw(Path::new(".crates.toml"), config, "crate metadata")
}

fn resolve_root(flag: Option<&str>,
config: &Config) -> CargoResult<Filesystem> {
let config_root = try!(config.get_path("install.root"));
Ok(flag.map(PathBuf::from).or_else(|| {
env::var_os("CARGO_INSTALL_ROOT").map(PathBuf::from)
}).or_else(move || {
config_root.map(|v| v.val)
}).unwrap_or_else(|| {
config.home().to_owned()
}).map(Filesystem::new).unwrap_or_else(|| {
config.home().clone()
}))
}
11 changes: 9 additions & 2 deletions src/cargo/ops/cargo_rustc/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ use std::env;
use std::ffi::{OsStr, OsString};
use std::fs;
use std::io::prelude::*;
use std::path::{self, PathBuf};
use std::path::{self, PathBuf, Path};
use std::sync::Arc;

use core::{Package, PackageId, PackageSet, Target, Resolve};
use core::{Profile, Profiles};
use util::{self, CargoResult, human};
use util::{self, CargoResult, human, Filesystem};
use util::{Config, internal, ChainError, profile, join_paths};

use self::job::{Job, Work};
Expand Down Expand Up @@ -85,6 +85,13 @@ pub fn compile_targets<'a, 'cfg: 'a>(pkg_targets: &'a PackagesToBuild<'a>,
layout::Layout::new(config, root, Some(&target), &dest)
});

// For now we don't do any more finer-grained locking on the artifact
// directory, so just lock the entire thing for the duration of this
// compile.
let fs = Filesystem::new(host_layout.root().to_path_buf());
let path = Path::new(".cargo-lock");
let _lock = try!(fs.open_rw(path, config, "build directory"));

let mut cx = try!(Context::new(resolve, packages, config,
host_layout, target_layout,
build_config, profiles));
Expand Down
71 changes: 44 additions & 27 deletions src/cargo/sources/git/source.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
use std::fmt::{self, Debug, Formatter};
use std::hash::{Hash, Hasher, SipHasher};
use std::mem;
use std::path::PathBuf;

use url::{self, Url};

use core::source::{Source, SourceId};
use core::GitReference;
use core::{Package, PackageId, Summary, Registry, Dependency};
use util::{CargoResult, Config, to_hex};
use util::{CargoResult, Config, FileLock, to_hex};
use sources::PathSource;
use sources::git::utils::{GitRemote, GitRevision};

Expand All @@ -17,11 +16,11 @@ use sources::git::utils::{GitRemote, GitRevision};
pub struct GitSource<'cfg> {
remote: GitRemote,
reference: GitReference,
db_path: PathBuf,
checkout_path: PathBuf,
source_id: SourceId,
path_source: Option<PathSource<'cfg>>,
rev: Option<GitRevision>,
checkout_lock: Option<FileLock>,
ident: String,
config: &'cfg Config,
}

Expand All @@ -30,25 +29,9 @@ impl<'cfg> GitSource<'cfg> {
config: &'cfg Config) -> GitSource<'cfg> {
assert!(source_id.is_git(), "id is not git, id={}", source_id);

let reference = match source_id.git_reference() {
Some(reference) => reference,
None => panic!("Not a git source; id={}", source_id),
};

let remote = GitRemote::new(source_id.url());
let ident = ident(source_id.url());

let db_path = config.git_db_path().join(&ident);

let reference_path = match *reference {
GitReference::Branch(ref s) |
GitReference::Tag(ref s) |
GitReference::Rev(ref s) => s.to_string(),
};
let checkout_path = config.git_checkout_path()
.join(&ident)
.join(&reference_path);

let reference = match source_id.precise() {
Some(s) => GitReference::Rev(s.to_string()),
None => source_id.git_reference().unwrap().clone(),
Expand All @@ -57,11 +40,11 @@ impl<'cfg> GitSource<'cfg> {
GitSource {
remote: remote,
reference: reference,
db_path: db_path,
checkout_path: checkout_path,
source_id: source_id.clone(),
path_source: None,
rev: None,
checkout_lock: None,
ident: ident,
config: config,
}
}
Expand Down Expand Up @@ -160,7 +143,34 @@ impl<'cfg> Registry for GitSource<'cfg> {

impl<'cfg> Source for GitSource<'cfg> {
fn update(&mut self) -> CargoResult<()> {
let actual_rev = self.remote.rev_for(&self.db_path, &self.reference);
// First, lock both the global database and checkout locations that
// we're going to use. We may be performing a fetch into these locations
// so we need writable access.
let db_lock = format!(".cargo-lock-{}", self.ident);
let db_lock = try!(self.config.git_db_path()
.open_rw(&db_lock, self.config,
"the git database"));
let db_path = db_lock.parent().join(&self.ident);

let reference_path = match self.source_id.git_reference() {
Some(&GitReference::Branch(ref s)) |
Some(&GitReference::Tag(ref s)) |
Some(&GitReference::Rev(ref s)) => s,
None => panic!("not a git source"),
};
let checkout_lock = format!(".cargo-lock-{}-{}", self.ident,
reference_path);
let checkout_lock = try!(self.config.git_checkout_path()
.join(&self.ident)
.open_rw(&checkout_lock, self.config,
"the git checkout"));
let checkout_path = checkout_lock.parent().join(reference_path);

// Resolve our reference to an actual revision, and check if the
// databaes already has that revision. If it does, we just load a
// database pinned at that revision, and if we don't we issue an update
// to try to find the revision.
let actual_rev = self.remote.rev_for(&db_path, &self.reference);
let should_update = actual_rev.is_err() ||
self.source_id.precise().is_none();

Expand All @@ -169,22 +179,29 @@ impl<'cfg> Source for GitSource<'cfg> {
format!("git repository `{}`", self.remote.url())));

trace!("updating git source `{:?}`", self.remote);
let repo = try!(self.remote.checkout(&self.db_path));
let repo = try!(self.remote.checkout(&db_path));
let rev = try!(repo.rev_for(&self.reference));
(repo, rev)
} else {
(try!(self.remote.db_at(&self.db_path)), actual_rev.unwrap())
(try!(self.remote.db_at(&db_path)), actual_rev.unwrap())
};

try!(repo.copy_to(actual_rev.clone(), &self.checkout_path));
// Copy the database to the checkout location. After this we could drop
// the lock on the database as we no longer needed it, but we leave it
// in scope so the destructors here won't tamper with too much.
try!(repo.copy_to(actual_rev.clone(), &checkout_path));

let source_id = self.source_id.with_precise(Some(actual_rev.to_string()));
let path_source = PathSource::new_recursive(&self.checkout_path,
let path_source = PathSource::new_recursive(&checkout_path,
&source_id,
self.config);

// Cache the information we just learned, and crucially also cache the
// lock on the checkout location. We wouldn't want someone else to come
// swipe our checkout location to another revision while we're using it!
self.path_source = Some(path_source);
self.rev = Some(actual_rev);
self.checkout_lock = Some(checkout_lock);
self.path_source.as_mut().unwrap().update()
}

Expand Down
Loading

0 comments on commit c5360c4

Please sign in to comment.