-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Bug Report
Description
S3 external outputs are broken for pipelines since 7211bd0 because of a bug in s3fs (and probably in other filesystems). They will only break if running a stage for which an output doesn't already exist. When initializing the stage, DVC will try to remove the nonexistent output and raise a FileNotFound error.
Reproduce
dvc repro will break if there is an external output and that output does not exist yet.
In a new repo, using some <s3_path> that doesn't exist yet, do this:
$ echo 'foo' > foo
$ dvc stage add --external -n foo -d foo -O <s3_path> 'aws s3 cp params.yaml <s3_path>'
$ dvc repro -v
Expected
dvc repro shouldn't fail while removing outputs. In this case, it fails because of what seems like a bug or at least inconsistent behavior in fsspec. Like mentioned in #5961 (comment), output.remove for s3fs and other async filesystems calls _expand_path. When the path doesn't exist and recursive=True, _expand_path raises FileNotFoundError. When recursive=False, it returns the path. It also returns the path for the LocalFileSystem regardless of whether recursive=True, so not sure if it was intended to raise an error only for this specific scenario.