Skip to content

external outputs: broken if pipeline output doesn't exist during stage initialization #8757

@dberenbaum

Description

@dberenbaum

Bug Report

Description

S3 external outputs are broken for pipelines since 7211bd0 because of a bug in s3fs (and probably in other filesystems). They will only break if running a stage for which an output doesn't already exist. When initializing the stage, DVC will try to remove the nonexistent output and raise a FileNotFound error.

Reproduce

dvc repro will break if there is an external output and that output does not exist yet.

In a new repo, using some <s3_path> that doesn't exist yet, do this:

$ echo 'foo' > foo
$ dvc stage add --external -n foo -d foo -O <s3_path> 'aws s3 cp params.yaml <s3_path>'
$ dvc repro -v

Expected

dvc repro shouldn't fail while removing outputs. In this case, it fails because of what seems like a bug or at least inconsistent behavior in fsspec. Like mentioned in #5961 (comment), output.remove for s3fs and other async filesystems calls _expand_path. When the path doesn't exist and recursive=True, _expand_path raises FileNotFoundError. When recursive=False, it returns the path. It also returns the path for the LocalFileSystem regardless of whether recursive=True, so not sure if it was intended to raise an error only for this specific scenario.

Metadata

Metadata

Assignees

Labels

p1-importantImportant, aka current backlog of things to doregressionOhh, we broke something :-(

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions