Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

fix race condition when extracting files with cached_path #5227

Merged
merged 4 commits into from
May 27, 2021

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented May 26, 2021

Ran into this when running a T5 experiment on CNN-DM: https://beaker.org/ex/ex_co58l942ntjh/tasks/tk_rd3v6das2fzn.

@epwalsh epwalsh requested a review from dirkgr May 26, 2021 23:07
@@ -325,12 +329,24 @@ def cached_path(

if extraction_path is not None:
# If the extracted directory already exists (and is non-empty), then no
# need to extract again unless `force_extract=True`.
# need to create a lock file and extract again unless `force_extract=True`.
if os.path.isdir(extraction_path) and os.listdir(extraction_path) and not force_extract:
return extraction_path

# Extract it.
with FileLock(extraction_path + ".lock"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make sure that only one process/thread makes it into the block below, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeup

@epwalsh epwalsh merged commit 12155c4 into main May 27, 2021
@epwalsh epwalsh deleted the cached-path-race-condition-fix branch May 27, 2021 17:06
Abhishek-P pushed a commit to Abhishek-P/allennlp that referenced this pull request Aug 11, 2021
* fix race condition when extracting files with cached_path

* add warning when directory already exists
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants