-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure that s3 is connected before using the s3fs api #67
ensure that s3 is connected before using the s3fs api #67
Conversation
Hey, could you please fix the linter errors, and also see if the tests passes? To run linter errors, you need to do following: pip install pre-commit
pre-commit run --all You can run the tests as follows: pip install -e ".[dev]"
pytest -nlogical |
Could this be fixed in fsspec/s3fs? |
It could be fixed in fsspec/s3fs, however based on their documentation, it is asked to connect before using multiple jobs, so I believe, it is the intended behavior. We could also trace back to botocore: boto/botocore#1780 but it looks like the intended behavior. Would like me to open an issue in https://github.com/fsspec/s3fs? |
@GuichardVictor thanks, it's a good find. do you know where do we call (or |
This is what I found when investigating on this issue: I believe that calls to s3fs functions use the following function The function calls Due to how the credential provider from While this may need to be also fixed in s3fs, this looks to be the intended behavior as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GuichardVictor for the contributions.
@shcheklein, this happens because we run many jobs in multiple threads or asynchronously without doing s3fs does See iterative/dvc#10146. |
that's what I was trying to double check. fsspec is not thread safe. I think it should be fine, one thing potentially to look after is not |
Can we do a test for this btw? |
We should not depend on this behavior of s3fs, as other fsspec filesystems could connect eagerly on instantiation (eg: dvcfs, webhdfs, etc). We try to handle this ourselves by lazily constructing underlying filesystems. |
This PR aims to fix the following issue:
Using dvc with aws-vault, assume role and a source_profile occasionally fails with:
Using process authentification, assume role and source_profile in the aws config, loading the credentials will fail with
InfiniteLoopConfigError
as seen in the following example:This is due to the resolver instance not setting its seen profiles state when calling the function
load_credentials
, thus when running a second time, the profile will already be present in its seen profiles raising the previously stated error.s3fs
will load the credentials multiple times if the session is not created before using the api with multiple threads of multiple async jobs.Enforcing the connection when creating the S3FileSystem instance avoids this issue as stated in the s3fs documentation:
Closes iterative/dvc#10146.