Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure that s3 is connected before using the s3fs api #67

Merged
merged 2 commits into from
Mar 3, 2024

Conversation

GuichardVictor
Copy link
Contributor

@GuichardVictor GuichardVictor commented Mar 1, 2024

This PR aims to fix the following issue:

Using dvc with aws-vault, assume role and a source_profile occasionally fails with:

ERROR: unexpected error - Infinite loop in credential configuration detected. Attempting to load from profile my-profile-2 which has already been visited. Visited profiles: ['profile-1', 'profile-2']

Using process authentification, assume role and source_profile in the aws config, loading the credentials will fail with InfiniteLoopConfigError as seen in the following example:

import botocore.session
import botocore.credentials

session = botocore.session.Session(profile="profile")
resolver = botocore.credentials.create_credential_resolver(session, region_name=None)

credentials = resolver.load_credentials() # <--- works
credentials = resolver.load_credentials() # <--- fails with infinite loop credentials

This is due to the resolver instance not setting its seen profiles state when calling the function load_credentials, thus when running a second time, the profile will already be present in its seen profiles raising the previously stated error.

s3fs will load the credentials multiple times if the session is not created before using the api with multiple threads of multiple async jobs.

Enforcing the connection when creating the S3FileSystem instance avoids this issue as stated in the s3fs documentation:

You must also explicitly await the client creation before making any S3 call.

Closes iterative/dvc#10146.

@skshetry
Copy link
Member

skshetry commented Mar 2, 2024

Hey, could you please fix the linter errors, and also see if the tests passes?

To run linter errors, you need to do following:

pip install pre-commit
pre-commit run --all

You can run the tests as follows:

pip install -e ".[dev]"
pytest -nlogical

@skshetry skshetry self-requested a review March 2, 2024 04:59
@skshetry
Copy link
Member

skshetry commented Mar 2, 2024

Could this be fixed in fsspec/s3fs?

@GuichardVictor
Copy link
Contributor Author

It could be fixed in fsspec/s3fs, however based on their documentation, it is asked to connect before using multiple jobs, so I believe, it is the intended behavior. We could also trace back to botocore: boto/botocore#1780 but it looks like the intended behavior. Would like me to open an issue in https://github.com/fsspec/s3fs?

@shcheklein shcheklein requested a review from rlamy March 2, 2024 17:34
@shcheklein
Copy link
Member

@GuichardVictor thanks, it's a good find.

do you know where do we call (or fsspec does) connect if we don't explicitly do it here? Why do we have two or more connect calls in the first place?

@GuichardVictor
Copy link
Contributor Author

GuichardVictor commented Mar 2, 2024

This is what I found when investigating on this issue:

I believe that calls to s3fs functions use the following function _call_s3 defined here.

The function calls set_session. While they check if the session is already created to avoid creating a new one, when running multiple jobs (which dvc does by default) there might be a race condition.

Due to how the credential provider from botocore works it raises the infinite loop error.

While this may need to be also fixed in s3fs, this looks to be the intended behavior as s3fs explicitly ask to create a session before using it.

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GuichardVictor for the contributions.

@skshetry
Copy link
Member

skshetry commented Mar 3, 2024

Why do we have two or more connect calls in the first place?

@shcheklein, this happens because we run many jobs in multiple threads or asynchronously without doing .connect() first.

s3fs does connect() lazily (e.g.: if self._s3 is not set), all the jobs will race to connect() and end up calling multiple times.

See iterative/dvc#10146.

@skshetry skshetry merged commit e2c4886 into iterative:main Mar 3, 2024
13 checks passed
@shcheklein
Copy link
Member

s3fs does connect() lazily (e.g.: if self._s3 is not set), all the jobs will race to connect() and end up calling multiple times.

that's what I was trying to double check. fsspec is not thread safe.

I think it should be fine, one thing potentially to look after is not fs method throwing some stuff like "bad credentials" and trying to access eagerly AWS (to get a token). In a context of Studio, and in some other cases it was creating some issues. E.g. commands that don't even depend on remote storages were throwing errors. Or commands that even initialize fs but don't download anything also could now throw errors.

@shcheklein
Copy link
Member

Can we do a test for this btw?

@skshetry
Copy link
Member

skshetry commented Mar 3, 2024

We should not depend on this behavior of s3fs, as other fsspec filesystems could connect eagerly on instantiation (eg: dvcfs, webhdfs, etc).

We try to handle this ourselves by lazily constructing underlying filesystems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

push: Infinite loop in credential configuration detected
3 participants