-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS: Use custom Execution interceptor to support multiple storage credentials #12827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Use custom Execution interceptor to support multiple storage credentials #12827
Conversation
0288511 to
fed52dd
Compare
|
@nastra I'm a little concerned about this approach because we're doing a lot of hand-crafting/manipulation of the request as opposed to using features of the SDK. There are two alternatives I can think of:
While the I think I prefer #1 above because it is an approach we can replicate for the GCSFileIO and ADLSFileIO. The more similar the implementations, the less likely we'll have inconsistent handling. |
|
Thanks @danielcweeks, my understanding for the approaches are the following : For 2, Prefix level credProvider, are we thinking of creating a cred provider per path, and each cred provider will do a refresh logic and get the cred for the the path it corresponds to ? this would trigger a lot of request to the creds endpoint ? I understand the ExecutionInterceptor is very specific to AWS, and touches the request at a very low level, but it helps in not having to manage a lot of connections to s3 (per prefix) or making a lot of calls to creds refresh endpoint. It is never the less something S3A supports, though not for cred vending for audit logging : https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/auditing.html Sorry for hopping in this discussion, I wanted to contribute here based on what i have seen for solving these cred vending issues in past ! |
I don't think we would expect that there are going to be large numbers of prefixes. A table will typically have just a single prefix. We're trying to support cases where there are multiple but also cannot express the policy within the size limits of a single credential. Reusing the client would address these concerns, but I don't think we even need to address that case unless it's trivial to do as part of this support. I feel like the |
|
I see, thanks for the explanation @danielcweeks if we are sure that we don't want to have large number of prefixes and this is only there to suppport case is like support cases where there are multiple but also cannot express the policy within the size limits of a single credential |
The idea of this PR was to just explore how something with an |
This uses a custom
ExecutionInterceptorto set the correct headers when storage credentials are configured forS3FileIO. This is an alternative approach to #12799 and is currently WIP (I haven't actually tested this yet to see whether it works properly and some additional tests are missing)