-
Notifications
You must be signed in to change notification settings - Fork 3k
Azure: Support vended credentials refresh in ADLSFileIO. #11577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure: Support vended credentials refresh in ADLSFileIO. #11577
Conversation
|
cc// @nastra @jackye1995 @amogh-jahagirdar @munendrasn Can you please help with the review. |
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
| private LoadCredentialsResponse fetchCredentials() { | ||
| Map<String, String> headers = | ||
| RESTUtil.merge( | ||
| configHeaders(properties), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate why this is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to implement the similar behaviour present in RESTSessionCatalog, where catalog can be configured to pass explicit headers to server by setting the configuration with header. prefix.
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
| private static final long MIN_REFRESH_WAIT_MILLIS = 10; | ||
|
|
||
| public AzureSasCredentialRefresher( | ||
| Supplier<Pair<String, Long>> sasTokenWithExpirationSupplier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this actually need to be a supplier given that it's being immediately used in L40?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the supplier is being used during initialization and also during each scheduled refresh to fetch the new credentials. Supplier holds the logic to fetch new credentials from the API endpoint, since we are going to use it multiple times I modelled it as supplier instead of single method call. Please suggest if there is a cleaner way to achieve the same.
azure/src/test/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProviderTest.java
Outdated
Show resolved
Hide resolved
| .isEqualTo(credential.config().get(ADLS_SAS_TOKEN_PREFIX + STORAGE_ACCOUNT)); | ||
|
|
||
| Thread.sleep(20); | ||
| // Since expiration time past to current time, the refresh will fall back at minimum 10ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment is trying to say. Also what happens if you remove the sleep time above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Since the credential refresh implementation is based on scheduling a new refresh using the server provided expiration time,
- this test is covering the scenario where server responds with already expired token (expires-at-ms < current-time).
- The scheduling delay for consequent refresh is minimum delay of 10ms as per the logic in refreshDelayMillis()
- If we remove the sleep time, the credentials will be fetched from API only once. Depending on the test case execution time (> 10ms) there can be second credential fetch. To keep the test case behaviour consistent I used thread sleep.
|
@ChaladiMohanVamsi thanks for working on this. Do you have a way of actually testing this PR with an ADLS environment and see whether the refreshes work? |
Co-authored-by: Eduard Tudenhoefner <[email protected]>
|
@nastra @amogh-jahagirdar Thanks for the review and suggestions. I have addressed the review comments. I was able to test the credentials refresh logic in ADLS environment, since I don't have the new credentials API endpoint spec implementation in my env I used loadTable endpoint for testing by tweaking few parts of this PR. |
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java
Show resolved
Hide resolved
| if (null == client) { | ||
| synchronized (this) { | ||
| if (null == client) { | ||
| DefaultAuthSession authSession = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to change with the introduction of the new Auth manager, so I would wait until that goes in then switch over to using that approach. See how this is done for S3 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated the Auth manager releated changes referring to the changes for S3 FileIO and GCS FileIO.
cc// @adutra can you please review the changes.
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java
Show resolved
Hide resolved
|
@ChaladiMohanVamsi: Are you still working on this PR? @nastra has added 1.9.0 milestone for this. I was thinking a 1.9.0 release cut tomorrow evening if everything is merged. |
Yes @ajantha-bhat, I will try to address the comments by today. |
|
@nastra @danielcweeks can you please review on the latest changes handling review comments. |
| Map<String, String> credentialProviderProperties = Maps.newHashMap(allProperties); | ||
| credentialProviderProperties.put( | ||
| VendedAdlsCredentialProvider.URI, adlsRefreshCredentialsEndpoint); | ||
| Optional.ofNullable(allProperties.get(OAuth2Properties.TOKEN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this, since we're already copying over allProperties into credentialProviderProperties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed redundant operation of setting token.
| public void close() { | ||
| if (vendedAdlsCredentialProvider != null) { | ||
| vendedAdlsCredentialProvider.close(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline after }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied
|
|
||
| public VendedAdlsCredentialProvider(Map<String, String> properties) { | ||
| Preconditions.checkArgument(null != properties, "Invalid properties: null"); | ||
| Preconditions.checkArgument(null != properties.get(URI), "Invalid URI: null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please apply the below diff to this class?
--- a/azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java
+++ b/azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java
@@ -28,6 +28,7 @@ import java.time.ZoneOffset;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
import org.apache.iceberg.azure.AzureProperties;
import org.apache.iceberg.io.CloseableGroup;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
@@ -48,6 +49,8 @@ public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable
public static final String URI = "credentials.uri";
private final SerializableMap<String, String> properties;
+ private final String credentialsEndpoint;
+ private final String catalogEndpoint;
private transient volatile Map<String, SimpleTokenCache> sasCredentialByAccount;
private transient volatile HTTPClient client;
private transient AuthManager authManager;
@@ -55,8 +58,12 @@ public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable
public VendedAdlsCredentialProvider(Map<String, String> properties) {
Preconditions.checkArgument(null != properties, "Invalid properties: null");
- Preconditions.checkArgument(null != properties.get(URI), "Invalid URI: null");
+ Preconditions.checkArgument(null != properties.get(URI), "Invalid credentials endpoint: null");
+ Preconditions.checkArgument(
+ null != properties.get(CatalogProperties.URI), "Invalid catalog endpoint: null");
this.properties = SerializableMap.copyOf(properties);
+ this.credentialsEndpoint = properties.get(URI);
+ this.catalogEndpoint = properties.get(CatalogProperties.URI);
}
String credentialForAccount(String storageAccount) {
@@ -117,7 +124,7 @@ public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable
synchronized (this) {
if (null == client) {
authManager = AuthManagers.loadAuthManager("adls-credentials-refresh", properties);
- HTTPClient httpClient = HTTPClient.builder(properties).uri(properties.get(URI)).build();
+ HTTPClient httpClient = HTTPClient.builder(properties).uri(catalogEndpoint).build();
authSession = authManager.catalogSession(httpClient, properties);
client = httpClient.withAuthSession(authSession);
}
@@ -130,7 +137,7 @@ public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable
private LoadCredentialsResponse fetchCredentials() {
return httpClient()
.get(
- properties.get(URI),
+ credentialsEndpoint,
null,
This is fixing an issue that we're currently also fixing for S3 / GCP in #12612 / #12638. You will also have to update the tests in a similar way as in those PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied the patch and associated test cases.
| import com.azure.core.http.policy.HttpPipelinePolicy; | ||
| import reactor.core.publisher.Mono; | ||
|
|
||
| public class VendedAzureSasCredentialPolicy implements HttpPipelinePolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should be able to make this non-public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it to non-public.
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM once the remaining comments have been applied
azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSFileIO.java
Outdated
Show resolved
Hide resolved
….java Co-authored-by: Daniel Weeks <[email protected]>
danielcweeks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending checks. Thanks @ChaladiMohanVamsi !
|
thanks @ChaladiMohanVamsi |
| vendedAdlsCredentialProvider.close(); | ||
| } | ||
|
|
||
| DelegateFileIO.super.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be enclosed in finally block so that it is always run ?
Proposed Change
Add support to refresh and consume vended storage credentials for ADLSFileIO.
New Azure properties
client.refresh-credentials-enabledproperty to control credential refresh and defaults to true.client.refresh-credentials-endpointdefines endpoint to fetch the refresh credentials with below spec.Requirements of credential config
adls.sas-token-expire-at-ms.similar to existing sas token prefixadls.sas-token.Similar PRs for other FileIOs