-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17536. ABFS: Supporting customer provided encryption key #2707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the need for this.
- One thing that it's not yet covered is the need to get this into delegation tokens. Look at what was done for the S3A DT's there: the encryption settings and secret key is picked up on the client and passed into the cluster so users can submit jobs into a shared cluster, without that shared cluster having the encryption key. Is this needed here? If so AbfsDelegationTokenIdentifier will need to be extended.
- is it ever the case that different stores will have different keys?
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: |
| return requestHeaders; | ||
| } | ||
|
|
||
| private void addServerSideEncryptionHeaders( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addCustomerProvidedKeyHeaders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| private static final int INT_512 = 512; | ||
| private static final int INT_50 = 50; | ||
|
|
||
| private final IdentityTransformerInterface identityTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| @Test | ||
| public void testAppendWithCPK() throws Exception { | ||
| boolean isWithCPK = true; | ||
| final AzureBlobFileSystem fs = getAbfs(isWithCPK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| .equalsIgnoreCase(headerName)).findFirst(); | ||
| String desc; | ||
| if (isCPKHeaderExpected) { | ||
| desc = "CPK hear should be resent"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } | ||
|
|
||
| @Test | ||
| public void testAppendWithoutCPK() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have test cases for 2 CPK enabled accounts to ensure all is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Yes, each account can have it's own key. |
|
🎊 +1 overall
This message was automatically generated. |
| properties=("fs.azure.abfs.account.name" "fs.azure.test.namespace.enabled" | ||
| "fs.azure.account.auth.type") | ||
| values=("{account name}.dfs.core.windows.net" "true" "OAuth") | ||
| values=("abfsitgen2.dfs.core.windows.net" "true" "OAuth") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
| .equalsIgnoreCase(headerName)).findFirst(); | ||
| String desc; | ||
| if (isCPKHeaderExpected) { | ||
| desc = "CPK header is expected, but the same is absent."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention header name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Outdated
Show resolved
Hide resolved
bilaharith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review
| /** Setting this true will make the driver use it's own RemoteIterator implementation */ | ||
| public static final String FS_AZURE_ENABLE_ABFS_LIST_ITERATOR = "fs.azure.enable.abfslistiterator"; | ||
| /** Server side encryption key */ | ||
| public static final String FS_AZURE_CLIENT_PROVIDED_ENCRYPTION_KEY = "fs.azure.client-provided-encryption-key"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dot as a delimiter has been the trend. Any reason to deviate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not use the dot here. The dot is used to namespace. - is fine else make it one word
| */ | ||
| public class AbfsClient implements Closeable { | ||
| public static final Logger LOG = LoggerFactory.getLogger(AbfsClient.class); | ||
| private static final String SERVER_SIDE_ENCRYPTION_ALGORITHM = "AES256"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these to FileSystemConfigurations.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| public AbfsRestOperation getPathStatus(final String path, final boolean includeProperties) throws AzureBlobFileSystemException { | ||
| final List<AbfsHttpHeader> requestHeaders = createDefaultHeaders(); | ||
| addCustomerProvidedKeyHeaders(requestHeaders); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When includeProperties is true, REST API GetFileProperties is invoked (which has user metadata in response) and is the case where CPK headers are needed.
REST API GetPathStatus doesnt consume CPK headers. Tomorrow if there is a rejection of headers not consumed, requests will fail.
addCPK for includeProperties=true case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // dump the headers | ||
| AbfsIoUtils.dumpHeadersToDebugLog("Response Headers", | ||
| connection.getHeaderFields()); | ||
| responseHeaders); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New method is introduced to print response headers which simply accepts the list of response headers
|
|
||
| // Reset URL with configured filesystem | ||
| final String abfsUrl = this.getFileSystemName() + "@" + this.getAccountName(); | ||
| final String abfsUrl = this.getFileSystemName() + "@" + authorityParts[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was buggy. authorityParts[1] is the actual FS to be loaded, getAccountName returns the FS that is created dynamically.
|
|
||
| private String getUserAgentString(AbfsConfiguration config, | ||
| boolean includeSSLProvider) throws MalformedURLException { | ||
| boolean includeSSLProvider) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbfsClient constructor could throw IOException, wiuth the new encoding and hashing logic included
| AbfsClient baseAbfsClientInstance, | ||
| AbfsConfiguration abfsConfig) | ||
| throws AzureBlobFileSystemException { | ||
| AbfsConfiguration abfsConfig) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbfsClient constructor could throw IOException, wiuth the new encoding and hashing logic included
| public AbfsRestOperation read(final String path, final long position, final byte[] buffer, final int bufferOffset, | ||
| final int bufferLength, final String eTag, String cachedSasToken) throws AzureBlobFileSystemException { | ||
| final List<AbfsHttpHeader> requestHeaders = createDefaultHeaders(); | ||
| addCustomerProvidedKeyHeaders(requestHeaders); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purposes of debugging, add a log line if request fails and the sha256 of the key used in request and the server returned sha256 in response are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server doesn't send back the actual sha of the encryption key with which the data is encrypted if the encryption key sent is wrong
snvijaya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the comments.
snvijaya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - Some minor comments as discussed. Please address.
bilaharith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @steveloughran Could you please take a look
|
🎊 +1 overall
This message was automatically generated. |
I've been on a little vacation. I did have some review which was unsubmitted...now looks out of date. Will review again |
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good test coverage; I do think there is duplication in there of disabling caching and creating new filesystems then invoking an operation to raise an IOE. At the very least, combine the conf.set, filesystem.get and the casting to some createNewABFS(Configuration) method.
If CPK mismatch results in a specific IOE subclass (and it should!) this is what intercept should look for
- Avoids any other reason for the call to fail being interpreted as success in the test.
- guarantees that subclassed exception will be raised when expected.
I'm afraid we also need some documentation somewhere. This is quite a large new feature and needs coverage. Yes software engineers get the help with the documentation too. This can be beneficial: if you find something is hard to explain, it will be hard to use.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Outdated
Show resolved
Hide resolved
| import org.apache.hadoop.fs.permission.AclEntry; | ||
| import org.apache.hadoop.fs.permission.FsAction; | ||
| import org.apache.hadoop.fs.permission.FsPermission; | ||
| import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, these do need to go up into the "non org-apache section"; relates to how backporting usually needs to revert these back to the com.google package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
| String expectedCPKSha = getCPKSha(fs); | ||
|
|
||
| byte[] fileContent = getRandomBytesArray(FILE_SIZE); | ||
| Path testFilePath = new Path(testFileName+"1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Show resolved
Hide resolved
bilaharith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting you to review
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, some minor test details
+1 pending those changes (or you disagreeing with me about their need)
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Outdated
Show resolved
Hide resolved
| conf.set(FS_AZURE_CLIENT_PROVIDED_ENCRYPTION_KEY + "." + accountName, | ||
| "different-1234567890123456789012"); | ||
| conf.set("fs.abfs.impl.disable.cache", "true"); | ||
| AzureBlobFileSystem fs2 = (AzureBlobFileSystem) FileSystem.get(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, you can use FileSystem.newInstance()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestCustomerProvidedKey.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java
Show resolved
Hide resolved
bilaharith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Steve, Please review.
|
🎊 +1 overall
This message was automatically generated. |
|
+1, merged to trunk. Thank you! |
Contributed by bilahari t h Change-Id: I86216e755b81e9d14f5e87844d9fd58e8940560c
…he#2707) Contributed by bilahari t h
…on key (apache#2707) Note: Contains one file change to "ITestCustomerProvidedKey.java" from "CDPD-29718. HADOOP-17290. ABFS: Add Identifiers to Client Request Header" required for not breaking the build and was previously skipped. Contributed by bilahari t h Change-Id: I86216e755b81e9d14f5e87844d9fd58e8940560c
The data for a particular customer needs to be encrypted on account level. At server side the APIs will start accepting the encryption key as part of request headers. The data will be encrypted/decrypted with the given key at the server.
Since the ABFS FileSystem APIs are implementations for Hadoop FileSystem APIs there is no direct way with which customer can pass the key to ABFS driver. In this case driver should have the following capabilities so that it can accept and pass the encryption key as one of the request headers.
There should be a way to configure the encryption key for different accounts.
If there is a key specified for a particular account, the same needs to be sent along with the request headers.
Config changes
They key for an account can be specified in the core-site as follows.
fs.azure.account.client-provided-encryption-key.{account name}.dfs.core.windows.net