Skip to content

Conversation

@trakos
Copy link
Contributor

@trakos trakos commented Mar 3, 2023

HDFS plugin for snapshots was restricted to hdfs:// scheme. However, the client library it uses also supports webhdfs:// and swebhdfs:// schemes.

This PR enables use of webhdfs and swebdhfs schemes for HDFS snapshot repositories.

It only required allowing webhdfs and swebhdfs schemes in URI validation, and adding a couple of permissions: doAs is required for (s)webhdfs, and runtime setFactory is required for swebhdfs.

All other changes are related to tests - I've added plugins:repository-hdfs:yamlRestTestWeb2 and plugins:repository-hdfs:yamlRestTestWeb3 gradle tasks that use miniHDFS for integration tests of snapshots using webhdfs protocol.

I've also tested it manually against Hadoop proxy with SSL enabled (swebhdfs protocol).

This is related to #24455, though I didn't test Kerberized webhdfs setup. I think that might be unsupported in Hadoop client in the first place: https://issues.apache.org/jira/browse/HDFS-15765

The original issue was closed due to low demand, but seeing that it only requires loosening the validation and couple permissions, this should be well worth it.

In our case, we prefer to use webhdfs because it's much easier to secure - it's just an HTTP REST API.

@elasticsearchmachine elasticsearchmachine added v8.8.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 3, 2023
@HiDAl HiDAl added :Data Management/Other Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Mar 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@jbaiera jbaiera self-requested a review March 9, 2023 16:07
Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiya! Thanks for the PR submission. I've been giving it a cursory review as my schedule has permitted. I've left some surface level comments to start and will be digging a bit deeper soon.

Any time we add HDFS functionality that requires permission changes we need to be pretty careful. The Hadoop project has historically taken liberties on how it modifies the JVM it is running on. This review might be a bit slow, sorry in advance!

}
uri = URI.create(uriSetting);
if ("hdfs".equalsIgnoreCase(uri.getScheme()) == false) {
if (uri.getScheme() == null || allowedSchemes.contains(uri.getScheme().toLowerCase(Locale.ROOT)) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just check the URI scheme without lower casing. Is there a case where it makes sense to accept WEBHdfs as the scheme when most would put it as lowercase to begin with?

Copy link
Contributor Author

@trakos trakos Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the toLowerCase call now, but I want to highlight equalsIgnoreCase was used for comparison before my change.

So, I guess that technically this could be a breaking change for someone, however unlikely. Normally, it will just throw an exception if you try to use uppercased HDFS as a scheme:

org.apache.hadoop.fs.UnsupportedFileSystemException: fs.AbstractFileSystem.HDFS.impl=null: No AbstractFileSystem configured for scheme: HDFS

However, technically, you can avoid that error by setting conf.fs.AbstractFileSystem.HDFS.impl to org.apache.hadoop.hdfs.DistributedFileSystem, or maybe even set that to a different, custom implementation class.

So, that's why I just proposed the toLowerCase call initially - I wanted to avoid breaking some edge-case setup. Would this need to be a deprecation first?

);
}
if (Strings.hasLength(uri.getPath()) && uri.getPath().equals("/") == false) {
if (uri.getScheme().equals("hdfs") && Strings.hasLength(uri.getPath()) && uri.getPath().equals("/") == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test is specific to hdfs? If the uri is specified as webhdfs://hostname:9870/path/to/repo I think we would want to intercept and advise on that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that's a bit of wishful thinking on my part. You see, I have separate PR for Hadoop: apache/hadoop#5447 that would utilize base path in webhdfs-scheme URIs. I was actually testing this PR combined with my Hadoop PR at one point.

For some context: webhdfs is using regular HTTP(S) requests for interacting with filesystem. The requests are being made to base scheme URI + a hardcoded prefix, /webhdfs/v1. However, there are some webhdfs proxies, most notably Apache Knox, that have a different API path. It's also easy to set up a custom proxy, since it's just HTTP.

For Apache Knox, the default path is /gateway/<gateway-name>/webhdfs/v1. Somewhat ironically, Apache Hadoop's client does not support this custom path currently, unlike most third party clients. In that PR apache/hadoop#5447 I've linked to before I proposed to add support for it by utilizing the path in connection URI. So, while it's not the same as filesystem base path, there would be a use-case where user would want to provide a separate path in connection URI, for example webhdfs://hostname:9870/gateway/myname/webhdfs/v1/. It would be only path to API, different from a path option.

Anyway, I can make this validation apply to (s)webhdfs schemes for now if you'd prefer, please let me know. We could then perhaps revisit if Hadoop would accept the change.

Comment on lines 80 to 84
new AuthPermission("doAs")
};
SWEBHDFS_PERMISSIONS = new Permission[] {
new AuthPermission("doAs"),
new RuntimePermission("setFactory"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit in the weeds as a request, but the above registry of permissions details why they are needed. Can we add a quick blurb for each permission about why they are included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've added comments to that section. For some more details:

  • doAs permission was required (IIRC) due to the call here. This permissions was already required for HDFS plugin in Elasticsearch, but only for Kerberized setup. Now, it's also required for webhdfs, even without Kerberos.
  • setFactory is required for the call here. This is required only for the swebhdfs scheme, as webhdfs doesn't use SSL connection.

@trakos
Copy link
Contributor Author

trakos commented Apr 1, 2023

Thanks for taking a look @jbaiera! I've answered the initial feedback.

This review might be a bit slow, sorry in advance

No worries, take your time! Thanks for being thorough.

@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
@jbaiera jbaiera self-requested a review July 24, 2023 19:36
@jbaiera jbaiera self-assigned this Jul 24, 2023
Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again for the long delay on getting this review up, I've left a couple of comments on here, mostly some small nits/questions. There is one comment about the permissions that I think we'll want to get some additional approval on regarding setFactory but otherwise I think things are looking good so far. I'll enable testing as well.

systemProperty 'tests.rest.suite', 'hdfs_repository_' + hadoopVer + '/10_basic'
}
// HA fixture is unsupported. Don't run them.
tasks.named("yamlRestTestSecure" + hadoopVer).configure {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small Nit: Can we move this down a little to rectify the code comment above it? The HA tests are done by the javaRestTest* tasks, and while we may justifiably want to disable all the yamlRestTest* tasks here the comment placement is inaccurate.

This build file is really complicated. I normally wouldn't bring this kind of thing up otherwise

tasks.named("yamlRestTestWeb" + hadoopVer).configure {
dependsOn "hdfs" + hadoopVer + "Fixture"
// The normal test runner only runs the standard hdfs rest tests
systemProperty 'tests.rest.suite', 'hdfs_repository_' + hadoopVer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the webhdfs rest tests instead of the hdfs ones?


// org.apache.hadoop.hdfs.web.SSLConnectionConfigurator
// This is used by swebhdfs connections
permission java.lang.RuntimePermission "setFactory";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here's the deal with this permission: Frustratingly, Java uses this permission to guard against setting a SSL socket factory on both a connection instance as well as the global default for all https connections. In fact, it's used to guard against setting all sorts of weird global state in the running JVM.

I'll ask around with our more Security minded folks and either link them here for their blessing on this, or collect feedback and share it here. I imagine that using webhdfs without SSL support is not exactly an attractive prospect. There is some prior art in the code base that might support this usage, but I need to double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not ideal, the grant here seems reasonable (given the usage).

I'm less concerned with the grant of setFactory permission, than I would be with other already granted permissions. And we already "trust" several other plugins and their dependencies with this permission.

@jbaiera jbaiera self-requested a review August 2, 2023 21:44
@jbaiera
Copy link
Member

jbaiera commented Aug 2, 2023

@elasticmachine ok to test

@jbaiera
Copy link
Member

jbaiera commented Aug 3, 2023

@elasticmachine update branch

}

def getHttpPortForVersion(hadoopVersion) {
return 10004 - (2 * hadoopVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running the tests locally and the fixture does not start due to an already used port. I think these port assignment functions need to be updated to use an offset of 3 instead of 2 now.

@quux00 quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023
@mattc58 mattc58 removed the v8.11.0 label Oct 4, 2023
@mattc58 mattc58 added the v8.12.0 label Oct 4, 2023
@jbaiera jbaiera removed the v8.12.0 label Oct 26, 2023
@dakrone dakrone added :Data Management/HDFS HDFS repository issues and removed :Data Management/Other labels Nov 16, 2023
@jbaiera
Copy link
Member

jbaiera commented Feb 6, 2024

Since the conversation has come to a bit of a halt I'm going to close this PR for now. If you would like to continue development on it, please let us know and we can happily reopen it!

@jbaiera jbaiera closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/HDFS HDFS repository issues >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team

Projects

None yet

Development

Successfully merging this pull request may close these issues.