Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Aug 27, 2025

Which issue does this PR close?

Closes #2243.

Rationale for this change

I also noticed the Apache OpenDAL project, which supports object_store and many file services. Perhaps we can integrate it to access more file services.

What changes are included in this PR?

add hdfs-opendal feature to support hdfs with opendal

How are these changes tested?

Successfully run CometReadHdfsBenchmark locally (tips: build native enable hdfs-opendal: cd native && cargo build --features hdfs-opendal)

@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.77%. Comparing base (f09f8af) to head (488d649).
⚠️ Report is 434 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2244      +/-   ##
============================================
+ Coverage     56.12%   57.77%   +1.64%     
- Complexity      976     1291     +315     
============================================
  Files           119      145      +26     
  Lines         11743    13360    +1617     
  Branches       2251     2378     +127     
============================================
+ Hits           6591     7719    +1128     
- Misses         4012     4384     +372     
- Partials       1140     1257     +117     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wForget wForget marked this pull request as ready for review August 27, 2025 09:35
@andygrove
Copy link
Member

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

@comphead
Copy link
Contributor

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings
which can lead to lack of support for IO retries, authentication, etc
but this PR uses openDAL as an optional feature, which IMO should be fine

@comphead
Copy link
Contributor

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

@andygrove
Copy link
Member

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

OpenDAL is governed by the Apache Software Foundation, which is nice too. It also supports object-store as a backend.

@mbutrovich
Copy link
Contributor

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

I think Iceberg-rs is relying on it, so that's a big motivation to contribute to it.

@wForget
Copy link
Member Author

wForget commented Aug 29, 2025

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings which can lead to lack of support for IO retries, authentication, etc but this PR uses openDAL as an optional feature, which IMO should be fine

The dependency you mentioned corresponds to the services-hdfs-native feature of OpenDAL, which is a fully native HDFS client. This PR introduces the service-hdfs feature, which uses hdrs crate, a jvm based libhfs.

@wForget
Copy link
Member Author

wForget commented Aug 29, 2025

@wForget is there a real use case for open_dal?

No, currently we use gluten as spark native engine, but we also use jvm-based libhdfs

@comphead
Copy link
Contributor

Thanks @wForget I was referring to native-hdfs crate as you correctly mentioned.
Reg to hdrs it looks interesting and more supported than hdfs crate with the same architecture and based on libhdfs which based on jvm libs.

Is there an object store implementation based on hdrs? from PR ny understanding object_store_opendal is the implemenation?

@wForget
Copy link
Member Author

wForget commented Aug 29, 2025

Is there an object store implementation based on hdrs? from PR ny understanding object_store_opendal is the implemenation?

It seems to be https://github.com/apache/opendal/tree/main/integrations/object_store

@comphead
Copy link
Contributor

Thanks I'm planning to run some tests this weekend using this feature and local HDFS 3 node cluster

@comphead
Copy link
Contributor

comphead commented Sep 2, 2025

I'm still on it @wForget, the local hdfs cluster setup having some issue

@wForget
Copy link
Member Author

wForget commented Sep 2, 2025

I'm still on it @wForget, the local hdfs cluster setup having some issue

Thank you for your verification and feedback.

@comphead
Copy link
Contributor

comphead commented Sep 3, 2025

I made smoke checks on my local 3 nodes cluster and it works, however I haven't checked auth part, @parthchandra WDYT?

Its probably doesn't make sense of having 2 similar libhdfs clients in the project, however open dal one is more actively supported

@parthchandra
Copy link
Contributor

I made smoke checks on my local 3 nodes cluster and it works, however I haven't checked auth part, @parthchandra WDYT?

I'll try this out (though it may be a couple of days before I get to it). In the meantime, I see no issue with having this in the code base given that it is behind a feature flag (as is the current fs-hdfs based implementation).
I suspect that we will have to make minor changes to use this when accessing object store implementations that already have native implementations (e.g. S3) and I would like to be able to benchmark the various implementations at some point.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @wForget and @parthchandra

@comphead comphead merged commit a74fea1 into apache:main Sep 3, 2025
93 checks passed
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
* feat: Support hdfs with OpenDAL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Apache OpenDAL to support more file serivces

6 participants