Skip to content

Conversation

@alamb
Copy link

@alamb alamb commented Apr 11, 2025

This is part of testing delta.rs with DataFusion 37 (which will, among other things, use a new version of arrow-rs and object_store)

I don't really intend this PR to be merged as is -- it is simply enough modifications to get the code to compile to unblock me testing delta.rs

Notes for anyone who finds this PR:

  1. I could not figure out how to make chrono work correctly with 53/54/55. I suspect what is needed is to feature gate a chrono crate the way you feature gate arrow and pick the right version when needed
  2. the HDFS object store crate uses object_store 0.11.0 so that woudl need to be updated to 0.12.0 before updating this PR

Note this is based on the v0.8.0 tag

//! This module exists to help re-export the version of arrow used by default-engine and other
//! parts of kernel that need arrow
#[cfg(feature = "arrow_53")]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure that simply deleting support for the older arrow releases is not the right approach for this crate but it was the most expedient for my purposes

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Apr 11, 2025
@zachschuermann
Copy link
Member

awesome, thanks @alamb! just saw arrow 55 is out, thanks for starting this - ill pick it up and see if we can support arrow 55 soon! regarding your questions:

  1. I could not figure out how to make chrono work correctly with 53/54/55. I suspect what is needed is to feature gate a chrono crate the way you feature gate arrow and pick the right version when needed

yea we have pinned chrono so that we can support arrow 53 (some context here) - perhaps we can just get by with dropping 53 support now and just do 54/55?

  1. the HDFS object store crate uses object_store 0.11.0 so that woudl need to be updated to 0.12.0 before updating this PR

ah yep - I can try to follow up over there to see if they can bump too!

@alamb
Copy link
Author

alamb commented Apr 11, 2025

yea we have pinned chrono so that we can support arrow 53 (some context #784) - perhaps we can just get by with dropping 53 support now and just do 54/55?

I don't think I am in a position to judge this 🤷 -- that will be on the maintainers of this crate I think

@zachschuermann
Copy link
Member

did the same thing for the hdfs-native-object-store crate to update to object_store 0.12: datafusion-contrib/hdfs-native-object-store#15

@rtyler
Copy link
Member

rtyler commented Apr 12, 2025

@alamb Thanks for taking a crack at this, I noticed 55 yesterday when doing
some unrelated work too and queued this up on my todo list 😄

Personally I think the hdfs-native-object-store is a boat anchor that should be
jettisoned as a dependency. I begrudgingly update the deltalake-hdfs package
because somebody donated that code a while ago and it has been nothing but
trouble.

I'll see what I can do with the chrono juggling to ensure that kernel try to keep support for multiple arrow versions via feature flags. If we can get that to work, then the arrow_55 flag could go into a 0.9.1 without much trouble 🤞

@Kimahriman
Copy link

Personally I think the hdfs-native-object-store is a boat anchor that should be jettisoned as a dependency. I begrudgingly update the deltalake-hdfs package because somebody donated that code a while ago and it has been nothing but trouble.

Donator of the code here. What trouble does it cause? First I'm hearing of any issues. Just something with transitive dependencies?

I didn't notice there was a new object store release, but happy to push a new release out asap for it.

@rtyler
Copy link
Member

rtyler commented Apr 22, 2025

@Kimahriman the challenge with hdfs-native-object-store is not really about the code that's there or how it works, but the dependency treadmill that we can get trapped in.

Every set of arrow/datafusion/object_store releases creates a juggling act that must be done before any new release can be cut. This can mean that in the cases of object_store incompatibilities the delta-rs project is blocked from releasing new code until hdfs-native-object-store is upgraded, a crate outside of our control.

The crate does get updated fairly regularly which is helpful. I have prepared what I hope is an informative graphic for this situation (ref) 😺

hdfsnativeobjectstore

With #873 I'm hoping that we can unlink delta-kernel-rs from hdfs-native-object-store as a hard linked dependency.

@Kimahriman
Copy link

Kimahriman commented Apr 22, 2025

Seems slightly dramatic considering the the big effort it takes for every arrow and datafusion major version bump 😛 , and I can have a new release out in like 30 minutes once I actually know there's a new object store version out there.

I might make an issue on the object store repo seeing if they'd be willing to just merge it into their base to remove that as an issue completely. The whole thing is a very thin wrapper around my actual HDFS library, so it would only need updates for the API changes, and I could get a second set of eyes on how I've implemented the object_store semantics

rtyler pushed a commit that referenced this pull request Apr 27, 2025
## What changes are proposed in this pull request?
TLDR: add support for arrow 55 and drop support for arrow 53 (this let's
us unpin our `chrono` dependency)

Exhaustive list of changes:
1. New `pub` `FileSize` and `FileIndex` which are both `u64` now instead
of `usize`.
2. removing `arrow_53` feature flag
3. removing `object_store` feature flag (now we just 'include it' in
`default-engine`)
4. adding `arrow-55` feature flag
5. renamed `arrow_54` to `arrow-54`
6. adding respective `object_store` to each of `arrow_54` and `arrow_55`
(and re-exporting all of object_store)
7. got rid of `cloud` feature flag: now `aws`, `azure`, and `gcp` are
just on anytime `object_store` is pulled in
8. unpin `chrono` dep
9. handful of conditionally-compiled things to take care of breaking
changes across arrow 54/55


### This PR affects the following public APIs
Adds new `arrow-55` flag (and renames `arrow_54` to `arrow-54`) and
removes `arrow_53`, `cloud`, and `object_store`. Additionally re-exports
all of `object_store` that's compatible with the version of arrow you've
selected. Note there are two additional `pub` types (or rather aliases)
now: `FileSize` and `FileIndex`.

related: #831 

## How was this change tested?
Existing
@zachschuermann
Copy link
Member

this is done in #885 - thanks @alamb for getting it started!!

@alamb
Copy link
Author

alamb commented Apr 30, 2025

Thanks @zachschuermann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants