Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release DataFusion 37.1.0 (non breaking API release) #9904

Closed
10 tasks done
alamb opened this issue Apr 1, 2024 · 25 comments
Closed
10 tasks done

Release DataFusion 37.1.0 (non breaking API release) #9904

alamb opened this issue Apr 1, 2024 · 25 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Apr 1, 2024

@alamb alamb added the enhancement New feature or request label Apr 1, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 1, 2024

cc @andygrove

@alamb
Copy link
Contributor Author

alamb commented Apr 11, 2024

I started collecting some potential fixes for this release (basically regressions) in the description

@andygrove
Copy link
Member

Thanks @alamb. I have more bandwidth now to help support maintenance releases. Do you have a target date in mind for 37.1.0?

@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2024

Thanks @alamb. I have more bandwidth now to help support maintenance releases. Do you have a target date in mind for 37.1.0?

Thank you !

I didn't have any date in mind -- how about we try for early next week (of April 15)? I am happy to do the backport PRs -- should we make a maintenance branch? Or maybe we should use https://github.com/apache/arrow-datafusion/tree/branch-37 🤔 ?

@andygrove
Copy link
Member

Thanks @alamb. I have more bandwidth now to help support maintenance releases. Do you have a target date in mind for 37.1.0?

Thank you !

I didn't have any date in mind -- how about we try for early next week (of April 15)? I am happy to do the backport PRs -- should we make a maintenance branch? Or maybe we should use https://github.com/apache/arrow-datafusion/tree/branch-37 🤔 ?

That sounds good. My plan was to use branch-{version} for any maintenance releases, following the pattern used by Apache Spark., so we would release any 37.x.x releases from branch-37.

@alamb
Copy link
Contributor Author

alamb commented Apr 15, 2024

That sounds good. My plan was to use branch-{version} for any maintenance releases, following the pattern used by Apache Spark., so we would release any 37.x.x releases from branch-37.

Makes sense -- I will do that.

I'll plan to make some backport PRs tomorrow morning

@ion-elgreco
Copy link

Heey @alamb, when are you planning this release? :)

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2024

Heey @alamb, when are you planning this release? :)

@ion-elgreco I hope the end of this week -- I am going to do the backporting now

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2024

I made the following 4 backport PRs: #10103, #10104, #10105, #10107

I will also send a note to the dev list and on slack/discord in case anyone else wants to backport additional items

@Omega359
Copy link
Contributor

Note that #9900 didn't in fact make all the udf's public - the string and unicode modules do not in fact have the modules changed to pub nor is default implemented for the functions that I spot checked.

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2024

Note that #9900 didn't in fact make all the udf's public - the string and unicode modules do not in fact have the modules changed to pub nor is default implemented for the functions that I spot checked.

Thank you -- moved the conversation to #10107 (comment)

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

We also added #10123 to this release, to reduce debug stack frame sizes. Thanks @sergiimk and @devinjdangelo

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

I am now beginning the process of creating a release candidate

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

Here is a PR with the changelog / version update: #10128

./dev/release/generate-changelog.py apache/arrow-datafusion 37.0.0 branch-37 > dev/changelog/37.1.0.md

@progval
Copy link
Contributor

progval commented Apr 18, 2024

Would #9770 and #9969 be eligible for backport too?

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

Would #9770 and #9969 be eligible for backport too?

@progval -- If it was a regression or a serious bug I think it would be a good idea to backport. In this case I can't remember the exact sequence of events (as in some types were enabled, and then disabled, but I am not sure what was in the 37.0.0 release)

What exactly do you think we should backport?

@progval
Copy link
Contributor

progval commented Apr 18, 2024

int8/int16/int32/int64 were enabled for a while. #9770 enabled for uint8/uint16/uint32/uint64, then #9969 disabled for int8/int16/uint8/uint16.

So backporting #9770 adds a non-breaking feature and a regression; and backporting #9969 immediately fixes that regression plus an existing bug.

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

I see -- so from that perspective 37.0.0 and earlier have a bug where int8/int16 bloom filters can incorrectly filter out incorrect answers.

I did some archeology and it seems like the in8/int16 bloom filter support was added in #7821 / shipped as part of https://github.com/apache/arrow-datafusion/blob/main/dev/changelog/33.0.0.md

(aka the bug has been present for several months/releases yet)

Thus I think it would be useful to fix but I also don't think it is critical to include in 37.1.0

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

@alamb
Copy link
Contributor Author

alamb commented Apr 18, 2024

Unfortunately it appears I made an error and forgot to update the release version in this RC

I have started a new thread[1] for a second RC https://lists.apache.org/thread/0md6qyhw0hody8p0v9wddvt7vo8r8z2x

@alamb alamb self-assigned this Apr 22, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 22, 2024

The release was approved and published to crates.io: https://lists.apache.org/thread/v6y745zyljoor471964l3tfq8m37lzh2

The release is available here:
https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-37.1.0

It has also been released to crates.io: https://crates.io/crates/datafusion/37.1.0

Thank you to everyone 🙏

@ion-elgreco
Copy link

Thanks @alamb!

I tried it with 37.1.0 in delta-rs, but we still get this error: internal error: entered unreachable code: NamedStructField should be rewritten in OperatorToFunction, wasn't this regression fixed?

@alamb
Copy link
Contributor Author

alamb commented Apr 22, 2024

Thanks @alamb!

I tried it with 37.1.0 in delta-rs, but we still get this error: internal error: entered unreachable code: NamedStructField should be rewritten in OperatorToFunction, wasn't this regression fixed?

Hi @ion-elgreco -- another reason this could happen is that the array function's haven't been registered with your session context / however you are

Yeah you need to use the FunctionRewrtiter here (with the relevant rewriter registered) https://github.com/apache/arrow-datafusion/blob/0573f78c7e7a4d94c3204cee464b3860479e0afb/datafusion/optimizer/src/analyzer/function_rewrite.rs#L33

There was a similar question in discord here: https://discord.com/channels/885562378132000778/1166447479609376850/1229122082256851054

Can you point me at the code of how you are running your query?

@ion-elgreco
Copy link

ion-elgreco commented Apr 22, 2024

@alamb this is the code:

  let (table, _metrics) = DeltaOps(table)
            .delete()
            .with_predicate("props['a'] = '2021-02-02'")
            .await
            .unwrap();

Which comes from here: https://github.com/delta-io/delta-rs/blob/main/crates%2Fcore%2Fsrc%2Foperations%2Fdelete.rs#L770-L774

@alamb
Copy link
Contributor Author

alamb commented Apr 22, 2024

Filed #10181 to track the issue with internal error: entered unreachable code: NamedStructField should be rewritten in OperatorToFunction

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants