-
Notifications
You must be signed in to change notification settings - Fork 119
[Website] Add blog post for Arrow 21.0.0 #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_posts/2025-07-16-21.0.0-release.md
Outdated
| A new feature named Content-Defined Chunking improves deduplication of Parquet | ||
| files with mostly identical contents, by choosing data page boundaries based on | ||
| actual contents rather than a number of values. For that, it uses a rolling hash | ||
| function, and the min and max chunk size can be chosen. The feature is disabled by | ||
| default and can be enabled on a per-file basis in the Parquet `WriterProperties` | ||
| (GH-45750). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kszucs Do you think this is a good description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. @kszucs I may merge this as-is but if you have any edits after the merge feel free to ping me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I haven't noticed the ping. Yes, it is great, thank you!
Co-authored-by: Rossi Sun <[email protected]>
Co-authored-by: David Li <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
|
Thanks all for the contributions. I'll merge this tomorrow unless anyone has more updates before then. |
No description provided.