-
Notifications
You must be signed in to change notification settings - Fork 122
Website: Add blog post for arrow-rs 57.0.0 #720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7e3b172
481b10f
5f3a997
f0ab8b9
da6782e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,249 @@ | ||
| --- | ||
| layout: post | ||
| title: "Apache Arrow Rust 57.0.0 Release" | ||
| date: "2025-10-30 00:00:00" | ||
| author: pmc | ||
| categories: [release] | ||
| --- | ||
| <!-- | ||
| {% comment %} | ||
| Licensed to the Apache Software Foundation (ASF) under one or more | ||
| contributor license agreements. See the NOTICE file distributed with | ||
| this work for additional information regarding copyright ownership. | ||
| The ASF licenses this file to you under the Apache License, Version 2.0 | ||
| (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| {% endcomment %} | ||
| --> | ||
|
|
||
| The Apache Arrow team is pleased to announce that the v57.0.0 release of Apache Arrow | ||
| Rust is now available on crates.io ([arrow] and [parquet]) and as [source download]. | ||
|
|
||
| [arrow]: https://crates.io/crates/arrow | ||
| [parquet]: https://crates.io/crates/parquet | ||
| [source download]: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-57.0.0 | ||
|
|
||
| See the [57.0.0 changelog] for a full list of changes. | ||
|
|
||
| [57.0.0 changelog]: https://github.com/apache/arrow-rs/blob/57.0.0/CHANGELOG.md | ||
|
|
||
|
|
||
| ## New Features | ||
|
|
||
| Note: Arrow Rust hosts the development of the [parquet] crate, a high | ||
| performance Rust implementation of [Apache Parquet]. | ||
|
|
||
| ### Performance: 4x Faster Parquet Metadata Parsing 🚀 | ||
|
|
||
| Ed Seidl ([@etseidl]) and Jörn Horstmann ([@jhorstmann]) contributed a rewritten | ||
| thrift metadata parser for Parquet files which is almost 4x faster than the | ||
| previous parser based on the `thrift` crate. This is especially exciting for low | ||
| latency use cases and reading Parquet files with large amounts of metadata (e.g. | ||
| many row groups or columns). | ||
| See the [blog post about the new Parquet metadata parser] for more details. | ||
|
|
||
| <div style="display: flex; gap: 16px; justify-content: center; align-items: flex-start;"> | ||
| <img src="{{ site.baseurl }}/img/rust-parquet-metadata/results.png" width="100%" class="img-responsive" alt="" aria-hidden="true"> | ||
| </div> | ||
|
|
||
| *Figure 1:* Performance improvements of [Apache Parquet] metadata parsing between version `56.2.0` and `57.0.0`. | ||
|
|
||
|
|
||
| [Apache Parquet]: https://parquet.apache.org/ | ||
| [@etseidl]: https://github.com/etseidl | ||
| [@jhorstmann]: https://github.com/jhorstmann | ||
|
|
||
| [blog post about the new Parquet metadata parser]: https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/ | ||
|
|
||
| ### New `arrow-avro` Crate | ||
|
|
||
| The `57.0.0` release introduces a new [`arrow-avro`] crate contributed by [@jecsand838] | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI @jecsand838 and @nathaniel-d-ef |
||
| and [@nathaniel-d-ef] that provides much more efficient conversion between | ||
| [Apache Avro](https://avro.apache.org/) and Arrow `RecordBatch`es, as well as broader feature support. | ||
|
|
||
| Previously, Arrow‑based systems that read or wrote Avro data | ||
| typically used the general‑purpose [apache-avro] crate. While mature and | ||
| feature‑complete, its row-oriented API does not support features such as | ||
| projection pushdown or vectorized execution. The new `arrow-avro` crate supports | ||
| these features efficiently by converting Avro data directly into Arrow's | ||
| columnar format. | ||
|
|
||
| See the [blog post about adding arrow-avro] for more details. | ||
|
|
||
| <div style="display: flex; gap: 16px; justify-content: center; align-items: flex-start; padding: 20px 15px;"> | ||
| <img src="{{ site.baseurl }}/img/introducing-arrow-avro/arrow-avro-architecture.svg" | ||
| width="100%" | ||
| alt="High-level `arrow-avro` architecture" | ||
| style="background:#fff"> | ||
| </div> | ||
|
|
||
| *Figure 2:* Architecture of the `arrow-avro` crate. | ||
|
|
||
|
|
||
| [@jecsand838]: https://github.com/jecsand838 | ||
| [@nathaniel-d-ef]: https://github.com/nathaniel-d-ef | ||
| [apache-avro]: https://crates.io/crates/apache-avro | ||
| [`arrow-avro`]: https://crates.io/crates/arrow-avro | ||
|
|
||
| [blog post about adding arrow-avro]: https://arrow.apache.org/blog/2025/10/23/introducing-arrow-avro/ | ||
|
|
||
|
|
||
| ### Parquet Variant Support 🧬 | ||
|
|
||
| The Apache Parquet project recently added a [new `Variant` type] for | ||
| representing semi-structured data. The `57.0.0` release includes support for reading and | ||
| writing both normal and shredded `Variant` values to and from Parquet files. It | ||
| also includes [parquet-variant], a complete library for working with `Variant` | ||
| values, [`VariantArray`] for working with arrays of `Variant` values in Apache | ||
| Arrow, computation kernels for converting to/from JSON and Arrow types, | ||
| extracting paths, and shredding values. | ||
|
|
||
| [new `Variant` type]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md | ||
| [`VariantArray`]: https://docs.rs/parquet/latest/parquet/variant/struct.VariantArray.html | ||
| [parquet-variant]: https://crates.io/crates/parquet-variant | ||
|
|
||
| ```rust | ||
| // Use the VariantArrayBuilder to build a VariantArray | ||
|
Comment on lines
+113
to
+114
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add imports to make this runnable? |
||
| let mut builder = VariantArrayBuilder::new(3); | ||
| builder.new_object().with_field("name", "Alice").finish(); // row 1: {"name": "Alice"} | ||
| builder.append_value("such wow"); // row 2: "such wow" (a string) | ||
| let array = builder.build(); | ||
|
|
||
| // Since VariantArray is an ExtensionType, it needs to be converted | ||
| // to an ArrayRef and Field with the appropriate metadata | ||
| // before it can be written to a Parquet file | ||
| let field = array.field("data"); | ||
| let array = ArrayRef::from(array); | ||
| // create a RecordBatch with the VariantArray | ||
| let schema = Schema::new(vec![field]); | ||
| let batch = RecordBatch::try_new(Arc::new(schema), vec![array])?; | ||
|
|
||
| // Now you can write the RecordBatch to the Parquet file, as normal | ||
| let file = std::fs::File::create("variant.parquet")?; | ||
| let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?; | ||
| writer.write(&batch)?; | ||
| writer.close()?; | ||
| ``` | ||
|
|
||
|
|
||
| This support is being integrated into query engines, such as | ||
| [@friendlymatthew]'s [`datafusion-variant`] crate to integrate into DataFusion | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| and [delta-rs]. While this support is still experimental, we believe the APIs | ||
| are mostly complete and do not expect major changes. Please consider trying | ||
| it out and providing feedback and improvements. | ||
|
|
||
| [`datafusion-variant`]: https://github.com/datafusion-contrib/datafusion-variant | ||
| [delta-rs]: https://github.com/delta-io/delta-rs/issues/3637 | ||
|
|
||
| Thanks to the many contributors who made this possible, including: | ||
| * Ryan Johnson ([@scovich]), Congxian Qiu ([@klion26]), and Liam Bao ([@liamzwbao]) for completing the implementation | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI @scovich, @klion26, @liamzwbao, @PinkCrow007 @carpecodeum, @mkarbo, @superserious-dev, @friendlymatthew, @micoo227, @Weijun-H, |
||
| * Li Jiaying ([@PinkCrow007]), Aditya Bhatnagar ([@carpecodeum]), and Malthe Karbo ([@mkarbo]) for | ||
| initiating the work | ||
| * Everyone else who has contributed, including [@superserious-dev], [@friendlymatthew], [@micoo227], [@Weijun-H], | ||
| [@harshmotw-db], [@odysa], [@viirya], [@adriangb], [@kosiew], [@codephage2020], | ||
| [@ding-young], [@mbrobbel], [@petern48], [@sdf-jkl], [@abacef], and [@mprammer]. | ||
|
|
||
| [@PinkCrow007]: https://github.com/PinkCrow007 | ||
| [@mkarbo]: https://github.com/mkarbo | ||
| [@carpecodeum]: https://github.com/carpecodeum | ||
| [@scovich]: https://github.com/scovich | ||
| [@superserious-dev]: https://github.com/superserious-dev | ||
| [@friendlymatthew]: https://github.com/friendlymatthew | ||
| [@micoo227]: https://github.com/micoo227 | ||
| [@Weijun-H]: https://github.com/Weijun-H | ||
| [@harshmotw-db]: https://github.com/harshmotw-db | ||
| [@odysa]: https://github.com/odysa | ||
| [@viirya]: https://github.com/viirya | ||
| [@klion26]: https://github.com/klion26 | ||
| [@adriangb]: https://github.com/adriangb | ||
| [@kosiew]: https://github.com/kosiew | ||
| [@liamzwbao]: https://github.com/liamzwbao | ||
| [@codephage2020]: https://github.com/codephage2020 | ||
| [@ding-young]: https://github.com/ding-young | ||
| [@mbrobbel]: https://github.com/mbrobbel | ||
| [@petern48]: https://github.com/petern48 | ||
| [@sdf-jkl]: https://github.com/sdf-jkl | ||
| [@abacef]: https://github.com/abacef | ||
| [@mprammer]: https://github.com/mprammer | ||
|
|
||
| See the ticket [Variant type support in Parquet #6736] for more details | ||
|
|
||
|
|
||
| [Variant type support in Parquet #6736]: https://github.com/apache/arrow-rs/issues/6736 | ||
|
|
||
|
|
||
| ### Parquet Geometry Support 🗺️ | ||
|
|
||
|
|
||
| The `57.0.0` release also includes support for reading and writing [Parquet Geometry | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| types], `GEOMETRY` and `GEOGRAPHY`, including `GeospatialStatistics` | ||
| contributed by Kyle Barron ([@kylebarron]), Dewey Dunnington ([@paleolimbot]), | ||
| Kaushik Srinivasan ([@kaushiksrini]), and Blake Orth ([@BlakeOrth]). | ||
|
|
||
| Please see the [Implement Geometry and Geography type support in Parquet] tracking ticket for more details. | ||
|
|
||
| [@kylebarron]: https://github.com/kylebarron | ||
| [@paleolimbot]: https://github.com/paleolimbot | ||
| [@kaushiksrini]: https://github.com/kaushiksrini | ||
| [@BlakeOrth]: https://github.com/BlakeOrth | ||
|
|
||
| [Parquet Geometry types]: https://github.com/apache/parquet-format/blob/master/Geospatial.md | ||
|
|
||
|
|
||
| [Implement Geometry and Geography type support in Parquet]: https://github.com/apache/arrow-rs/issues/8373 | ||
|
|
||
| ## Thanks to Our Contributors | ||
| ```console | ||
| $ git shortlog -sn 56.0.0..57.0.0 | ||
| 36 Matthijs Brobbel | ||
| 20 Andrew Lamb | ||
| 13 Ryan Johnson | ||
| 11 Ed Seidl | ||
| 10 Connor Sanders | ||
| 8 Alex Huang | ||
| 5 Emil Ernerfeldt | ||
| 5 Liam Bao | ||
| 5 Matthew Kim | ||
| 4 nathaniel-d-ef | ||
| 3 Raz Luvaton | ||
| 3 albertlockett | ||
| 3 dependabot[bot] | ||
| 3 mwish | ||
| 2 Ben Ye | ||
| 2 Congxian Qiu | ||
| 2 Dewey Dunnington | ||
| 2 Kyle Barron | ||
| 2 Lilian Maurel | ||
| 2 Mark Nash | ||
| 2 Nuno Faria | ||
| 2 Pepijn Van Eeckhoudt | ||
| 2 Tobias Schwarzinger | ||
| 2 lichuang | ||
| 1 Adam Gutglick | ||
| 1 Adam Reeve | ||
| 1 Alex Stephen | ||
| 1 Chen Chongchen | ||
| 1 Jack | ||
| 1 Jeffrey Vo | ||
| 1 Jörn Horstmann | ||
| 1 Kaushik Srinivasan | ||
| 1 Li Jiaying | ||
| 1 Lin Yihai | ||
| 1 Marco Neumann | ||
| 1 Piotr Findeisen | ||
| 1 Piotr Srebrny | ||
| 1 Samuele Resca | ||
| 1 Van De Bio | ||
| 1 Yan Tingwang | ||
| 1 ding-young | ||
| 1 kosiew | ||
| 1 张林伟 | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @etseidl and @jhorstmann, your names are in lights