diff --git a/dev/changelog/0.5.0.md b/dev/changelog/0.5.0.md new file mode 100644 index 0000000000..af2ae1cfd8 --- /dev/null +++ b/dev/changelog/0.5.0.md @@ -0,0 +1,129 @@ + + +# DataFusion Comet 0.5.0 Changelog + +This release consists of 69 commits from 15 contributors. See credits at the end of this changelog for more information. + +**Fixed bugs:** + +- fix: Unsigned type related bugs [#1095](https://github.com/apache/datafusion-comet/pull/1095) (kazuyukitanimura) +- fix: Use RDD partition index [#1112](https://github.com/apache/datafusion-comet/pull/1112) (viirya) +- fix: Various metrics bug fixes and improvements [#1111](https://github.com/apache/datafusion-comet/pull/1111) (andygrove) +- fix: Don't create CometScanExec for subclasses of ParquetFileFormat [#1129](https://github.com/apache/datafusion-comet/pull/1129) (Kimahriman) +- fix: Fix metrics regressions [#1132](https://github.com/apache/datafusion-comet/pull/1132) (andygrove) +- fix: Enable scenarios accidentally commented out in CometExecBenchmark [#1151](https://github.com/apache/datafusion-comet/pull/1151) (mbutrovich) +- fix: Spark 4.0-preview1 SPARK-47120 [#1156](https://github.com/apache/datafusion-comet/pull/1156) (kazuyukitanimura) +- fix: Document enabling comet explain plan usage in Spark (4.0) [#1176](https://github.com/apache/datafusion-comet/pull/1176) (parthchandra) +- fix: stddev_pop should not directly return 0.0 when count is 1.0 [#1184](https://github.com/apache/datafusion-comet/pull/1184) (viirya) +- fix: fix missing explanation for then branch in case when [#1200](https://github.com/apache/datafusion-comet/pull/1200) (rluvaton) +- fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [#1253](https://github.com/apache/datafusion-comet/pull/1253) (andygrove) +- fix: Fall back to Spark for distinct aggregates [#1262](https://github.com/apache/datafusion-comet/pull/1262) (andygrove) +- fix: disable initCap by default [#1276](https://github.com/apache/datafusion-comet/pull/1276) (kazuyukitanimura) + +**Performance related:** + +- perf: Stop passing Java config map into native createPlan [#1101](https://github.com/apache/datafusion-comet/pull/1101) (andygrove) +- feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` [#1185](https://github.com/apache/datafusion-comet/pull/1185) (andygrove) +- perf: Improve query planning to more reliably fall back to columnar shuffle when native shuffle is not supported [#1209](https://github.com/apache/datafusion-comet/pull/1209) (andygrove) +- feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [#1192](https://github.com/apache/datafusion-comet/pull/1192) (andygrove) +- feat: Implement custom RecordBatch serde for shuffle for improved performance [#1190](https://github.com/apache/datafusion-comet/pull/1190) (andygrove) + +**Implemented enhancements:** + +- feat: support array_insert [#1073](https://github.com/apache/datafusion-comet/pull/1073) (SemyonSinchenko) +- feat: enable decimal to decimal cast of different precision and scale [#1086](https://github.com/apache/datafusion-comet/pull/1086) (himadripal) +- feat: Improve ScanExec native metrics [#1133](https://github.com/apache/datafusion-comet/pull/1133) (andygrove) +- feat: Add Spark-compatible implementation of SchemaAdapterFactory [#1169](https://github.com/apache/datafusion-comet/pull/1169) (andygrove) +- feat: Improve shuffle metrics (second attempt) [#1175](https://github.com/apache/datafusion-comet/pull/1175) (andygrove) +- feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [#1021](https://github.com/apache/datafusion-comet/pull/1021) (Kontinuation) +- feat: Reenable tests for filtered SMJ anti join [#1211](https://github.com/apache/datafusion-comet/pull/1211) (comphead) +- feat: add support for array_remove expression [#1179](https://github.com/apache/datafusion-comet/pull/1179) (jatin510) + +**Documentation updates:** + +- docs: Update documentation for 0.4.0 release [#1096](https://github.com/apache/datafusion-comet/pull/1096) (andygrove) +- docs: Fix readme typo FGPA -> FPGA [#1117](https://github.com/apache/datafusion-comet/pull/1117) (gstvg) +- docs: Add more technical detail and new diagram to Comet plugin overview [#1119](https://github.com/apache/datafusion-comet/pull/1119) (andygrove) +- docs: Add some documentation explaining how shuffle works [#1148](https://github.com/apache/datafusion-comet/pull/1148) (andygrove) +- docs: Update TPC-H benchmark results [#1257](https://github.com/apache/datafusion-comet/pull/1257) (andygrove) + +**Other:** + +- chore: Add changelog for 0.4.0 [#1089](https://github.com/apache/datafusion-comet/pull/1089) (andygrove) +- chore: Prepare for 0.5.0 development [#1090](https://github.com/apache/datafusion-comet/pull/1090) (andygrove) +- build: Skip installation of spark-integration and fuzz testing modules [#1091](https://github.com/apache/datafusion-comet/pull/1091) (parthchandra) +- minor: Add hint for finding the GPG key to use when publishing to maven [#1093](https://github.com/apache/datafusion-comet/pull/1093) (andygrove) +- chore: Include first ScanExec batch in metrics [#1105](https://github.com/apache/datafusion-comet/pull/1105) (andygrove) +- chore: Improve CometScan metrics [#1100](https://github.com/apache/datafusion-comet/pull/1100) (andygrove) +- chore: Add custom metric for native shuffle fetching batches from JVM [#1108](https://github.com/apache/datafusion-comet/pull/1108) (andygrove) +- chore: Remove unused StringView struct [#1143](https://github.com/apache/datafusion-comet/pull/1143) (andygrove) +- test: enable more Spark 4.0 tests [#1145](https://github.com/apache/datafusion-comet/pull/1145) (kazuyukitanimura) +- chore: Refactor cast to use SparkCastOptions param [#1146](https://github.com/apache/datafusion-comet/pull/1146) (andygrove) +- chore: Move more expressions from core crate to spark-expr crate [#1152](https://github.com/apache/datafusion-comet/pull/1152) (andygrove) +- chore: Remove dead code [#1155](https://github.com/apache/datafusion-comet/pull/1155) (andygrove) +- chore: Move string kernels and expressions to spark-expr crate [#1164](https://github.com/apache/datafusion-comet/pull/1164) (andygrove) +- chore: Move remaining expressions to spark-expr crate + some minor refactoring [#1165](https://github.com/apache/datafusion-comet/pull/1165) (andygrove) +- chore: Add ignored tests for reading complex types from Parquet [#1167](https://github.com/apache/datafusion-comet/pull/1167) (andygrove) +- test: enabling Spark tests with offHeap requirement [#1177](https://github.com/apache/datafusion-comet/pull/1177) (kazuyukitanimura) +- minor: move shuffle classes from common to spark [#1193](https://github.com/apache/datafusion-comet/pull/1193) (andygrove) +- minor: refactor to move decodeBatches to broadcast exchange code as private function [#1195](https://github.com/apache/datafusion-comet/pull/1195) (andygrove) +- minor: refactor prepare_output so that it does not require an ExecutionContext [#1194](https://github.com/apache/datafusion-comet/pull/1194) (andygrove) +- minor: remove unused source files [#1202](https://github.com/apache/datafusion-comet/pull/1202) (andygrove) +- chore: Upgrade to DataFusion 44.0.0-rc2 [#1154](https://github.com/apache/datafusion-comet/pull/1154) (andygrove) +- chore: Add safety check to CometBuffer [#1050](https://github.com/apache/datafusion-comet/pull/1050) (viirya) +- chore: Remove unreachable code [#1213](https://github.com/apache/datafusion-comet/pull/1213) (andygrove) +- test: Enable Comet by default except some tests in SparkSessionExtensionSuite [#1201](https://github.com/apache/datafusion-comet/pull/1201) (kazuyukitanimura) +- chore: extract `struct` expressions to folders based on spark grouping [#1216](https://github.com/apache/datafusion-comet/pull/1216) (rluvaton) +- chore: extract static invoke expressions to folders based on spark grouping [#1217](https://github.com/apache/datafusion-comet/pull/1217) (rluvaton) +- chore: Follow-on PR to fully enable onheap memory usage [#1210](https://github.com/apache/datafusion-comet/pull/1210) (andygrove) +- chore: extract agg_funcs expressions to folders based on spark grouping [#1224](https://github.com/apache/datafusion-comet/pull/1224) (rluvaton) +- chore: extract datetime_funcs expressions to folders based on spark grouping [#1222](https://github.com/apache/datafusion-comet/pull/1222) (rluvaton) +- chore: Upgrade to DataFusion 44.0.0 from 44.0.0 RC2 [#1232](https://github.com/apache/datafusion-comet/pull/1232) (rluvaton) +- chore: extract strings file to `strings_func` like in spark grouping [#1215](https://github.com/apache/datafusion-comet/pull/1215) (rluvaton) +- chore: extract predicate_functions expressions to folders based on spark grouping [#1218](https://github.com/apache/datafusion-comet/pull/1218) (rluvaton) +- build(deps): bump protobuf version to 3.21.12 [#1234](https://github.com/apache/datafusion-comet/pull/1234) (wForget) +- chore: extract json_funcs expressions to folders based on spark grouping [#1220](https://github.com/apache/datafusion-comet/pull/1220) (rluvaton) +- test: Enable shuffle by default in Spark tests [#1240](https://github.com/apache/datafusion-comet/pull/1240) (kazuyukitanimura) +- chore: extract hash_funcs expressions to folders based on spark grouping [#1221](https://github.com/apache/datafusion-comet/pull/1221) (rluvaton) +- build: Fix test failure caused by merging conflicting PRs [#1259](https://github.com/apache/datafusion-comet/pull/1259) (andygrove) + +## Credits + +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. + +``` + 37 Andy Grove + 10 Raz Luvaton + 7 KAZUYUKI TANIMURA + 3 Liang-Chi Hsieh + 2 Parth Chandra + 1 Adam Binford + 1 Dharan Aditya + 1 Himadri Pal + 1 Jagdish Parihar + 1 Kristin Cowalcijk + 1 Matt Butrovich + 1 Oleks V + 1 Sem + 1 Zhen Wang + 1 gstvg +``` + +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.