diff --git a/dev/changelog/39.0.0.md b/dev/changelog/39.0.0.md index f94e34592c72..ff27b4ba24f5 100644 --- a/dev/changelog/39.0.0.md +++ b/dev/changelog/39.0.0.md @@ -1,23 +1,25 @@ - -## [39.0.0](https://github.com/apache/datafusion/tree/39.0.0) (2024-06-07) +# Apache DataFusion 39.0.0 Changelog + +This release consists of 234 commits from 59 contributors. See credits at the end of this changelog for more information. **Breaking changes:** @@ -72,16 +74,12 @@ - docs: add documents to substrait type variation consts [#10719](https://github.com/apache/datafusion/pull/10719) (waynexia) - Minor: (Doc) Enable rt-multi-thread feature for sample code [#10770](https://github.com/apache/datafusion/pull/10770) (hsiang-c) -**Merged pull requests:** +**Other:** -- Prepare 38.0.0 release candidate 1 [#10407](https://github.com/apache/datafusion/pull/10407) (andygrove) - Minor: Add more docs and examples for `Expr::unalias` [#10406](https://github.com/apache/datafusion/pull/10406) (alamb) - minor: Remove [RUST][datafusion] from release vote email subject line [#10411](https://github.com/apache/datafusion/pull/10411) (andygrove) -- Remove ScalarFunctionDefinition [#10325](https://github.com/apache/datafusion/pull/10325) (lewiszlw) -- chore(docs): update subquery documentation with more information [#10361](https://github.com/apache/datafusion/pull/10361) (sanderson) - fix dml logical plan output schema [#10394](https://github.com/apache/datafusion/pull/10394) (leoyvens) - [MINOR]: Move transpose code to under common [#10409](https://github.com/apache/datafusion/pull/10409) (mustafasrepo) -- minor: Remove docs archive [#10416](https://github.com/apache/datafusion/pull/10416) (andygrove) - Fix incorrect Schema over aggregate function, Remove unnecessary `exprlist_to_fields_aggregate` [#10408](https://github.com/apache/datafusion/pull/10408) (jonahgao) - Enable user defined display_name for ScalarUDF [#10417](https://github.com/apache/datafusion/pull/10417) (yyy1000) - Fix and improve `CommonSubexprEliminate` rule [#10396](https://github.com/apache/datafusion/pull/10396) (peter-toth) @@ -94,16 +92,10 @@ - Improve flight sql examples [#10432](https://github.com/apache/datafusion/pull/10432) (lewiszlw) - Move Covariance (Population) covar_pop to be a User Defined Aggregate Function [#10418](https://github.com/apache/datafusion/pull/10418) (yyy1000) - Stop copying LogicalPlan and Exprs in `OptimizeProjections` (2% faster planning) [#10405](https://github.com/apache/datafusion/pull/10405) (alamb) -- Minor: format comments in `PushDownFilter` rule [#10437](https://github.com/apache/datafusion/pull/10437) (alamb) - chore: Improve release process for next time [#10447](https://github.com/apache/datafusion/pull/10447) (andygrove) -- Minor: Add usecase to comments in `LogicalPlan::recompute_schema` [#10443](https://github.com/apache/datafusion/pull/10443) (alamb) -- doc: fix old master branch references to main [#10458](https://github.com/apache/datafusion/pull/10458) (Jefffrey) - Move bit_and_or_xor unit tests to slt [#10457](https://github.com/apache/datafusion/pull/10457) (NoeB) -- Introduce user-defined signature [#10439](https://github.com/apache/datafusion/pull/10439) (jayzhan211) -- Remove `AggregateFunctionDefinition::Name` [#10441](https://github.com/apache/datafusion/pull/10441) (lewiszlw) - Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [#10430](https://github.com/apache/datafusion/pull/10430) (alamb) - refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [#10454](https://github.com/apache/datafusion/pull/10454) (erratic-pattern) -- Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [#10404](https://github.com/apache/datafusion/pull/10404) (berkaysynnada) - Add `simplify` method to aggregate function [#10354](https://github.com/apache/datafusion/pull/10354) (milenkovicm) - Add cast array test to sqllogictest [#10474](https://github.com/apache/datafusion/pull/10474) (viirya) - Add `Expr::try_as_col`, deprecate `Expr::try_into_col` (speed up optimizer) [#10448](https://github.com/apache/datafusion/pull/10448) (alamb) @@ -113,21 +105,13 @@ - Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [#10460](https://github.com/apache/datafusion/pull/10460) (ClSlaid) - Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [#10431](https://github.com/apache/datafusion/pull/10431) (alamb) - Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [#10483](https://github.com/apache/datafusion/pull/10483) (ozankabak) -- feat: allow `array_slice` to take an optional stride parameter [#10469](https://github.com/apache/datafusion/pull/10469) (jonahgao) - Replace `GetFieldAccess` with indexing function in `SqlToRel ` [#10375](https://github.com/apache/datafusion/pull/10375) (jayzhan211) -- fix: make `columnize_expr` resistant to display_name collisions [#10459](https://github.com/apache/datafusion/pull/10459) (jonahgao) - Fix values with different data types caused failure [#10445](https://github.com/apache/datafusion/pull/10445) (b41sh) -- fix: avoid compressed json files repartitioning [#10470](https://github.com/apache/datafusion/pull/10470) (korowa) -- Minor: Improved document string for `LogicalPlanBuilder` [#10496](https://github.com/apache/datafusion/pull/10496) (AbrarNitk) - Fix SortMergeJoin with join filter filtering all rows out [#10495](https://github.com/apache/datafusion/pull/10495) (viirya) - chore: use fullpath in macro to avoid declaring in other module [#10503](https://github.com/apache/datafusion/pull/10503) (jayzhan211) -- Minor: Extend more style of udaf `expr_fn`, Remove order args for`covar_samp` and `covar_pop` [#10492](https://github.com/apache/datafusion/pull/10492) (jayzhan211) - Minor: remove unused source file `udf.rs` [#10497](https://github.com/apache/datafusion/pull/10497) (jonahgao) -- feat: optional args for regexp\_\* UDFs [#10514](https://github.com/apache/datafusion/pull/10514) (Michael-J-Ward) - Support UDAF to align Builtin aggregate function [#10493](https://github.com/apache/datafusion/pull/10493) (jayzhan211) -- Remove `file_type()` from `FileFormat` [#10499](https://github.com/apache/datafusion/pull/10499) (Jefffrey) - Minor: add a test for `current_time` (no args) [#10509](https://github.com/apache/datafusion/pull/10509) (alamb) -- fix: parsing timestamp with date format [#10476](https://github.com/apache/datafusion/pull/10476) (shanretoo) - [MINOR]: Move pipeline checker rule to the end [#10502](https://github.com/apache/datafusion/pull/10502) (mustafasrepo) - Minor: Extract parent/child limit calculation into a function, improve docs [#10501](https://github.com/apache/datafusion/pull/10501) (alamb) - Fix window expr deserialization [#10506](https://github.com/apache/datafusion/pull/10506) (lewiszlw) @@ -135,7 +119,6 @@ - Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [#10356](https://github.com/apache/datafusion/pull/10356) (alamb) - Implement unparse `IS_NULL` to String and enhance the tests [#10529](https://github.com/apache/datafusion/pull/10529) (goldmedal) - Fix panic in array_agg(distinct) query [#10526](https://github.com/apache/datafusion/pull/10526) (jayzhan211) -- UDAF: Extend more args to `state_fields` and `groups_accumulator_supported` and introduce `ReversedUDAF` [#10525](https://github.com/apache/datafusion/pull/10525) (jayzhan211) - Move min_max unit tests to slt [#10539](https://github.com/apache/datafusion/pull/10539) (xinlifoobar) - Implement unparse `IsNotFalse` to String [#10538](https://github.com/apache/datafusion/pull/10538) (goldmedal) - Implement Unparse TryCast Expr --> String Support [#10542](https://github.com/apache/datafusion/pull/10542) (xinlifoobar) @@ -145,38 +128,30 @@ - Stop most copying LogicalPlan and Exprs in `ScalarSubqueryToJoin` [#10489](https://github.com/apache/datafusion/pull/10489) (alamb) - Example for simple Expr --> SQL conversion [#10528](https://github.com/apache/datafusion/pull/10528) (edmondop) - fix `null_count` on `compute_record_batch_statistics` to report null counts across partitions [#10468](https://github.com/apache/datafusion/pull/10468) (samuelcolvin) -- fix: `array_slice` panics [#10547](https://github.com/apache/datafusion/pull/10547) (jonahgao) - Minor: Add `PullUpCorrelatedExpr::new` and improve documentation [#10500](https://github.com/apache/datafusion/pull/10500) (alamb) - Stop copying LogicalPlan and Exprs in `PushDownLimit` [#10508](https://github.com/apache/datafusion/pull/10508) (alamb) - Break up contributing guide into smaller pages [#10533](https://github.com/apache/datafusion/pull/10533) (alamb) - PhysicalExpr Orderings with Range Information [#10504](https://github.com/apache/datafusion/pull/10504) (berkaysynnada) - Implement unparse `ScalarVariable` to String [#10541](https://github.com/apache/datafusion/pull/10541) (reswqa) -- feat: Expose Parquet Schema Adapter [#10515](https://github.com/apache/datafusion/pull/10515) (HawaiianSpork) - Handle dictionary values in ScalarValue serde [#10563](https://github.com/apache/datafusion/pull/10563) (thinkharderdev) - Improve signature of `get_field` function [#10569](https://github.com/apache/datafusion/pull/10569) (lewiszlw) - Implement Unparse `GroupingSet` Expr --> String Support sql [#10555](https://github.com/apache/datafusion/pull/10555) (xinlifoobar) - Minor: Move proxy to datafusion common [#10561](https://github.com/apache/datafusion/pull/10561) (jayzhan211) - Update prost-build requirement from =0.12.4 to =0.12.6 [#10578](https://github.com/apache/datafusion/pull/10578) (dependabot[bot]) - Add examples of how to convert logical plan to/from sql strings [#10558](https://github.com/apache/datafusion/pull/10558) (xinlifoobar) -- feat: API for collecting statistics/index for metadata of a parquet file + tests [#10537](https://github.com/apache/datafusion/pull/10537) (NGA-TRAN) - Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [#10304](https://github.com/apache/datafusion/pull/10304) (comphead) -- Remove `Expr::GetIndexedField`, replace `Expr::{field,index,range}` with `FieldAccessor`, `IndexAccessor`, and `SliceAccessor` [#10568](https://github.com/apache/datafusion/pull/10568) (jayzhan211) - Minor: Fix `ArrayFunctionRewriter` name reporting [#10581](https://github.com/apache/datafusion/pull/10581) (alamb) - Improve `UserDefinedLogicalNode::from_template` API to return `Result` [#10575](https://github.com/apache/datafusion/pull/10575) (lewiszlw) - Migrate testing optimizer rules to use `rewrite` API [#10576](https://github.com/apache/datafusion/pull/10576) (lewiszlw) -- Improve ContextProvider [#10577](https://github.com/apache/datafusion/pull/10577) (lewiszlw) - test: add more tests for statistics reading [#10592](https://github.com/apache/datafusion/pull/10592) (NGA-TRAN) - refactor: reduce allocations in push down filter [#10567](https://github.com/apache/datafusion/pull/10567) (erratic-pattern) - Fix compilation of datafusion-cli on 32bit targets [#10594](https://github.com/apache/datafusion/pull/10594) (nathaniel-daniel) -- Add to_date function to scalar functions doc [#10601](https://github.com/apache/datafusion/pull/10601) (Omega359) - Rename monotonicity as output_ordering in ScalarUDF's [#10596](https://github.com/apache/datafusion/pull/10596) (berkaysynnada) - Implement Unparser for `UNION ALL` [#10603](https://github.com/apache/datafusion/pull/10603) (phillipleblanc) - Improve `UserDefinedLogicalNodeCore::from_template` API to return Result [#10597](https://github.com/apache/datafusion/pull/10597) (lewiszlw) - Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [#10574](https://github.com/apache/datafusion/pull/10574) (jayzhan211) - Minor: Consolidate some integration tests into `core_integration` [#10588](https://github.com/apache/datafusion/pull/10588) (alamb) - Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [#10527](https://github.com/apache/datafusion/pull/10527) (appletreeisyellow) -- feat: Add eliminate group by constant optimizer rule [#10591](https://github.com/apache/datafusion/pull/10591) (korowa) -- Docs: Update PR workflow documentation [#10532](https://github.com/apache/datafusion/pull/10532) (alamb) - [MINOR]: Update get range implementation for lead lag window functions [#10614](https://github.com/apache/datafusion/pull/10614) (mustafasrepo) - Minor: Improve documentation in sql_to_plan example [#10582](https://github.com/apache/datafusion/pull/10582) (alamb) - Docs: add examples for `RuntimeEnv::register_object_store`, improve error messages [#10617](https://github.com/apache/datafusion/pull/10617) (aditanase) @@ -184,7 +159,6 @@ - Add to_unixtime function to scalar functions doc [#10620](https://github.com/apache/datafusion/pull/10620) (Omega359) - Test for reading read statistics from parquet files without statistics and boolean & struct data type [#10608](https://github.com/apache/datafusion/pull/10608) (NGA-TRAN) - adding benchmark for extracting arrow statistics from parquet [#10610](https://github.com/apache/datafusion/pull/10610) (Lordworms) -- feat: extend `unnest` to support Struct datatype [#10429](https://github.com/apache/datafusion/pull/10429) (duongcongtoai) - Implement a dialect-specific rule for unparsing an identifier with or without quotes [#10573](https://github.com/apache/datafusion/pull/10573) (goldmedal) - add catalog as part of the table path in plan_to_sql [#10612](https://github.com/apache/datafusion/pull/10612) (y-f-u) - Refactor parquet row group pruning into a struct (use new statistics API, part 1) [#10607](https://github.com/apache/datafusion/pull/10607) (alamb) @@ -205,19 +179,14 @@ - Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) [#10662](https://github.com/apache/datafusion/pull/10662) (alamb) - Add `FileScanConfig::new()` API [#10623](https://github.com/apache/datafusion/pull/10623) (alamb) - Minor: Remove `GetFieldAccessSchema` [#10665](https://github.com/apache/datafusion/pull/10665) (jayzhan211) -- Minor: Use slice in `ConcreteTreeNode` [#10666](https://github.com/apache/datafusion/pull/10666) (peter-toth) - Move Median to `functions-aggregate` and Introduce Numeric signature [#10644](https://github.com/apache/datafusion/pull/10644) (jayzhan211) - Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [#10268](https://github.com/apache/datafusion/pull/10268) (jayzhan211) -- fix: pass `quote` parameter to CSV writer [#10671](https://github.com/apache/datafusion/pull/10671) (DDtKey) - Fix compilation "comparison_binary_numeric_coercion not found" [#10677](https://github.com/apache/datafusion/pull/10677) (alamb) - refactor: simplify converting List DataTypes to `ScalarValue` [#10675](https://github.com/apache/datafusion/pull/10675) (jonahgao) -- feat: add substrait support for Interval types and literals [#10646](https://github.com/apache/datafusion/pull/10646) (waynexia) - Minor: Improve ObjectStoreUrl docs + examples [#10619](https://github.com/apache/datafusion/pull/10619) (alamb) -- fix: CI compilation failed on substrait [#10683](https://github.com/apache/datafusion/pull/10683) (jonahgao) - Add tests for reading numeric limits in parquet statistics [#10642](https://github.com/apache/datafusion/pull/10642) (alamb) - Update nix requirement from 0.28.0 to 0.29.0 [#10684](https://github.com/apache/datafusion/pull/10684) (dependabot[bot]) - refactor: Move SchemaAdapter from parquet module to data source [#10680](https://github.com/apache/datafusion/pull/10680) (HawaiianSpork) -- Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [#10543](https://github.com/apache/datafusion/pull/10543) (peter-toth) - Convert first, last aggregate function to UDAF [#10648](https://github.com/apache/datafusion/pull/10648) (mustafasrepo) - Minor: CastExpr Ordering Handle [#10650](https://github.com/apache/datafusion/pull/10650) (berkaysynnada) - Factor out common datafusion types into another proto file [#10649](https://github.com/apache/datafusion/pull/10649) (mustafasrepo) @@ -238,7 +207,6 @@ - Fix incorrect statistics read for unsigned integers columns in parquet [#10704](https://github.com/apache/datafusion/pull/10704) (xinlifoobar) - Separate `Partitioning` protobuf serialization code [#10708](https://github.com/apache/datafusion/pull/10708) (lewiszlw) - Support consuming Substrait with compound signature function names [#10653](https://github.com/apache/datafusion/pull/10653) (Blizzara) -- Minor: Add examples of using TreeNode with `Expr` [#10686](https://github.com/apache/datafusion/pull/10686) (alamb) - Minor: Add examples of using TreeNode with `LogicalPlan` [#10687](https://github.com/apache/datafusion/pull/10687) (alamb) - Add `ParquetExec::builder()`, deprecate `ParquetExec::new` [#10636](https://github.com/apache/datafusion/pull/10636) (alamb) - feature: Add a WindowUDFImpl::simplify() API [#9906](https://github.com/apache/datafusion/pull/9906) (guojidan) @@ -249,7 +217,6 @@ - CI: Fix complaints from newer Clippy versions [#10725](https://github.com/apache/datafusion/pull/10725) (comphead) - Remove Eager Trait for Joins [#10721](https://github.com/apache/datafusion/pull/10721) (berkaysynnada) - Minor: fix signature `fn octect_length()` [#10726](https://github.com/apache/datafusion/pull/10726) (marvinlanhenke) -- docs: add documents to substrait type variation consts [#10719](https://github.com/apache/datafusion/pull/10719) (waynexia) - Update rstest requirement from 0.19.0 to 0.20.0 [#10734](https://github.com/apache/datafusion/pull/10734) (dependabot[bot]) - Update rstest_reuse requirement from 0.6.0 to 0.7.0 [#10733](https://github.com/apache/datafusion/pull/10733) (dependabot[bot]) - Add example for building an external secondary index for parquet files [#10549](https://github.com/apache/datafusion/pull/10549) (alamb) @@ -262,16 +229,12 @@ - Minor: Split physical_plan/parquet/mod.rs into smaller modules [#10727](https://github.com/apache/datafusion/pull/10727) (alamb) - minor: consolidate unparser integration tests [#10736](https://github.com/apache/datafusion/pull/10736) (devinjdangelo) - Minor: Move aggregate variance to slt [#10750](https://github.com/apache/datafusion/pull/10750) (marvinlanhenke) -- fix: fix string repeat for negative numbers [#10760](https://github.com/apache/datafusion/pull/10760) (tshauck) -- Introduce Sum UDAF [#10651](https://github.com/apache/datafusion/pull/10651) (jayzhan211) - Extract parquet statistics from timestamps with timezones [#10766](https://github.com/apache/datafusion/pull/10766) (xinlifoobar) - Minor: Add tests for extracting dictionary parquet statistics [#10729](https://github.com/apache/datafusion/pull/10729) (alamb) - Update rstest requirement from 0.20.0 to 0.21.0 [#10774](https://github.com/apache/datafusion/pull/10774) (dependabot[bot]) - Minor: Refactor memory size estimation for HashTable [#10748](https://github.com/apache/datafusion/pull/10748) (marvinlanhenke) - Reduce code repetition in `datafusion/functions` mod files [#10700](https://github.com/apache/datafusion/pull/10700) (MohamedAbdeen21) -- Minor: (Doc) Enable rt-multi-thread feature for sample code [#10770](https://github.com/apache/datafusion/pull/10770) (hsiang-c) - Support negatives in split part [#10780](https://github.com/apache/datafusion/pull/10780) (tshauck) -- feat: support unparsing LogicalPlan::Window nodes [#10767](https://github.com/apache/datafusion/pull/10767) (devinjdangelo) - Extract parquet statistics from `LargeUtf8` columns and Add tests for `UTF8` And `LargeUTF8` [#10762](https://github.com/apache/datafusion/pull/10762) (Weijun-H) - Cleanup GetIndexedField [#10769](https://github.com/apache/datafusion/pull/10769) (lewiszlw) - Extract parquet statistics from f16 columns, add `ScalarValue::Float16` [#10763](https://github.com/apache/datafusion/pull/10763) (Lordworms) @@ -284,10 +247,8 @@ - minor: Refactor some unparser methods to improve readability [#10788](https://github.com/apache/datafusion/pull/10788) (devinjdangelo) - Convert variance sample to udaf [#10713](https://github.com/apache/datafusion/pull/10713) (yyin-dev) - Improve docs and fix a typo [#10798](https://github.com/apache/datafusion/pull/10798) (lewiszlw) -- fix: `array_slice` and `array_element` panicked on empty args [#10804](https://github.com/apache/datafusion/pull/10804) (jonahgao) - Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files [#10711](https://github.com/apache/datafusion/pull/10711) (xinlifoobar) - SMJ: Add more tests and improve comments [#10784](https://github.com/apache/datafusion/pull/10784) (comphead) -- feat: Update Parquet row filtering to handle type coercion [#10716](https://github.com/apache/datafusion/pull/10716) (jeffreyssmith2nd) - Handle EmptyRelation during SQL unparsing [#10803](https://github.com/apache/datafusion/pull/10803) (goldmedal) - Document Committer and PMC process [#10778](https://github.com/apache/datafusion/pull/10778) (alamb) - Int64 as default type for make_array function empty or null case [#10790](https://github.com/apache/datafusion/pull/10790) (jayzhan211) @@ -301,3 +262,72 @@ - Refactor and simplify the SQL unparser [#10811](https://github.com/apache/datafusion/pull/10811) (goldmedal) - Minor: Remove code duplication in `memory_limit` derivation for datafusion-cli [#10814](https://github.com/apache/datafusion/pull/10814) (comphead) - build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [#10765](https://github.com/apache/datafusion/pull/10765) (waynexia) +- chore: Prepare 39.0.0-rc1 [#10828](https://github.com/apache/datafusion/pull/10828) (andygrove) + +## Credits + +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. + +``` + 44 Andrew Lamb + 18 Jay Zhan + 14 张林伟 + 11 Andy Grove + 11 Xin Li + 10 Jonah Gao + 8 Jax Liu + 7 Mustafa Akur + 7 Oleks V + 7 dependabot[bot] + 5 Arttu + 5 Berkay Şahin + 5 Marvin Lanhenke + 4 Lordworms + 4 Ruihang Xia + 3 Bruce Ritchie + 3 Devin D'Angelo + 3 Duong Cong Toai + 3 Eduard Karacharov + 3 Junhao Liu + 3 Liang-Chi Hsieh + 3 Mohamed Abdeen + 3 Nga Tran + 3 Peter Toth + 3 Phillip LeBlanc + 2 Abrar Khan + 2 Adam Curtis + 2 Chunchun Ye + 2 Jeffrey Vo + 2 Michael Maletich + 2 QP Hou + 2 Trent Hauck + 2 Weijie Guo + 2 junxiangMu + 2 yfu + 1 Adrian Tanase + 1 Alex Huang + 1 Andrey Koshchiy + 1 Artem Medvedev + 1 ClSlaid + 1 Dan Harris + 1 Edmondo Porcu + 1 Jeffrey Smith II + 1 Kun Liu + 1 Leonardo Yvens + 1 Marko Milenković + 1 Matthew Turner + 1 Mehmet Ozan Kabak + 1 Michael J Ward + 1 NoeB + 1 Samuel Colvin + 1 Scott Anderson + 1 VimT + 1 Yue Yin + 1 baishen + 1 hsiang-c + 1 nathaniel-daniel + 1 shanretoo + 1 tison +``` + +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release. diff --git a/dev/release/README.md b/dev/release/README.md index 749af8696b0f..c0ba87ad39c6 100644 --- a/dev/release/README.md +++ b/dev/release/README.md @@ -57,7 +57,7 @@ See instructions at https://infra.apache.org/release-signing.html#generate for g Committers can add signing keys in Subversion client with their ASF account. e.g.: -```bash +```shell $ svn co https://dist.apache.org/repos/dist/dev/datafusion $ cd datafusion $ editor KEYS @@ -66,7 +66,7 @@ $ svn ci KEYS Follow the instructions in the header of the KEYS file to append your key. Here is an example: -```bash +```shell (gpg --list-sigs "John Doe" && gpg --armor --export "John Doe") >> KEYS svn commit KEYS -m "Add key for John Doe" ``` @@ -89,35 +89,26 @@ to generate one if you do not already have one. The changelog is generated using a Python script. There is a dependency on `PyGitHub`, which can be installed using pip: -```bash +```shell pip3 install PyGitHub ``` -Run the following command to generate the changelog content. +To generate the changelog, set the `GITHUB_TOKEN` environment variable to a valid token and then run the script +providing two commit ids or tags followed by the version number of the release being created. The following +example generates a change log of all changes between the first commit and the current HEAD revision. -```bash -$ GITHUB_TOKEN= ./dev/release/generate-changelog.py 24.0.0 HEAD > dev/changelog/25.0.0.md +```shell +export GITHUB_TOKEN= +./dev/release/generate-changelog.py 24.0.0 HEAD 25.0.0 > dev/changelog/25.0.0.md ``` This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for -titles starting with `feat:`, `fix:`, or `docs:` . The script will produce output similar to: - -``` -Fetching list of commits between 24.0.0 and HEAD -Fetching pull requests -Categorizing pull requests -Generating changelog content -``` - -This process is not fully automated, so there are some additional manual steps: +titles starting with `feat:`, `fix:`, or `docs:`. -- Add the ASF header to the generated file -- Add the following content (copy from the previous version's changelog and update as appropriate: - -``` -## [24.0.0](https://github.com/apache/datafusion/tree/24.0.0) (2023-05-06) +Once the change log is generated, run `prettier` to format the document: -[Full Changelog](https://github.com/apache/datafusion/compare/23.0.0...24.0.0) +```shell +prettier -w dev/changelog/25.0.0md ``` ## Prepare release commits and PR @@ -265,7 +256,7 @@ published in the correct order as shown in this diagram. _To update this diagram, manually edit the dependencies in [crate-deps.dot](crate-deps.dot) and then run:_ -```bash +```shell dot -Tsvg dev/release/crate-deps.dot > dev/release/crate-deps.svg ``` @@ -310,7 +301,7 @@ Please visit https://brew.sh/ to obtain Homebrew. In addition to that please che Before running the script make sure that you can run the following command in your bash to make sure that `brew` has been installed and configured properly: -```bash +```shell brew --version ``` @@ -325,7 +316,7 @@ To create a Github Personal Access Token, please visit https://docs.github.com/e After all of the above is complete execute the following command: -```bash +```shell dev/release/publish_homebrew.sh ``` @@ -368,13 +359,13 @@ Release candidates should be deleted once the release is published. Get a list of DataFusion release candidates: -```bash +```shell svn ls https://dist.apache.org/repos/dist/dev/datafusion ``` Delete a release candidate: -```bash +```shell svn delete -m "delete old DataFusion RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-38.0.0-rc1/ ``` @@ -384,13 +375,13 @@ Only the latest release should be available. Delete old releases after publishin Get a list of DataFusion releases: -```bash +```shell svn ls https://dist.apache.org/repos/dist/release/datafusion ``` Delete a release: -```bash +```shell svn delete -m "delete old DataFusion release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-37.0.0 ``` @@ -401,7 +392,7 @@ with a copy of the previous release announcement. Run the following commands to get the number of commits and number of unique contributors for inclusion in the blog post. -```bash +```shell git log --pretty=oneline 37.0.0..38.0.0 datafusion datafusion-cli datafusion-examples | wc -l git shortlog -sn 37.0.0..38.0.0 datafusion datafusion-cli datafusion-examples | wc -l ``` diff --git a/dev/release/generate-changelog.py b/dev/release/generate-changelog.py index 424baece6023..23b594214823 100755 --- a/dev/release/generate-changelog.py +++ b/dev/release/generate-changelog.py @@ -20,7 +20,7 @@ from github import Github import os import re - +import subprocess def print_pulls(repo_name, title, pulls): if len(pulls) > 0: @@ -32,7 +32,7 @@ def print_pulls(repo_name, title, pulls): print() -def generate_changelog(repo, repo_name, tag1, tag2): +def generate_changelog(repo, repo_name, tag1, tag2, version): # get a list of commits between two tags print(f"Fetching list of commits between {tag1} and {tag2}", file=sys.stderr) @@ -52,12 +52,12 @@ def generate_changelog(repo, repo_name, tag1, tag2): all_pulls.append((pull, commit)) # we split the pulls into categories - #TODO: make categories configurable breaking = [] bugs = [] docs = [] enhancements = [] performance = [] + other = [] # categorize the pull requests based on GitHub labels print("Categorizing pull requests", file=sys.stderr) @@ -75,7 +75,6 @@ def generate_changelog(repo, repo_name, tag1, tag2): cc_breaking = parts_tuple[2] == '!' labels = [label.name for label in pull.labels] - #print(pull.number, labels, parts, file=sys.stderr) if 'api change' in labels or cc_breaking: breaking.append((pull, commit)) elif 'bug' in labels or cc_type == 'fix': @@ -84,18 +83,64 @@ def generate_changelog(repo, repo_name, tag1, tag2): performance.append((pull, commit)) elif 'enhancement' in labels or cc_type == 'feat': enhancements.append((pull, commit)) - elif 'documentation' in labels or cc_type == 'docs': + elif 'documentation' in labels or cc_type == 'docs' or cc_type == 'doc': docs.append((pull, commit)) + else: + other.append((pull, commit)) # produce the changelog content print("Generating changelog content", file=sys.stderr) + + # ASF header + print("""\n""") + + print(f"# Apache DataFusion {version} Changelog\n") + + # get the number of commits + commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip() + + # get number of contributors + contributor_count = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2} | wc -l", shell=True, text=True).strip() + + print(f"This release consists of {commit_count} commits from {contributor_count} contributors. " + f"See credits at the end of this changelog for more information.\n") + print_pulls(repo_name, "Breaking changes", breaking) print_pulls(repo_name, "Performance related", performance) print_pulls(repo_name, "Implemented enhancements", enhancements) print_pulls(repo_name, "Fixed bugs", bugs) print_pulls(repo_name, "Documentation updates", docs) - print_pulls(repo_name, "Merged pull requests", all_pulls) + print_pulls(repo_name, "Other", other) + + # show code contributions + credits = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2}", shell=True, text=True).rstrip() + + print("## Credits\n") + print("Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) " + "per contributor.\n") + print("```") + print(credits) + print("```\n") + print("Thank you also to everyone who contributed in other ways such as filing issues, reviewing " + "PRs, and providing feedback on this release.\n") def cli(args=None): """Process command line arguments.""" @@ -103,8 +148,9 @@ def cli(args=None): args = sys.argv[1:] parser = argparse.ArgumentParser() - parser.add_argument("tag1", help="The previous release tag (e.g. 38.0.0)") - parser.add_argument("tag2", help="The current release tag (e.g. HEAD)") + parser.add_argument("tag1", help="The previous commit or tag (e.g. 0.1.0)") + parser.add_argument("tag2", help="The current commit or tag (e.g. HEAD)") + parser.add_argument("version", help="The version number to include in the changelog") args = parser.parse_args() token = os.getenv("GITHUB_TOKEN") @@ -112,7 +158,7 @@ def cli(args=None): g = Github(token) repo = g.get_repo(project) - generate_changelog(repo, project, args.tag1, args.tag2) + generate_changelog(repo, project, args.tag1, args.tag2, args.version) if __name__ == "__main__": cli() \ No newline at end of file