Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1589 commits
Select commit Hold shift + click to select a range
c73b6e1
bench: add benchmarks for first_value, last_value (#21409)
theirix Apr 7, 2026
7eccbb9
chore(deps): bump the all-other-cargo-deps group with 4 updates (#21435)
dependabot[bot] Apr 7, 2026
94c7bc2
chore(deps): bump taiki-e/install-action from 2.70.3 to 2.74.0 (#21434)
dependabot[bot] Apr 7, 2026
7c3b22c
test: Add `datafusion.format.*` configs test coverage (#21355)
erenavsarogullari Apr 7, 2026
8a48a87
perf: optimize object store requests when reading JSON (#20823)
ariel-miculas Apr 7, 2026
c74ed91
Estimate aggregate output rows using existing NDV statistics (#20926)
buraksenn Apr 8, 2026
944bac2
feat: Propagate orderings through struct-producing projections (#21218)
rkrishn7 Apr 8, 2026
f4e24a5
fix: skips projection pruning for whole subtree (#20545)
Acfboy Apr 8, 2026
603bfb4
Follow-up: remove interleave panic recovery after Arrow 58.1.0 (#21436)
xudong963 Apr 8, 2026
cdddd76
fix: preserve subquery structure when unparsing SubqueryAlias over Ag…
yonatan-sevenai Apr 8, 2026
190b104
perf: Optimize `split_part` for `Utf8View` (#21420)
neilconway Apr 8, 2026
5ba06ac
writing table to parquet followed by read and schema check (#21444)
Rich-T-kid Apr 8, 2026
91c2e04
fix: FilterExec should drop projection when apply projection pushdown…
haohuaijin Apr 8, 2026
4aed81a
fix: preserve duplicate GROUPING SETS rows (#21058)
xiedeyantu Apr 8, 2026
4b1901f
Eliminate outer joins with empty relations via null-padded projection…
SubhamSinghal Apr 8, 2026
8f77a3b
chore(deps): bump cryptography from 46.0.6 to 46.0.7 (#21489)
dependabot[bot] Apr 9, 2026
e1ad871
Preserve logical cast field semantics during physical lowering with f…
kosiew Apr 9, 2026
cc9f869
Optimize `regexp_replace` by stripping trailing .* from anchored patt…
Dandandan Apr 9, 2026
8cf70ec
fix: apply the left side schema on the right side in set expressions …
gruuya Apr 9, 2026
02e4411
Update documentation with recent blogs and events (#21462)
alamb Apr 9, 2026
6cf94c7
Add more regexp_replace test coverage (#21485)
alamb Apr 9, 2026
6c106ba
fix: Use codepoints in `lpad`, `rpad`, `translate` (#21405)
neilconway Apr 9, 2026
6a770aa
feat: add cast_to_type UDF for type-based casting (#21322)
adriangb Apr 9, 2026
249c23c
Introduce Morselizer API, rewrite `ParquetOpener` to `ParquetMorseliz…
alamb Apr 9, 2026
8d91fb0
Update 53 upgrade guide to note release, other changes (#21449)
alamb Apr 9, 2026
fbdf770
chore: create benches small ints for count_distinct (#21521)
coderfender Apr 9, 2026
44af0a1
docs: Incorporate writing table provider blog post to user documentat…
buraksenn Apr 9, 2026
e8d217a
perf: use DynComparator in sort-merge join (SMJ), microbenchmark quer…
mbutrovich Apr 9, 2026
8939726
perf: Optimize NULL handling in `substr` (#21519)
neilconway Apr 9, 2026
e8990c4
perf: replace SMJ's join_filter_not_matched_map HashMap with Vec<Filt…
mbutrovich Apr 9, 2026
4389f14
refactor: extract sort pushdown logic from FileScanConfig into separa…
zhuqi-lucas Apr 10, 2026
b46634c
fix: PostgreSQL dialect can not support tinyint type (#21445)
xiedeyantu Apr 10, 2026
d61be49
perf: Optimize NULL handling in `find_in_set` (#21464)
neilconway Apr 10, 2026
ad7c57a
perf: Optimize NULL handling in `lcm`, `gcd` (#21468)
neilconway Apr 10, 2026
fbb5240
perf: Optimize NULL handling in `arrays_zip` (#21475)
neilconway Apr 10, 2026
42d9835
perf: Optimize NULL handling in `array_remove` (#21532)
neilconway Apr 10, 2026
beed4f0
perf: Optimize NULL handling in `array_slice` (#21482)
neilconway Apr 10, 2026
1929d71
perf: Optimize NULL handling in some datetime functions (#21477)
neilconway Apr 10, 2026
eaf0a41
perf: Optimize NULL handling in `array_has` (#21471)
neilconway Apr 10, 2026
0626ca3
perf: Optimize `Utf8View` string concat (#21535)
neilconway Apr 10, 2026
5e60df6
fix: DataFusion benchmark panicked: failed to cast '2013-07-01' to UI…
xiedeyantu Apr 10, 2026
3911f0c
chore: Add array_slice tests for overlapping nulls across inputs (#21…
neilconway Apr 10, 2026
540d8ec
Migrate PhysicalExprAdapter to unified CastExpr and remove CastColumn…
kosiew Apr 10, 2026
374806c
fix(sql): return planner error for malformed typed literals (#21454)
officialasishkumar Apr 11, 2026
d4e629f
fix: Preserve quoted mixed-case identifiers in the `pivot_unpivot` e…
niebayes Apr 11, 2026
ec00112
Conditionally build page pruning predicates (#21480)
fpetkovski Apr 11, 2026
0ab78e7
fix(spark): array_repeat returns repeated NULLs instead of NULL when …
buraksenn Apr 11, 2026
16e578d
Unify cast handling by removing `CastColumnExpr` branches in pruning …
kosiew Apr 12, 2026
bb1c8e6
remove as_any from TableProvider, SchemaProvider, CatalogProvider, an…
timsaucer Apr 12, 2026
d68373e
fix: grouping with alias (#21438)
timsaucer Apr 12, 2026
c253bfb
feat: Add pluggable StatisticsRegistry for operator-level statistics …
asolimando Apr 13, 2026
29c5dd5
[datafusion-spark] Add Spark-compatible ceil function (#20593)
shivbhatia10 Apr 13, 2026
98d280f
sql: render PostgreSQL array literals as ARRAY[...] in unparser (#21513)
xiedeyantu Apr 13, 2026
70e7730
fix(spark): mod/pmod returns NULL instead of NaN for float division b…
buraksenn Apr 13, 2026
62b8ec4
feat: Add Hash trait to Aggregate enums (#21569)
rluvaton Apr 13, 2026
d3cedb2
port 52.5.0 changelog to main (#21553)
alamb Apr 13, 2026
0143dfe
physical_optimizer: preserve_file_partitions when num file groups < t…
jayshrivastava Apr 13, 2026
cda7b5c
EliminateOuterJoin with Like, IsTrue, IsFalse, IsNotUnknown (#21549)
SubhamSinghal Apr 14, 2026
f1c643a
fix: LazyMemoryExec should produce independent streams per execute() …
viirya Apr 14, 2026
e0fd16e
feat(substrait): support Placeholder <-> DynamicParameter in Substrai…
bvolpato Apr 14, 2026
41dd942
Add `arrow_field(expr)` scalar UDF (#21389)
adriangb Apr 14, 2026
a13c23d
Remove CastColumnExpr and custom_file_casts example; unify on field-a…
kosiew Apr 14, 2026
9dab336
feat: add `with_metadata` scalar UDF to attach Arrow field metadata (…
adriangb Apr 14, 2026
90521e8
chore(deps): bump hashbrown from 0.16.1 to 0.17.0 (#21611)
dependabot[bot] Apr 14, 2026
6db3899
chore(deps): bump ctor from 0.8.0 to 0.10.0 (#21612)
dependabot[bot] Apr 14, 2026
776b723
Rewrite FileStream in terms of Morsel API (#21342)
alamb Apr 14, 2026
961c5fc
perf: Optimize NULL handling in `StringViewArrayBuilder` (#21538)
neilconway Apr 14, 2026
3763ad4
feat: Additional Canonical Extension Types (#21291)
tschwarzinger Apr 14, 2026
a1b536e
Consolidate special case `regexp_match` logic (#21486)
alamb Apr 14, 2026
4d069f9
Reorder `cargo publish` commands by dependency (#21552)
alamb Apr 14, 2026
607f3d2
chore(deps): bump taiki-e/install-action from 2.74.0 to 2.75.10 (#21605)
dependabot[bot] Apr 14, 2026
0073ff2
chore(deps): update jinja2 requirement from <4,>=3.1 to >=3.1.6,<4 in…
dependabot[bot] Apr 14, 2026
e6b32fe
chore(deps): bump the all-other-cargo-deps group across 1 directory w…
dependabot[bot] Apr 14, 2026
2818abb
bench: first_last remove noisy benchmarks, add update_batch (#21487)
theirix Apr 14, 2026
f9239a1
feat: Add memory-limited execution for NestedLoopJoinExec (#21448)
viirya Apr 14, 2026
ccbcded
chore: Fix `typo` problems (#21495)
erenavsarogullari Apr 14, 2026
cc05c3b
chore(deps): update pydata-sphinx-theme requirement from <1,>=0.16 to…
dependabot[bot] Apr 14, 2026
dc973cc
chore(deps-dev): bump follow-redirects from 1.15.6 to 1.16.0 in /data…
dependabot[bot] Apr 14, 2026
244f891
feat(stats): cap NDV at row count in statistics estimation (#21081)
asolimando Apr 15, 2026
26c6121
feat: support `array_compact` builtin function (#21522)
comphead Apr 15, 2026
d0692b8
bench: Scale sort benchmarks to 1M rows to exercise merge path (#21630)
mbutrovich Apr 15, 2026
7b2f284
fix: json scan performance on local files (#21478)
ariel-miculas Apr 15, 2026
edf8ad3
fix(benchmarks): correct TPC-H benchmark SQL (#21615)
kumarUjjawal Apr 15, 2026
f99ba69
Add release management page to the documentation (#21001)
alamb Apr 15, 2026
5c653be
Port filter_pushdown.rs async tests to sqllogictest (#21620)
adriangb Apr 15, 2026
7bf39b5
chore: fix cargo audit and dependencies check on main (#21655)
alamb Apr 15, 2026
240fbdb
fix: suppress nondeterministic metrics in agg_dyn_e2e sqllogictest (#…
mbutrovich Apr 15, 2026
dc6142e
Remove `as_any` on the `PhysicalExpr` trait (#21573)
timsaucer Apr 15, 2026
936db37
Perf: Window topn optimisation (#21479)
SubhamSinghal Apr 16, 2026
cc4717a
fix: Fix compilation error on `main` (#21664)
2010YOUY01 Apr 16, 2026
fc088b7
chore(deps): update setuptools requirement from <83,>=82 to >=82.0.1,…
dependabot[bot] Apr 16, 2026
8dedd12
chore(deps): update maturin requirement from <2,>=1.11 to >=1.13.1,<2…
dependabot[bot] Apr 16, 2026
dbe3395
docs: Update `map_extract` examples (#21360)
nuno-faria Apr 16, 2026
9eefb7c
fix: `median` retract logic for sliding window frames (#21300)
lyne7-sc Apr 16, 2026
bd2af68
Spark make_valid_utf8 function implementation (#20633)
kazantsev-maksim Apr 16, 2026
a0dbbab
fix: Fix Spark `slice` function `Null` type to `GenericListArray` cas…
erenavsarogullari Apr 16, 2026
7bfa3fb
chore(deps): update tokio from 1.51 to 1.52 (#21670)
ahmed-mez Apr 16, 2026
8b47f45
fix: Remove nested async block causing Stacked Borrows violation in P…
mbutrovich Apr 16, 2026
4c7bb08
Use ListArray nullability instead of offsets for `array_element`, `ar…
tabac Apr 16, 2026
3b5008a
fix: impl `handle_child_pushdown_result` for `SortExec` (#21527)
haohuaijin Apr 16, 2026
9873357
chore: breakdown `array.slt` into smaller files (#21658)
comphead Apr 16, 2026
2ef0217
fix: SortMergeJoin full outer join incorrectly matches rows when filt…
mbutrovich Apr 16, 2026
8777251
chore: Add more tests with `GROUP BY` to test spark `collect_set` (#2…
comphead Apr 16, 2026
44625fb
docs: add April 2026 readings and meetup links (#21644)
alamb Apr 16, 2026
7731130
feat: add a config to disable subquery_sort_elimination (#21614)
haohuaijin Apr 16, 2026
93ae1b8
fix: try again to fix Miri in ParquetOpener (#21680)
mbutrovich Apr 16, 2026
ef9a80c
Add strategy-focused InList benchmarks (#21648)
geoffreyclaude Apr 16, 2026
1068686
perf: add fast path for uniform fill values in `array_resize` (#20617)
lyne7-sc Apr 16, 2026
1f0faf9
perf : Optimize count distinct using bitmaps instead of hashsets for …
coderfender Apr 16, 2026
4b8c1d9
Fix massive spill files for StringView/BinaryView columns II (#21633)
adriangb Apr 16, 2026
d1800db
fix: `optimize_projections` failure after mark joins created by `EXIS…
buraksenn Apr 17, 2026
4e015db
fix: import from `datafusion_expr` in `make_valid_utf8` (#21687)
hcrosse Apr 17, 2026
882be98
chore: Backport 53.1.0 changelog (#21686)
comphead Apr 17, 2026
8229509
refactor: Introduce SpillState enum for memory-limited NLJ execution …
viirya Apr 17, 2026
769e214
Support Date32/Date64 in unwrap_cast optimization (#21665)
Dandandan Apr 17, 2026
97172e2
perf: Optimize `left`, `right` to reduce copying (#21442)
neilconway Apr 17, 2026
fab3b71
feat[expr-common]: add REE arithmetic coercion for numeric and decima…
asubiotto Apr 17, 2026
5a427cb
perf: Optimize `substr` for Utf8, LargeUtf8 (#21366)
neilconway Apr 17, 2026
e5966b5
fix: linearized operands in physical binaryexpr protobuf to avoid rec…
haohuaijin Apr 17, 2026
2f0ca3d
feat: extend single ndv optimization to non-arithmetic supporting typ…
buraksenn Apr 17, 2026
4a5d130
fix: remove unnecessary `as_any()` to fix compilation error (#21693)
Jefffrey Apr 17, 2026
6cd7e83
feat: extend interval analysis support for temporal types (#21520)
buraksenn Apr 17, 2026
03b390d
Remove trait function `as_any` from datafusion-datasource (#21576)
timsaucer Apr 17, 2026
afc0784
feat: add sort_pushdown_inexact benchmark for RG reorder (#21674)
zhuqi-lucas Apr 17, 2026
29f1acd
feat: change approx percentile/median UDFs to return floats (#21074)
theirix Apr 18, 2026
8a650f5
Make `test_display_pg_json` pass regardless of build setup and depend…
AdamGS Apr 18, 2026
b75df6f
fix: Prevent CLI crash on wide tables (#21721)
Geethapranay1 Apr 18, 2026
1fbbba5
feat: support '>', '<', '>=', '<=', '<>' in all operator (#21416)
buraksenn Apr 18, 2026
6aa5a7e
refactor: Share left-side spill file across partitions on OOM fallbac…
viirya Apr 19, 2026
d8c9797
Spark is_valid_utf8 function implementation (#21627)
kazantsev-maksim Apr 19, 2026
43d32a8
chore: use bench array helpers from Arrow bench_util (#21544)
theirix Apr 19, 2026
fd882fb
feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys (…
xiedeyantu Apr 19, 2026
7e1a710
fix(unparser): make `BigQueryDialect` more robust (#21296)
sgrebnov Apr 19, 2026
9c0edcc
chore: add count distinct group benchmarks (#21575)
coderfender Apr 19, 2026
90a8117
feat: Add support for `LEFT JOIN LATERAL` (#21352)
neilconway Apr 19, 2026
8614308
perf: Optimize logical optimizer's `OptimizeProjections` pass (#21726)
neilconway Apr 19, 2026
935382f
perf: Optimize `DFSchema::qualified_name` (#21722)
neilconway Apr 19, 2026
675881d
fix: insert placeholder type inference showing wrong type when there …
buraksenn Apr 19, 2026
03ca0aa
perf: Tweak vec capacity in `project_statistics` (#21734)
neilconway Apr 19, 2026
3aaf393
minor: More comments to `read_spill_as_stream` (#21713)
2010YOUY01 Apr 20, 2026
466c3ea
Dynamic work scheduling in FileStream (#21351)
alamb Apr 20, 2026
a311d14
chore: Update Release instructions (#21705)
comphead Apr 20, 2026
7acbe03
test: add tests for spill file sizes to verify View GC (#21750)
RatulDawar Apr 20, 2026
c470bb1
chore: backport version from `branch-53`, update some dependencies (#…
comphead Apr 21, 2026
e524f49
fix: array_concat widens container variant for mixed List/LargeList i…
hcrosse Apr 21, 2026
9b5e43e
feat: Expose used `MemoryPool` details in `ResourcesExhausted` error …
erenavsarogullari Apr 21, 2026
526f0cb
perf: Reduce `Box` and `Arc` allocation churn during tree rewriting (…
neilconway Apr 21, 2026
a737c27
feat: estimate cardinality for semi and anti-joins using distinct cou…
buraksenn Apr 21, 2026
5baa6ef
chore(deps): bump astral-sh/setup-uv from 8.0.0 to 8.1.0 (#21759)
dependabot[bot] Apr 21, 2026
9a851d6
fix: Fix local `datafusion-cli` test failure (#21761)
2010YOUY01 Apr 21, 2026
c641262
chore(deps): bump aws-config from 1.8.15 to 1.8.16 in the all-other-c…
dependabot[bot] Apr 21, 2026
af67cdd
chore(deps): bump github/codeql-action from 4.35.1 to 4.35.2 (#21758)
dependabot[bot] Apr 21, 2026
9a1ed57
chore(deps): bump taiki-e/install-action from 2.75.10 to 2.75.18 (#21…
dependabot[bot] Apr 21, 2026
cfafce4
chore: add `array_remove_*` NULL handling changes to `Upgrade Guide` …
comphead Apr 21, 2026
5f8b131
perf: Implement groups accumulator count distinct primitive types (#2…
coderfender Apr 22, 2026
192ceb6
Snowflake Unparser dialect and UNNEST support (#21593)
yonatan-sevenai Apr 22, 2026
e5d9145
perf: Optimize approx count distinct using bitmaps instead of HLL for…
coderfender Apr 22, 2026
e7f7fa9
fix: Validate spill read schema (#21738)
2010YOUY01 Apr 22, 2026
5d508d3
Skip files outside partition structure in hive-partitioned listing ta…
zhuqi-lucas Apr 22, 2026
64619a6
fix: improve sort pushdown benchmark data and add DESC LIMIT queries …
zhuqi-lucas Apr 22, 2026
8a45d02
feat: support `ListView` and `LargeListView` in `ScalarValue` (#21669)
Jefffrey Apr 22, 2026
8875956
Handle canceled partitioned hash join dynamic filters lazily (#21666)
adriangb Apr 22, 2026
ff844be
Improve ergonomics for ExecutionPlanMetricsSet and MetricsSet (#21762)
gabotechs Apr 22, 2026
4bff17e
[Minor]: unify ANY/ALL planning and align ANY NULL semantics with PG …
buraksenn Apr 22, 2026
4fbdfd0
[Minor]: fix security audit because of rustls-webpki version (#21785)
buraksenn Apr 22, 2026
83c2c01
fix: rebind RecursiveQueryExec batches to the declared output schema …
adriangb Apr 22, 2026
766dff1
chore: Rename concat-specific string builders, make pub(crate) (#21695)
neilconway Apr 22, 2026
eca2e00
refactor: Simplify NLJ re-scans with `ReplayableStreamSource` (#21742)
2010YOUY01 Apr 23, 2026
73ebfcc
docs: fix some comments on query_planning example (#21783)
jotare Apr 23, 2026
82abcbd
ci: permit stale workflow to delete cache (#21772)
Jefffrey Apr 23, 2026
643db7a
feat: add cosine_distance scalar function (#21542)
crm26 Apr 23, 2026
95157ef
Unparser drops ORDER BY alias when flattening Projection through Subq…
yonatan-sevenai Apr 23, 2026
9d73d09
chore: re-enable `add_months` overflow test (#21774)
Jefffrey Apr 23, 2026
067ba4b
chore: add aggregation test for listview types (#21776)
Jefffrey Apr 23, 2026
fd093fb
chore: re-enable `array_union` nested null array edge case test (#21773)
Jefffrey Apr 24, 2026
14c18ec
fix: Enable `arrow-ipc/zstd` in `datasource-arrow` to make `test_spil…
AdamGS Apr 24, 2026
cc67c13
Fix: allow coercion from Binary and LargeBinary into BinaryView (#21800)
bert-beyondloops Apr 24, 2026
1ea328d
chore: leave specialised bench helpers (#21810)
theirix Apr 24, 2026
85e75e2
Add quote style and trimming to csv writier (#20813)
xanderbailey Apr 24, 2026
7d5ddca
perf: Optimize `lower`, `upper` for sliced arrays (#21814)
neilconway Apr 24, 2026
794f30e
perf: Add bulk NULL-aware string builders, use in `lower` and `upper`…
neilconway Apr 24, 2026
89e14f1
chore(deps): bump picomatch from 2.3.1 to 2.3.2 in /datafusion/wasmte…
dependabot[bot] Apr 25, 2026
aab4263
perf(substr_index): speed up scalar and Utf8View (#21754)
kumarUjjawal Apr 25, 2026
6f1040b
Fix PushdownSort dropping LIMIT when eliminating SortExec (#21744)
sgrebnov Apr 25, 2026
65f337d
perf: Use bulk-NULL builder in `uuid` (#21845)
neilconway Apr 25, 2026
7fa6e21
Skip map_expressions rebuild for Extension nodes with empty expressio…
zhuqi-lucas Apr 27, 2026
1897c28
chore: use Arc::unwrap_or_clone in more places (#21823)
Dandandan Apr 27, 2026
0ebecd5
build: explicitly set `publish = false` for internal crates (#21869)
rluvaton Apr 27, 2026
a2c0c8a
Refactor InListExpr into static-filter modules (#21649)
geoffreyclaude Apr 27, 2026
1dbf264
docs: fix typos in documentation (#21875)
jx2lee Apr 27, 2026
a28c099
chore: bump API limit for stale workflow (#21867)
Jefffrey Apr 27, 2026
4e71785
perf: Use bulk-NULL string builder in `initcap` (#21863)
neilconway Apr 27, 2026
d193962
fix: Do not highlight the CLI hint directly (#21858)
nuno-faria Apr 27, 2026
b11b99b
chore: bump `sha` & `md-5` to `0.11.0` (#21840)
Jefffrey Apr 27, 2026
68061a5
docs: refresh CLI usage output in the user guide (#21874)
jx2lee Apr 27, 2026
bff0ffb
docs: clarify ExecutionProps and TaskContext docs (#21872)
alamb Apr 27, 2026
2676297
chore: add internal markdown link check (#21831)
Geethapranay1 Apr 27, 2026
62ad66b
perf: Use bulk-NULL builder in `chr` (#21847)
neilconway Apr 27, 2026
40b209e
feat: remove `__unnest_placeholder` from struct unnest projection (#…
akoshchiy Apr 27, 2026
ca1d39d
perf: implement convert_to_state for SparkAvg (#21548)
azhangd Apr 27, 2026
1bb588e
perf: Implement physical execution of uncorrelated scalar subqueries …
neilconway Apr 27, 2026
95ef332
perf: optimise `first_value`, `last_value` aggregate function (#21383)
theirix Apr 27, 2026
af7904f
feat : ABI upgrade from abi_stabby to stabby since abi_stable is no l…
coderfender Apr 27, 2026
f802ed1
Add protobuf serialization/deserialization support for `EmptyTable` s…
OlegWock Apr 27, 2026
310dd5d
Support Dictionary Arrays in MIN/MAX Aggregates (#21315)
kosiew Apr 28, 2026
6a09260
Fix some GH action permission issues identified by CodeQL (#21838)
Jefffrey Apr 28, 2026
54a5515
perf(spark): use 256-entry byte-pair table in hex encoding (#21836)
Scolliq Apr 28, 2026
bbf67d9
Add lambda support and array_transform udf (#21679)
gstvg Apr 28, 2026
c2c0773
feat(unparser): Keep inner join `Filter → TableScan` predicates to `W…
sgrebnov Apr 28, 2026
22bb4e6
Add support for nested types to nullif. (#21764)
tabac Apr 28, 2026
ec92925
perf: Optimize `substr_index` to use bulk-NULL string builder (#21877)
neilconway Apr 28, 2026
686d617
Update documentation for PhysicalExpr::evaluate_bounds (#21879)
alamb Apr 28, 2026
2a2b5b0
chore(deps): bump taiki-e/install-action from 2.75.18 to 2.75.23 (#21…
dependabot[bot] Apr 28, 2026
288a5f1
chore(deps): update pydata-sphinx-theme requirement from <1,>=0.17.0 …
dependabot[bot] Apr 28, 2026
53bd344
chore(deps): bump libloading from 0.8.9 to 0.9.0 (#21890)
dependabot[bot] Apr 28, 2026
4876cdc
refactor `array_remove` benchmarks & add nested benches (#21834)
Jefffrey Apr 28, 2026
720aaff
perf: Use bulk-NULL builder in `replace` (#21849)
neilconway Apr 28, 2026
8f033e4
feat: minor lambda perf improvements (#21896)
comphead Apr 28, 2026
b9cf885
Update `astral-tokio-tar` to appease cargo_audit (#21902)
alamb Apr 29, 2026
e8a93bb
feat: automatically cast `ListView` to `List` for UDFs (#21855)
Jefffrey Apr 29, 2026
61fe692
Remove unnecessary Mutex in SharedMemoryReservation (#21899)
gabotechs Apr 29, 2026
73ca6a5
ci: add breaking change detector (#21499)
rluvaton Apr 29, 2026
3aefba7
fix: fix elapsed_compute metric in ParquetSink to report encoding tim…
fred1268 Apr 29, 2026
66980e2
Fix GH action permissions in `rust.yml` and `docs.yaml` workflows (#2…
Jefffrey Apr 29, 2026
f0c5306
docs(optimizer): add generated optimizer rules reference (#21824)
kumarUjjawal Apr 29, 2026
5fda216
chore: fix `iff` typos (#21904)
comphead Apr 29, 2026
2b95cde
Add SQL based benchmarking harness, port tpch to use framework (#21707)
Omega359 Apr 29, 2026
985f0a4
Deduplicate InList primitive static filters (#21932)
geoffreyclaude Apr 29, 2026
72fe20b
Fix nesting of permissions block in docs workflow (#21930)
Jefffrey Apr 29, 2026
42cd2fa
dependencies check are now required to merge ci (#21940)
blaginin Apr 29, 2026
0bb17bc
build: allow posting comments on PRs made from forks and fix missing …
rluvaton Apr 30, 2026
0144570
perf: Add `BulkNullStringArrayBuilder` trait, use in `repeat` (#21854)
neilconway Apr 30, 2026
1364286
fix: grouping separator for float and decimal (#20268)
Druva-D Apr 30, 2026
3deeadd
perf: strength reduce hash partition modulo (up to 1.16x faster) (#21…
Dandandan Apr 30, 2026
040c21e
fix: Fix `.gitignore` in `benchmarks/` (#21954)
2010YOUY01 Apr 30, 2026
d09a919
Use shared statistics merge for union stats (#21430)
kumarUjjawal Apr 30, 2026
702f479
Add ClickBench URL pushdown benchmark (#21945)
xudong963 Apr 30, 2026
e18b1cf
add any_match higher-order function (#21903)
LiaCastaneda Apr 30, 2026
c2824b5
test(sqllogictest): stabilize parquet output_rows_skew with WITH ORDE…
RatulDawar Apr 30, 2026
d648982
Skip unnecessary plan rebuild in adjust_input_keys_ordering for non-j…
zhuqi-lucas Apr 30, 2026
9bbc28b
Adding Use of arrow's has_true() / has_false() (#21806)
raushanprabhakar1 Apr 30, 2026
f3cebc5
feat[expr-common]: support regex and LIKE coercion on REE and Dict va…
asubiotto May 1, 2026
e514a01
perf: optimize retract_batch for `median` and `percentile_cont` (#21894)
lyne7-sc May 1, 2026
37dbdaf
feat[expr-common]: support REE in coalesce (#21919)
asubiotto May 1, 2026
ea0928c
feat: support binary arguments for StringConcat operator (#21883)
theirix May 1, 2026
d59bc72
fix(proto): correctly serialize FilterExec empty projection (#21885)
Adez017 May 1, 2026
bb86364
fix: Make conversion from FileDecryptionProperties to ConfigFileDecry…
adamreeve May 1, 2026
948cd09
proto: serialize and dedupe dynamic filters v2 (#21807)
jayshrivastava May 1, 2026
ba038e9
chore: fix `datafusion-spark` substring (#21963)
comphead May 1, 2026
c3aef20
proto: serialize dynamic filters on Sort, Aggregate, HashJoin
jayshrivastava Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
71 changes: 14 additions & 57 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,20 @@ github:
main:
required_pull_request_reviews:
required_approving_review_count: 1
required_status_checks:
contexts:
- "Check License Header"
- "Use prettier to check formatting of documents"
- "Check Markdown Links"
- "Validate required_status_checks in .asf.yaml"
- "Spell Check with Typos"
- "Circular Dependency Check"
- "Detect Unused Dependencies"
# needs to be updated as part of the release process
# .asf.yaml doesn't support wildcard branch protection rules, only exact branch names
# https://github.com/apache/infrastructure-asfyaml?tab=readme-ov-file#branch-protection
# Keeping set of protected branches for future releases
# Meanwhile creating a prerelease script that will update the branch protection names
# automatically. Keep track on it https://github.com/apache/datafusion/issues/17134
# these branches protection blocks autogenerated during release process which is described in
# https://github.com/apache/datafusion/tree/main/dev/release#2-add-a-protection-to-release-candidate-branch
branch-50:
required_pull_request_reviews:
required_approving_review_count: 1
Expand All @@ -66,66 +74,15 @@ github:
branch-52:
required_pull_request_reviews:
required_approving_review_count: 1
branch-53:
required_pull_request_reviews:
required_approving_review_count: 1
branch-54:
required_pull_request_reviews:
required_approving_review_count: 1
branch-55:
required_pull_request_reviews:
required_approving_review_count: 1
branch-56:
required_pull_request_reviews:
required_approving_review_count: 1
branch-57:
required_pull_request_reviews:
required_approving_review_count: 1
branch-58:
required_pull_request_reviews:
required_approving_review_count: 1
branch-59:
required_pull_request_reviews:
required_approving_review_count: 1
branch-60:
required_pull_request_reviews:
required_approving_review_count: 1
branch-61:
required_pull_request_reviews:
required_approving_review_count: 1
branch-62:
required_pull_request_reviews:
required_approving_review_count: 1
branch-63:
required_pull_request_reviews:
required_approving_review_count: 1
branch-64:
required_pull_request_reviews:
required_approving_review_count: 1
branch-65:
required_pull_request_reviews:
required_approving_review_count: 1
branch-66:
required_pull_request_reviews:
required_approving_review_count: 1
branch-67:
required_pull_request_reviews:
required_approving_review_count: 1
branch-68:
required_pull_request_reviews:
required_approving_review_count: 1
branch-69:
required_pull_request_reviews:
required_approving_review_count: 1
branch-70:
required_pull_request_reviews:
required_approving_review_count: 1
pull_requests:
# enable updating head branches of pull requests
allow_update_branch: true
allow_auto_merge: true
# auto-delete head branches after being merged
del_branch_on_merge: true

# publishes the content of the `asf-site` branch to
# https://datafusion.apache.org/
publish:
whoami: asf-site

14 changes: 14 additions & 0 deletions .github/actions/setup-builder/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,17 @@ runs:
# https://github.com/actions/checkout/issues/766
shell: bash
run: git config --global --add safe.directory "$GITHUB_WORKSPACE"
- name: Remove unnecessary preinstalled software
shell: bash
run: |
echo "Disk space before cleanup:"
df -h
apt-get clean
# remove tool cache: about 8.5GB (github has host /opt/hostedtoolcache mounted as /__t)
rm -rf /__t/* || true
# remove Haskell runtime: about 6.3GB (host /usr/local/.ghcup)
rm -rf /host/usr/local/.ghcup || true
# remove Android library: about 7.8GB (host /usr/local/lib/android)
rm -rf /host/usr/local/lib/android || true
echo "Disk space after cleanup:"
df -h
20 changes: 19 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ updates:
interval: weekly
target-branch: main
labels: [auto-dependencies]
open-pull-requests-limit: 15
ignore:
# major version bumps of arrow* and parquet are handled manually
- dependency-name: "arrow*"
Expand All @@ -44,10 +45,27 @@ updates:
patterns:
- "prost*"
- "pbjson*"

# Catch-all: group only minor/patch into a single PR,
# excluding deps we want always separate (and excluding arrow/parquet which have their own group)
all-other-cargo-deps:
applies-to: version-updates
patterns:
- "*"
exclude-patterns:
- "arrow*"
- "parquet"
- "object_store"
- "sqlparser"
- "prost*"
- "pbjson*"
update-types:
- "minor"
- "patch"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
interval: "weekly"
open-pull-requests-limit: 10
labels: [auto-dependencies]
- package-ecosystem: "pip"
Expand Down
11 changes: 8 additions & 3 deletions .github/workflows/audit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,22 @@ on:
paths:
- "**/Cargo.toml"
- "**/Cargo.lock"

merge_group:

permissions:
contents: read

jobs:
security_audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install cargo-audit
uses: taiki-e/install-action@f535147c22906d77695e11cb199e764aa610a4fc # v2.62.46
uses: taiki-e/install-action@481c34c1cf3a84c68b5e46f4eccfc82af798415a # v2.75.23
with:
tool: cargo-audit
- name: Run audit check
# Note: you can ignore specific RUSTSEC issues using the `--ignore` flag ,for example:
# run: cargo audit --ignore RUSTSEC-2026-0001
run: cargo audit
142 changes: 142 additions & 0 deletions .github/workflows/breaking_changes_detector.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Detect semver-incompatible (breaking) API changes in crates modified by a PR.
#
# Only public workspace crates that have file changes are checked.
# Internal crates (benchmarks, test-utils, sqllogictest, doc) are excluded.
#
# This workflow only runs cargo-semver-checks and uploads the result as an
# artifact. The actual PR comment is posted by a companion workflow
# (`breaking_changes_detector_comment.yml`) that picks up the artifact via
# `workflow_run`.
#
# Why split it?
# "The GITHUB_TOKEN has read-only permissions in pull requests from forked
# repositories."
# https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request
# A read-only token cannot post comments, so on fork PRs the previous
# single-workflow design failed with HTTP 403. We can't simply broaden the
# trigger here either: cargo-semver-checks compiles PR code (build.rs, proc
# macros), so granting this job a write token would expose it to any code
# in the PR. And ASF infra policy independently forbids `pull_request_target`
# for any workflow that exposes GITHUB_TOKEN
# (https://infra.apache.org/github-actions-policy.html). The companion
# `workflow_run` workflow runs in the base-repo context with write access
# and never executes PR code.

name: "Detect breaking changes"

on:
pull_request:
branches:
- main

permissions:
contents: read

jobs:
check-semver:
name: Check semver
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0

# For fork PRs, `origin` points to the fork, not the upstream repo.
# Explicitly fetch the base branch from the upstream repo so we have
# a valid baseline ref for both diff and semver-checks.
- name: Fetch base branch
env:
BASE_REF: ${{ github.base_ref }}
REPO: ${{ github.repository }}
run: git fetch "https://github.com/${REPO}.git" "${BASE_REF}:refs/remotes/origin/${BASE_REF}"

- name: Determine changed crates
id: changed_crates
env:
BASE_REF: ${{ github.base_ref }}
run: |
PACKAGES=$(ci/scripts/changed_crates.sh changed-crates "origin/${BASE_REF}")
echo "packages=$PACKAGES" >> "$GITHUB_OUTPUT"
echo "Changed crates: $PACKAGES"

# `datafusion-substrait` (and crates that depend on it via sqllogictest)
# have a build script that calls protoc, which is not preinstalled on
# ubuntu-latest runners.
- name: Install Protobuf Compiler
if: steps.changed_crates.outputs.packages != ''
run: |
sudo apt-get update
sudo apt-get install -y protobuf-compiler

- name: Install cargo-semver-checks
if: steps.changed_crates.outputs.packages != ''
uses: taiki-e/install-action@94cb46f8d6e437890146ffbd78a778b78e623fb2 # v2.74.0
with:
tool: cargo-semver-checks

- name: Run cargo-semver-checks
id: check_semver
if: steps.changed_crates.outputs.packages != ''
env:
BASE_REF: ${{ github.base_ref }}
PACKAGES: ${{ steps.changed_crates.outputs.packages }}
run: |
set +e
# `tee` lets cargo's output stream live into the Actions log
# while we also keep a copy for the PR comment.
ci/scripts/changed_crates.sh semver-check "origin/${BASE_REF}" $PACKAGES \
2>&1 | tee /tmp/semver-output.txt
EXIT_CODE=${PIPESTATUS[0]}
# Pass the result through an output instead of failing the job:
# a detected breaking change should surface as a PR comment, not a
# red check, so PR authors aren't confused by an intentional break.
if [ "$EXIT_CODE" -eq 0 ]; then
echo "result=success" >> "$GITHUB_OUTPUT"
else
echo "result=failure" >> "$GITHUB_OUTPUT"
fi

# Stage the data the companion comment workflow needs into a single
# directory. We default the result to "success" so the comment
# workflow clears any stale comment when the check step is skipped
# (e.g. no published crates changed).
- name: Stage artifact for comment workflow
if: always()
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
CHECK_RESULT: ${{ steps.check_semver.outputs.result || 'success' }}
run: |
mkdir -p semver-artifact
echo "$PR_NUMBER" > semver-artifact/pr_number
echo "$CHECK_RESULT" > semver-artifact/result
if [ -f /tmp/semver-output.txt ]; then
sed 's/\x1b\[[0-9;]*m//g' /tmp/semver-output.txt > semver-artifact/logs
else
: > semver-artifact/logs
fi

- name: Upload artifact
if: always()
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: semver-check-result
path: semver-artifact/
retention-days: 1
Loading
Loading