Skip to content

Serialize dynamic filters on execution plan nodes (HashJoin, Aggregate, Sort)#2

Closed
jayshrivastava wants to merge 1589 commits intomainfrom
js/serialize-dynamic-filters-in-execution-plans-2
Closed

Serialize dynamic filters on execution plan nodes (HashJoin, Aggregate, Sort)#2
jayshrivastava wants to merge 1589 commits intomainfrom
js/serialize-dynamic-filters-in-execution-plans-2

Conversation

@jayshrivastava
Copy link
Copy Markdown
Owner

@jayshrivastava jayshrivastava commented Feb 20, 2026

Which issue does this PR close?

Informs: datafusion-contrib/datafusion-distributed#180
Follow up for: apache#20416

Rationale for this change

I'm interested in serializing a physical plan (post-physical optimizer) and executing it on a remote node. To do so, I need dynamic filters and references/pointers to dynamic filters to be preserved in the plan. Currently, nodes which produce filters such as HashJoinExec, AggregateExec, and SortExec, do not serialize their dynamic filters.

This change intends to update the above nodes to serialize dynamic filters and adds tests for the scenario above.

What changes are included in this PR?

Proto schema (datafusion.proto)

Added PhysicalExprNode dynamic_filter field to:

  • HashJoinExecNode (tag 11)
  • AggregateExecNode (tag 13)
  • SortExecNode (tag 5)
Plan node public API

Added with_dynamic_filter() and dynamic_filter() to HashJoinExec, AggregateExec, SortExec.

with_dynamic_filter() always

  • validates that the filter is valid for the plan node's schema
  • resets any internal state related to the dynamic filter
Serde

Using the new plan node public APIs above

  • Each node's try_from_* serialization function now reads dynamic_filter()
    and serializes it via the proto converter
  • Each node's try_into_* deserialization function deserializes the field,
    downcasts to DynamicFilterPhysicalExpr, and sets it on the node

Are these changes tested?

  1. Added tests which create this plan and perform round-trip serialization on it (1 test each for HashJoinExec, AggregateExec, and SortExec).
    HashJoinExec ─── dynamic filter
         │               │  
         ▼               │
    FilterExec           | (optimizer pushes down this filter
         │               |
         ▼               ▼ 
    DataSourceExec  ─ dynamic filter 
  1. Added tests for with_dynamic_filter() and dynamic_filter() on dynamic_filter() to HashJoinExec, AggregateExec, SortExec.

@jayshrivastava jayshrivastava changed the title wip Serialize dynamic filters on execution plan nodes (HashJoin, Aggregate, Sort) Feb 23, 2026
@jayshrivastava jayshrivastava force-pushed the js/serialize-dynamic-filters-in-execution-plans-2 branch from ff17e8a to 00b2a63 Compare February 23, 2026 19:44
@jayshrivastava
Copy link
Copy Markdown
Owner Author

Note for reviewers: I'm unsure if I should be using apply_expressions() or expressions() (see apache#20337) instead of with_dynamic_filter() and dynamic_filter()

Comment on lines +1039 to +1042
/// Returns the dynamic filter expression for this aggregate, if set.
pub fn dynamic_filter(&self) -> Option<&Arc<DynamicFilterPhysicalExpr>> {
self.dynamic_filter.as_ref().map(|df| &df.filter)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to use apply_expressions (apache#20337), mainly because it's more generic and you can do basically anything with PhysicalExprs inside a plan, including detecting dynamic filters, and you wouldn't need to know beforehand which nodes are producers and consumers -- any custom logic can be done separately in the proto crate. It would also reduce overhead to people who wants to add a new ExecutionPlan that holds a DynamicFilterPhysicalExpr, they'd have to remember to also add the manual dynamic_filter() call here. Implementation for apply_expressions is part of ExecutionPlan and will not be optional, so users will not forget they have to do it in every node.

Comment on lines +1057 to +1060
pub fn with_dynamic_filter(
mut self,
filter: Arc<DynamicFilterPhysicalExpr>,
) -> Result<Self> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we do something similar for every producer/consumer, a more generic way to modify the expressions would probably implementing map_expressions like suggested here in ExecutionPlan to make it more generic?

@jayshrivastava jayshrivastava force-pushed the js/dedupe-dynamic-filter-inner-state branch from c5d0e2f to fef4259 Compare February 26, 2026 18:48
@jayshrivastava jayshrivastava force-pushed the js/serialize-dynamic-filters-in-execution-plans-2 branch 2 times, most recently from 4889d13 to ed4c611 Compare February 26, 2026 18:53
jayshrivastava added a commit that referenced this pull request Feb 26, 2026
Fixups for the cherry-picked commits from PRs apache#19437, apache#20037, apache#20416,
and #2 to work with branch-52's partition-index APIs:

- Update remap_children callers to use instance method signature
- Adapt DynamicFilterUpdate::Global enum for new code paths
- Add missing partitioned_exprs/runtime_partition fields to new constructors
- Remove null_aware field (not on branch-52)
- Replace FilterExecBuilder with FilterExec::try_new
- Remove non-compiling tests that depend on upstream-only APIs
- Fix duplicate imports in roundtrip test file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gene-bordegaray pushed a commit to DataDog/datafusion that referenced this pull request Feb 26, 2026
Fixups for the cherry-picked commits from PRs apache#19437, apache#20037, apache#20416,
and jayshrivastava#2 to work with branch-52's partition-index APIs:

- Update remap_children callers to use instance method signature
- Adapt DynamicFilterUpdate::Global enum for new code paths
- Add missing partitioned_exprs/runtime_partition fields to new constructors
- Remove null_aware field (not on branch-52)
- Replace FilterExecBuilder with FilterExec::try_new
- Remove non-compiling tests that depend on upstream-only APIs
- Fix duplicate imports in roundtrip test file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jayshrivastava jayshrivastava force-pushed the js/dedupe-dynamic-filter-inner-state branch from cb23b01 to 18b0289 Compare March 19, 2026 15:04
theirix and others added 13 commits April 7, 2026 13:16
## Which issue does this PR close?

## Rationale for this change

Spin-off of apache#21383 to have a bench for `First_Value`, `Last_Value`
available before a PR with logic change.

## What changes are included in this PR?

- Add benchmark for `GroupsAccumulator`. It's pretty complicated to test
aggregates with grouping, since many operations are stateful, so I
introduced end-to-end evaluate test (to actually test taking state) and
convert_to_state (as in other benches)
- A bench for a simple `Accumulator`

## Are these changes tested?

- Manual bench run

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…he#21435)

Bumps the all-other-cargo-deps group with 4 updates:
[indexmap](https://github.com/indexmap-rs/indexmap),
[tokio](https://github.com/tokio-rs/tokio),
[libc](https://github.com/rust-lang/libc) and
[semver](https://github.com/dtolnay/semver).

Updates `indexmap` from 2.13.0 to 2.13.1
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md">indexmap's
changelog</a>.</em></p>
<blockquote>
<h2>2.13.1 (2026-04-02)</h2>
<ul>
<li>Made some <code>Slice</code> methods <code>const</code>:
<ul>

<li><code>map::Slice::{first,last,split_at,split_at_checked,split_first,split_last}</code></li>

<li><code>set::Slice::{first,last,split_at,split_at_checked,split_first,split_last}</code></li>
</ul>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/indexmap-rs/indexmap/commit/0b2adfe27714f38d159794678d61d310ac521a1a"><code>0b2adfe</code></a>
Merge pull request <a
href="https://github.com/indexmap-rs/indexmap/issues/434">#434</a>
from cuviper/const-slice</li>
<li><a
href="https://github.com/indexmap-rs/indexmap/commit/afa3cafdc81b0b1168417ca042bb6b54496672a0"><code>afa3caf</code></a>
Release 2.13.1</li>
<li><a
href="https://github.com/indexmap-rs/indexmap/commit/906a7ced0af89814e97c5f780915848577e0e660"><code>906a7ce</code></a>
Make <code>Slice::{first,last,split_*}</code> methods
<code>const</code></li>
<li>See full diff in <a
href="https://github.com/indexmap-rs/indexmap/compare/2.13.0...2.13.1">compare
view</a></li>
</ul>
</details>
<br />

Updates `tokio` from 1.50.0 to 1.51.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tokio-rs/tokio/releases">tokio's
releases</a>.</em></p>
<blockquote>
<h2>Tokio v1.51.0</h2>
<h1>1.51.0 (April 3rd, 2026)</h1>
<h3>Added</h3>
<ul>
<li>net: implement <code>get_peer_cred</code> on Hurd (<a
href="https://github.com/tokio-rs/tokio/issues/7989">#7989</a>)</li>
<li>runtime: add <code>tokio::runtime::worker_index()</code> (<a
href="https://github.com/tokio-rs/tokio/issues/7921">#7921</a>)</li>
<li>runtime: add runtime name (<a
href="https://github.com/tokio-rs/tokio/issues/7924">#7924</a>)</li>
<li>runtime: stabilize <code>LocalRuntime</code> (<a
href="https://github.com/tokio-rs/tokio/issues/7557">#7557</a>)</li>
<li>wasm: add wasm32-wasip2 networking support (<a
href="https://github.com/tokio-rs/tokio/issues/7933">#7933</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li>runtime: steal tasks from the LIFO slot (<a
href="https://github.com/tokio-rs/tokio/issues/7431">#7431</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>docs: do not show &quot;Available on non-loom only.&quot; doc label
(<a
href="https://github.com/tokio-rs/tokio/issues/7977">#7977</a>)</li>
<li>macros: improve overall macro hygiene (<a
href="https://github.com/tokio-rs/tokio/issues/7997">#7997</a>)</li>
<li>sync: fix <code>notify_waiters</code> priority in
<code>Notify</code> (<a
href="https://github.com/tokio-rs/tokio/issues/7996">#7996</a>)</li>
<li>sync: fix panic in <code>Chan::recv_many</code> when called with
non-empty vector on closed channel (<a
href="https://github.com/tokio-rs/tokio/issues/7991">#7991</a>)</li>
</ul>
<p><a
href="https://github.com/tokio-rs/tokio/issues/7431">#7431</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7431">tokio-rs/tokio#7431</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7557">#7557</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7557">tokio-rs/tokio#7557</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7921">#7921</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7921">tokio-rs/tokio#7921</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7924">#7924</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7924">tokio-rs/tokio#7924</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7933">#7933</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7933">tokio-rs/tokio#7933</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7977">#7977</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7977">tokio-rs/tokio#7977</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7989">#7989</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7989">tokio-rs/tokio#7989</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7991">#7991</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7991">tokio-rs/tokio#7991</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7996">#7996</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7996">tokio-rs/tokio#7996</a>
<a
href="https://github.com/tokio-rs/tokio/issues/7997">#7997</a>:
<a
href="https://github.com/tokio-rs/tokio/pull/7997">tokio-rs/tokio#7997</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/tokio-rs/tokio/commit/0af06b7bab12c58161b1d0ae79bdf4452305d42f"><code>0af06b7</code></a>
chore: prepare Tokio v1.51.0 (<a
href="https://github.com/tokio-rs/tokio/issues/8005">#8005</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/01a7f1dfabc93293743701074752ff0d8e787595"><code>01a7f1d</code></a>
chore: prepare tokio-macros v2.7.0 (<a
href="https://github.com/tokio-rs/tokio/issues/8004">#8004</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/eeb55c733ba9a83c51d08b1629dca6a5ec0f4b2b"><code>eeb55c7</code></a>
runtime: steal tasks from the LIFO slot (<a
href="https://github.com/tokio-rs/tokio/issues/7431">#7431</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/1fc450aefba4b05cdff9b7825ca5e39cccb3780e"><code>1fc450a</code></a>
runtime: stabilize <code>LocalRuntime</code> (<a
href="https://github.com/tokio-rs/tokio/issues/7557">#7557</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/324218f9bbdc26e4bb527d036613826824f3078b"><code>324218f</code></a>
Merge tag 'tokio-1.47.4' (<a
href="https://github.com/tokio-rs/tokio/issues/8003">#8003</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/aa65d0d0b8ea6eec80985b9d231390f137493071"><code>aa65d0d</code></a>
chore: prepare Tokio v1.47.4 (<a
href="https://github.com/tokio-rs/tokio/issues/8002">#8002</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/bf18ed452d6aae438e84ae008a01a74776abdc19"><code>bf18ed4</code></a>
sync: fix panic in <code>Chan::recv_many</code> when called with
non-empty vector on clo...</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/43134f1e5784993eb4fb3863933d74ac9e28f598"><code>43134f1</code></a>
wasm: add wasm32-wasip2 networking support (<a
href="https://github.com/tokio-rs/tokio/issues/7933">#7933</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/b4c3246d330379430937bdbb5e1b0c37282ae23e"><code>b4c3246</code></a>
macros: improve overall macro hygiene (<a
href="https://github.com/tokio-rs/tokio/issues/7997">#7997</a>)</li>
<li><a
href="https://github.com/tokio-rs/tokio/commit/7947fa4bd79d7345aa7e6b189fc1fbb6983a4351"><code>7947fa4</code></a>
rt: add runtime name (<a
href="https://github.com/tokio-rs/tokio/issues/7924">#7924</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tokio-rs/tokio/compare/tokio-1.50.0...tokio-1.51.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `libc` from 0.2.183 to 0.2.184
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/rust-lang/libc/releases">libc's
releases</a>.</em></p>
<blockquote>
<h2>0.2.184</h2>
<h3>MSRV</h3>
<p>This release increases the MSRV of <code>libc</code> to 1.65. With
this update, you can now always use the
<code>core::ffi::c_*</code> types with <code>libc</code> definitions,
since <code>libc</code> has been changed to reexport from
<code>core</code> rather than redefining them. (This <em>usually</em>
worked before but had edge cases.)
(<a
href="https://github.com/rust-lang/libc/pull/4972">#4972</a>)</p>
<h3>Added</h3>
<ul>
<li>BSD: Add <code>IP_MINTTL</code> to bsd (<a
href="https://github.com/rust-lang/libc/pull/5026">#5026</a>)</li>
<li>Cygwin: Add <code>TIOCM_DSR</code> (<a
href="https://github.com/rust-lang/libc/pull/5031">#5031</a>)</li>
<li>FreeBSD: Added <code>xfile</code> structe and file descriptor types
(<a
href="https://github.com/rust-lang/libc/pull/5002">#5002</a>)</li>
<li>Linux: Add CAN netlink bindings (<a
href="https://github.com/rust-lang/libc/pull/5011">#5011</a>)</li>
<li>Linux: Add <code>struct ethhdr</code> (<a
href="https://github.com/rust-lang/libc/pull/4239">#4239</a>)</li>
<li>Linux: Add <code>struct ifinfomsg</code> (<a
href="https://github.com/rust-lang/libc/pull/5012">#5012</a>)</li>
<li>Linux: Define <code>max_align_t</code> for riscv64 (<a
href="https://github.com/rust-lang/libc/pull/5029">#5029</a>)</li>
<li>NetBSD: Add missing <code>CLOCK_</code> constants (<a
href="https://github.com/rust-lang/libc/pull/5020">#5020</a>)</li>
<li>NuttX: Add <code>_SC_HOST_NAME_MAX</code> (<a
href="https://github.com/rust-lang/libc/pull/5004">#5004</a>)</li>
<li>VxWorks: Add <code>flock</code> and <code>F_*LCK</code> constants
(<a
href="https://github.com/rust-lang/libc/pull/4043">#4043</a>)</li>
<li>WASI: Add all <code>_SC_*</code> sysconf constants (<a
href="https://github.com/rust-lang/libc/pull/5023">#5023</a>)</li>
</ul>
<h3>Deprecated</h3>
<p>The remaining fixed-width integer aliases, <code>__uint128_t</code>,
<code>__uint128</code>, <code>__int128_t</code>, and
<code>__int128</code>,
have been deprecated. Use <code>i128</code> and <code>u128</code>
instead. (<a
href="https://github.com/rust-lang/libc/pull/4343">#4343</a>)</p>
<h3>Fixed</h3>
<ul>
<li><strong>breaking</strong> Redox: Fix signal action constant types
(<a
href="https://github.com/rust-lang/libc/pull/5009">#5009</a>)</li>
<li>EspIDF: Correct the value of <code>DT_*</code> constants (<a
href="https://github.com/rust-lang/libc/pull/5034">#5034</a>)</li>
<li>Redox: Fix locale values and add <code>RTLD_NOLOAD</code>, some TCP
constants (<a
href="https://github.com/rust-lang/libc/pull/5025">#5025</a>)</li>
<li>Various: Use <code>Padding::new(&lt;zeroed&gt;)</code> rather than
<code>Padding::uninit()</code> (<a
href="https://github.com/rust-lang/libc/pull/5036">#5036</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li><strong>potentially breaking</strong> Linux: Add new fields to
<code>struct ptrace_syscall_info</code> (<a
href="https://github.com/rust-lang/libc/pull/4966">#4966</a>)</li>
<li>Re-export <code>core::ffi</code> integer types rather than
redefining (<a
href="https://github.com/rust-lang/libc/pull/5015">#5015</a>)</li>
<li>Redox: Update <code>F_DUPFD</code>, <code>IP</code>, and
<code>TCP</code> constants to match relibc (<a
href="https://github.com/rust-lang/libc/pull/4990">#4990</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-lang/libc/blob/0.2.184/CHANGELOG.md">libc's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/rust-lang/libc/compare/0.2.183...0.2.184">0.2.184</a>
- 2026-04-01</h2>
<h3>MSRV</h3>
<p>This release increases the MSRV of <code>libc</code> to 1.65. With
this update, you can now always use the
<code>core::ffi::c_*</code> types with <code>libc</code> definitions,
since <code>libc</code> has been changed to reexport from
<code>core</code> rather than redefining them. (This <em>usually</em>
worked before but had edge cases.)
(<a
href="https://github.com/rust-lang/libc/pull/4972">#4972</a>)</p>
<h3>Added</h3>
<ul>
<li>BSD: Add <code>IP_MINTTL</code> to bsd (<a
href="https://github.com/rust-lang/libc/pull/5026">#5026</a>)</li>
<li>Cygwin: Add <code>TIOCM_DSR</code> (<a
href="https://github.com/rust-lang/libc/pull/5031">#5031</a>)</li>
<li>FreeBSD: Added <code>xfile</code> structe and file descriptor types
(<a
href="https://github.com/rust-lang/libc/pull/5002">#5002</a>)</li>
<li>Linux: Add CAN netlink bindings (<a
href="https://github.com/rust-lang/libc/pull/5011">#5011</a>)</li>
<li>Linux: Add <code>struct ethhdr</code> (<a
href="https://github.com/rust-lang/libc/pull/4239">#4239</a>)</li>
<li>Linux: Add <code>struct ifinfomsg</code> (<a
href="https://github.com/rust-lang/libc/pull/5012">#5012</a>)</li>
<li>Linux: Define <code>max_align_t</code> for riscv64 (<a
href="https://github.com/rust-lang/libc/pull/5029">#5029</a>)</li>
<li>NetBSD: Add missing <code>CLOCK_</code> constants (<a
href="https://github.com/rust-lang/libc/pull/5020">#5020</a>)</li>
<li>NuttX: Add <code>_SC_HOST_NAME_MAX</code> (<a
href="https://github.com/rust-lang/libc/pull/5004">#5004</a>)</li>
<li>VxWorks: Add <code>flock</code> and <code>F_*LCK</code> constants
(<a
href="https://github.com/rust-lang/libc/pull/4043">#4043</a>)</li>
<li>WASI: Add all <code>_SC_*</code> sysconf constants (<a
href="https://github.com/rust-lang/libc/pull/5023">#5023</a>)</li>
</ul>
<h3>Deprecated</h3>
<p>The remaining fixed-width integer aliases, <code>__uint128_t</code>,
<code>__uint128</code>, <code>__int128_t</code>, and
<code>__int128</code>,
have been deprecated. Use <code>i128</code> and <code>u128</code>
instead. (<a
href="https://github.com/rust-lang/libc/pull/4343">#4343</a>)</p>
<h3>Fixed</h3>
<ul>
<li><strong>breaking</strong> Redox: Fix signal action constant types
(<a
href="https://github.com/rust-lang/libc/pull/5009">#5009</a>)</li>
<li>EspIDF: Correct the value of <code>DT_*</code> constants (<a
href="https://github.com/rust-lang/libc/pull/5034">#5034</a>)</li>
<li>Redox: Fix locale values and add <code>RTLD_NOLOAD</code>, some TCP
constants (<a
href="https://github.com/rust-lang/libc/pull/5025">#5025</a>)</li>
<li>Various: Use <code>Padding::new(&lt;zeroed&gt;)</code> rather than
<code>Padding::uninit()</code> (<a
href="https://github.com/rust-lang/libc/pull/5036">#5036</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li><strong>potentially breaking</strong> Linux: Add new fields to
<code>struct ptrace_syscall_info</code> (<a
href="https://github.com/rust-lang/libc/pull/4966">#4966</a>)</li>
<li>Re-export <code>core::ffi</code> integer types rather than
redefining (<a
href="https://github.com/rust-lang/libc/pull/5015">#5015</a>)</li>
<li>Redox: Update <code>F_DUPFD</code>, <code>IP</code>, and
<code>TCP</code> constants to match relibc (<a
href="https://github.com/rust-lang/libc/pull/4990">#4990</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/rust-lang/libc/commit/b1fd610c7eb6026c108f318874283525871b0e77"><code>b1fd610</code></a>
chore: Release libc 0.2.184</li>
<li><a
href="https://github.com/rust-lang/libc/commit/f596819d7c309f9de20ace14532d37d94ae48380"><code>f596819</code></a>
ci: Don't enforce cargo-semver-checks</li>
<li><a
href="https://github.com/rust-lang/libc/commit/4645f60c3a289aaf7d7fe08e2de66a1acd63a97c"><code>4645f60</code></a>
linux: update ptrace_syscall_info struct</li>
<li><a
href="https://github.com/rust-lang/libc/commit/14cbbec35360179b68947183d3ba618fa78acba2"><code>14cbbec</code></a>
types: Remove <code>Padding::uninit</code></li>
<li><a
href="https://github.com/rust-lang/libc/commit/b5dcda885fbf89e39e6a8fb80ee46f90284a6d4a"><code>b5dcda8</code></a>
pthread: Use <code>Padding::new(\&lt;zeroed&gt;)</code> rather than
<code>Padding::uninit()</code></li>
<li><a
href="https://github.com/rust-lang/libc/commit/bbb1c5d350e010760c4ebdbc2bb499b2e0faff76"><code>bbb1c5d</code></a>
types: Add a <code>new</code> function to <code>Padding</code></li>
<li><a
href="https://github.com/rust-lang/libc/commit/df06e43309c93a6dc5ea210d72f0284d945c7d61"><code>df06e43</code></a>
Fix locale values and add RTLD_NOLOAD, some TCP constants</li>
<li><a
href="https://github.com/rust-lang/libc/commit/078f5c6b3c7c3a51deba2c52c3d00b93cbb48557"><code>078f5c6</code></a>
newlib/espidf: Move DT_* to espidf/mod.rs</li>
<li><a
href="https://github.com/rust-lang/libc/commit/d32b83db3c0e078e0a8b094d9dfbd41f87c7a20f"><code>d32b83d</code></a>
Add IP_MINTTL to bsd</li>
<li><a
href="https://github.com/rust-lang/libc/commit/939e0ec2a8c3234424286719405cb708e9b8062b"><code>939e0ec</code></a>
Define max_align_t for riscv64-linux</li>
<li>Additional commits viewable in <a
href="https://github.com/rust-lang/libc/compare/0.2.183...0.2.184">compare
view</a></li>
</ul>
</details>
<br />

Updates `semver` from 1.0.27 to 1.0.28
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/semver/releases">semver's
releases</a>.</em></p>
<blockquote>
<h2>1.0.28</h2>
<ul>
<li>Documentation improvements</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/dtolnay/semver/commit/7625c7aa3f0e8ba21e099d1765bcebcb72aa8816"><code>7625c7a</code></a>
Release 1.0.28</li>
<li><a
href="https://github.com/dtolnay/semver/commit/fd404d082c2666b3df87c6229b85201a8533adda"><code>fd404d0</code></a>
Merge pull request 351 from czy-29/master</li>
<li><a
href="https://github.com/dtolnay/semver/commit/f75f26e98469c637ebb45baaa9c9694fc235f80b"><code>f75f26e</code></a>
The <code>doc_auto_cfg</code> and <code>doc_cfg</code> features have
been merged</li>
<li><a
href="https://github.com/dtolnay/semver/commit/9e2bfa2ec874e1d9fc1abe7b109dd212a6fd85c2"><code>9e2bfa2</code></a>
Enable <code>serde</code> on <code>docs.rs</code> and automatically add
<code>serde</code> flag to the docs</li>
<li><a
href="https://github.com/dtolnay/semver/commit/8591f2344b52b31d85b538de58b76a676fe9ff90"><code>8591f23</code></a>
Unpin CI miri toolchain</li>
<li><a
href="https://github.com/dtolnay/semver/commit/66bdd2ce5fb40d435677a03aaaaa60c569e8932c"><code>66bdd2c</code></a>
Pin CI miri to nightly-2026-02-11</li>
<li><a
href="https://github.com/dtolnay/semver/commit/324ffce5d914778062136c9744ffdf53523c9fa2"><code>324ffce</code></a>
Switch from cargo bench to criterion</li>
<li><a
href="https://github.com/dtolnay/semver/commit/34133a568a2fd0d9f10ef45bbf12d280e795c03e"><code>34133a5</code></a>
Update actions/upload-artifact@v5 -&gt; v6</li>
<li><a
href="https://github.com/dtolnay/semver/commit/7f935ffc7235e20864e7cba882077c9d8ad65f7c"><code>7f935ff</code></a>
Update actions/upload-artifact@v4 -&gt; v5</li>
<li><a
href="https://github.com/dtolnay/semver/commit/c07fb913535b7f12d4780fbcc9fef0e0bb6fc836"><code>c07fb91</code></a>
Switch from test::black_box to std::hint::black_box</li>
<li>Additional commits viewable in <a
href="https://github.com/dtolnay/semver/compare/1.0.27...1.0.28">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…e#21434)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.70.3 to 2.74.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.74.0</h2>
<ul>
<li>
<p>Support <code>cargo-deb</code>. (<a
href="https://github.com/taiki-e/install-action/pull/1669">#1669</a>)</p>
</li>
<li>
<p>Update <code>just@latest</code> to 1.49.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.4.</p>
</li>
</ul>
<h2>2.73.0</h2>
<ul>
<li>
<p>Introduce <a
href="https://blog.yossarian.net/2025/11/21/We-should-all-be-using-dependency-cooldowns">dependency
cooldown</a> when installing with
<code>taiki-e/install-action@&lt;tool_name&gt;</code>, <code>tool:
&lt;tool_name&gt;@latest</code>, or <code>tool:
&lt;tool_name&gt;@&lt;omitted_version&gt;</code> to mitigate the risk of
supply chain attacks by default. (<a
href="https://github.com/taiki-e/install-action/pull/1666">#1666</a>)</p>
<p>This action without this cooldown already takes a few hours to a few
days for new releases to be reflected (as with other common package
managers that verify checksums or signatures), so this should not affect
most users.</p>
<p>See the <a
href="https://github.com/taiki-e/install-action#security">&quot;Security&quot;
section in readme</a> for more details.</p>
</li>
<li>
<p>Improve robustness for network failure.</p>
</li>
<li>
<p>Documentation improvements.</p>
</li>
</ul>
<h2>2.72.0</h2>
<ul>
<li>
<p>Support <code>cargo-xwin</code>. (<a
href="https://github.com/taiki-e/install-action/pull/1659">#1659</a>,
thanks <a
href="https://github.com/daxpedda"><code>@​daxpedda</code></a>)</p>
</li>
<li>
<p>Support trailing comma in <code>tool</code> input option.</p>
</li>
<li>
<p>Update <code>tombi@latest</code> to 0.9.14.</p>
</li>
</ul>
<h2>2.71.3</h2>
<ul>
<li>
<p>Update <code>wasm-tools@latest</code> to 1.246.2.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.3.</p>
</li>
</ul>
<h2>2.71.2</h2>
<ul>
<li>
<p>Implement workaround for <a
href="https://github.com/actions/partner-runner-images/issues/169">windows-11-arm
runner bug</a> which sometimes causes installation failure. (<a
href="https://github.com/taiki-e/install-action/pull/1657">#1657</a>)</p>
<p>This addresses an issue that was attempted to be worked around in
2.71.0 but was insufficient.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.1.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.11.3.</p>
</li>
</ul>
<h2>2.71.1</h2>
<ul>
<li>
<p>Fix a regression that caused an execution policy violation on
self-hosted Windows runner due to use of non-default
<code>powershell</code> shell, introduced in 2.71.0.</p>
</li>
<li>
<p>Update <code>dprint@latest</code> to 0.53.2.</p>
</li>
</ul>
<h2>2.71.0</h2>
<ul>
<li>
<p>Support <code>wasm-tools</code>. (<a
href="https://github.com/taiki-e/install-action/pull/1642">#1642</a>,
thanks <a
href="https://github.com/crepererum"><code>@​crepererum</code></a>)</p>
</li>
<li>
<p>Support <code>covgate</code>. (<a
href="https://github.com/taiki-e/install-action/pull/1613">#1613</a>,
thanks <a
href="https://github.com/jesse-black"><code>@​jesse-black</code></a>)</p>
</li>
<li>
<p>Implement potential workaround for <a
href="https://github.com/actions/partner-runner-images/issues/169">windows-11-arm
runner bug</a> which sometimes causes issue that the action successfully
completes but the tool is not installed. (<a
href="https://github.com/taiki-e/install-action/pull/1647">#1647</a>)</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<ul>
<li>Update <code>tombi@latest</code> to 0.9.15.</li>
</ul>
<h2>[2.74.0] - 2026-04-06</h2>
<ul>
<li>
<p>Support <code>cargo-deb</code>. (<a
href="https://github.com/taiki-e/install-action/pull/1669">#1669</a>)</p>
</li>
<li>
<p>Update <code>just@latest</code> to 1.49.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.4.</p>
</li>
</ul>
<h2>[2.73.0] - 2026-04-05</h2>
<ul>
<li>
<p>Introduce <a
href="https://blog.yossarian.net/2025/11/21/We-should-all-be-using-dependency-cooldowns">dependency
cooldown</a> when installing with
<code>taiki-e/install-action@&lt;tool_name&gt;</code>, <code>tool:
&lt;tool_name&gt;@latest</code>, or <code>tool:
&lt;tool_name&gt;@&lt;omitted_version&gt;</code> to mitigate the risk of
supply chain attacks by default. (<a
href="https://github.com/taiki-e/install-action/pull/1666">#1666</a>)</p>
<p>This action without this cooldown already takes a few hours to a few
days for new releases to be reflected (as with other common package
managers that verify checksums or signatures), so this should not affect
most users.</p>
<p>See the <a
href="https://github.com/taiki-e/install-action#security">&quot;Security&quot;
section in readme</a> for more details.</p>
</li>
<li>
<p>Improve robustness for network failure.</p>
</li>
<li>
<p>Documentation improvements.</p>
</li>
</ul>
<h2>[2.72.0] - 2026-04-04</h2>
<ul>
<li>
<p>Support <code>cargo-xwin</code>. (<a
href="https://github.com/taiki-e/install-action/pull/1659">#1659</a>,
thanks <a
href="https://github.com/daxpedda"><code>@​daxpedda</code></a>)</p>
</li>
<li>
<p>Support trailing comma in <code>tool</code> input option.</p>
</li>
<li>
<p>Update <code>tombi@latest</code> to 0.9.14.</p>
</li>
</ul>
<h2>[2.71.3] - 2026-04-04</h2>
<ul>
<li>
<p>Update <code>wasm-tools@latest</code> to 1.246.2.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2026.4.3.</p>
</li>
</ul>
<h2>[2.71.2] - 2026-04-02</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/94cb46f8d6e437890146ffbd78a778b78e623fb2"><code>94cb46f</code></a>
Release 2.74.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/7fef44e1953572bcd24693fc866ad446fb1b4057"><code>7fef44e</code></a>
Update changelog</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/3bf2282bfd15630bbf9543653d4132bc64c9ca89"><code>3bf2282</code></a>
Update mise manifest</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/223b1d599eeacab3f4361624d257a1d50a152a7c"><code>223b1d5</code></a>
Update tombi manifest</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/fdcd834b4f2d5c0d663395c561633bbe19ecb08d"><code>fdcd834</code></a>
Update <code>just@latest</code> to 1.49.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/b45e8d6c436517e3d00a29c621a3534a176e4706"><code>b45e8d6</code></a>
Update <code>mise@latest</code> to 2026.4.4</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/4eac87a84609e7a285bcfd82df34e948017a9fcb"><code>4eac87a</code></a>
ci: Update config</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/5b413367489ec0bfe059fd6482a23cc544ed613e"><code>5b41336</code></a>
Add issue template</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/55a981690b2670493d925900a2569e5065371d31"><code>55a9816</code></a>
Support cargo-deb</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/7a562dfa955aa2e4d5b0fd6ebd57ff9715c07b0b"><code>7a562df</code></a>
Release 2.73.0</li>
<li>Additional commits viewable in <a
href="https://github.com/taiki-e/install-action/compare/6ef672efc2b5aabc787a9e94baf4989aa02a97df...94cb46f8d6e437890146ffbd78a778b78e623fb2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.70.3&new-version=2.74.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?
- Closes apache#21354.

## Rationale for this change
Currently, DataFusion supports 9 `datafusion.format.*` configs but their
test coverage seem to be missed so this issue aims to add comprehensive
test coverage for them. This is follow-up to recent `config framework`
improvements: apache#20372 and
apache#20816.

## What changes are included in this PR?
New test coverage is being added for `datafusion.format.*` configs.

## Are these changes tested?
Yes, new test coverage is being added for `datafusion.format.*` configs.

## Are there any user-facing changes?
No
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

This is an alternative approach to
- apache#19687

Instead of reading the entire range in the json FileOpener, implement an
AlignedBoundaryStream which scans the range for newlines as the
FileStream
requests data from the stream, by wrapping the original stream returned
by the
ObjectStore.

This eliminated the overhead of the extra two get_opts requests needed
by
calculate_range and more importantly, it allows for efficient read-ahead
implementations by the underlying ObjectStore. Previously this was
inefficient
because the streams opened by calculate_range included a stream from 
`(start - 1)` to file_size and another one from `(end - 1)` to
end_of_file, just to
find the two relevant newlines.


## What changes are included in this PR?
Added the AlignedBoundaryStream which wraps a stream returned by the
object
store and finds the delimiting newlines for a particular file range.
Notably it doesn't
do any standalone reads (unlike the calculate_range function),
eliminating two calls
to get_opts.

## Are these changes tested?
Yes, added unit tests.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…20926)

## Which issue does this PR close?
Part of apache#20766

## Rationale for this change
Grouped aggregations currently estimate output rows as input_rows,
ignoring available NDV statistics. Spark's AggregateEstimation and
Trino's AggregationStatsRule both use NDV products to tighten this
estimate. This PR is highly referenced by both.


- [Spark
reference](https://github.com/apache/spark/blob/e8d8e6a8d040d26aae9571e968e0c64bda0875dc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala#L38-L61)
- [Trino
reference](https://github.com/trinodb/trino/blob/43c8c3ba8bff814697c5926149ce13b9532f030b/core/trino-main/src/main/java/io/trino/cost/AggregationStatsRule.java#L92-L101)

## What changes are included in this PR?
- Estimate aggregate output rows as min(input_rows, product(NDV_i +
null_adj_i) * grouping_sets)
- Cap by Top K limit when active since output row cannot be higher than
K
- Propagate distinct_count from child stats to group-by output columns

## Are these changes tested?
Yes existing and new tests that cover different scenarios and edge cases


## Are there any user-facing changes?
No
…e#21218)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#21217

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Adds `ScalarUDFImpl::struct_field_mapping`
- Adds logic in `ProjectionMapping` to decompose struct-producing
functions into their field-level mapping entries so that orderings
propagate through struct projections
- Adds unit tests/SLT

## Are these changes tested?

Yes.

## Are there any user-facing changes?

N/A

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18816 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
In `UserDefinedLogicalNodeCore`, the default implementation of
`necessary_children_exprs ` returns `None`, which signals to the
optimizer that it cannot determine which columns are required from the
child.

The optimizer takes a conservative approach and skips projection pruning
for that node, leading to complex and redundant plans in the subtree.
However, it would make more sense to assume all columns are required and
let the optimizer proceed, rather than giving up on the entire subtree
entirely.

## What changes are included in this PR?

```rust
LogicalPlan::Extension(extension) => {
    if let Some(necessary_children_indices) =
        extension.node.necessary_children_exprs(indices.indices())
    {
        ...
    } else {
        // Requirements from parent cannot be routed down to user defined logical plan safely
        // Assume it requires all input exprs here
        plan.inputs()
            .into_iter()
            .map(RequiredIndices::new_for_all_exprs)
            .collect()
    }
}
```

instead of 


https://github.com/apache/datafusion/blob/b6d46a63824f003117297848d8d83b659ac2e759/datafusion/optimizer/src/optimize_projections/mod.rs#L331-L337
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes.

In addition to unit tests, I've also added a complete end-to-end
integration test that reproduces the full scenario in the issue. This
might seem redundant, bloated, or even unnecessary. Please let me know
if I should remove these tests.

An existing test is modified, but I think the newer behavior is
expected.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

Yes. But I think the new implementation is the expected behavior.
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…e#21436)

## Which issue does this PR close?

- Closes #.

## Rationale for this change

`Fix sort merge interleave overflow` (apache#20922) added a temporary
`catch_unwind` shim around Arrow's `interleave` call because the
upstream implementation still panicked on offset overflow at the time.

Arrow 58.1.0 includes apache/arrow-rs#9549, which returns
`ArrowError::OffsetOverflowError` directly instead of panicking.
DataFusion main now depends on that release, so the panic recovery path
is no longer needed and only broadens the set of failures we might
accidentally treat as recoverable.

## What changes are included in this PR?

- Remove the temporary panic-catching wrapper from
  `BatchBuilder::try_interleave_columns`.
- Keep the existing retry logic, but trigger it only from the returned
  `OffsetOverflowError`.
- Replace the panic-specific unit tests with a direct error-shape
assertion.

## Are these changes tested?

Yes.

- `cargo test -p datafusion-physical-plan sorts::builder -- --nocapture`
- `cargo test -p datafusion-physical-plan sorts:: -- --nocapture`
- `./dev/rust_lint.sh`

## Are there any user-facing changes?

No.
apache#21099)

When the SQL unparser encountered a SubqueryAlias node whose direct
child was an Aggregate (or other clause-building plan like Window, Sort,
Limit, Union), it would flatten the subquery into a simple table alias,
losing the aggregate entirely.

For example, a plan representing:
SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id =
b.m

would unparse to:
  SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps
Projection children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful
for an Aggregate child, the code recursed directly into the Aggregate,
merging its GROUP BY into the outer SELECT instead of wrapping it in a
derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a
plan type that builds its own SELECT clauses (Aggregate, Window, Sort,
Limit, Union), emit it as a derived subquery via self.derive() with the
alias always attached, rather than falling through to the recursive path
that would flatten it.

## Which issue does this PR close?
- Closes apache#21098 

## Rationale for this change

The SQL unparser silently drops subquery structure when a SubqueryAlias
node directly wraps an Aggregate (or Window, Sort, Limit, Union). For
example, a plan representing
```sql 
SELECT ... FROM j1 JOIN (SELECT max(id) FROM j2) AS b ...
``` 
unparses to 
```sql
SELECT ... FROM j1 JOIN j2 AS b ...
```
losing the aggregate entirely. This produces semantically incorrect SQL.

## What changes are included in this PR?

In the SubqueryAlias handler within select_to_sql_recursively
(`datafusion/sql/src/unparser/plan.rs`):
- Added an early check: if the SubqueryAlias's direct child is a plan
type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit,
Union) and cannot be reduced to a table scan, emit it as a derived
subquery (SELECT ...) AS alias via self.derive() instead of recursing
into the child and flattening it.
- Added a helper requires_derived_subquery() that identifies plan types
requiring this treatment.

## Are these changes tested?

Yes. A new test test_unparse_manual_join_with_subquery_aggregate is
added that constructs a SubqueryAlias > Aggregate plan (without an
intermediate Projection) and asserts the unparsed SQL preserves the
MAX() aggregate function call. This test fails without the fix. All
current unparser tests succeed without modification

## Are there any user-facing changes?

No.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Which issue does this PR close?

- Closes apache#21410.

## Rationale for this change

When `split_part` is invoked with a `StringViewArray`, we can avoid
copying when constructing the return value by instead returning pointers
into the view buffers of the input `StringViewArray`.

This PR only applies this optimization to the code path for scalar
`delimiter` and `position`, because that's the most common usage mode in
practice. We could also optimize the array-args case but it didn't seem
worth the extra code.

Benchmarks (M4 Max):

  - scalar_utf8view_very_long_parts/pos_first: 102 µs → 68 µs (-33%)
  - scalar_utf8view_long_parts/pos_middle: 164 µs → 137 µs (-15%)
  - scalar_utf8_single_char/pos_first: 42.5 µs → 42.9 µs (no change)
  - scalar_utf8_single_char/pos_middle: 96.5 µs → 99.5 µs (noise)
  - scalar_utf8_single_char/pos_negative: 48.3 µs → 48.6 µs (no change)
  - scalar_utf8_multi_char/pos_middle: 132 µs → 132 µs (no change)
  - scalar_utf8_long_strings/pos_middle: 1.06 ms → 1.08 ms (noise)
  - array_utf8_single_char/pos_middle: 355 µs → 365 µs (noise)
  - array_utf8_multi_char/pos_middle: 357 µs → 360 µs (no change)

## What changes are included in this PR?

* Implement optimization
* Add benchmark that covers this case
* Improve SLT test coverage for this code path

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.
)

## Which issue does this PR close?
This attempts to bridge the missing test coverage mentioned by @alamb on
this issue apache#8791
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
The changes are test
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
no
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…apache#21460)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#21459 

## Rationale for this change

When a `ProjectionExec` sits on top of a `FilterExec` that already
carries an explicit projection, the `ProjectionPushdown` optimizer
attempts to swap them via `try_swapping_with_projection`. The swap
replaces the `FilterExec's` input with the narrower `ProjectionExec`,
but `FilterExecBuilder::from(self)` carried over the old projection
indices (e.g. [0, 1, 2]). After the swap the new input only has the
columns selected by the `ProjectionExec` (e.g. 2 columns), so .build()
tries to validate the stale projection against the narrower schema and
panics with "project index 2 out of bounds, max field 2".

## What changes are included in this PR?

In `FilterExec::try_swapping_with_projection`, after replacing the input
with the narrower ProjectionExec, clear the FilterExec's own projection
via .`apply_projection(None)`. The ProjectionExec that is now the input
already handles column selection, so the FilterExec no longer needs its
own projection.



## Are these changes tested?

yes, add test case

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
blaginin and others added 18 commits April 29, 2026 21:52
- Closes apache#21938

See
apache#21938 (comment)

I feel like this is quite a useful check - and it's relatively small -
let's run it always?
…protobuf (apache#21913)

## Which issue does this PR close?

- apache#21911

## Rationale for this change

The breaking-change detector added in apache#21499 fails on fork PRs with HTTP
403:

> The GITHUB_TOKEN has read-only permissions in pull requests from
forked repositories.
>
> From [GitHub
Docs](https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request)

A read-only token can't post the sticky comment, so the workflow errors
out at the `gh api … POST /comments` call.

We can't switch to `pull_request_target` either - ASF infra policy
forbids it for any workflow exposing `GITHUB_TOKEN`
(https://infra.apache.org/github-actions-policy.html), and
`cargo-semver-checks` compiles fork-controlled code (`build.rs`, proc
macros) anyway, so granting it a write token would be unsafe.

## What changes are included in this PR?

Split the comment posting into a companion `workflow_run` workflow:

- `breaking_changes_detector.yml` keeps the `pull_request` trigger but
only stages the result (`pr_number`, `result`, `logs`) and uploads it as
an artifact. No write token, no comment posting from this workflow.
- `breaking_changes_detector_comment.yml` triggers on `workflow_run`,
runs in the base-repo context with `pull-requests: write`, downloads the
artifact, validates the inputs, and upserts/deletes the sticky comment
via `actions-cool/maintain-one-comment`. Never checks out PR code.

The comment workflow uses a runtime-randomized heredoc delimiter when
piping the fork-controlled logs into `$GITHUB_OUTPUT`, to stop log
content from closing the heredoc early and overwriting the validated
`result` output (or injecting other keys).

Drops the now-unused `comment` subcommand from
`ci/scripts/changed_crates.sh`.

----

also installed protobuf as noticed this failed when building subtrait
in:
- apache#15591

## Are these changes tested?

No, cant really test it

## Are there any user-facing changes?

No

---------

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
…#21854)

Add a DataFusion-side trait that abstracts over the bulk-NULL string
array builders (GenericStringArrayBuilder<O> and
StringViewArrayBuilder), so that functions which dispatch over
Utf8/LargeUtf8/Utf8View can adopt the new builders without giving up
their single-bodied generic implementation.

Convert `repeat` as the first call site. The output is null iff either
input is null, so the per-row null match becomes a single
NullBuffer::union over the input null buffers, evaluated once before the
loop.

Also mark the inherent append_value/append_placeholder methods on the
new builders as #[inline]; without this, calls through the trait wrapper
end up going through a non-inlined inherent and slow down small-output
paths.

## Which issue does this PR close?

- Closes apache#21853.

## Rationale for this change

Optimize NULL handling in `repeat` using the bulk-NULL string builders
that have recently been added. This requires adding
`BulkNullStringArrayBuilder`, a trait that is similar in spirit to
Arrow's `StringLikeArrayBuilder`.

Benchmarks:

- repeat_string overflow [size=1024, repeat_times=1073741824]: 1022.5ns
→ 1054.5ns (+3.13%)
- repeat_string overflow [size=4096, repeat_times=1073741824]: 1016.6ns
→ 1055.3ns (+3.81%)
- repeat_large_string [size=1024, repeat_times=3]: 32.4µs → 26.6µs
(−17.90%)
- repeat_large_string [size=4096, repeat_times=3]: 127.4µs → 104.0µs
(−18.37%)
  - repeat_string [size=1024, repeat_times=3]: 32.6µs → 26.8µs (−17.79%)
- repeat_string [size=4096, repeat_times=3]: 127.4µs → 105.5µs (−17.19%)
- repeat_string_view [size=1024, repeat_times=3]: 37.3µs → 31.7µs
(−15.01%)
- repeat_string_view [size=4096, repeat_times=3]: 146.5µs → 124.5µs
(−15.02%)
- repeat_large_string [size=1024, repeat_times=30]: 82.0µs → 80.4µs
(−1.95%)
- repeat_large_string [size=4096, repeat_times=30]: 344.2µs → 338.7µs
(−1.60%)
  - repeat_string [size=1024, repeat_times=30]: 81.7µs → 79.7µs (−2.45%)
- repeat_string [size=4096, repeat_times=30]: 352.2µs → 334.7µs (−4.97%)
- repeat_string_view [size=1024, repeat_times=30]: 88.1µs → 83.1µs
(−5.68%)
- repeat_string_view [size=4096, repeat_times=30]: 368.8µs → 342.6µs
(−7.10%)
  - repeat/scalar_utf8: 174.7ns → 179.2ns (+2.58%)
  - repeat/scalar_utf8view: 174.5ns → 180.5ns (+3.44%)

## What changes are included in this PR?

* Add `BulkNullStringArrayBuilder`
* Optimize `repeat` using `BulkNullStringArrayBuilder`
* Inline some functions in GenericStringBuilder; benchmarking suggests
this is a win

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#20266


<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
Created a new function to insert the separator (,) into the numbers if
the flag is enabled
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Tests passed locally 
Added unit tests to verify functionality
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
No changes to public api

## Additional Info
Claude was used to assist in identifying the source of the issue.
…che#21900)

## Which issue does this PR close?

- Closes apache#21843.

## Rationale for this change

Performance improvement on large hash-repartitions.

TPC-H at bigger scale factor shows biggest benefit:

<details>

```
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃ perf-strength-reduce-hash-partition ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │  327.64 / 329.66 ±1.44 / 331.84 ms │   327.82 / 330.00 ±1.68 / 331.48 ms │     no change │
│ QQuery 2  │  131.35 / 138.32 ±3.91 / 143.24 ms │   125.94 / 126.49 ±0.61 / 127.42 ms │ +1.09x faster │
│ QQuery 3  │  286.47 / 300.24 ±8.19 / 308.99 ms │   273.76 / 276.01 ±1.80 / 277.98 ms │ +1.09x faster │
│ QQuery 4  │  158.30 / 160.81 ±2.21 / 163.54 ms │   137.56 / 138.79 ±0.98 / 139.80 ms │ +1.16x faster │
│ QQuery 5  │  428.90 / 437.68 ±4.45 / 440.83 ms │   390.52 / 396.51 ±4.30 / 403.67 ms │ +1.10x faster │
│ QQuery 6  │  131.88 / 132.83 ±1.17 / 134.81 ms │   133.06 / 134.70 ±1.22 / 135.83 ms │     no change │
│ QQuery 7  │  541.09 / 545.88 ±4.21 / 552.67 ms │  508.51 / 531.75 ±16.82 / 548.21 ms │     no change │
│ QQuery 8  │  467.86 / 476.44 ±6.95 / 483.87 ms │   427.19 / 439.03 ±9.88 / 453.56 ms │ +1.09x faster │
│ QQuery 9  │ 649.16 / 660.07 ±10.12 / 676.72 ms │   605.25 / 611.87 ±5.72 / 620.70 ms │ +1.08x faster │
│ QQuery 10 │  327.64 / 339.90 ±6.92 / 348.85 ms │   321.66 / 330.67 ±4.76 / 334.89 ms │     no change │
│ QQuery 11 │  104.93 / 107.54 ±1.71 / 110.18 ms │   92.80 / 101.35 ±12.27 / 125.63 ms │ +1.06x faster │
│ QQuery 12 │  198.96 / 202.37 ±2.45 / 206.21 ms │   195.07 / 197.77 ±4.26 / 206.26 ms │     no change │
│ QQuery 13 │  300.44 / 312.37 ±6.87 / 321.90 ms │  291.85 / 308.47 ±10.23 / 317.55 ms │     no change │
│ QQuery 14 │  188.06 / 193.71 ±4.81 / 200.69 ms │   182.89 / 186.72 ±3.67 / 192.75 ms │     no change │
│ QQuery 15 │  334.88 / 339.95 ±5.81 / 350.78 ms │   330.79 / 336.21 ±4.31 / 342.71 ms │     no change │
│ QQuery 16 │     78.38 / 81.25 ±2.51 / 84.55 ms │      74.35 / 76.61 ±2.80 / 81.94 ms │ +1.06x faster │
│ QQuery 17 │ 744.08 / 761.70 ±12.84 / 781.69 ms │  703.40 / 724.66 ±23.35 / 770.05 ms │     no change │
│ QQuery 18 │ 760.17 / 782.23 ±12.12 / 796.85 ms │  725.45 / 744.71 ±15.59 / 765.59 ms │     no change │
│ QQuery 19 │ 267.90 / 280.99 ±14.61 / 306.80 ms │  275.58 / 298.23 ±27.69 / 351.75 ms │  1.06x slower │
│ QQuery 20 │ 311.46 / 323.12 ±10.13 / 341.26 ms │   312.13 / 319.42 ±4.39 / 324.46 ms │     no change │
│ QQuery 21 │ 816.40 / 837.33 ±19.78 / 870.18 ms │   766.23 / 778.58 ±8.98 / 792.31 ms │ +1.08x faster │
│ QQuery 22 │     81.46 / 84.94 ±2.58 / 88.20 ms │      75.31 / 77.73 ±1.39 / 79.55 ms │ +1.09x faster │
└───────────┴────────────────────────────────────┴─────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                  │ 7829.34ms │
│ Total Time (perf-strength-reduce-hash-partition)   │ 7466.29ms │
│ Average Time (HEAD)                                │  355.88ms │
│ Average Time (perf-strength-reduce-hash-partition) │  339.38ms │
│ Queries Faster                                     │        10 │
│ Queries Slower                                     │         1 │
│ Queries with No Change                             │        11 │
│ Queries with Failure                               │         0 │
└────────────────────────────────────────────────────┴───────────┘
```

</details>

## What changes are included in this PR?

Use strength-reduce to speed up hash % partition

## Are these changes tested?

Existing tests

## Are there any user-facing changes?

A small change to `new_hash_partitioner` to return a `Result` instead of
panic during runtime
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
### Reproducer
Under `benchmarks/`, run `./bench.sh run tpch`, the generated result
file won't get ignored by git

### Reason
A recent PR has changed one entry in `.gitignore` and caused the issue
-
https://github.com/apache/datafusion/pull/21707/changes#diff-8ef3f336d18af2c481452ec156ec35b744a9c459c4e11f4bd72ceeb75ea6b6d3

### PR
This PR reverted this entry to the previous version.

Test: the above reproducer is working as expected after the change

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Part of apache#8229

## Rationale for this change

DataFusion already has shared logic for merging `Statistics`, but
`UnionExec` and `InterleaveExec` still used their own local merge code.

That left duplicated path in the codebase and kept the behavior less
consistent than the other statistics aggregation paths.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Reuse `Statistics::try_merge_iter` for `UnionExec` statistics merging
- Reuse the same shared path for `InterleaveExec` statistics merging
- Remove the local union-specific statistics merge helpers
- Add tests for union and interleave statistics merging
- Add a test for interleave partition-level statistics merging

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

No

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

N/A. This is a benchmark follow-up for apache#21637.

## Rationale for this change

This adds a ClickBench extended query that exercises Parquet filter
pushdown when row group statistics can prove a string range predicate
matches every row in the row group.

This case is useful for validating the optimization in apache#21637: when
Parquet statistics prove a row group is fully matched, DataFusion can
avoid evaluating the pushed-down RowFilter for that row group.

## What changes are included in this PR?

- Add `benchmarks/queries/clickbench/extended/q13.sql`.
- Document Q13 in the ClickBench query README.

## Are these changes tested?


I ran a local synthetic-data comparison for this query. With
`target_partitions=1`, the apache#21637 branch reduced scan processing time
from about 85.82ms to 24.89ms, reduced `bytes_scanned` from 26.12M to
400.6K, and reduced `row_pushdown_eval_time` from 4.12ms to effectively
zero.


## Are there any user-facing changes?

No public API changes. This adds a benchmark query and benchmark
documentation.
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

`array_any_match` is a commonly supported higher-order function in
systems like Spark (`exists`), Trino (`any_match`) among other engines.
It seems like a natural first addition alongside `array_transform` and
worth upstreaming I think.


## What changes are included in this PR?

Adds `array_any_match(array, predicate)` as a new higher-order function
(with aliases `any_match` and `list_any_match`). It returns:

`true` if any element satisfies the predicate
`false` if no element does (including empty arrays)
`null` if the predicate returns null for some elements and false for all
others

## Are these changes tested?

Yes I added unit tests and sqllogic tests

## Are there any user-facing changes?

Yes -- new SQL functions array_any_match, any_match, and list_any_match
are available.

---------

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
apache#21898)

## Summary
Adds `WITH ORDER (x)` to `CREATE EXTERNAL TABLE skew_parquet` /
`skew_parquet_single` in `explain_analyze.slt` so `FileScanConfig`
preserves scan ordering (`preserve_order`), keeping per-partition
`output_rows` stable under dynamic file scheduling (PR apache#21351).

## Related
- Follow-up to flaky skew assertions discussed around apache#21866 / apache#21850.

## Testing
- `cargo test -p datafusion-sqllogictest --test sqllogictests --
explain_analyze` (recommended before merge)

Sqllogictest-only change.

Made with [Cursor](https://cursor.com)

---------

Co-authored-by: Yongting You <2010youy01@gmail.com>
Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
…oin plans (apache#21947)

## Which issue does this PR close?

Closes apache#21946

## Rationale for this change

`adjust_input_keys_ordering` returns `Transformed::yes` unconditionally
in the default else branch, even when `requirements.data` is empty and
no changes were made. This triggers unnecessary `with_new_children`
rebuilds on every node in the plan tree for non-join/non-aggregate
queries.

For plans with custom `ExecutionPlan` nodes whose `with_new_children` is
expensive (e.g. nodes that re-evaluate cost functions on rebuild), this
causes significant overhead.

## What changes are included in this PR?

Add an early return with `Transformed::no` when
`requirements.data.is_empty()` in the default else branch of
`adjust_input_keys_ordering`. This skips the unnecessary plan tree
rebuild for simple scan/filter/limit plans that have no join key
reordering requirements.

## Are these changes tested?

Yes, two unit tests added:
- `adjust_input_keys_ordering_no_transform_for_scan` — verifies a bare
parquet scan returns `Transformed::no`
- `adjust_input_keys_ordering_no_transform_for_filter_scan` — verifies a
filter→scan tree returns `Transformed::no` via `transform_down`

## Are there any user-facing changes?

No. This is a performance optimization that does not change query
results or plan structure.
## Which issue does this PR close?
Closes apache#21784

## Rationale for this change
Apache Arrow added `BooleanArray::has_true()` and `has_false()` so
callers can answer “any true/false?” without a full bit count. That can
short-circuit and avoid unnecessary work compared to patterns like
`true_count() == 0` or `true_count() > 0`.

This PR applies those APIs across DataFusion where the logic is purely
existential (or equivalent via null-safe “all true” / “no true” checks),
matching the audit suggested in the issue.

## What changes are included in this PR?
- Replace hot-path checks that only needed existence or emptiness with
`has_true()` / `has_false()` (and `null_count()` where needed),
including:
- Nested/array helpers (`array_has`, list replace), Spark
`array_contains` null-semantics fast path
- Physical expressions: `evaluate_selection`, binary AND/OR
short-circuit, CASE/IN list loops
  - `scatter` fast paths
- Top-K filter handling, sort-merge join filter, nested-loop join bitmap
checks
  - Parquet column stats (`metadata.rs`, `has_any_exact_match`)
- Keep `true_count()` / `false_count()` where an actual count is
required (row counts, metrics, selectivity, `to_array(n)`, etc.)
- Import `arrow::array::Array` where `null_count()` is used on
`BooleanArray` in trait-heavy paths

## Are these changes tested?
Existing tests cover this behavior; the edits are semantics-preserving
refactors (same conditions, cheaper primitives). No new tests were
added.

## Are there any user-facing changes?
No. Behavior should be unchanged; this is an internal
performance/clarity improvement.

---------

Co-authored-by: Raushan Prabhakar <ros@Raushans-MacBook-Air.local>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
…lue types that require an extra coercion step (apache#21924)

The coerce_fn applied to REE values needs to be the higher-level coerce
function so that any REE value can be coerced (not just primitive
types).

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#21923 

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Queries were failing

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Correct unwrapping of REE values for regex/LIKE coercion

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes, with slt tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
Queries that would previously error now pass.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
…he#21894)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

For sliding window aggregation, `retract_batch` removes outgoing rows
from the aggregate state on every window slide. `median` and
`percentile_cont` store primitive numeric values internally, but their
retract paths converted values through `ScalarValue` before matching
them.

This PR keeps retract matching on native Arrow values, reducing
conversion and hashing overhead in that hot path.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Optimize `median` and `percentile_cont` `retract_batch` using
`Hashable<T::Native>` keys.
- Add sliding-window benchmarks for `median` and `percentile_cont` with
window sizes `256`, `4096`, and `16384`.

### Benchmarks
```
group                                                              main                                   optimized
-----                                                              ----                                   ---------
median sliding_window f64 no_nulls window_size=16384               2.38      3.3±0.06ms        ? ?/sec    1.00  1396.6±36.31µs        ? ?/sec
median sliding_window f64 no_nulls window_size=256                 2.73   781.3±20.80µs        ? ?/sec    1.00   286.0±10.52µs        ? ?/sec
median sliding_window f64 no_nulls window_size=4096                2.11  1052.2±27.13µs        ? ?/sec    1.00   499.3±19.44µs        ? ?/sec
median sliding_window f64 with_nulls window_size=16384             2.52      3.0±0.06ms        ? ?/sec    1.00  1173.1±36.86µs        ? ?/sec
median sliding_window f64 with_nulls window_size=256               2.67   728.6±20.07µs        ? ?/sec    1.00   272.8±12.90µs        ? ?/sec
median sliding_window f64 with_nulls window_size=4096              2.11   954.8±27.37µs        ? ?/sec    1.00   452.6±13.08µs        ? ?/sec
percentile_cont sliding_window f64 no_nulls window_size=16384      3.86     10.7±0.24ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec
percentile_cont sliding_window f64 no_nulls window_size=256        2.49   797.8±25.51µs        ? ?/sec    1.00   320.1±58.86µs        ? ?/sec
percentile_cont sliding_window f64 no_nulls window_size=4096       3.44      3.2±0.12ms        ? ?/sec    1.00   928.2±42.15µs        ? ?/sec
percentile_cont sliding_window f64 with_nulls window_size=16384    3.72      6.7±0.90ms        ? ?/sec    1.00  1790.9±22.20µs        ? ?/sec
percentile_cont sliding_window f64 with_nulls window_size=256      2.51   721.0±25.52µs        ? ?/sec    1.00   286.7±30.34µs        ? ?/sec
percentile_cont sliding_window f64 with_nulls window_size=4096     3.34      2.2±0.14ms        ? ?/sec    1.00   667.1±20.87µs        ? ?/sec
```

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes. existed slt passed.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

No.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
We were missing a couple of branches to unwrap REE in
type_union_resolution_coercion.

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#21918 

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Fix an unexpected error

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Type coercion match arms for REE

## Are these changes tested?

Yes, via sql logic tests
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Queries that errored now complete successfully

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
## Which issue does this PR close?

- Refers apache#12709.

## Rationale for this change

Binary arguments are supported for concat UDFs, but not for the pipe
operator (`||`), which supports only text.

## What changes are included in this PR?

- Support binary concat by providing specialised kernels for pure binary
operations. Avoid support of mixed string/binary arguments as it doesn't
match the behaviour of major DBs, except for Postgres (see the table in
the linked ticket).
- Add `concat_elements_binary_view_array` kernel
- Refactor private `binary_coercion` to support symmetric BinaryLike +
BinaryLike - required for the new codeflow

Concat UDFs are out of scope and supported separately.

## Are these changes tested?

- Existing SLTs
- Moved a few tests to a more appropriate `binary.slt`
- Added new unit tests

## Are there any user-facing changes?

Concatenation `||` operator now allows binary+binary concatenation
(`SELECT x'636166c3a9' || x'68656c6c6f'`), but denies mixed
string+binary concatenation `SELECT x'636166c3a9' || 'hello'`

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…1885)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#21871 

## Rationale for this change

FilterExec supports two semantically different projection states:

- None → return all columns (full projection)
- Some(vec![]) → return no columns (empty projection)
However, both cases were being serialized identically as an empty vector
in the proto representation. During deserialization, an empty vector was
always mapped back to None, meaning an empty projection would silently
become a full projection after a serde round-trip.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…ptionProperties fallible (apache#21603)

## Which issue does this PR close?

- Closes apache#21602.

## Rationale for this change

Fail quickly with a helpful error if we're unable to represent a
`FileDecryptionProperties` instance as `ConfigFileDecryptionProperties`

## What changes are included in this PR?

* Change the implementation of `From<&Arc<FileDecryptionProperties>>`
for `ConfigFileDecryptionProperties` to `TryFrom`.
* Fail the conversion if we can't get the footer key from the
`FileDecryptionProperties` with empty metadata

## Are these changes tested?

Yes I've added a new unit test.

I also tested this with a branch of delta-rs that uses Datafusion with
Parquet encryption, and this required only minor changes to tests and
examples:
corwinjoy/delta-rs@file_format_options_squashed...adamreeve:delta-rs:test-datafusion-change

## Are there any user-facing changes?

Yes, this is a breaking API change.

---------

Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>
Co-authored-by: Nuno Faria <nunofpfaria@gmail.com>
@jayshrivastava jayshrivastava changed the base branch from js/dedupe-dynamic-filter-inner-state to js/dedupe-dynamic-filter-inner-state-v2 May 1, 2026 14:43
jayshrivastava and others added 3 commits May 1, 2026 17:22
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

Informs:
datafusion-contrib/datafusion-distributed#180
Closes: apache#20418

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Consider you have a plan with a `HashJoinExec` and `DataSourceExec`
```
HashJoinExec(dynamic_filter_1 on a@0)
  (...left side of join)
  ProjectionExec(a := Column("a", source_index))
    DataSourceExec
      ParquetSource(predicate = dynamic_filter_2)
```

You serialize the plan, deserialize it, and execute it. What should
happen is that the dynamic filter should "work", meaning:
1. When you deserialize the plan, both the `HashJoinExec` and
`DataSourceExec` should have pointers to the same
`DynamicFilterPhysicalExpr`
2. The `DynamicFilterPhysicalExpr` should be updated during execution by
the `HashJoinExec` and the `DataSourceExec` should filter out rows

This does not happen today for a few reasons, a couple of which this PR
aims to address
1. `DynamicFilterPhysicalExpr` is not survive round-tripping. The
internal exprs get inlined (ex. it may be serialized as `Literal`) due
to the `PhysicalExpr::snapshot()` API
2. Even if `DynamicFilterPhysicalExpr` survives round-tripping, the one
pushed down to the `DataSourceExec` often has different children. In
this case, you have two `DynamicFilterPhysicalExpr` which
do not survive deduping, causing referential integrity to be lost.


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

This PR aims to fix those problems by:
1. Removing the `snapshot()` call from the serialization process
2. Adding protos for `DynamicFilterPhysicalExpr` so it can be serialized
and deserialized
3. Removing `Arc`-based deduplication. We now only dedupe on
   `expression_id` if the `PhysicalExpr` reports a `expression_id`.
After this change, only `DynamicFilterPhysicalExpr` reports an
`expression_id`
   to be deduped.
4. `expression_id` is now just a random u64. Since a given query likely
   only has a few `DynamicFilterPhysicalExpr` instances, the odds of a
   collision are very low
5. There's no need for a `DedupingSerializer` anymore since the
   `expression_id` is already stored in the dynamic filter proto itself 

Future work:
1. Serialize dynamic filters in `HashJoinExec`, `AggregateExec` and
`SortExec`
2. Add tests which actually execute plans after deserialization and
assert that dynamic filtering is functional
3. Add proto converters to the `PhysicalExtensionCodec` trait so
implementors can utilize deduping logic

## Are these changes tested?

- adds tests which roundtrip dynamic filters and assert that referential
  integrity is maintained
- removes tests that test `Arc`-based deduplication and session id
  rotation since we don't support that anymore

## Are there any user-facing changes?

- The default codec does not call `snapshot()` on `PhysicalExpr` during
serialization anymore. This means that `DynamicFilterPhysicalExpr` are
now serialized and deserialized without snapshotting.
- All `PhysicalExpr` are not deduped anymore. Only
`DynamicFilterPhysicalExpr` is

---------

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

Fix negative cases with `substring`, some tests were incorrect 

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Builds on the prior `DynamicFilterPhysicalExpr` proto serialization +
dedupe work so plan-node references to a shared dynamic filter survive
roundtrip.

- Adds `dynamic_filter` to the proto messages for `SortExec`,
  `AggregateExec`, and `HashJoinExec` and wires it through
  to/from-proto.
- Exposes `dynamic_filter()` / `with_dynamic_filter()` on those plan
  nodes so the dedupe deserializer can reattach the shared
  `DynamicFilterPhysicalExpr` after roundtrip.
- Extracts `supported_accumulators_info()` on `AggregateExec` and uses
  it from `init_dynamic_filter` and `with_dynamic_filter`.
- Adds `test_hash_join_with_dynamic_filter_roundtrip`,
  `test_aggregate_with_dynamic_filter_roundtrip`, and
  `test_sort_topk_with_dynamic_filter_roundtrip` to verify that the
  plan node and the pushdown-target `ParquetSource` predicate end up
  pointing at the same `expression_id` after roundtrip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jayshrivastava jayshrivastava force-pushed the js/serialize-dynamic-filters-in-execution-plans-2 branch from 90abf76 to c3aef20 Compare May 4, 2026 17:47
@jayshrivastava jayshrivastava changed the base branch from js/dedupe-dynamic-filter-inner-state-v2 to main May 4, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.