Skip to content

Conversation

@ding-young
Copy link
Contributor

@ding-young ding-young commented Aug 4, 2025

Which issue does this PR close?

Rationale for this change

In multi-level merge, we reserve estimated memory need for merging sorted spill files first, and bypass global memory pool when creating SortPreservingMergeStream(shortly SPM). The purpose of it is to ensure that we can finish SPM step without lacking memory by keeping worst case memory reservation til SPM ends.

Details
  • grow merge_reservation based on max batch memory per spill file

    let mut memory_reservation = self.reservation.new_empty();
    // Don't account for existing streams memory
    // as we are not holding the memory for them
    let mut sorted_streams = mem::take(&mut self.sorted_streams);
    let (sorted_spill_files, buffer_size) = self
    .get_sorted_spill_files_to_merge(
    2,
    // we must have at least 2 streams to merge
    2_usize.saturating_sub(sorted_streams.len()),
    &mut memory_reservation,
    )?;

  • bypass global buffer pool (use unbounded memory pool)

    if !all_in_memory {
    // Don't track memory used by this stream as we reserve that memory by worst case sceneries
    // (reserving memory for the biggest batch in each stream)
    // TODO - avoid this hack as this can be broken easily when `SortPreservingMergeStream`
    // changes the implementation to use more/less memory
    builder = builder.with_bypass_mempool();
    } else {
    // If we are only merging in-memory streams, we need to use the memory reservation
    // because we don't know the maximum size of the batches in the streams
    builder = builder.with_reservation(self.reservation.new_empty());
    }

Since we use UnboundedMemoryPool as a trick, we don't validate whether this memory_reservation is the actual upper limit when SPM step for multi level merge. Therefore, we need to validate the memory consumption in SPM does not exceed the size of memory_reservation.

What changes are included in this PR?

This PR add check to SpillReadStream so that whenever a spill stream is polled, the memory size of the batch being read does not exceed max_record_batch_memory + margin. This allows us to detect cases where we made an incorrect (underestimated) memory reservation — for example, when the batch consumes more memory after the write-read cycle than originally expected.

This PR creates a separate GreedyMemoryPool size of memory_reservation instead of using UnboundedMemoryPool when merging spill files (and in-memory streams) on multi-level merge.

Are these changes tested?

Yes, and following tests related to spilling fail 😢
Maybe our previous worst-case memory estimation was wrong, but don't understand why at this point. We need more investigation here. I'll put more details in comments.

Are there any user-facing changes?

@github-actions github-actions bot added execution Related to the execution crate physical-plan Changes to the physical-plan crate labels Aug 4, 2025
@ding-young
Copy link
Contributor Author

List of failing tests

Details
// fuzz_cases 
fuzz_cases::aggregate_fuzz::test_single_mode_aggregate_single_mode_aggregate_with_spill
fuzz_cases::spilling_fuzz_in_memory_constrained_env::test_aggregate_with_high_cardinality_with_limited_memory_and_different_sizes_of_record_batch
fuzz_cases::spilling_fuzz_in_memory_constrained_env::test_aggregate_with_high_cardinality_with_limited_memory_and_different_sizes_of_record_batch_and_changing_memory_reservation
fuzz_cases::spilling_fuzz_in_memory_constrained_env::test_aggregate_with_high_cardinality_with_limited_memory_and_different_sizes_of_record_batch_and_take_all_memory
fuzz_cases::spilling_fuzz_in_memory_constrained_env::test_aggregate_with_high_cardinality_with_limited_memory_and_large_record_batch
fuzz_cases::spilling_fuzz_in_memory_constrained_env::test_sort_with_limited_memory_and_large_record_batch
// memory_limit
memory_limit::test_stringview_external_sort 
memory_limit::test_stringview_external_sort

Possible cause?

There are two problem discovered while working on this PR.

First, we don't reserve memory for in-memory streams (since we don't know the batch size without polling it) when we build SortPreservingMergeStream with both in-memory streams and sorted spill files. The merge_reservation only accounts for worst case memory usage for sorted spill files, so this means that we'll bypass the global memory pool but does not check the memory consumed by in-memory streams.

Second, it seems like the actual memory usage reported by local memory pool exceeds the size of precomputed memory_reservation even when there are only sorted spill files to merge. Failing tests indicate this, and when I printed out the size of UnboundedMemoryPool, it seems like we underestimated the worst case memory consumption for merge.

Details

There are 6 spill files to merge, and we reserved 3676800 bytes based on max record batch size for each spill file, but the peak used bytes of UnboundedMemoryPool are 28280376. Note that this test test_stringview_external_sort uses FairSpillPool as global memory pool, so I guess the usage log for UnboundedMemoryPool accounts for SPM in multi level merge.

// modify `with_bypass_mempool` in this PR to use UnbountedMemoryPool
// RUST_LOG=debug cargo test -p datafusion memory_limit::test_stringview_external_sort -- --exact --nocapture
[merge_sorted_runs_within_mem_limit] spill6, inmem0
[merge_sorted_runs_within_mem_limit] memory_reservation:3676800
create new merge sort with bypass mempool
[mem pool] 1394496 used
[mem pool] 2788992 used
[mem pool] 3639880 used
[mem pool] 4570768 used
[mem pool] 5965448 used
[mem pool] 7360128 used
[mem pool] 8754744 used
[mem pool] 10149360 used
[mem pool] 11543832 used
[mem pool] 12938304 used
[mem pool] 14332872 used
[mem pool] 15727440 used
[mem pool] 17122440 used
[mem pool] 18517440 used
[mem pool] 19912376 used
[mem pool] 21307312 used
[mem pool] 22701912 used
[mem pool] 24096512 used
[mem pool] 25491152 used
[mem pool] 26885792 used
[mem pool] 28280376 used

@2010YOUY01 @rluvaton
Could you help me check if the issue is really with estimation or memory management, rather than me logging things incorrectly? I'm not sure where this discrepancy is coming from, since it's not only the string array-related tests that are failing.

@rluvaton
Copy link
Member

rluvaton commented Aug 4, 2025

Note: sort spill remaining in memory batch while row_hash does not, also sort account for memory used by the SortPerservingMergeStream which row_hash does not.

How much is the difference for the fuzz tests I added that check memory constrained envs? as it only tests couple of simple columns that are easier to reason about.

The kernels that are used in the sort stream might over estimate and also note that even if you request X capacity you might get more than that.

@rluvaton
Copy link
Member

rluvaton commented Aug 4, 2025

I saw that SortPreservingMergeStream using different way to calculate memory which does not take into account the sliced data, see

let mut counted_buffers: HashSet<NonNull<u8>> = HashSet::new();

I think this is it

@2010YOUY01
Copy link
Contributor

If it fails, I think this approach will make the debugging very painful.

I have an alternative idea to make this validation more fine-grained:
Let's say there are 3 spills to merge, each has estimated max batch size 10M, 15M, 12M
Then we can only check during merging, each stream's batch size is always less than [10M, 15M, 12M]

Though this approach is less comprehensive, and can be a bit hacky when implementing (to directly extend operator for this check), but it can make trouble-shooting much easier.

@github-actions github-actions bot added the core Core DataFusion crate label Aug 8, 2025
@ding-young
Copy link
Contributor Author

Update: I took alternative approach similar to what @2010YOUY01 suggested.

I have an alternative idea to make this validation more fine-grained: Let's say there are 3 spills to merge, each has estimated max batch size 10M, 15M, 12M Then we can only check during merging, each stream's batch size is always less than [10M, 15M, 12M]

I switched back to using UnboundedMemoryPool, but instead added check to SpillReadStream so that whenever a spill stream is polled, the memory size of the batch being read does not exceed max_record_batch_memory. This allows us to detect cases where we made an incorrect (underestimated) memory reservation — for example, when the batch consumes more memory after the write-read cycle than originally expected.

There is a slight discrepancy due to minor vector allocations, so I added a margin to the check. Fortunately, in most cases, the validation passes. However, for external sorting with string views, the validation currently fails, so further investigation is needed.

@ding-young
Copy link
Contributor Author

ding-young commented Aug 8, 2025

How much is the difference for the fuzz tests I added that check memory constrained envs? as it only tests couple of simple columns that are easier to reason about.

I also did some additional debugging to understand why SortPreservingMergeStream ends up using more memory than the pre-reserved amount. The root cause I identified is as follows:

Except for slight discrepancies due to vector allocation overhead, I found several key sources of memory underestimation:

  1. In-memory stream not accounted for SPM:

When performing SPM (SortPreservingMerge) over both spill files and in-memory streams, we only reserve memory using
get_reserved_byte_for_record_batch_size(spill.max_record_batch_memory * buffer_len) * (per spill). However, if there are 2 spill streams and 1 in-memory stream, the reservation for the in-memory stream is not considered at all.

  1. Incorrect logic in get_reserved_byte_for_record_batch_size:

The estimation was based on a fixed 2× multiplier, without considering difference in sort key & sort payload columns and data type. In reality, this varies significantly and the current logic often under(/over)estimates. This is a known issue(#14748) and I’m actively working on it.

  1. SPM buffers at most 2 (buffer_len * (batch and cursor)) per stream:

Looking at the implementation, SPM can buffer both the previous and current (cursor, batch) for each stream simultaneously.
See

batches: Vec::with_capacity(stream_count * 2),
and
self.prev_cursors[stream_idx] = self.cursors[stream_idx].take();

That means, in the worst case, SPM can use up to 2 × get_reserved_byte_for_record_batch_size(spill.max_record_batch_memory * buffer_len) * (per stream) So I think we should double the reservation per stream to be safe.

@ding-young ding-young force-pushed the verify-mem-multi-level branch from 5ec8edd to 68cffc6 Compare August 8, 2025 05:31
@github-actions github-actions bot removed the core Core DataFusion crate label Aug 8, 2025
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this approach is a good idea.

I think this PR is ready, however I think we should work on fixing the failures first, then come back to this PR.

@2010YOUY01
Copy link
Contributor

How much is the difference for the fuzz tests I added that check memory constrained envs? as it only tests couple of simple columns that are easier to reason about.

I also did some additional debugging to understand why SortPreservingMergeStream ends up using more memory than the pre-reserved amount. The root cause I identified is as follows:

Except for slight discrepancies due to vector allocation overhead, I found several key sources of memory underestimation:

  1. In-memory stream not accounted for SPM:

When performing SPM (SortPreservingMerge) over both spill files and in-memory streams, we only reserve memory using get_reserved_byte_for_record_batch_size(spill.max_record_batch_memory * buffer_len) * (per spill). However, if there are 2 spill streams and 1 in-memory stream, the reservation for the in-memory stream is not considered at all.

  1. Incorrect logic in get_reserved_byte_for_record_batch_size:

The estimation was based on a fixed 2× multiplier, without considering difference in sort key & sort payload columns and data type. In reality, this varies significantly and the current logic often under(/over)estimates. This is a known issue(#14748) and I’m actively working on it.

  1. SPM buffers at most 2 (buffer_len * (batch and cursor)) per stream:

Looking at the implementation, SPM can buffer both the previous and current (cursor, batch) for each stream simultaneously. See

batches: Vec::with_capacity(stream_count * 2),

and

self.prev_cursors[stream_idx] = self.cursors[stream_idx].take();

That means, in the worst case, SPM can use up to 2 × get_reserved_byte_for_record_batch_size(spill.max_record_batch_memory * buffer_len) * (per stream) So I think we should double the reservation per stream to be safe.

For point 1: I vaguely remember in multi level merge, there is a logic to re-spill in-memory batches before the final merge, so that we don't have to special handlings for the mixed in-mem + spills case 🤔 If I’m not remembering it correctly, or we have missed some edges cases, we should do it (before the final merge, spill all in-mems first) for simplicity now.

For point 2: I was expecting this should better be done after #15380, but it seems this optimization got stuck, I'll look into this issue in the next few days.

@ding-young
Copy link
Contributor Author

For point 1: I vaguely remember in multi level merge, there is a logic to re-spill in-memory batches before the final merge, so that we don't have to special handlings for the mixed in-mem + spills case 🤔 If I’m not remembering it correctly, or we have missed some edges cases, we should do it (before the final merge, spill all in-mems first) for simplicity now.

After taking another look, it seems that the in-mem + spill case only happens in the first-round merge. After that, everything gets spilled. So while it's true that this case may use more memory than the reservation, it doesn't seem to be the major case, and I’ll hold off on addressing it for now.

For point 2: I was expecting this should better be done after #15380, but it seems this optimization got stuck, I'll look into this issue in the next few days.

I’ve opened a new PR to address it. Would appreciate it if you could take a look :)

Besides that, just as a side note: I’m currently looking into a failing test case in this PR (memory validation). It’s related to StringViewArray, and I’m digging into why get_array_memory_size and get_sliced_size are so different even after running gc() before spilling.

@ding-young
Copy link
Contributor Author

Update: even if we make correct memory size calculation with above PR#17315, it seems like StringViewArray related test fails and the size of record batch blows up (ex. 850907 -> 932176) after IPC write/read round trip. I'll take a short look and open a ticket in arrow-rs as well.

@ding-young ding-young force-pushed the verify-mem-multi-level branch from 26ba0e5 to 221af34 Compare September 16, 2025 13:11
@ding-young ding-young marked this pull request as ready for review September 16, 2025 13:13
@ding-young ding-young force-pushed the verify-mem-multi-level branch from 221af34 to d9fa760 Compare September 16, 2025 13:15
@ding-young
Copy link
Contributor Author

After rebase, I saw no memory validation err with cargo test. However, since there might be potential memory accounting / blowup errs, this pr prints warn log instead of directly returning Err.

@ding-young ding-young force-pushed the verify-mem-multi-level branch from d9fa760 to 5a510cc Compare September 16, 2025 13:19
@github-actions github-actions bot removed the execution Related to the execution crate label Sep 16, 2025
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@2010YOUY01 2010YOUY01 added this pull request to the merge queue Sep 18, 2025
Merged via the queue into apache:main with commit 13208e6 Sep 18, 2025
28 checks passed
blaginin added a commit to blaginin/datafusion that referenced this pull request Nov 9, 2025
* Use `Display` formatting of `DataType`:s in error messages (#17565)

* Use Display formatting for DataTypes where I could find them

* fix

* More places

* Less Debug

* Cargo fmt

* More cleanup

* Plural types as Display

* Fixes

* Update some more tests and error messages

* Update test snapshot

* last (?) fixes

* update another slt

* Update instructions on how to run the tests

* Ignore pending snapshot files in .gitignore

* Running all the tests is so slow

* just a trailing space

* Update another test

* Fix markdown formatting

* Improve Display for NativeType

* Update code related to error reporting of NativeType

* Revert some formatting

* fixelyfix

* Another snapshot update

* docs: Move Google Summer of Code 2025 pages to a section (#17504)

* Move GSOC content to its own section

* Update to 20205

* feat: Add `OR REPLACE` to creating external tables (#17580)

* feat: Add `OR REPLACE` to creating external tables

* regen

* fmt

* make more explicit + add tests

* clipy fix

---------

Co-authored-by: Dmitrii Blaginin <[email protected]>

* `avg(distinct)` support for decimal types (#17560)

* chore: mv `DistinctSumAccumulator` to common

* feat: add avg distinct support for float64 type

* chore: fmt

* refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module

* feat: add avg distinct support for float64 type

* feat: add avg distinct support for decimal

* feat: more test for avg distinct in rust api

* Remove DataFrame API tests for avg(distinct)

* Remove proto test

* Fix merge errors

* Refactoring

* Minor cleanup

* Decimal slt tests for avg(distinct)

* Fix state_fields for decimal distinct avg

---------

Co-authored-by: YuNing Chen <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>

* chore(deps): bump taiki-e/install-action from 2.61.8 to 2.61.9 (#17640)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.8 to 2.61.9.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/2fdc5fd6ac805b0f8256893bd4c807bcb666af00...8ea32481661d5e04d602f215b94f17e4014b44f9)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.61.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump Swatinem/rust-cache from 2.8.0 to 2.8.1 (#17641)

Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.8.0 to 2.8.1.
- [Release notes](https://github.com/swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/swatinem/rust-cache/compare/98c8021b550208e191a6a3145459bfc9fb29c4c0...f13886b937689c021905a6b90929199931d60db1)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-version: 2.8.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Validate the memory consumption in SPM created by multi level merge (#17029)

* use GreedyMemoryPool for sanity check

* validate whether batch read from spill exceeds max_record_batch_mem

* replace err with warn log

* fix(SubqueryAlias): use maybe_project_redundant_column (#17478)

* fix(SubqueryAlias): use maybe_project_redundant_column

Fixes #17405

* chore: format

* ci: retry

* chore(SubqueryAlias): restructore duplicate detection and add tests

* docs: add examples and context to the reproducer

* minor: Ensure `datafusion-sql` package dependencies have `sql` flag (#17644)

* optimizer: Rewrite `IS NOT DISTINCT FROM` joins as Hash Joins (#17319)

* optimizer: Convert to Hash Join for join predicates like 'a IS NOT DISTINCT FROM b'

* drop tables in slt

* fix rust doc

* Update datafusion/optimizer/src/extract_equijoin_predicate.rs

Co-authored-by: Jonathan Chen <[email protected]>

* Update datafusion/optimizer/src/extract_equijoin_predicate.rs

* Update datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt

Co-authored-by: Nga Tran <[email protected]>

* review: more tests and better error message

* review: improve doc

---------

Co-authored-by: Jonathan Chen <[email protected]>
Co-authored-by: Nga Tran <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>

* Upgrade to arrow 56.1.0 (#17275)

* Update to arrow/parquet 56.1.0

* Adjust for new parquet sizes, update for deprecated API

* Thread through max_predicate_cache_size, add test

* fix: Preserves field metadata when creating logical plan for VALUES expression (#17525)

* [ISSUE 17425] Initial attempt to fix this problem

* Add tests for the fix

* Require that the metadata of values in VALUES clause must be identical

* fix merge error

---------

Co-authored-by: Andrew Lamb <[email protected]>

* chore(deps): bump serde from 1.0.223 to 1.0.225 (#17614)

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.223 to 1.0.225.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.223...v1.0.225)

---
updated-dependencies:
- dependency-name: serde
  dependency-version: 1.0.225
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dmitrii Blaginin <[email protected]>

* chore: Update dynamic filter formatting (#17647)

* chore: update dynamic filter formatting to indicate expr is placeholder

* update tests

* update tests

* chore(deps): bump taiki-e/install-action from 2.61.9 to 2.61.10 (#17660)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.9 to 2.61.10.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/8ea32481661d5e04d602f215b94f17e4014b44f9...0aa4f22591557b744fe31e55dbfcdfea74a073f7)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.61.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* proto: don't include parquet feature by default (#17577)

* feat: add support for RightAnti and RightSemi join types (#17604)

Closes #17603

* minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency (#17656)

* minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency

* toml formatting

* chore(deps): bump indexmap from 2.11.3 to 2.11.4 (#17661)

Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.11.3 to 2.11.4.
- [Changelog](https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md)
- [Commits](https://github.com/indexmap-rs/indexmap/compare/2.11.3...2.11.4)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-version: 2.11.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: add xorq to list of known users (#17668)

* Introduce `TypeSignatureClass::Binary` to allow accepting arbitrarily sized `FixedSizeBinary` arguments (#17531)

* Introduce wildcard const for FixedSizeBinary type signature

* Add Binary to TypeSignatureClass

* Remove FIXED_SIZE_BINARY_WILDCARD

* docs: deduplicate links in `introduction.md` (#17669)

* docs: deduplicate links in `introduction.md`

* Further simplifications

* Fix

* Add explicit PMC/committers list to governance docs page (#17574)

* Add committers explicitly to governance page, with script

* add license header

* Update Wes McKinney's affiliation in governance.md

* Update adriangb's affiliation

* Update affiliation

* Andy Grove Affiliation

* Update Qi Zhu affiliation

* Updatd linwei's info

* Update docs/source/contributor-guide/governance.md

* Update docs/source/contributor-guide/governance.md

* Apply suggestions from code review

Co-authored-by: Oleks V <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>

* Apply suggestions from code review

Co-authored-by: Alex Huang <[email protected]>
Co-authored-by: Yang Jiang <[email protected]>
Co-authored-by: Yongting You <[email protected]>

* Apply suggestions from code review

Co-authored-by: Yijie Shen <[email protected]>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
Co-authored-by: Jax Liu <[email protected]>
Co-authored-by: Ifeanyi Ubah <[email protected]>

* Apply suggestions from code review

Co-authored-by: Will Jones <[email protected]>

* Clarify what is updated in the script

* Apply suggestions from code review

Co-authored-by: Paddy Horan <[email protected]>
Co-authored-by: Dan Harris <[email protected]>

* Update docs/source/contributor-guide/governance.md

* Update docs/source/contributor-guide/governance.md

Co-authored-by: Parth Chandra <[email protected]>

* Update docs/source/contributor-guide/governance.md

* prettier

---------

Co-authored-by: Wes McKinney <[email protected]>
Co-authored-by: Adrian Garcia Badaracco <[email protected]>
Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Qi Zhu <[email protected]>
Co-authored-by: 张林伟 <[email protected]>
Co-authored-by: xudong.w <[email protected]>
Co-authored-by: Oleks V <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Alex Huang <[email protected]>
Co-authored-by: Yang Jiang <[email protected]>
Co-authored-by: Yongting You <[email protected]>
Co-authored-by: Yijie Shen <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
Co-authored-by: Jax Liu <[email protected]>
Co-authored-by: Ifeanyi Ubah <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Paddy Horan <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Ruihang Xia <[email protected]>
Co-authored-by: Parth Chandra <[email protected]>

* fix: Ignore governance doc from typos (#17678)

* Support Decimal32/64 types (#17501)

* Support Decimal32/64 types

* Fix bugs, tests, handle more aggregate functions and schema

* Fill out more parts in expr,common and expr-common

* Some stragglers and overlooked corners

* Actually commit the avg_distinct support

---------

Co-authored-by: Andrew Lamb <[email protected]>

* minor: Improve hygiene for `datafusion-functions` macros (#17638)

* feat(small): Display `NullEquality` in join executor's `EXPLAIN` output (#17664)

* Clarify null-equal explain expectations

* Format null equality display strings

* fix test

* review: more concise message

* review: more concise message

* Custom timestamp format for DuckDB (#17653)

* feat(substrait): add time literal support (#17655)

Adds support for `ScalarValue::Time64Microsecond` and `ScalarValue::Time64Nanosecond` to be converted to and from Substrait literals. This includes the `PrecisionTime` literal type and specific `TIME_64_TYPE_VARIATION_REF` for 6-digit (microseconds) and 9-digit (nanoseconds) precision.

Co-authored-by: Bruno Volpato <[email protected]>

* Support LargeList for array_sort (#17657)

* Support FixedSizeList for array_except (#17658)

* fix: null padding for `array_reverse` on `FixedSizeList` (#17673)

* fix: array_reverse with null

* update

* update

* chore: refactor array fn signatures & add more slt tests (#17672)

* Support FixedSizeList for array_to_string (#17666)

* fix: correct statistics for `NestedLoopJoinExec` (#17680)

* fix: correct statistics for nestedloopexec

* chore: update comment

* minor: add SQLancer fuzzed SLT case for natural joins (#17683)

* chore: Upgrade Rust version to 1.90.0 (#17677)

* chore: bump workspace rust version to 1.90.0

* fix clippy errors

* fix clippy errors

* try using dedicate runner temp space

* retrigger

* inspect disk usage

* split build/run

* disable debug info in ci profile

* revert ci changes

* Support FixedSizeList for array_position (#17659)

* chore(deps): bump the proto group with 2 updates (#16806)

* chore(deps): bump the proto group with 2 updates

Bumps the proto group with 2 updates: [pbjson-build](https://github.com/influxdata/pbjson) and [prost-build](https://github.com/tokio-rs/prost).


Updates `pbjson-build` from 0.7.0 to 0.8.0
- [Commits](https://github.com/influxdata/pbjson/commits)

Updates `prost-build` from 0.13.5 to 0.14.1
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/prost/compare/v0.13.5...v0.14.1)

---
updated-dependencies:
- dependency-name: pbjson-build
  dependency-version: 0.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: proto
- dependency-name: prost-build
  dependency-version: 0.14.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: proto
...

Signed-off-by: dependabot[bot] <[email protected]>

* Regen protos

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jefffrey <[email protected]>

* feat(spark): implement Spark `make_interval` function (#17424)

* feat(spark): implement Spark make_interval function

* fix name length

* add doc

* add doc and change test, need more test

* fmt

* add test and doc, need to work in overflow

* clippy

* empty params

* test ok IntervalMonthDayNano::new(0, 0, 0) in unit test

* line blank

* fix doc table select

* dont panic

* update test and not panic fmt

* review

* review fix test failure

* review fix test failure format simple string

* test uncomment and link

* return test (empty)

* changes review

* all overflow null

* all overflow null fix fmt

* changes review

* changes review clippy

* refactor move

* fix error doc date_sub

* clean slt

* no space device

* chore: Update READMEs of crates to be more consistent (#17691)

* chore: Update READMEs of crates to be more consistent

* Add some more Apache project links

* Minor formatting

* Formatting

* Update datafusion/pruning/README.md

Co-authored-by: Andrew Lamb <[email protected]>

* suggestion

* formatting

* formatting

---------

Co-authored-by: Andrew Lamb <[email protected]>

* chore: update a bunch of dependencies (#17708)

* chore: fix wasm-pack installation link in wasmtest README (#17704)

* Support FixedSizeList for array_slice via coercion to List (#17667)

* docs: Remove disclaimer that `datafusion` 50.0.0 is not released (#17695)

* docs: Remove disclaimer that datafusion 50.0.0 is not released

* Add section about 51.0.0

* chore(deps): bump taiki-e/install-action from 2.61.10 to 2.62.1 (#17710)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.10 to 2.62.1.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/0aa4f22591557b744fe31e55dbfcdfea74a073f7...d6912b47771be2c443ec90dbb3d28e023987e782)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.62.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* perf: Improve the performance of WINDOW functions with many partitions (#17528)

* perf: Improve the performance of WINDOW functions with many partitions

* Improve variable name in calculate_n_out_row

* fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct (#17706)

* fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct

* Update datafusion/common/src/dfschema.rs

Co-authored-by: Andrew Lamb <[email protected]>

* fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>

* feat: expose `udafs` and `udwfs` methods on `FunctionRegistry` (#17650)

* expose udafs and udwfs method on `FunctionRegistry`

* fix doc test

* add default implementations not to trigger
backward incompatible change for others

* Support remaining substrait time literal variations (#17707)

* Bump MSRV to 1.87.0 (#17724)

* Bump MSRV to 1.87.0

* automatic code fixes

* Add upgrading entry

* Avoid redundant Schema clones (#17643)

* Collocate variants of From DFSchema to Schema

* Remove duplicated logic for obtaining Schema from DFSchema

* Remove Arc clone in hash_nested_array

* Avoid redundant Schema clones

* Avoid some Field clones

* make arc clones explicit

* retract the new From

* empty: roll the dice 🎲

* Use github link instead of relative link to optimizer_rule.rs in query-optimizer.md (#17723)

* Move misplaced upgrading entry about MSRV (#17727)

* Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API (#17536)

* Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API

* Add to roundtrip proto tests

* Support `WHERE`, `ORDER BY`, `LIMIT`, `SELECT`, `EXTEND` pipe operators (#17278)

* support WHERE pipe operator

* support order by

* support limit

* select pipe

* extend support

* document supported pipe operators in user guide

* fmt

* fix where pipe before extend

* don't rebind

* remove clone

* move docs into select.md

* avoid confusion by removing `>` in examples

---------

Co-authored-by: Jeffrey Vo <[email protected]>

* doc: add missing examples for multiple math functions (#17018)

* Update Scalar_functions.md

* pretier fix

* Updated files

* Updated Scalar functions

* Update datafusion/functions/src/math/log.rs

Co-authored-by: Jeffrey Vo <[email protected]>

* Update datafusion/functions/src/math/monotonicity.rs

Co-authored-by: Jeffrey Vo <[email protected]>

* Update datafusion/functions/src/math/monotonicity.rs

Co-authored-by: Jeffrey Vo <[email protected]>

* Update datafusion/functions/src/math/nans.rs

Co-authored-by: Jeffrey Vo <[email protected]>

* Update datafusion/functions/src/math/nanvl.rs

Co-authored-by: Jeffrey Vo <[email protected]>

* Fix tanh example to be tanh not trunc

* Run update_function_docs.sh

---------

Co-authored-by: Jeffrey Vo <[email protected]>

* feat: support for null, date, and timestamp types in approx_distinct (#17618)

* feat: let approx_distinct handle null, date and timestamp types

Signed-off-by: Dennis Zhuang <[email protected]>

* chore: update testing submodule

Signed-off-by: Dennis Zhuang <[email protected]>

* feat: supports time type and refactor NullHLLAccumulator

Signed-off-by: Dennis Zhuang <[email protected]>

* bump arrow-testing submodule

---------

Signed-off-by: Dennis Zhuang <[email protected]>
Co-authored-by: Jefffrey <[email protected]>

* fix(agg/corr): return NULL when variance is zero or samples < 2 (#17621)

Signed-off-by: Dennis Zhuang <[email protected]>

* chore(deps): bump taiki-e/install-action from 2.62.1 to 2.62.4 (#17739)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.1 to 2.62.4.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/d6912b47771be2c443ec90dbb3d28e023987e782...5597bc27da443ba8bf9a3bc4e5459ea59177de42)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.62.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump tempfile from 3.22.0 to 3.23.0 (#17741)

Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.22.0 to 3.23.0.
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Stebalien/tempfile/compare/v3.22.0...v3.23.0)

---
updated-dependencies:
- dependency-name: tempfile
  dependency-version: 3.23.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: make `LimitPushPastWindows` public (#17736)

* fix: Remove parquet encryption feature from root deps (#17700)

This fix relates to issue #16650 by completing #16649 .

* fix: Remove datafusion-macros's dependency on datafusion-expr  (#17688)

* Remove datafusion-macros's dependency on datafusion-expr

* Re-export

* chore: remove homebrew publish instructions from release steps (#17735)

* minor: create `OptimizerContext` with provided `ConfigOptions` (#17742)

* Improve documentation for ordered set aggregate functions (#17744)

* docs: fix sidebar overlapping table on configuration page on website (#17738)

* solved bug

* fix:modified css for table overlapping

* Add support for calling async UDF as aggregation expression (#17620)

* Add support for calling async UDF as aggregation expression

Fixes https://github.com/apache/datafusion/issues/17619

* add explain plans

* chore(deps): bump taiki-e/install-action from 2.62.4 to 2.62.5 (#17750)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.4 to 2.62.5.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/5597bc27da443ba8bf9a3bc4e5459ea59177de42...6f69ec9970ed0c500b1b76d648e05c4c7e0e5671)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.62.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* (fix): Lag function creates unwanted projection   (#17630) (#17639)

* fix: Not adding generated windown expr resulting column twice (#17630)

* Making clippy happier

* Support `LargeList` in `array_has` simplification to `InList` (#17732)

* Support `LargeList` in `array_has` simplification to `InList`

* refactoring

* chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 (#17642)

* chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53

Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.51 to 0.3.53.
- [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases)
- [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md)
- [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits)

---
updated-dependencies:
- dependency-name: wasm-bindgen-test
  dependency-version: 0.3.53
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* testing setting WASM_BINDGEN_TEST_TIMEOUT

* more testing

* more testing

* more testing

* more testing

* more testing

* testing

* testing

* testing

* testing

* whoops

* whoops

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* problem commit

* please let this work

* oops

* test 0.3.53

* fix

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jeffrey Vo <[email protected]>

* feat: support `Utf8View` for more args of `regexp_replace` (#17195)

* Stash changes.

* Signature cleanup, more test scenarios.

* Minor test renaming.

* Simplify signature.

* Update tests.

* Signature change for binary input support.

* Return type changes for binary.

* Stash.

* Stash.

* Stash.

* Stash.

* Fix regx bench.

* Clippy.

* Fix bench regx.

* Refactor signature. I need to remove the match arms that aren't used anymore, update the .slt test for string_view.slt, and understand why String(3) and String(4) is not equivalent to this.

* Remove unnecessary match arms.

* Update string_view slt test.

* Reduce diff by returning to single function with a match arm instead of two.

* Simplify template args.

* Fix benchmark compilation.

* Address PR feedback.

* feat(spark): implement Spark `map` function `map_from_arrays` (#17456)

* feat(spark): implement Spark `map` function `map_from_arrays`

* chore: add test with nested `map_from_arrays` calls, refactor map_deduplicate_keys to remove unnesessary variables and array slices

* fix: clippy warning

* fix: null and different size input lists treatment, chore: move common map funcs to utils.rs, add more tests

* fix: typo

* fix: clippy docstring warning

* chore: move more helpers needed for multiple map functions to utils

* chore: add multi-row tests

* fix: null values treatment

* fix: docstring warnings

* chore(deps): bump object_store from 0.12.3 to 0.12.4 (#17753)

Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.3 to 0.12.4.
- [Changelog](https://github.com/apache/arrow-rs-object-store/blob/main/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs-object-store/compare/v0.12.3...v0.12.4)

---
updated-dependencies:
- dependency-name: object_store
  dependency-version: 0.12.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update `arrow` / `parquet` to  56.2.0 (#17631)

* temp update to arrow 56.2.0 pin

* Update to 56.2.0

* Use released arrow

* Update cargo.lock

* fix lock

* chore(deps): bump taiki-e/install-action from 2.62.5 to 2.62.6 (#17766)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.5 to 2.62.6.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/6f69ec9970ed0c500b1b76d648e05c4c7e0e5671...4575ae687efd0e2c78240087f26013fb2484987f)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.62.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Keep aggregate udaf schema names unique when missing an order-by (#17731)

* test: reproducer of bug

* fix: make schema names unique for approx_percentile_cont

* test: regression test is now resolved

* feat : Display function alias in output column name (#17690)

* display function's alias name in output  column

* Update function.rs

* updated verbose name format

* simplify alias logic and removing args clone

* Support join cardinality estimation less conservatively (#17476)

* Support join cardinality estimation if distinct_count is set

Currently we require max and min to be set, as they might be used to
estimate the distinct count. This is unnecessarily conservative if
distinct_count has actually been provided, in which case max and min
won't be used at all and the presence of max or min has no influence
over how good of an estimate it is.

* Update datafusion/physical-plan/src/joins/utils.rs

Co-authored-by: Piotr Findeisen <[email protected]>

* Update tests

* Calculate cardinality even if distinct or min/max not provided

---------

Co-authored-by: Piotr Findeisen <[email protected]>

* chore(deps): bump libc from 0.2.175 to 0.2.176 (#17767)

Bumps [libc](https://github.com/rust-lang/libc) from 0.2.175 to 0.2.176.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Changelog](https://github.com/rust-lang/libc/blob/0.2.176/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.175...0.2.176)

---
updated-dependencies:
- dependency-name: libc
  dependency-version: 0.2.176
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump postgres-types from 0.2.9 to 0.2.10 (#17768)

Bumps [postgres-types](https://github.com/rust-postgres/rust-postgres) from 0.2.9 to 0.2.10.
- [Release notes](https://github.com/rust-postgres/rust-postgres/releases)
- [Commits](https://github.com/rust-postgres/rust-postgres/compare/postgres-types-v0.2.9...postgres-types-v0.2.10)

---
updated-dependencies:
- dependency-name: postgres-types
  dependency-version: 0.2.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Use `Expr::qualified_name()` and `Column::new()` to extract partition keys from window and aggregate operators (#17757)

* Use `Expr::qualified_name()` and `Column::new()` to extract partition keys

Using `Expr::schema_name()` and `Column::from_qualified_name()` could
incorrectly parse the column name.

* Use `Expr::qualified_name()` to extract group by keys

* Retrain dataframe tests with filters and aggregates

* Prevent exponential planning time for Window functions - v2 (#17684)

* fix

* Update mod.rs

* Update mod.rs

* Update mod.rs

* tests copied from v1 pr

* test case from review comment

https://github.com/apache/datafusion/pull/17684#discussion_r2366146307

* one more test case

* Update mod.rs

* Update datafusion/physical-plan/src/windows/mod.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/physical-plan/src/windows/mod.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update mod.rs

* Update mod.rs

---------

Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>

* docs: add Ballista link to landing page (#17746) (#17775)

* docs: add Ballista link to landing page (#17746)

This adds a link and description for DataFusion Ballista to the landing page, as suggested in issue #17746. Ballista is a distributed compute platform built on top of DataFusion.

Closes: #17746

* fix(docs): update Ballista link

* updated theory part

* chore(deps): bump taiki-e/install-action from 2.62.6 to 2.62.8 (#17781)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.6 to 2.62.8.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/4575ae687efd0e2c78240087f26013fb2484987f...ea0eda622640ac23a17ba349cf09e2709d58f5e1)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.62.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump wasm-bindgen-test from 0.3.53 to 0.3.54 (#17784)

Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.53 to 0.3.54.
- [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases)
- [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md)
- [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits)

---
updated-dependencies:
- dependency-name: wasm-bindgen-test
  dependency-version: 0.3.54
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: Action some old TODOs in github actions (#17694)

* chore: Action some old TODOs in github actions

* Update Cargo.toml

* testing

* Revert changing cli test runner to use container

* Remove sccache

* dev: Add benchmark for compilation profiles (#17754)

* Add benchmark for compilation profiles

* add apache header

* add apache header

* chore(deps): bump tokio-postgres from 0.7.13 to 0.7.14 (#17785)

Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.13 to 0.7.14.
- [Release notes](https://github.com/rust-postgres/rust-postgres/releases)
- [Commits](https://github.com/rust-postgres/rust-postgres/compare/tokio-postgres-v0.7.13...tokio-postgres-v0.7.14)

---
updated-dependencies:
- dependency-name: tokio-postgres
  dependency-version: 0.7.14
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump serde from 1.0.226 to 1.0.227 (#17783)

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.226 to 1.0.227.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.226...v1.0.227)

---
updated-dependencies:
- dependency-name: serde
  dependency-version: 1.0.227
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump regex from 1.11.2 to 1.11.3 (#17782)

Bumps [regex](https://github.com/rust-lang/regex) from 1.11.2 to 1.11.3.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.11.2...1.11.3)

---
updated-dependencies:
- dependency-name: regex
  dependency-version: 1.11.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Support `CAST` from temporal to `Utf8View` (#17535)

* Add case expr simplifiers for literal comparisons (#17743)

* Add case expr simplifiers for literal comparisons

* Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Avoid expr clones

---------

Co-authored-by: Andrew Lamb <[email protected]>

* chore: dependabot to run weekly (#17797)

* [DOCS] Add dbt Fusion engine and R2 Query Engine to "Known Users" (#17793)

* Add dbt Fusion engine and R2 Query Engine

* Update docs/source/user-guide/introduction.md

* Update docs/source/user-guide/introduction.md

* feat: change `datafusion-proto` to use `TaskContext` rather than`SessionContext` for physical plan serialization (#17601)

* change session context to task context in physical proto ...

* fix compilation issue

* remove `RuntimeEnv` from few function arguments

* update upgrading guide

* display window function's alias name in output (#17788)

* docs: update wasmtest README with instructions for Apple silicon (#17755)

* chore(deps): bump sysinfo from 0.37.0 to 0.37.1 (#17800)

Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.37.0 to 0.37.1.
- [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/GuillaumeGomez/sysinfo/compare/v0.37.0...v0.37.1)

---
updated-dependencies:
- dependency-name: sysinfo
  dependency-version: 0.37.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump taiki-e/install-action from 2.62.8 to 2.62.9 (#17799)

Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.8 to 2.62.9.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/install-action/compare/ea0eda622640ac23a17ba349cf09e2709d58f5e1...71d339ebf191fcbc3d49cd04b9484a4261f29975)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.62.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(spark): implement Spark `make_dt_interval` function (#17728)

* feat(spark): implement Spark make_dt_interval function

* fmt

* delete pub

* test slt

* fmt

* overflow -> null

* sugested changes

* fmt

* only res in slt

* null not void type

* explain types

* explain types fix url

* better comment

* Fix potential overflow when we print verbose physical plan (#17798)

* change debug to trace for potential overflow

* fix comments.

* fix

* Add SedonaDB as known user to Apache DataFusion (#17806)

* Extend datatype semantic equality check to include timestamps (#17777)

* Extend datatype semantic equality to include timestamps

* test

* Respond to comments

* cargo fmt

---------

Co-authored-by: Shiv Bhatia <[email protected]>

* fix: Filter out nulls properly in approx_percentile_cont_with_weight (#17780)

* chore: refactor usage of `reassign_predicate_columns` (#17703)

* chore: refactor usage of `reassign_predicate_columns`

* chore: Address PR comments

---------

Co-authored-by: Andrew Lamb <[email protected]>

* dev: Add Apache license check to the lint script (#17787)

* Add liscense checker ci script

* fix the deliberately added bad license header

* review: use dev profile and pin the version

* Fix: common_sub_expression_eliminate optimizer rule failed (#16066)

Common_sub_expression_eliminate rule failed with error:
`SchemaError(FieldNotFound {field: <name>}, valid_fields: []})`
due to the schema being changed by the second application of
`find_common_exprs`

As I understood the source of the problem was in sequential call of
`find_common_exprs`. First call returned original names as `aggr_expr`
and changed names as `new_aggr_expr`. Second call takes into account
only `new_aggr_expr` and if names was already changed by first call
will return changed names as `aggr_expr`(original ones)
and put them into Projection logic.

I used NamePreserver mechanism to restore original schema names and
generate Projection with original name at the end of aggregate
optimization.

Co-authored-by: Andrew Lamb <[email protected]>

* feat: support multi-threaded writing of Parquet files with modular encryption (#16738)

* Initial commit

diff --git c/Cargo.lock i/Cargo.lock
index 749971532..f0b9d0a5f 100644
--- c/Cargo.lock
+++ i/Cargo.lock
@@ -246,52 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"

 [[package]]
 name = "arrow"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "fd798aea3553913a5986813e9c6ad31a2d2b04e931fe8ea4a37155eb541cebb5"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-arith",
- "arrow-array",
- "arrow-buffer",
- "arrow-cast",
+ "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "arrow-csv",
- "arrow-data",
- "arrow-ipc",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "arrow-json",
- "arrow-ord",
+ "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "arrow-pyarrow",
- "arrow-row",
- "arrow-schema",
- "arrow-select",
- "arrow-string",
+ "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "half",
  "rand 0.9.2",
 ]

 [[package]]
 name = "arrow-arith"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "508dafb53e5804a238cab7fd97a59ddcbfab20cc4d9814b1ab5465b9fa147f2e"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "chrono",
+ "num",
+]
+
+[[package]]
+name = "arrow-arith"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
  "chrono",
  "num",
 ]

 [[package]]
 name = "arrow-array"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "e2730bc045d62bb2e53ef8395b7d4242f5c8102f41ceac15e8395b9ac3d08461"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
  "ahash 0.8.12",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "chrono",
  "chrono-tz",
  "half",
@@ -299,11 +309,35 @@ dependencies = [
  "num",
 ]

+[[package]]
+name = "arrow-array"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "ahash 0.8.12",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "chrono",
+ "half",
+ "hashbrown 0.15.4",
+ "num",
+]
+
 [[package]]
 name = "arrow-buffer"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "54295b93beb702ee9a6f6fbced08ad7f4d76ec1c297952d4b83cf68755421d1d"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+dependencies = [
+ "bytes",
+ "half",
+ "num",
+]
+
+[[package]]
+name = "arrow-buffer"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
 dependencies = [
  "bytes",
  "half",
@@ -312,15 +346,14 @@ dependencies = [

 [[package]]
 name = "arrow-cast"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "67e8bcb7dc971d779a7280593a1bf0c2743533b8028909073e804552e85e75b5"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
- "arrow-select",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "atoi",
  "base64 0.22.1",
  "chrono",
@@ -332,14 +365,32 @@ dependencies = [
 ]

 [[package]]
-name = "arrow-csv"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "673fd2b5fb57a1754fdbfac425efd7cf54c947ac9950c1cce86b14e248f1c458"
+name = "arrow-cast"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
 dependencies = [
- "arrow-array",
- "arrow-cast",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "atoi",
+ "base64 0.22.1",
+ "chrono",
+ "half",
+ "lexical-core",
+ "num",
+ "ryu",
+]
+
+[[package]]
+name = "arrow-csv"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+dependencies = [
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "chrono",
  "csv",
  "csv-core",
@@ -348,33 +399,42 @@ dependencies = [

 [[package]]
 name = "arrow-data"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "97c22fe3da840039c69e9f61f81e78092ea36d57037b4900151f063615a2f6b4"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-buffer",
- "arrow-schema",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "half",
+ "num",
+]
+
+[[package]]
+name = "arrow-data"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
  "half",
  "num",
 ]

 [[package]]
 name = "arrow-flight"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "6808d235786b721e49e228c44dd94242f2e8b46b7e95b233b0733c46e758bfee"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
 dependencies = [
- "arrow-arith",
- "arrow-array",
- "arrow-buffer",
- "arrow-cast",
- "arrow-data",
- "arrow-ipc",
- "arrow-ord",
- "arrow-row",
- "arrow-schema",
- "arrow-select",
- "arrow-string",
+ "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
  "base64 0.22.1",
  "bytes",
  "futures",
@@ -382,35 +442,45 @@ dependencies = [
  "paste",
  "prost",
  "prost-types",
- "tonic",
+ "tonic 0.12.3",
 ]

 [[package]]
 name = "arrow-ipc"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "778de14c5a69aedb27359e3dd06dd5f9c481d5f6ee9fbae912dba332fd64636b"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "flatbuffers",
  "lz4_flex",
  "zstd",
 ]

 [[package]]
-name = "arrow-json"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "3860db334fe7b19fcf81f6b56f8d9d95053f3839ffe443d56b5436f7a29a1794"
+name = "arrow-ipc"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-cast",
- "arrow-data",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "flatbuffers",
+]
+
+[[package]]
+name = "arrow-json"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+dependencies = [
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "chrono",
  "half",
  "indexmap 2.10.0",
@@ -424,78 +494,130 @@ dependencies = [

 [[package]]
 name = "arrow-ord"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "425fa0b42a39d3ff55160832e7c25553e7f012c3f187def3d70313e7a29ba5d9"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
- "arrow-select",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+]
+
+[[package]]
+name = "arrow-ord"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
 ]

 [[package]]
 name = "arrow-pyarrow"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "d944d8ae9b77230124e6570865b570416c33a5809f32c4136c679bbe774e45c9"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-data",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "pyo3",
 ]

 [[package]]
 name = "arrow-row"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "df9c9423c9e71abd1b08a7f788fcd203ba2698ac8e72a1f236f1faa1a06a7414"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "half",
+]
+
+[[package]]
+name = "arrow-row"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
  "half",
 ]

 [[package]]
 name = "arrow-schema"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "85fa1babc4a45fdc64a92175ef51ff00eba5ebbc0007962fecf8022ac1c6ce28"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
  "bitflags 2.9.1",
  "serde",
  "serde_json",
 ]

+[[package]]
+name = "arrow-schema"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+
 [[package]]
 name = "arrow-select"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "d8854d15f1cf5005b4b358abeb60adea17091ff5bdd094dca5d3f73787d81170"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
  "ahash 0.8.12",
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "num",
+]
+
+[[package]]
+name = "arrow-select"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "ahash 0.8.12",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
  "num",
 ]

 [[package]]
 name = "arrow-string"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "2c477e8b89e1213d5927a2a84a72c384a9bf4dd0dbf15f9fd66d821aafd9e95e"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
 dependencies = [
- "arrow-array",
- "arrow-buffer",
- "arrow-data",
- "arrow-schema",
- "arrow-select",
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "memchr",
+ "num",
+ "regex",
+ "regex-syntax",
+]
+
+[[package]]
+name = "arrow-string"
+version = "55.2.0"
+source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+dependencies = [
+ "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
  "memchr",
  "num",
  "regex",
@@ -567,6 +689,28 @@ dependencies = [
  "syn 2.0.106",
 ]

+[[package]]
+name = "async-stream"
+version = "0.3.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476"
+dependencies = [
+ "async-stream-impl",
+ "futures-core",
+ "pin-project-lite",
+]
+
+[[package]]
+name = "async-stream-impl"
+version = "0.3.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn 2.0.104",
+]
+
 [[package]]
 name = "async-trait"
 version = "0.1.89"
@@ -827,7 +971,7 @@ dependencies = [
  "rustls-native-certs",
  "rustls-pki-types",
  "tokio",
- "tower",
+ "tower 0.5.2",
  "tracing",
 ]

@@ -948,18 +1092,19 @@ dependencies = [

 [[package]]
 name = "axum"
-version = "0.8.4"
+version = "0.7.9"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5"
+checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f"
 dependencies = [
- "axum-core",
+ "async-trait",
+ "axum-core 0.4.5",
  "bytes",
  "futures-util",
  "http 1.3.1",
  "http-body 1.0.1",
  "http-body-util",
  "itoa",
- "matchit",
+ "matchit 0.7.3",
  "memchr",
  "mime",
  "percent-encoding",
@@ -967,7 +1112,53 @@ dependencies = [
  "rustversion",
  "serde",
  "sync_wrapper",
- "tower",
+ "tower 0.5.2",
+ "tower-layer",
+ "tower-service",
+]
+
+[[package]]
+name = "axum"
+version = "0.8.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5"
+dependencies = [
+ "axum-core 0.5.2",
+ "bytes",
+ "futures-util",
+ "http 1.3.1",
+ "http-body 1.0.1",
+ "http-body-util",
+ "itoa",
+ "matchit 0.8.4",
+ "memchr",
+ "mime",
+ "percent-encoding",
+ "pin-project-lite",
+ "rustversion",
+ "serde",
+ "sync_wrapper",
+ "tower 0.5.2",
+ "tower-layer",
+ "tower-service",
+]
+
+[[package]]
+name = "axum-core"
+version = "0.4.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199"
+dependencies = [
+ "async-trait",
+ "bytes",
+ "futures-util",
+ "http 1.3.1",
+ "http-body 1.0.1",
+ "http-body-util",
+ "mime",
+ "pin-project-lite",
+ "rustversion",
+ "sync_wrapper",
  "tower-layer",
  "tower-service",
 ]
@@ -1818,8 +2009,8 @@ name = "datafusion"
 version = "49.0.1"
 dependencies = [
  "arrow",
- "arrow-ipc",
- "arrow-schema",
+ "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "async-trait",
  "bytes",
  "bzip2 0.6.0",
@@ -1996,7 +2187,7 @@ dependencies = [
  "ahash 0.8.12",
  "apache-avro",
  "arrow",
- "arrow-ipc",
+ "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "base64 0.22.1",
  "chrono",
  "half",
@@ -2176,7 +2367,7 @@ version = "49.0.1"
 dependencies = [
  "arrow",
  "arrow-flight",
- "arrow-schema",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "async-trait",
  "base64 0.22.1",
  "bytes",
@@ -2197,7 +2388,7 @@ dependencies = [
  "tempfile",
  "test-utils",
  "tokio",
- "tonic",
+ "tonic 0.13.1",
  "tracing",
  "tracing-subscriber",
  "url",
@@ -2264,7 +2455,7 @@ version = "49.0.1"
 dependencies = [
  "abi_stable",
  "arrow",
- "arrow-schema",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "async-ffi",
  "async-trait",
  "datafusion",
@@ -2284,7 +2475,7 @@ name = "datafusion-functions"
 version = "49.0.1"
 dependencies = [
  "arrow",
- "arrow-buffer",
+ "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "base64 0.22.1",
  "blake2",
  "blake3",
@@ -2347,7 +2538,7 @@ name = "datafusion-functions-nested"
 version = "49.0.1"
 dependencies = [
  "arrow",
- "arrow-ord",
+ "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "criterion",
  "datafusion-common",
  "datafusion-doc",
@@ -2517,8 +2708,8 @@ version = "49.0.1"
 dependencies = [
  "ahash 0.8.12",
  "arrow",
- "arrow-ord",
- "arrow-schema",
+ "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "async-trait",
  "chrono",
  "criterion",
@@ -2589,7 +2780,7 @@ name = "datafusion-pruning"
 version = "49.0.1"
 dependencies = [
  "arrow",
- "arrow-schema",
+ "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
  "datafusion-common",
  "datafusion-datasource",
  "datafusion-expr",
@@ -4157,6 +4348,12 @@ dependencies = [
  "pkg-config",
 ]

+[[package]]
+name = "matchit"
+version = "0.7.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94"
+
 [[package]]
 name = "matchit"
 version = "0.8.4"
@@ -4529,18 +4726,17 @@ dependencies = [

 [[package]]
 name = "parquet"
-version = "56.0.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "c7288a07e…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate the memory consumption in SortPreservingMergeStream

3 participants