Skip to content

Conversation

@LiaCastaneda
Copy link

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

adriangb and others added 4 commits December 19, 2025 17:46
…nfrastructure (apache#18449)

This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- (This PR): apache#18449 (depends on
apache#18448)
- apache#18451

- Enhance InListExpr to efficiently store homogeneous lists as arrays
and avoid a conversion to Vec<PhysicalExpr>
  by adding an internal InListStorage enum with Array and Exprs variants
- Re-use existing hashing and comparison utilities to support Struct
arrays and other complex types
- Add public function `in_list_from_array(expr, list_array, negated)`
for creating InList from arrays

Although the diff looks large most of it is actually tests and docs. I
think the actual code change is a negative LOC change, or at least
negative complexity (eliminates a trait, a macro, matching on data
types).

---------

Co-authored-by: David Hewitt <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
(cherry picked from commit 486c5d8)
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18330 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Reduce code duplication.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

A util function replacing many calls which are using the same code.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

No logic should change whatsoever, so each area which now uses this code
should have it's own tests and benchmarks unmodified.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Yes, there is now a new pub function.
No other changes to API.

---------

Co-authored-by: Martin Grigorov <[email protected]>
(cherry picked from commit 76b4156)
…for more precise filters (apache#18451)

## Background

This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393.
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- apache#18449 (depends on
apache#18448)
- (This PR): apache#18451

## Changes in this PR

This PR refactors state management in HashJoinExec to make filter
pushdown more efficient and prepare for pushing down membership tests.

- Refactor internal data structures to clean up state management and
make usage more idiomatic (use `Option` instead of comparing integers,
etc.)
- Uses CASE expressions to evaluate pushed-down filters selectively by
partition Example: `CASE hash_repartition % N WHEN partition_id THEN
condition ELSE false END`

---------

Co-authored-by: Lía Adriana <[email protected]>
(cherry picked from commit 5b0aa37)
… on the size of the build side (apache#18393)

This PR is part of an EPIC to push down hash table references from
HashJoinExec into scans. The EPIC is tracked in
apache#17171.

A "target state" is tracked in
apache#18393 (*this PR*).
There is a series of PRs to get us to this target state in smaller more
reviewable changes that are still valuable on their own:
- apache#18448
- apache#18449 (depends on
apache#18448)
- apache#18451

As those are merged I will rebase this PR to keep track of the
"remaining work", and we can use this PR to explore big picture ideas or
benchmarks of the final state.

(cherry picked from commit c0e8bb5)
@LiaCastaneda LiaCastaneda changed the title Lia/in list cherry picks on top of v51 Bring IN LIST Dynamic Filtering work Dec 22, 2025
@LiaCastaneda LiaCastaneda force-pushed the lia/IN-LIST-cherry-picks-on-top-of-v51 branch from 3824132 to 3a23f98 Compare December 22, 2025 09:25
@LiaCastaneda LiaCastaneda changed the base branch from branch-51-test to branch-51 December 23, 2025 08:49
@LiaCastaneda LiaCastaneda changed the base branch from branch-51 to branch-51-test December 23, 2025 08:50
@LiaCastaneda LiaCastaneda changed the base branch from branch-51-test to branch-51 December 24, 2025 11:41
@LiaCastaneda LiaCastaneda marked this pull request as ready for review January 7, 2026 17:34
…ache#19300)

*errors* when serializing now, and would break any users using joins +
protobuf.
@github-actions github-actions bot added the proto label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants