Skip to content

fix(rust/sedona-spatial-join): prevent filter pushdown past KNN joins#611

Merged
Kontinuation merged 5 commits into
apache:mainfrom
Kontinuation:dont-pushdown-filters-for-knnjoin
Feb 18, 2026
Merged

fix(rust/sedona-spatial-join): prevent filter pushdown past KNN joins#611
Kontinuation merged 5 commits into
apache:mainfrom
Kontinuation:dont-pushdown-filters-for-knnjoin

Conversation

@Kontinuation
Copy link
Copy Markdown
Member

@Kontinuation Kontinuation commented Feb 13, 2026

Summary

KNN joins have different semantics than regular spatial joins — pushing filters to the object (build) side changes which objects are the k nearest neighbors, producing incorrect results. DataFusion's builtin PushDownFilter optimizer rule doesn't know this and incorrectly pushes filters through KNN joins.

This PR adds a KnnJoinEarlyRewrite optimizer rule that converts KNN joins to SpatialJoinPlanNode extension nodes before DataFusion's PushDownFilter rule runs. Extension nodes naturally block filter pushdown via prevent_predicate_push_down_columns() returning all columns.

Changes

  • New KnnJoinEarlyRewrite optimizer rule — handles two patterns:
    1. Join(filter=ST_KNN(...)) — when the ON clause has only the spatial predicate
    2. Filter(ST_KNN(...), Join(on=[...])) — when the ON clause also has equi-join conditions (DataFusion's SQL planner separates these)
  • Positional rule insertionMergeSpatialProjectionIntoJoin and KnnJoinEarlyRewrite are inserted before PushDownFilter, while SpatialJoinLogicalRewrite (for non-KNN joins) remains after so non-KNN joins still benefit from filter pushdown
  • Updated SpatialJoinLogicalRewrite — skips KNN joins (already handled by the early rewrite)
  • Integration tests verifying that object-side filters are NOT pushed down for KNN joins, but ARE pushed down for non-KNN spatial joins

Rule ordering

... → MergeSpatialProjectionIntoJoin → KnnJoinEarlyRewrite → PushDownFilter → ... → SpatialJoinLogicalRewrite

Follow-ups

  1. We don't enforce ST_KNN to appear first in the chain of AND expressions. For instance, ST_KNN(L.geom, R.geom, 5) AND L.id = R.id has the same semantics as L.id = R.id AND ST_KNN(L.geom, R.geom, 5). This seems to be unnatural. Optimization rule does not seem to be a good place to enforce this, so we leave it to future patches that work on SQL parser and ASTs.
  2. We don't allow any filter pushdown for ST_KNN for now. Actually filters applied to the query side could be pushed down without any problem, we need to implement such rules ourselves in future patches.

TODO

prevent_predicate_push_down_columns method seems to do the trick. I'll experiment with it. Hopefully we can implement query side filter pushdown easily.
UPDATE: No. It is a terrible idea. There's no shortcut. We have to implement the optimization rule ourselves.

Closes #605

KNN joins have different semantics than regular spatial joins — pushing
filters to the object (build) side changes which objects are the k
nearest neighbors, producing incorrect results.

Add KnnJoinEarlyRewrite optimizer rule that converts KNN joins to
SpatialJoinPlanNode extension nodes before DataFusion's PushDownFilter
rule runs, since extension nodes naturally block filter pushdown via
prevent_predicate_push_down_columns().

Rule ordering: MergeSpatialProjectionIntoJoin → KnnJoinEarlyRewrite →
PushDownFilter → ... → SpatialJoinLogicalRewrite (for non-KNN joins).

Closes apache#605
@Kontinuation Kontinuation force-pushed the dont-pushdown-filters-for-knnjoin branch from cde2b7f to d6bc0d7 Compare February 13, 2026 13:54
Comment on lines +109 to +116
fn necessary_children_exprs(&self, _output_columns: &[usize]) -> Option<Vec<Vec<usize>>> {
// Request all columns from both children. This ensures the optimizer
// recurses into children while preserving all columns needed by the
// join filter and output schema.
let left_indices: Vec<usize> = (0..self.left.schema().fields().len()).collect();
let right_indices: Vec<usize> = (0..self.right.schema().fields().len()).collect();
Some(vec![left_indices, right_indices])
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for working around a bug in DataFusion. I'll submit a patch later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an early logical optimizer rewrite to ensure KNN joins are converted into SpatialJoinPlanNode extension nodes before DataFusion’s PushDownFilter runs, preventing incorrect filter pushdown onto the KNN build side.

Changes:

  • Insert MergeSpatialFilterIntoJoin + new KnnJoinEarlyRewrite before PushDownFilter, and run SpatialJoinLogicalRewrite after it for non-KNN joins.
  • Remove is_spatial_predicate and update predicate-name collection tests accordingly.
  • Add integration tests asserting object-side filter pushdown is blocked for KNN joins but allowed for non-KNN spatial joins; add necessary_children_exprs to the extension node.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
rust/sedona-spatial-join/src/planner/optimizer.rs Adds KnnJoinEarlyRewrite, reorders optimizer rules around PushDownFilter, refactors join→extension rewrite helper.
rust/sedona-spatial-join/src/planner/logical_plan_node.rs Adds necessary_children_exprs implementation for SpatialJoinPlanNode.
rust/sedona-spatial-join/src/planner/spatial_expr_utils.rs Removes is_spatial_predicate and updates tests to validate collect_spatial_predicate_names.
rust/sedona-spatial-join/tests/spatial_join_integration.rs Expands KNN filter correctness coverage and adds physical-plan assertions for filter pushdown behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/sedona-spatial-join/src/planner/optimizer.rs Outdated
Comment thread rust/sedona-spatial-join/tests/spatial_join_integration.rs
Comment thread rust/sedona-spatial-join/tests/spatial_join_integration.rs
Comment thread rust/sedona-spatial-join/tests/spatial_join_integration.rs Outdated
Comment thread rust/sedona-spatial-join/src/planner/optimizer.rs
@Kontinuation Kontinuation force-pushed the dont-pushdown-filters-for-knnjoin branch from e8c0c85 to b4381c3 Compare February 17, 2026 07:43
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@Kontinuation Kontinuation merged commit 55b822f into apache:main Feb 18, 2026
17 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Feb 19, 2026
github-merge-queue Bot pushed a commit to apache/datafusion that referenced this pull request Mar 6, 2026
…nTableProvider::scan (#20393)

## Which issue does this PR close?

N/A (newly discovered bug)

This is originally found in apache/sedona-db when working on a custom
plan node:
apache/sedona-db#611 (comment)

## Rationale for this change

`ForeignTableProvider::scan()` converts a `None` projection (meaning
"return all columns") into an empty `RVec<usize>` before passing it
across the FFI boundary. On the receiving side, `scan_fn_wrapper` always
wraps the received `RVec` in `Some(...)`, passing `Some(&vec![])` to the
inner `TableProvider::scan()`. This means "project zero columns" — the
exact opposite of the intended "project all columns."

The root cause is that the `FFI_TableProvider::scan` function signature
uses `RVec<usize>` for the projections parameter. Since `RVec<usize>`
cannot represent `None`, the `None` vs. empty-vec distinction is lost at
the FFI boundary.

## What changes are included in this PR?

Three coordinated changes in `datafusion/ffi/src/table_provider.rs`:

1. **FFI struct definition**: Changed `scan` function pointer signature
from `RVec<usize>` to `ROption<RVec<usize>>` for the projections
parameter, matching how `limit` already uses `ROption<usize>` for the
same `None`-vs-value distinction.

2. **Receiver side** (`scan_fn_wrapper`): Converts
`ROption<RVec<usize>>` via `.into_option().map(...)` and passes
`projections.as_ref()` to the inner provider, preserving `None`
semantics.

3. **Sender side** (`ForeignTableProvider::scan`): Converts
`Option<&Vec<usize>>` to `ROption<RVec<usize>>` via `.into()` instead of
using `unwrap_or_default()`.

Plus a new unit test
`test_scan_with_none_projection_returns_all_columns` that directly
exercises the FFI round-trip with `projection=None` and verifies all 3
columns are returned.

Also fixed the existing `test_aggregation` test to set
`library_marker_id = mock_foreign_marker_id` so it actually exercises
the FFI path instead of taking the local bypass.

## How are these changes tested?

- New test `test_scan_with_none_projection_returns_all_columns`: creates
a 3-column MemTable, wraps it through FFI with the foreign marker set,
calls `scan(None)`, and asserts 3 columns are returned (previously
returned 0).

## Are these changes safe?

This is a **breaking FFI ABI change** to the `FFI_TableProvider::scan`
function pointer signature. However:

- The `abi_stable` crate's `#[derive(StableAbi)]` generates layout
checks at dylib load time, so mismatched dylibs will be caught at load
rather than causing silent corruption.
- All existing providers construct `FFI_TableProvider` via `::new()` or
`::new_with_ffi_codec()`, which internally wire up `scan_fn_wrapper` —
nobody constructs the `scan` function pointer manually.
alamb pushed a commit to alamb/datafusion that referenced this pull request Mar 12, 2026
…nTableProvider::scan (apache#20393)

## Which issue does this PR close?

N/A (newly discovered bug)

This is originally found in apache/sedona-db when working on a custom
plan node:
apache/sedona-db#611 (comment)

## Rationale for this change

`ForeignTableProvider::scan()` converts a `None` projection (meaning
"return all columns") into an empty `RVec<usize>` before passing it
across the FFI boundary. On the receiving side, `scan_fn_wrapper` always
wraps the received `RVec` in `Some(...)`, passing `Some(&vec![])` to the
inner `TableProvider::scan()`. This means "project zero columns" — the
exact opposite of the intended "project all columns."

The root cause is that the `FFI_TableProvider::scan` function signature
uses `RVec<usize>` for the projections parameter. Since `RVec<usize>`
cannot represent `None`, the `None` vs. empty-vec distinction is lost at
the FFI boundary.

## What changes are included in this PR?

Three coordinated changes in `datafusion/ffi/src/table_provider.rs`:

1. **FFI struct definition**: Changed `scan` function pointer signature
from `RVec<usize>` to `ROption<RVec<usize>>` for the projections
parameter, matching how `limit` already uses `ROption<usize>` for the
same `None`-vs-value distinction.

2. **Receiver side** (`scan_fn_wrapper`): Converts
`ROption<RVec<usize>>` via `.into_option().map(...)` and passes
`projections.as_ref()` to the inner provider, preserving `None`
semantics.

3. **Sender side** (`ForeignTableProvider::scan`): Converts
`Option<&Vec<usize>>` to `ROption<RVec<usize>>` via `.into()` instead of
using `unwrap_or_default()`.

Plus a new unit test
`test_scan_with_none_projection_returns_all_columns` that directly
exercises the FFI round-trip with `projection=None` and verifies all 3
columns are returned.

Also fixed the existing `test_aggregation` test to set
`library_marker_id = mock_foreign_marker_id` so it actually exercises
the FFI path instead of taking the local bypass.

## How are these changes tested?

- New test `test_scan_with_none_projection_returns_all_columns`: creates
a 3-column MemTable, wraps it through FFI with the foreign marker set,
calls `scan(None)`, and asserts 3 columns are returned (previously
returned 0).

## Are these changes safe?

This is a **breaking FFI ABI change** to the `FFI_TableProvider::scan`
function pointer signature. However:

- The `abi_stable` crate's `#[derive(StableAbi)]` generates layout
checks at dylib load time, so mismatched dylibs will be caught at load
rather than causing silent corruption.
- All existing providers construct `FFI_TableProvider` via `::new()` or
`::new_with_ffi_codec()`, which internally wire up `scan_fn_wrapper` —
nobody constructs the `scan` function pointer manually.
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
…nTableProvider::scan (apache#20393)

## Which issue does this PR close?

N/A (newly discovered bug)

This is originally found in apache/sedona-db when working on a custom
plan node:
apache/sedona-db#611 (comment)

## Rationale for this change

`ForeignTableProvider::scan()` converts a `None` projection (meaning
"return all columns") into an empty `RVec<usize>` before passing it
across the FFI boundary. On the receiving side, `scan_fn_wrapper` always
wraps the received `RVec` in `Some(...)`, passing `Some(&vec![])` to the
inner `TableProvider::scan()`. This means "project zero columns" — the
exact opposite of the intended "project all columns."

The root cause is that the `FFI_TableProvider::scan` function signature
uses `RVec<usize>` for the projections parameter. Since `RVec<usize>`
cannot represent `None`, the `None` vs. empty-vec distinction is lost at
the FFI boundary.

## What changes are included in this PR?

Three coordinated changes in `datafusion/ffi/src/table_provider.rs`:

1. **FFI struct definition**: Changed `scan` function pointer signature
from `RVec<usize>` to `ROption<RVec<usize>>` for the projections
parameter, matching how `limit` already uses `ROption<usize>` for the
same `None`-vs-value distinction.

2. **Receiver side** (`scan_fn_wrapper`): Converts
`ROption<RVec<usize>>` via `.into_option().map(...)` and passes
`projections.as_ref()` to the inner provider, preserving `None`
semantics.

3. **Sender side** (`ForeignTableProvider::scan`): Converts
`Option<&Vec<usize>>` to `ROption<RVec<usize>>` via `.into()` instead of
using `unwrap_or_default()`.

Plus a new unit test
`test_scan_with_none_projection_returns_all_columns` that directly
exercises the FFI round-trip with `projection=None` and verifies all 3
columns are returned.

Also fixed the existing `test_aggregation` test to set
`library_marker_id = mock_foreign_marker_id` so it actually exercises
the FFI path instead of taking the local bypass.

## How are these changes tested?

- New test `test_scan_with_none_projection_returns_all_columns`: creates
a 3-column MemTable, wraps it through FFI with the foreign marker set,
calls `scan(None)`, and asserts 3 columns are returned (previously
returned 0).

## Are these changes safe?

This is a **breaking FFI ABI change** to the `FFI_TableProvider::scan`
function pointer signature. However:

- The `abi_stable` crate's `#[derive(StableAbi)]` generates layout
checks at dylib load time, so mismatched dylibs will be caught at load
rather than causing silent corruption.
- All existing providers construct `FFI_TableProvider` via `::new()` or
`::new_with_ffi_codec()`, which internally wire up `scan_fn_wrapper` —
nobody constructs the `scan` function pointer manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve clarify of filter after KNN join

3 participants