Skip to content

Conversation

@yhuang-db
Copy link
Contributor

@yhuang-db yhuang-db commented Oct 20, 2025

What changes were proposed in this pull request?

As a follow-up of #52655, add NULL handling in approx_top_k_accumulate/estimate/combine.

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit tests on null handling for accumulate, combine and estimate.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 20, 2025
}
}

def updateSketchBuffer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused functions as update and evaluation are handled by ApproxTopKAggregateBuffer, not on ItemsSketch anymore.

}

test("SPARK-53960: combine and estimate count NULL values") {
sql(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's add withView before each test case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added to all tests with views.

checkAnswer(est, Row(Seq(Row(null, 4), Row("b", 3))))
}

test("SPARK-53960: combine with a sketch of all nulls") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's test combining two sketches of all nulls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except minor comments

@gengliangwang
Copy link
Member

Thanks, merging to master

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…e NULLs

### What changes were proposed in this pull request?

As a follow-up of apache#52655, add NULL handling in approx_top_k_accumulate/estimate/combine.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New unit tests on null handling for accumulate, combine and estimate.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52673 from yhuang-db/accumulate_estimate_count_null.

Authored-by: yhuang-db <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants