feat: Proof-of-concept of cost-based optimization [WIP / Experimental] #2906

andygrove · 2025-12-14T18:36:46Z

Which issue does this PR close?

Rationale for this change

I would like to start experimenting with CBO and allow users to also experiment by providing their own cost models, tailored to their specific workloads.

What changes are included in this PR?

Add a starting point for development and experimentation.

This is intentional simple/crude and just estimates Comet acceleration per operator (we should use the microbenchmarks to determing sensible estimates) and then averages that over the plan. Statistics such as row count are not taken into account yet, but we can incorporate that later.

How are these changes tested?

Not at all yet. I will leave this as draft until it is farther along and has some tests, but I would like to get feedback early since this is a proof-of-concept.

andygrove · 2025-12-14T18:43:04Z

@mbutrovich I expect that you may have some opinions on this one!

codecov-commenter · 2025-12-14T19:16:30Z

Codecov Report

❌ Patch coverage is 13.33333% with 91 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.08%. Comparing base (f09f8af) to head (54e1a33).
⚠️ Report is 768 commits behind head on main.

Files with missing lines	Patch %	Lines
...n/scala/org/apache/comet/cost/CometCostModel.scala	0.00%	78 Missing ⚠️
...n/scala/org/apache/comet/rules/CometExecRule.scala	25.00%	8 Missing and 4 partials ⚠️
.../main/scala/org/apache/comet/DataTypeSupport.scala	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2906      +/-   ##
============================================
+ Coverage     56.12%   59.08%   +2.95%     
- Complexity      976     1381     +405     
============================================
  Files           119      168      +49     
  Lines         11743    15448    +3705     
  Branches       2251     2578     +327     
============================================
+ Hits           6591     9127    +2536     
- Misses         4012     5032    +1020     
- Partials       1140     1289     +149

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderfender · 2025-12-14T19:34:15Z

spark/src/main/scala/org/apache/comet/cost/CometCostModel.scala

+      operatorCount += 1
+
+      // Recursively process children
+      node.children.foreach(collectOperatorCosts)


Perhaps we could remove usage of vars and let the function return the totalAcceleration and operator count itself ?

Something like :

def countItems(list: List[Any], accumulator: Int = 0): Int = { list match { case head :: tail => countItems(tail, accumulator + 1) case Nil => accumulator } } val myList = List(1, 2, 3, 4) val count = countItems(myList) // result: 4

I don't know how much of this code will still exist by the time the proof-of-concept is working and ready for detailed code review, so I'll hold off from making these changes now.

I am really looking for high-level feedback on the general approach at the moment.

andygrove added 4 commits December 14, 2025 11:09

Add trait

fe8cfdd

format

f05a629

configs

6662c96

integrate with Spark AQE

7f5996d

andygrove added 2 commits December 14, 2025 11:52

walk plan

fde7f35

Save

3f964c6

coderfender reviewed Dec 14, 2025

View reviewed changes

andygrove added 9 commits December 14, 2025 15:23

more

95551f1

add TODO

e3efad5

fix

6417e54

format

7227f25

test

927edc5

test

bc2ced7

test

8adc080

test

162ee47

test

63eeebc

andygrove changed the title ~~feat: Proof-of-concept of AQE cost-based optimization~~ feat: Proof-of-concept of AQE cost-based optimization [WIP / Experimental] Dec 15, 2025

andygrove added 2 commits December 14, 2025 18:06

test

14ff5a5

remove AQE integration

304e9f7

andygrove changed the title ~~feat: Proof-of-concept of AQE cost-based optimization [WIP / Experimental]~~ feat: Proof-of-concept of cost-based optimization [WIP / Experimental] Dec 15, 2025

andygrove added 3 commits December 15, 2025 15:05

use configs

a7acf9a

remove debug logging

f60d72a

remove nonsense test

54e1a33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Proof-of-concept of cost-based optimization [WIP / Experimental] #2906

feat: Proof-of-concept of cost-based optimization [WIP / Experimental] #2906

Uh oh!

andygrove commented Dec 14, 2025 •

edited

Loading

Uh oh!

andygrove commented Dec 14, 2025

Uh oh!

codecov-commenter commented Dec 14, 2025 •

edited

Loading

Uh oh!

coderfender Dec 14, 2025

Uh oh!

andygrove Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Proof-of-concept of cost-based optimization [WIP / Experimental] #2906

Are you sure you want to change the base?

feat: Proof-of-concept of cost-based optimization [WIP / Experimental] #2906

Uh oh!

Conversation

andygrove commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented Dec 14, 2025

Uh oh!

codecov-commenter commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderfender Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Dec 14, 2025 •

edited

Loading

codecov-commenter commented Dec 14, 2025 •

edited

Loading