Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 14, 2025

Which issue does this PR close?

Closes #2785

Rationale for this change

I would like to start experimenting with CBO and allow users to also experiment by providing their own cost models, tailored to their specific workloads.

What changes are included in this PR?

Add a starting point for development and experimentation.

This is intentional simple/crude and just estimates Comet acceleration per operator (we should use the microbenchmarks to determing sensible estimates) and then averages that over the plan. Statistics such as row count are not taken into account yet, but we can incorporate that later.

How are these changes tested?

Not at all yet. I will leave this as draft until it is farther along and has some tests, but I would like to get feedback early since this is a proof-of-concept.

@andygrove
Copy link
Member Author

@mbutrovich I expect that you may have some opinions on this one!

@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2025

Codecov Report

❌ Patch coverage is 13.33333% with 91 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.08%. Comparing base (f09f8af) to head (54e1a33).
⚠️ Report is 768 commits behind head on main.

Files with missing lines Patch % Lines
...n/scala/org/apache/comet/cost/CometCostModel.scala 0.00% 78 Missing ⚠️
...n/scala/org/apache/comet/rules/CometExecRule.scala 25.00% 8 Missing and 4 partials ⚠️
.../main/scala/org/apache/comet/DataTypeSupport.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2906      +/-   ##
============================================
+ Coverage     56.12%   59.08%   +2.95%     
- Complexity      976     1381     +405     
============================================
  Files           119      168      +49     
  Lines         11743    15448    +3705     
  Branches       2251     2578     +327     
============================================
+ Hits           6591     9127    +2536     
- Misses         4012     5032    +1020     
- Partials       1140     1289     +149     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

operatorCount += 1

// Recursively process children
node.children.foreach(collectOperatorCosts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could remove usage of vars and let the function return the totalAcceleration and operator count itself ?

Something like :

def countItems(list: List[Any], accumulator: Int = 0): Int = {
  list match {
    case head :: tail => countItems(tail, accumulator + 1)
    case Nil => accumulator
  }
}

val myList = List(1, 2, 3, 4)
val count = countItems(myList)  // result: 4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much of this code will still exist by the time the proof-of-concept is working and ready for detailed code review, so I'll hold off from making these changes now.

I am really looking for high-level feedback on the general approach at the moment.

@andygrove andygrove changed the title feat: Proof-of-concept of AQE cost-based optimization feat: Proof-of-concept of AQE cost-based optimization [WIP / Experimental] Dec 15, 2025
@andygrove andygrove changed the title feat: Proof-of-concept of AQE cost-based optimization [WIP / Experimental] feat: Proof-of-concept of cost-based optimization [WIP / Experimental] Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add simple cost-based optimizer and pluggable cost model

3 participants