-
Notifications
You must be signed in to change notification settings - Fork 260
feat: Proof-of-concept of cost-based optimization [WIP / Experimental] #2906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@mbutrovich I expect that you may have some opinions on this one! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2906 +/- ##
============================================
+ Coverage 56.12% 59.08% +2.95%
- Complexity 976 1381 +405
============================================
Files 119 168 +49
Lines 11743 15448 +3705
Branches 2251 2578 +327
============================================
+ Hits 6591 9127 +2536
- Misses 4012 5032 +1020
- Partials 1140 1289 +149 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| operatorCount += 1 | ||
|
|
||
| // Recursively process children | ||
| node.children.foreach(collectOperatorCosts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could remove usage of vars and let the function return the totalAcceleration and operator count itself ?
Something like :
def countItems(list: List[Any], accumulator: Int = 0): Int = {
list match {
case head :: tail => countItems(tail, accumulator + 1)
case Nil => accumulator
}
}
val myList = List(1, 2, 3, 4)
val count = countItems(myList) // result: 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how much of this code will still exist by the time the proof-of-concept is working and ready for detailed code review, so I'll hold off from making these changes now.
I am really looking for high-level feedback on the general approach at the moment.
Which issue does this PR close?
Closes #2785
Rationale for this change
I would like to start experimenting with CBO and allow users to also experiment by providing their own cost models, tailored to their specific workloads.
What changes are included in this PR?
Add a starting point for development and experimentation.
This is intentional simple/crude and just estimates Comet acceleration per operator (we should use the microbenchmarks to determing sensible estimates) and then averages that over the plan. Statistics such as row count are not taken into account yet, but we can incorporate that later.
How are these changes tested?
Not at all yet. I will leave this as draft until it is farther along and has some tests, but I would like to get feedback early since this is a proof-of-concept.