-
Notifications
You must be signed in to change notification settings - Fork 4k
kv: Record a new AdmittedWorkDoneEvent #158020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
57f2021 to
0019e89
Compare
|
The idea for this PR is from this commit sumeerbhola@82402f7 |
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 4 of 13 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @angles-n-daemons and @DrewKimball)
yuzefovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzefovich reviewed 13 of 13 files at r1, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @angles-n-daemons and @DrewKimball)
pkg/kv/kvserver/kvadmission/kvadmission.go line 316 at r1 (raw file):
kvflowHandle, found := n.kvflowHandles.LookupReplicationAdmissionHandle(ba.RangeID) if !found { return Handle{}, nil
nit: should we populate cpuStart here as well, for completeness?
pkg/sql/execstats/traceanalyzer.go line 196 at r1 (raw file):
s.ContentionEvents = append(s.ContentionEvents, other.ContentionEvents...) s.RUEstimate += other.RUEstimate s.SQLCPUTime += other.SQLCPUTime
nit: would be good to move this line further up to KVCPUTime.
pkg/sql/opt/exec/explain/output.go line 454 at r1 (raw file):
// independent of platform because the grunning library isn't currently // supported on all platforms. func (ob *OutputBuilder) AddCPUTime(cpuTime time.Duration) {
nit: we should rename this method to AddSQLCPUTime.
angles-n-daemons
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks good on the obs front - just a couple questions about the cpuStart measurement.
| cpuStart time.Duration | ||
| // Used for measuring post-admission CPU, even for cases that were not | ||
| // subject to admission control. | ||
| cpuStart time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand better, can you spot check my understanding of this field?
There's some work from: kvBatchServer.Batch -> KV Admitted, which we want to omit from our KV work time reporting, this work is stored as cpuStart on the Handler, the time spent on the cpu by the goroutine up to the point where this field is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, i'm actually not too sure, but i've asked in KV [1] what the difference is between measuring cpu from the potential call sites (node.Batch , ReplicaSend , and finally this here AdmitKVWork -> AdmittedKVWorkDone)
[1] https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1763667326272419
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, nice thanks for the context
| ) (_ Handle, retErr error) { | ||
| if n.kvAdmissionQ == nil { | ||
| return Handle{}, nil | ||
| return Handle{cpuStart: grunning.Time()}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set cpuStart on line 316 as well (valid return condition)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now that we're using a defer.
|
Oh, how does this change interact with #156895? That change collects KV CPU time for the purposes of top-level query stats reporting (i.e. SQL execution stats) unconditionally whereas here we include things into the trace and include into EXPLAIN ANALYZE output when Structured recording is enabled? Can we not use the collected things from #156895 to do the same though? (BatchResponse fields maybe already show up in the trace (?), and we could use data from |
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @DrewKimball, and @yuzefovich)
pkg/kv/kvserver/kvadmission/kvadmission.go line 316 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: should we populate
cpuStarthere as well, for completeness?
good catch. We could lift the setting of cpuStart to a defer statement.
Nope, your understanding is correct! I'm asking in KV [1] if they'd want this Potentially we could close this PR and just put the [1] https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1763667326272419 |
0019e89 to
c246a3c
Compare
alyshanjahani-crl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @angles-n-daemons, @DrewKimball, @sumeerbhola, and @yuzefovich)
pkg/kv/kvserver/kvadmission/kvadmission.go line 316 at r1 (raw file):
Previously, sumeerbhola wrote…
good catch. We could lift the setting of
cpuStartto a defer statement.
Ack, now using a defer to set cpuStart when AdmitKVWork is returning successfully (nil error)
| ) (_ Handle, retErr error) { | ||
| if n.kvAdmissionQ == nil { | ||
| return Handle{}, nil | ||
| return Handle{cpuStart: grunning.Time()}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now that we're using a defer.
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
|
oh dang nice, AI review caught the bug after i modified to use defer, didn't catch the named return value! From the AI review output: This doesn't work as intended because: Handle is returned by value (not as a named return value - note the _ in the signature) |
This commit creates a new kvpb.AdmittedWorkDoneEvent which the kvadmission controller uses to record total CPU time consumed by KV. This is done via span.RecordStructured. The cpu time is added to the execstats.QueryLevelStats object and reported in explain analyze. Note that CPUTime has been renamed to SQLCPUTime to be more accurate and better distinguish the various timings recorded on this object. Fixes: https://cockroachlabs.atlassian.net/browse/CRDB-55927 Release note (sql change): KV cpu time is now included in traces and EXPLAIN ANALYZE.
c246a3c to
ec81dce
Compare
This commit creates a new kvpb.AdmittedWorkDoneEvent which the kvadmission controller uses to record total CPU time consumed by KV.
This is done via span.RecordStructured.
The cpu time is added to the execstats.QueryLevelStats object and reported in explain analyze. Note that CPUTime has been renamed to SQLCPUTime to be more accurate and better distinguish the various timings recorded on this object.
Fixes: https://cockroachlabs.atlassian.net/browse/CRDB-55927
Release note (sql change): KV cpu time is now included in traces and EXPLAIN ANALYZE.