-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit cadence worker's hardware utilization inside worker once per host #1260
Conversation
internal/internal_worker_base.go
Outdated
@@ -209,6 +215,9 @@ func (bw *baseWorker) Start() { | |||
bw.shutdownWG.Add(1) | |||
go bw.runTaskDispatcher() | |||
|
|||
bw.shutdownWG.Add(1) | |||
bw.EmitHardwareUsage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's the best practice of using waitgroup and go routine (since the go routine is started inside this function) but the test is not reporting and goleak.
internal/internal_worker_base.go
Outdated
@@ -27,7 +27,9 @@ import ( | |||
"context" | |||
"errors" | |||
"fmt" | |||
"github.com/shirou/gopsutil/cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standart imports should be separated from external. I think goimports should do the trick
internal/internal_worker_base.go
Outdated
@@ -54,6 +57,8 @@ var ( | |||
|
|||
var errShutdown = errors.New("worker shutting down") | |||
|
|||
var emitOnce sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest extracting this into WorkerResult as a separate dependency, instead of trying to fit into the workers with sync.Once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by WorkerResult? Are you referring to the baseWorker struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well, I thought we provided fx.Module for this as we do in the monorepo. There is a NewCadenceWorkers that returns a single result that can be a good place to have this component.
If we want to care about monorepo - then it should be part of monorepo.
If we want to have this in OSS, then, I think, it should be a separated component that is optional and could be used by developers as an extra option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure that we want to have this in OSS, since a developer could use something like https://github.com/prometheus/client_golang/blob/main/prometheus/go_collector.go#L216
So it will expose data for collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do it in OSS, we cannot use go_collector that only works for prometheus. Cadence use tally on top of underlying metrics package.
I'm not sure I understand what you mean by worker utilization. If we track cpu/memory, then it should be dominated by a workload that is unrelated to the worker itself. |
internal/internal_worker_base.go
Outdated
@@ -54,6 +57,8 @@ var ( | |||
|
|||
var errShutdown = errors.New("worker shutting down") | |||
|
|||
var emitOnce sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do it in OSS, we cannot use go_collector that only works for prometheus. Cadence use tally on top of underlying metrics package.
internal/internal_worker_base.go
Outdated
cpuCores, _ := cpu.Counts(false) | ||
|
||
scope.Gauge(metrics.NumCPUCores).Update(float64(cpuCores)) | ||
scope.Gauge(metrics.CPUPercentage).Update(cpuPercent[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might still have a nil pointer if cpu.Percent(0, false) returns error. Either need to handle error or do a length check.
internal/internal_worker_base.go
Outdated
@@ -209,6 +215,12 @@ func (bw *baseWorker) Start() { | |||
bw.shutdownWG.Add(1) | |||
go bw.runTaskDispatcher() | |||
|
|||
// We want the emit function run once per host instead of run once per worker | |||
emitOnce.Do(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about move emitOnce to bw.emitHardwareUsage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried that, but that results in go routine leak unfortunately
What changed?
We can now emit cadence worker's hardware utilization while polling for decision/activity task
Why?
This enable us to monitor worker's utilization
How did you test it?
Tested on dev environment
Potential risks
No risk since it does not affect overall logic and are just added read-only functions