-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes a CPUUtilization metrics calculation bug for Windows clusters. #1219
Conversation
Hey, thanks for opening this PR.
Can you describe the bug and how you were running into it? Were you seeing issues with cluster metrics or with service metrics? |
Hi @samuelkarp. You can see how the values don't make any sense. Here's a snapshot of the 'service' after applying the change in this PR which is now accurately showing the percentage. This snapshot properly shows the CPU bursting and various autoscalings happening at > 80% CPUUtilization. |
Another snapshot demonstrating both the cluster and the service with incorrect metrics on 1.16.1: ...and one more after the fix in this PR illustrating correctly reporting metrics for cluster and service using autoscaling on a cluster of r4.2xlarge instances. |
@bboerst I verified this is actually a bug for windows, thanks for the fix. Can you confirm that this contribution is under the terms of the Apache 2.0 license? Thanks, |
@richardpen Confirmed. This contribution is submitted by me under the terms of the Apache 2.0 license. |
b402122
to
b6ac749
Compare
#1224 has been merged, rebasing isn't automatically closing this one. So manually closing this one. |
Summary
This PR fixes a calculation error witnessed while collecting CPUUsage.TotalUsage on Windows. The existing calculation resulted in an unusable value being sent back to AWS as 'CPUUtilization' and was impossible to derive scaling policies off of it. Converting this value to a percentage during stat collection now results in accurate metrics.
Implementation details
cpuUsage is converted to a percentage (* 100) then divided by numCores inside of the dockerStatsToContainerStats function.
Testing
This change was compiled and tested on a Windows ECS cluster running a number of different instance type configurations comprising of multiple vCPU counts in order to validate the new calculation.
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes: no
Description for the changelog
Licensing
This contribution is under the terms of the Apache 2.0 License: yes