-
Notifications
You must be signed in to change notification settings - Fork 7k
[core][stats-die/03bis] improve scheduler_placement_time_s metric #58217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f8a9b04 to
6805be5
Compare
b075c33 to
b950cff
Compare
6805be5 to
b2c67f5
Compare
src/ray/common/metrics.h
Outdated
| "resolved to when it actually reserves resources on a node to run.", | ||
| /*unit=*/"s", | ||
| /*unit=*/"ms", | ||
| /*boundaries=*/{0.1, 1, 10, 100, 1000, 10000}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these seem like reasonable frames? 0.1 and 1ms seem unlikely right? This timer probably encapsulates a few RPC's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right — 0.1 ms is unlikely. Based on the current data, most data points are under 100 ms, but we don’t have much visibility below that range. So, we might want to go two levels deeper, just in case, say, 10 and 1 ms. If there turn out to be no data points below 1 ms, that’s actually helpful information, since it tells us 1 ms is the lower boundary.
The upper bound of 10 s looks good, the data already shows that it’s the upper boundary. There are no data points above 10 s, which in itself is useful information.
b2c67f5 to
6223c2b
Compare
Signed-off-by: Cuong Nguyen <[email protected]>
6223c2b to
b5708bd
Compare
…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>
…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>
…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: YK <[email protected]>
…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>
Change the unit of
scheduler_placement_timefrom seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out.Test: