[core][stats-die/03bis] improve scheduler_placement_time_s metric #58217

can-anyscale · 2025-10-27T17:29:48Z

Change the unit of scheduler_placement_time from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out.

Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice.

505491038-c5d81017-b86c-406f-acf4-614560752062

Test:

CI

ZacAttack · 2025-10-29T17:27:07Z

src/ray/common/metrics.h

      "resolved to when it actually reserves resources on a node to run.",
-      /*unit=*/"s",
+      /*unit=*/"ms",
      /*boundaries=*/{0.1, 1, 10, 100, 1000, 10000},


Do these seem like reasonable frames? 0.1 and 1ms seem unlikely right? This timer probably encapsulates a few RPC's?

You're right — 0.1 ms is unlikely. Based on the current data, most data points are under 100 ms, but we don’t have much visibility below that range. So, we might want to go two levels deeper, just in case, say, 10 and 1 ms. If there turn out to be no data points below 1 ms, that’s actually helpful information, since it tells us 1 ms is the lower boundary.

The upper bound of 10 s looks good, the data already shows that it’s the upper boundary. There are no data points above 10 s, which in itself is useful information.

Signed-off-by: Cuong Nguyen <[email protected]>

@ZacAttack

…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

@ZacAttack

…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

@ZacAttack

…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>

@ZacAttack

…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: YK <[email protected]>

@ZacAttack

…y-project#58217) Change the unit of `scheduler_placement_time` from seconds to mili-seconds. The current bucket is in the range of 0.1s to 2.5 hours which doesn't make sense. According to a sample of data, the range we are interested in would be from us to s. Thanks @ZacAttack for pointing this out. ``` Note: This is an internal (non–public-facing) metric, so we only need to update its usage within Ray (e.g., the dashboard). A simple code change should suffice. ``` <img width="1609" height="421" alt="505491038-c5d81017-b86c-406f-acf4-614560752062" src="https://github.com/user-attachments/assets/cc647b97-42ec-42eb-bf01-4d1867940207" /> Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

can-anyscale marked this pull request as ready for review October 27, 2025 17:34

can-anyscale requested a review from a team as a code owner October 27, 2025 17:34

This comment was marked as outdated.

Sign in to view

ray-gardener bot added core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Oct 27, 2025

edoakes assigned ZacAttack Oct 27, 2025

can-anyscale force-pushed the can-statdie03-bis branch from f8a9b04 to 6805be5 Compare October 27, 2025 22:22

can-anyscale mentioned this pull request Oct 27, 2025

[core][stats-die/03] kill STATS in core worker component #58060

Merged

can-anyscale force-pushed the can-statdie03 branch from b075c33 to b950cff Compare October 28, 2025 19:33

can-anyscale force-pushed the can-statdie03-bis branch from 6805be5 to b2c67f5 Compare October 28, 2025 19:33

ZacAttack reviewed Oct 29, 2025

View reviewed changes

can-anyscale force-pushed the can-statdie03-bis branch from b2c67f5 to 6223c2b Compare October 29, 2025 18:46

can-anyscale requested a review from ZacAttack October 29, 2025 18:46

Base automatically changed from can-statdie03 to master October 29, 2025 21:26

[core][stats-die/03bis] improve scheduler_placement_time_s metric

b5708bd

Signed-off-by: Cuong Nguyen <[email protected]>

can-anyscale force-pushed the can-statdie03-bis branch from 6223c2b to b5708bd Compare October 29, 2025 21:26

can-anyscale added the go add ONLY when ready to merge, run all tests label Oct 29, 2025

Merge branch 'master' into can-statdie03-bis

1a2b573

ZacAttack approved these changes Nov 7, 2025

View reviewed changes

can-anyscale merged commit 50ffca4 into master Nov 7, 2025
6 checks passed

can-anyscale deleted the can-statdie03-bis branch November 7, 2025 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core][stats-die/03bis] improve scheduler_placement_time_s metric #58217

[core][stats-die/03bis] improve scheduler_placement_time_s metric #58217

Uh oh!

can-anyscale commented Oct 27, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

ZacAttack Oct 29, 2025

Uh oh!

can-anyscale Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[core][stats-die/03bis] improve scheduler_placement_time_s metric #58217

[core][stats-die/03bis] improve scheduler_placement_time_s metric #58217

Uh oh!

Conversation

can-anyscale commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

ZacAttack Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

can-anyscale Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

can-anyscale commented Oct 27, 2025 •

edited

Loading