Skip to content

Conversation

@can-anyscale
Copy link
Contributor

This PR replace STATS with Metric as a way to define metric inside ray (as a unification effort) in all object-manager components. Normally, metrics are defined at the top-level component and passed down to sub-components. However, in this case, because object manager is used as an API across, doing so would feel unnecessarily cumbersome. I decided to define the metrics inline within each client and server class instead.

Note that the metric classes (Metric, Gauge, Sum, etc.) are simply wrappers around static OpenCensus/OpenTelemetry entities.

Details
Full context of this refactoring work.

  • Each component (e.g., gcs, raylet, core_worker, etc.) now has a metrics.h file located in its top-level directory. This file defines all metrics for that component.
  • In most cases, metrics are defined once in the main entry point of each component (gcs/gcs_server_main.cc for GCS, raylet/main.cc for Raylet, core_worker/core_worker_process.cc for the Core Worker, etc.). These metrics are then passed down to subcomponents via the ray::observability::MetricInterface.
  • This approach significantly reduces rebuild time when metric infrastructure changes. Previously, a change would trigger a full Ray rebuild; now, only the top-level entry points of each component need rebuilding.
  • There are a few exceptions where metrics are tracked inside object libraries (e.g., task_specification). In these cases, metrics are defined within the library itself, since there is no corresponding top-level entry point.
  • Finally, the obsolete metric_defs.h and metric_defs.cc files can now be completely removed. This paves the way for further dead code cleanup in a future PR.

Test:

  • CI

@can-anyscale can-anyscale force-pushed the can-statdie02 branch 2 times, most recently from f3e327b to cccbc85 Compare October 21, 2025 23:49
@can-anyscale can-anyscale added the go add ONLY when ready to merge, run all tests label Oct 22, 2025
@can-anyscale can-anyscale marked this pull request as ready for review October 22, 2025 14:34
@can-anyscale can-anyscale requested a review from a team as a code owner October 22, 2025 14:34
cursor[bot]

This comment was marked as outdated.

@can-anyscale can-anyscale merged commit 2489adb into master Oct 22, 2025
6 checks passed
@can-anyscale can-anyscale deleted the can-statdie02 branch October 22, 2025 18:34
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ect#57974)

This PR replace STATS with Metric as a way to define metric inside ray
(as a unification effort) in all object-manager components. Normally,
metrics are defined at the top-level component and passed down to
sub-components. However, in this case, because object manager is used as
an API across, doing so would feel unnecessarily cumbersome. I decided
to define the metrics inline within each client and server class
instead.

Note that the metric classes (Metric, Gauge, Sum, etc.) are simply
wrappers around static OpenCensus/OpenTelemetry entities.


**Details**
Full context of this refactoring work.
- Each component (e.g., gcs, raylet, core_worker, etc.) now has a
metrics.h file located in its top-level directory. This file defines all
metrics for that component.
- In most cases, metrics are defined once in the main entry point of
each component (gcs/gcs_server_main.cc for GCS, raylet/main.cc for
Raylet, core_worker/core_worker_process.cc for the Core Worker, etc.).
These metrics are then passed down to subcomponents via the
ray::observability::MetricInterface.
- This approach significantly reduces rebuild time when metric
infrastructure changes. Previously, a change would trigger a full Ray
rebuild; now, only the top-level entry points of each component need
rebuilding.
- There are a few exceptions where metrics are tracked inside object
libraries (e.g., task_specification). In these cases, metrics are
defined within the library itself, since there is no corresponding
top-level entry point.
- Finally, the obsolete metric_defs.h and metric_defs.cc files can now
be completely removed. This paves the way for further dead code cleanup
in a future PR.

Test:
- CI

Signed-off-by: Cuong Nguyen <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ect#57974)

This PR replace STATS with Metric as a way to define metric inside ray
(as a unification effort) in all object-manager components. Normally,
metrics are defined at the top-level component and passed down to
sub-components. However, in this case, because object manager is used as
an API across, doing so would feel unnecessarily cumbersome. I decided
to define the metrics inline within each client and server class
instead.

Note that the metric classes (Metric, Gauge, Sum, etc.) are simply
wrappers around static OpenCensus/OpenTelemetry entities.

**Details**
Full context of this refactoring work.
- Each component (e.g., gcs, raylet, core_worker, etc.) now has a
metrics.h file located in its top-level directory. This file defines all
metrics for that component.
- In most cases, metrics are defined once in the main entry point of
each component (gcs/gcs_server_main.cc for GCS, raylet/main.cc for
Raylet, core_worker/core_worker_process.cc for the Core Worker, etc.).
These metrics are then passed down to subcomponents via the
ray::observability::MetricInterface.
- This approach significantly reduces rebuild time when metric
infrastructure changes. Previously, a change would trigger a full Ray
rebuild; now, only the top-level entry points of each component need
rebuilding.
- There are a few exceptions where metrics are tracked inside object
libraries (e.g., task_specification). In these cases, metrics are
defined within the library itself, since there is no corresponding
top-level entry point.
- Finally, the obsolete metric_defs.h and metric_defs.cc files can now
be completely removed. This paves the way for further dead code cleanup
in a future PR.

Test:
- CI

Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ect#57974)

This PR replace STATS with Metric as a way to define metric inside ray
(as a unification effort) in all object-manager components. Normally,
metrics are defined at the top-level component and passed down to
sub-components. However, in this case, because object manager is used as
an API across, doing so would feel unnecessarily cumbersome. I decided
to define the metrics inline within each client and server class
instead.

Note that the metric classes (Metric, Gauge, Sum, etc.) are simply
wrappers around static OpenCensus/OpenTelemetry entities.

**Details**
Full context of this refactoring work.
- Each component (e.g., gcs, raylet, core_worker, etc.) now has a
metrics.h file located in its top-level directory. This file defines all
metrics for that component.
- In most cases, metrics are defined once in the main entry point of
each component (gcs/gcs_server_main.cc for GCS, raylet/main.cc for
Raylet, core_worker/core_worker_process.cc for the Core Worker, etc.).
These metrics are then passed down to subcomponents via the
ray::observability::MetricInterface.
- This approach significantly reduces rebuild time when metric
infrastructure changes. Previously, a change would trigger a full Ray
rebuild; now, only the top-level entry points of each component need
rebuilding.
- There are a few exceptions where metrics are tracked inside object
libraries (e.g., task_specification). In these cases, metrics are
defined within the library itself, since there is no corresponding
top-level entry point.
- Finally, the obsolete metric_defs.h and metric_defs.cc files can now
be completely removed. This paves the way for further dead code cleanup
in a future PR.

Test:
- CI

Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants