You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[core][stats-die/04] kill STATS in the common component (#58299)
This PR replace STATS with Metric as a way to define metric inside ray
(as a unification effort) in all common components. Normally, metrics
are defined at the top-level component and passed down to
sub-components. However, in this case, because the common component is
used as an API across, doing so would feel unnecessarily cumbersome. I
decided to define the metrics inline within each client and server class
instead.
Note that the metric classes (Metric, Gauge, Sum, etc.) are simply
wrappers around static OpenCensus/OpenTelemetry entities.
**Details**
Full context of this refactoring work.
- Each component (e.g., gcs, raylet, core_worker, etc.) now has a
metrics.h file located in its top-level directory. This file defines all
metrics for that component.
- In most cases, metrics are defined once in the main entry point of
each component (gcs/gcs_server_main.cc for GCS, raylet/main.cc for
Raylet, core_worker/core_worker_process.cc for the Core Worker, etc.).
These metrics are then passed down to subcomponents via the
ray::observability::MetricInterface.
- This approach significantly reduces rebuild time when metric
infrastructure changes. Previously, a change would trigger a full Ray
rebuild; now, only the top-level entry points of each component need
rebuilding.
- There are a few exceptions where metrics are tracked inside object
libraries (e.g., task_specification). In these cases, metrics are
defined within the library itself, since there is no corresponding
top-level entry point.
Test:
- CI
Signed-off-by: Cuong Nguyen <[email protected]>
0 commit comments