[AMD] Fix CI RuntimeError: opentelemetry package is not installed#23940
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates python/pyproject_other.toml to include sglang[tracing] in the all_hip dependency group. The reviewer suggests extending this change to other platform-specific groups (all_hpu, all_musa, all_mps) and the test extra to ensure consistency and prevent CI failures across different environments.
| ] | ||
|
|
||
| all_hip = ["sglang[srt_hip]", "sglang[diffusion_hip]"] | ||
| all_hip = ["sglang[srt_hip]", "sglang[diffusion_hip]", "sglang[tracing]"] |
There was a problem hiding this comment.
While adding sglang[tracing] to all_hip correctly addresses the issue for AMD ROCm CI, the same inconsistency exists for other platform-specific 'all' extras in this file (all_hpu, all_musa, all_mps). To ensure feature parity and prevent similar CI failures on these platforms, consider adding sglang[tracing] to them as well.
Additionally, since the failure occurred during tracing-related tests, it might be beneficial to include sglang[tracing] in the test extra (line 157) to ensure these dependencies are always available for testing environments, regardless of the platform-specific 'all' extra used.
|
@amd-bot ci-status |
CI Status for PR #23940PR: [AMD] Fix CI RuntimeError: opentelemetry package is not installed AMD: 8 failures (0 likely related) | Others: 0 failures The PR change is verified working — installer now logs
AMD CI Failures
DetailsThe PR's intended fix is verified working. In baseline scheduled run 25053383235, the The single 🟡 row is Bottom line: the PR fix is correct and minimally scoped. None of the 8 red jobs implicates the PR's diff. Recommend rerunning the AMD jobs (especially
|
|
Tests passed. Can be merged. |
…l-project#23940) Co-authored-by: Bingxu Chen <bingxche@amd.com>
…l-project#23940) Co-authored-by: Bingxu Chen <bingxche@amd.com>

Motivation
The AMD ROCm CI is failing on all tracing-related tests across multiple jobs:
multimodal-gen-test-1-gpu-amd:test_spans_exported,test_spans_without_traceparent,test_batch_requestsintest_tracing.pymultimodal-gen-test-2-gpu-amd:TestDisaggZImageTracing::test_disagg_spans_share_trace_idintest_disagg_server.pyAll fail with the same root cause:
This was introduced when OpenTelemetry tracing was added to the diffusion pipeline (#21254) and tracing CI tests were added (#21740), but the
tracingoptional dependency group was never wired into the ROCm install path.The CUDA
pyproject.tomlcorrectly includessglang[tracing]in itsallextra (line 169), but the ROCm-specificpyproject_other.tomldoes not include it inall_hip.Modifications
Single-line change in
python/pyproject_other.toml:Add
"sglang[tracing]"to theall_hipoptional dependency group:The
tracingextra was already defined in the same file (line 84-89) with the correct packages:opentelemetry-sdkopentelemetry-apiopentelemetry-exporter-otlpopentelemetry-exporter-otlp-proto-grpcThe ROCm Dockerfile (
docker/rocm.Dockerfile, line 256) installspython[all_hip], so this change automatically pulls in the tracing packages during image build without any Dockerfile modifications.Accuracy Tests
N/A - dependency-only change, no model or kernel code affected.
Speed Tests and Profiling
N/A - no runtime code changes.
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci