Skip to content

Initial distributed tracing instrumentation#48

Merged
vMaroon merged 8 commits into
llm-d:mainfrom
sallyom:tracing
Feb 25, 2026
Merged

Initial distributed tracing instrumentation#48
vMaroon merged 8 commits into
llm-d:mainfrom
sallyom:tracing

Conversation

@sallyom
Copy link
Copy Markdown
Contributor

@sallyom sallyom commented Jul 9, 2025

Initial distributed tracing instrumentation

@sallyom sallyom force-pushed the tracing branch 2 times, most recently from 3342ece to 74dac2d Compare July 30, 2025 00:53
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/kvcache/indexer.go
func (k *Indexer) GetPodScores(ctx context.Context, renderReq *preprocessing.ApplyChatTemplateRequest, prompt, modelName string,
podIdentifiers []string,
) (map[string]float64, error) {
// Start tracing span for main operation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to get tracing encapsulated. Having some sparse inline tracing code is fine, but the bulk should be encapsulated. I would argue for keeping the inline bits, and wrap the indexer and scorer with tracing either in this PR or in a follow up. It's much cleaner that way without needing another refacor

@github-actions
Copy link
Copy Markdown

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Comment thread pkg/kvcache/indexer.go Outdated
sallyom and others added 7 commits February 24, 2026 19:36
- Use global otel.Tracer() in library code for scheduler integration
- Add telemetry initialization for standalone examples
- Add gRPC interceptors for trace context propagation

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
…ationName

Pass ctx through the Score interface so traced scorer spans are children
of the parent get_scores span instead of orphaned traces. Export a single
telemetry.InstrumentationName constant used by all tracing wrappers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: sallyom <somalley@redhat.com>
liu-cong
liu-cong previously approved these changes Feb 25, 2026
Copy link
Copy Markdown
Member

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/approve

I left a couple of nits for the naming

Comment thread pkg/kvcache/kvblock/traced_index.go Outdated
Comment thread pkg/kvcache/kvblock/traced_index.go Outdated
Signed-off-by: greg pereira <grpereir@redhat.com>
@vMaroon
Copy link
Copy Markdown
Member

vMaroon commented Feb 25, 2026

/lgtm
/approve

@github-actions github-actions Bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Feb 25, 2026
@vMaroon vMaroon merged commit 4130fbb into llm-d:main Feb 25, 2026
12 of 15 checks passed
zdtsw pushed a commit to zdtsw-forking/llm-d-kv-cache that referenced this pull request Mar 3, 2026
* Add OpenTelemetry manual instrumentation for distributed tracing

- Use global otel.Tracer() in library code for scheduler integration
- Add telemetry initialization for standalone examples
- Add gRPC interceptors for trace context propagation

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: sallyom <somalley@redhat.com>

* fix: resolve lint errors in tracing instrumentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix: use url.Parse in stripScheme for robustness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* tracing: use block-level hit ratio instead of pod-level

Signed-off-by: sallyom <somalley@redhat.com>

* encapsulate the tracing logic

Signed-off-by: greg pereira <grpereir@redhat.com>

* Add context.Context to KVBlockScorer.Score and consolidate InstrumentationName

Pass ctx through the Score interface so traced scorer spans are children
of the parent get_scores span instead of orphaned traces. Export a single
telemetry.InstrumentationName constant used by all tracing wrappers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix lint errors

Signed-off-by: sallyom <somalley@redhat.com>

* cong feedback about span naming conventions

Signed-off-by: greg pereira <grpereir@redhat.com>

---------

Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
alexxfan pushed a commit to red-hat-data-services/llm-d-kv-cache that referenced this pull request Mar 3, 2026
* Add OpenTelemetry manual instrumentation for distributed tracing

- Use global otel.Tracer() in library code for scheduler integration
- Add telemetry initialization for standalone examples
- Add gRPC interceptors for trace context propagation

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: sallyom <somalley@redhat.com>

* fix: resolve lint errors in tracing instrumentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix: use url.Parse in stripScheme for robustness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* tracing: use block-level hit ratio instead of pod-level

Signed-off-by: sallyom <somalley@redhat.com>

* encapsulate the tracing logic

Signed-off-by: greg pereira <grpereir@redhat.com>

* Add context.Context to KVBlockScorer.Score and consolidate InstrumentationName

Pass ctx through the Score interface so traced scorer spans are children
of the parent get_scores span instead of orphaned traces. Export a single
telemetry.InstrumentationName constant used by all tracing wrappers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix lint errors

Signed-off-by: sallyom <somalley@redhat.com>

* cong feedback about span naming conventions

Signed-off-by: greg pereira <grpereir@redhat.com>

---------

Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
guygir pushed a commit to guygir/llm-d-kv-cache-manager that referenced this pull request Apr 20, 2026
fix: nodeport service overwritten
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Looks good to me, indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants