feat: update ModelExpress metadata API to SourceIdentity-based schema by AndyDai-nv · Pull Request #21222 · sgl-project/sglang

AndyDai-nv · 2026-03-23T18:58:17Z

Summary

Update the ModelExpress integration to match the new SourceIdentity-based metadata API. This builds on the initial MX integration merged in #19920, adapting sglang to the redesigned metadata schema where sources are keyed by a SHA256 hash of their identity (model name, framework, parallelism, dtype, quantization) rather than plain model names.

Related: #19920 -- initial ModelExpress integration for remote instance weight loading

Motivation

The ModelExpress server redesigned its P2P metadata API to support:

SourceIdentity-based keying: sources are identified by a deterministic hash of their full configuration (model, framework, TP/PP/EP size, dtype, quantization), preventing mismatches between incompatible instances
Per-worker publish/get: each GPU worker publishes and is queried independently via (mx_source_id, worker_id) instead of bulk model-level operations
Explicit status management: update_status() replaces the removed publish_ready(), with clear lifecycle states (Initializing → Ready → Stale)
Two-step discovery: list_sources() returns lightweight refs, get_metadata() fetches full tensor data only for the chosen worker

The old API (publish_metadata(model_name, workers), wait_for_ready(), get_metadata(model_name)) has been removed from the MX server.

Changes

load_config.py: Add modelexpress_tp_size, modelexpress_pp_size, modelexpress_ep_size, modelexpress_dtype, modelexpress_quantization fields for building SourceIdentity on the client side
model_runner.py (seed side):
- Build SourceIdentity proto with model config (framework=SGLANG, TP/PP/EP size, dtype, quantization)
- Generate UUID worker_id per worker instance
- Call publish_metadata(identity, worker, worker_id) → returns mx_source_id
- Call update_status(mx_source_id, worker_id, worker_rank, READY) replacing removed publish_ready()
loader.py (client side):
- Build matching SourceIdentity from LoadConfig fields
- Call list_sources(identity, status_filter=READY) to discover seed workers, with fail-fast gRPC error handling
- Filter by worker_rank == tp_rank to find the matching peer
- Call get_metadata(mx_source_id, worker_id) for the specific worker (returns single response.worker instead of a worker list)
- Fix weight info mismatch for quantized models by calling process_weights_after_loading() before memory registration

API Migration

Old API (removed)	New API
`publish_metadata(model_name, [workers])`	`publish_metadata(identity, worker, worker_id) → mx_source_id`
`publish_ready(model_name, worker_id, ...)`	`update_status(mx_source_id, worker_id, worker_rank, READY)`
`wait_for_ready(model_name, worker_id)`	`list_sources(identity, status_filter=READY)` + filter by rank
`get_metadata(model_name) → response.workers`	`get_metadata(mx_source_id, worker_id) → response.worker`

CLI

Unchanged from #19920:

Seed:

python -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3 --tp 8 --port 30000 \
  --load-format auto \
  --remote-instance-weight-loader-start-seed-via-transfer-engine \
  --modelexpress-config '{"url": "localhost:8001", "source": true}'

Target:

python -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3 --tp 8 --port 30001 \
  --load-format remote_instance \
  --remote-instance-weight-loader-backend modelexpress \
  --modelexpress-config '{"url": "localhost:8001"}'

Test plan

Qwen2.5-7B (BF16, TP=1): seed publish + client discovery + RDMA transfer
DeepSeek-V3 (FP8, TP=8): multi-GPU weight transfer via MX coordination
Backward compat: existing NCCL and TransferEngine backends unaffected

Backward compatibility

Fully backward compatible. All changes are within the modelexpress code path -- existing NCCL and TransferEngine+HTTP paths are untouched.

…hema

…erver health check

gemini-code-assist · 2026-03-23T18:58:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ModelExpress metadata API to provide more comprehensive and robust information for RDMA transfers. By incorporating detailed SourceIdentity attributes and unique worker identifiers, it improves the ability of clients to accurately discover and coordinate with model instances in a distributed setup, leading to more reliable model loading and operation.

Highlights

Enhanced ModelExpress Metadata: Introduced new configuration fields and updated the ModelExpress metadata API to include detailed SourceIdentity information, such as tensor, pipeline, and expert parallel sizes, data type, and quantization.
Robust Seed Discovery: Refactored the client-side ModelExpress metadata loading process to use the new SourceIdentity for more precise matching and discovery of seed instances, improving robustness in distributed environments.
Unique Worker Identification: Implemented the generation and publishing of a unique worker_id for each running instance, enabling better tracking and management of individual model workers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request refactors ModelExpress integration by introducing SourceIdentity for more structured metadata publishing and discovery. This includes adding new configuration fields for parallelism, data types, and quantization, and updating both the seed's metadata publication and the client's discovery mechanism to use this identity and unique worker IDs. Review comments suggest improving code organization by moving an import uuid statement to the top of its file and enhancing error handling by including the specific grpc.RpcError message in the raised RuntimeError for better context.

python/sglang/srt/model_executor/model_runner.py

python/sglang/srt/model_loader/loader.py

AndyDai-nv · 2026-04-06T17:45:03Z

@amysaq2023 @Fridge003 @ishandhanani Could you review this PR when you get a chance?

AndyDai-nv added 3 commits March 23, 2026 11:24

Update ModelExpress metadata API to match new SourceIdentity-based sc…

c01d0bc

…hema

Use Optional defaults for LoadConfig modelexpress fields and add MX s…

fae4743

…erver health check

Replace polling loop with single list_sources call and fail-fast error

a00a9d7

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved

python/sglang/srt/model_loader/loader.py Show resolved Hide resolved

AndyDai-nv added 2 commits March 31, 2026 09:46

Move uuid import to module level per review feedback

4fe3b08

fix weight info mismatch bug

2872fad

AndyDai-nv changed the title ~~feat: support robust ModelExpress metadata API for RDMA transfer~~ feat: update ModelExpress metadata API to SourceIdentity-based schema Apr 6, 2026

AndyDai-nv added 2 commits April 6, 2026 10:38

Improve gRPC error message with code and details

a22ffb3

Merge branch 'main' into andy/mx-metadata-support

7c48400

AndyDai-nv marked this pull request as ready for review April 6, 2026 17:43

AndyDai-nv requested review from Fridge003, Ying1123, hnyls2002, ispobock and merrymercy as code owners April 6, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update ModelExpress metadata API to SourceIdentity-based schema#21222

feat: update ModelExpress metadata API to SourceIdentity-based schema#21222
AndyDai-nv wants to merge 7 commits intosgl-project:mainfrom
AndyDai-nv:andy/mx-metadata-support

AndyDai-nv commented Mar 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

AndyDai-nv commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AndyDai-nv commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

API Migration

CLI

Test plan

Backward compatibility

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

AndyDai-nv commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AndyDai-nv commented Mar 23, 2026 •

edited

Loading