Skip to content

feat: update ModelExpress metadata API to SourceIdentity-based schema#21222

Open
AndyDai-nv wants to merge 7 commits intosgl-project:mainfrom
AndyDai-nv:andy/mx-metadata-support
Open

feat: update ModelExpress metadata API to SourceIdentity-based schema#21222
AndyDai-nv wants to merge 7 commits intosgl-project:mainfrom
AndyDai-nv:andy/mx-metadata-support

Conversation

@AndyDai-nv
Copy link
Copy Markdown
Contributor

@AndyDai-nv AndyDai-nv commented Mar 23, 2026

Summary

Update the ModelExpress integration to match the new SourceIdentity-based metadata API. This builds on the initial MX integration merged in #19920, adapting sglang to the redesigned metadata schema where sources are keyed by a SHA256 hash of their identity (model name, framework, parallelism, dtype, quantization) rather than plain model names.

Related: #19920 -- initial ModelExpress integration for remote instance weight loading

Motivation

The ModelExpress server redesigned its P2P metadata API to support:

  • SourceIdentity-based keying: sources are identified by a deterministic hash of their full configuration (model, framework, TP/PP/EP size, dtype, quantization), preventing mismatches between incompatible instances
  • Per-worker publish/get: each GPU worker publishes and is queried independently via (mx_source_id, worker_id) instead of bulk model-level operations
  • Explicit status management: update_status() replaces the removed publish_ready(), with clear lifecycle states (Initializing → Ready → Stale)
  • Two-step discovery: list_sources() returns lightweight refs, get_metadata() fetches full tensor data only for the chosen worker

The old API (publish_metadata(model_name, workers), wait_for_ready(), get_metadata(model_name)) has been removed from the MX server.

Changes

  • load_config.py: Add modelexpress_tp_size, modelexpress_pp_size, modelexpress_ep_size, modelexpress_dtype, modelexpress_quantization fields for building SourceIdentity on the client side
  • model_runner.py (seed side):
    • Build SourceIdentity proto with model config (framework=SGLANG, TP/PP/EP size, dtype, quantization)
    • Generate UUID worker_id per worker instance
    • Call publish_metadata(identity, worker, worker_id) → returns mx_source_id
    • Call update_status(mx_source_id, worker_id, worker_rank, READY) replacing removed publish_ready()
  • loader.py (client side):
    • Build matching SourceIdentity from LoadConfig fields
    • Call list_sources(identity, status_filter=READY) to discover seed workers, with fail-fast gRPC error handling
    • Filter by worker_rank == tp_rank to find the matching peer
    • Call get_metadata(mx_source_id, worker_id) for the specific worker (returns single response.worker instead of a worker list)
    • Fix weight info mismatch for quantized models by calling process_weights_after_loading() before memory registration

API Migration

Old API (removed) New API
publish_metadata(model_name, [workers]) publish_metadata(identity, worker, worker_id) → mx_source_id
publish_ready(model_name, worker_id, ...) update_status(mx_source_id, worker_id, worker_rank, READY)
wait_for_ready(model_name, worker_id) list_sources(identity, status_filter=READY) + filter by rank
get_metadata(model_name) → response.workers get_metadata(mx_source_id, worker_id) → response.worker

CLI

Unchanged from #19920:

Seed:

python -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3 --tp 8 --port 30000 \
  --load-format auto \
  --remote-instance-weight-loader-start-seed-via-transfer-engine \
  --modelexpress-config '{"url": "localhost:8001", "source": true}'

Target:

python -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3 --tp 8 --port 30001 \
  --load-format remote_instance \
  --remote-instance-weight-loader-backend modelexpress \
  --modelexpress-config '{"url": "localhost:8001"}'

Test plan

  • Qwen2.5-7B (BF16, TP=1): seed publish + client discovery + RDMA transfer
  • DeepSeek-V3 (FP8, TP=8): multi-GPU weight transfer via MX coordination
  • Backward compat: existing NCCL and TransferEngine backends unaffected

Backward compatibility

Fully backward compatible. All changes are within the modelexpress code path -- existing NCCL and TransferEngine+HTTP paths are untouched.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ModelExpress metadata API to provide more comprehensive and robust information for RDMA transfers. By incorporating detailed SourceIdentity attributes and unique worker identifiers, it improves the ability of clients to accurately discover and coordinate with model instances in a distributed setup, leading to more reliable model loading and operation.

Highlights

  • Enhanced ModelExpress Metadata: Introduced new configuration fields and updated the ModelExpress metadata API to include detailed SourceIdentity information, such as tensor, pipeline, and expert parallel sizes, data type, and quantization.
  • Robust Seed Discovery: Refactored the client-side ModelExpress metadata loading process to use the new SourceIdentity for more precise matching and discovery of seed instances, improving robustness in distributed environments.
  • Unique Worker Identification: Implemented the generation and publishing of a unique worker_id for each running instance, enabling better tracking and management of individual model workers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors ModelExpress integration by introducing SourceIdentity for more structured metadata publishing and discovery. This includes adding new configuration fields for parallelism, data types, and quantization, and updating both the seed's metadata publication and the client's discovery mechanism to use this identity and unique worker IDs. Review comments suggest improving code organization by moving an import uuid statement to the top of its file and enhancing error handling by including the specific grpc.RpcError message in the raised RuntimeError for better context.

@AndyDai-nv AndyDai-nv changed the title feat: support robust ModelExpress metadata API for RDMA transfer feat: update ModelExpress metadata API to SourceIdentity-based schema Apr 6, 2026
@AndyDai-nv AndyDai-nv marked this pull request as ready for review April 6, 2026 17:43
@AndyDai-nv
Copy link
Copy Markdown
Contributor Author

@amysaq2023 @Fridge003 @ishandhanani Could you review this PR when you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant