feat: update ModelExpress metadata API to SourceIdentity-based schema#21222
feat: update ModelExpress metadata API to SourceIdentity-based schema#21222AndyDai-nv wants to merge 7 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the ModelExpress metadata API to provide more comprehensive and robust information for RDMA transfers. By incorporating detailed Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request refactors ModelExpress integration by introducing SourceIdentity for more structured metadata publishing and discovery. This includes adding new configuration fields for parallelism, data types, and quantization, and updating both the seed's metadata publication and the client's discovery mechanism to use this identity and unique worker IDs. Review comments suggest improving code organization by moving an import uuid statement to the top of its file and enhancing error handling by including the specific grpc.RpcError message in the raised RuntimeError for better context.
|
@amysaq2023 @Fridge003 @ishandhanani Could you review this PR when you get a chance? |
Summary
Update the ModelExpress integration to match the new SourceIdentity-based metadata API. This builds on the initial MX integration merged in #19920, adapting sglang to the redesigned metadata schema where sources are keyed by a SHA256 hash of their identity (model name, framework, parallelism, dtype, quantization) rather than plain model names.
Related: #19920 -- initial ModelExpress integration for remote instance weight loading
Motivation
The ModelExpress server redesigned its P2P metadata API to support:
(mx_source_id, worker_id)instead of bulk model-level operationsupdate_status()replaces the removedpublish_ready(), with clear lifecycle states (Initializing → Ready → Stale)list_sources()returns lightweight refs,get_metadata()fetches full tensor data only for the chosen workerThe old API (
publish_metadata(model_name, workers),wait_for_ready(),get_metadata(model_name)) has been removed from the MX server.Changes
load_config.py: Addmodelexpress_tp_size,modelexpress_pp_size,modelexpress_ep_size,modelexpress_dtype,modelexpress_quantizationfields for buildingSourceIdentityon the client sidemodel_runner.py(seed side):SourceIdentityproto with model config (framework=SGLANG, TP/PP/EP size, dtype, quantization)worker_idper worker instancepublish_metadata(identity, worker, worker_id)→ returnsmx_source_idupdate_status(mx_source_id, worker_id, worker_rank, READY)replacing removedpublish_ready()loader.py(client side):SourceIdentityfromLoadConfigfieldslist_sources(identity, status_filter=READY)to discover seed workers, with fail-fast gRPC error handlingworker_rank == tp_rankto find the matching peerget_metadata(mx_source_id, worker_id)for the specific worker (returns singleresponse.workerinstead of a worker list)process_weights_after_loading()before memory registrationAPI Migration
publish_metadata(model_name, [workers])publish_metadata(identity, worker, worker_id) → mx_source_idpublish_ready(model_name, worker_id, ...)update_status(mx_source_id, worker_id, worker_rank, READY)wait_for_ready(model_name, worker_id)list_sources(identity, status_filter=READY)+ filter by rankget_metadata(model_name) → response.workersget_metadata(mx_source_id, worker_id) → response.workerCLI
Unchanged from #19920:
Seed:
python -m sglang.launch_server \ --model-path deepseek-ai/DeepSeek-V3 --tp 8 --port 30000 \ --load-format auto \ --remote-instance-weight-loader-start-seed-via-transfer-engine \ --modelexpress-config '{"url": "localhost:8001", "source": true}'Target:
python -m sglang.launch_server \ --model-path deepseek-ai/DeepSeek-V3 --tp 8 --port 30001 \ --load-format remote_instance \ --remote-instance-weight-loader-backend modelexpress \ --modelexpress-config '{"url": "localhost:8001"}'Test plan
Backward compatibility
Fully backward compatible. All changes are within the
modelexpresscode path -- existing NCCL and TransferEngine+HTTP paths are untouched.