Skip to content

expose TransferQueueClient#3

Merged
0oshowero0 merged 1 commit intoTransferQueue:mainfrom
ji-huazhong:main
Sep 24, 2025
Merged

expose TransferQueueClient#3
0oshowero0 merged 1 commit intoTransferQueue:mainfrom
ji-huazhong:main

Conversation

@ji-huazhong
Copy link
Collaborator

What does this PR do?

As per title.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@0oshowero0 0oshowero0 requested a review from Copilot September 24, 2025 03:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR exposes TransferQueueClient classes by adding a complete client implementation for the transfer queue system. The primary purpose is to provide both synchronous and asynchronous client interfaces for interacting with transfer queue controllers and storage units via ZMQ messaging.

Key changes:

  • Adds comprehensive client implementation with async operations (get_meta, put, get_data, clear)
  • Provides synchronous wrapper for easier integration
  • Implements dynamic socket management with automatic connection handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
verl/experimental/transfer_queue/client.py New file containing AsyncTransferQueueClient and TransferQueueClient classes with ZMQ-based communication
verl/experimental/transfer_queue/storage.py Copyright year update from 2024 to 2025
verl/experimental/transfer_queue/init.py Copyright year update from 2024 to 2025

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

logger.info(f"[{self.client_id}]: get data response from storage unit {target_storage}: {response_msg}")

if response_msg.request_type == ZMQRequestType.GET_DATA_RESPONSE:
# 返回该存储单元的数据和索引信息
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese text. Should be translated to English: '# Return data and index information from this storage unit'

Suggested change
# 返回该存储单元的数据和索引信息
# Return data and index information from this storage unit

Copilot uses AI. Check for mistakes.

for info in server_infos.values():
if not isinstance(info, ZMQServerInfo):
raise ValueError(f"Invalid server info for {role} {id}")
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable 'id' is undefined in this scope. Should use 'info.id' instead.

Suggested change
raise ValueError(f"Invalid server info for {role} {id}")
raise ValueError(f"Invalid server info for {role} {info.id}")

Copilot uses AI. Check for mistakes.
return result


def process_zmq_server_info(handlers: dict[Any, Union[TransferQueueController, TransferQueueStorageSimpleUnit]]): # noqa: UP007
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The noqa comment 'UP007' suggests this is suppressing a ruff rule about Union syntax. Consider using the modern pipe syntax 'TransferQueueController | TransferQueueStorageSimpleUnit' instead of Union, or remove the noqa if the suppression is no longer needed.

Suggested change
def process_zmq_server_info(handlers: dict[Any, Union[TransferQueueController, TransferQueueStorageSimpleUnit]]): # noqa: UP007
def process_zmq_server_info(handlers: dict[Any, TransferQueueController | TransferQueueStorageSimpleUnit]):

Copilot uses AI. Check for mistakes.
@0oshowero0 0oshowero0 merged commit 8aa4bb2 into TransferQueue:main Sep 24, 2025
0oshowero0 pushed a commit that referenced this pull request Sep 28, 2025
ji-huazhong added a commit that referenced this pull request Nov 18, 2025
…oller

* Support storage unit in TransferQueue

* Fix importance error

* Support controller in TransferQueue (#2)

* Support controller in TransferQueue

* Fix import

* Fix comments

---------

Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>

* expose TransferQueueClient (#3)

* Add copyright and license information

Added copyright and licensing information to the controller.py file.

* update client docstring (#5)

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* merge TransferQueue utils (#4)

* [fix] Fix n_sample related problems (#8)

* update client docstring

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* fix n_sample related problems

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

---------

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* expose TransferQueue client/controller UT (#6)

* Add metadata.py and test_simple_storage_unit.py (#9)

* Add metadata.py and test_simple_storage_unit.py

* Add copyright and license information to test_simple_storage_unit.py

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Han Zhenyu 韩振宇 <o0shower0o@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add reorder function to BatchMeta (#13)

Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>

* [recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller (#45)

Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>

* delete TQ source codes

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* update docs

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* update performance

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* fix

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

---------

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Co-authored-by: FightingZhen <295632982@qq.com>
Co-authored-by: Han Zhenyu 韩振宇 <o0shower0o@outlook.com>
Co-authored-by: LLLLxmmm <130739718+LLLLxmmm@users.noreply.github.com>
Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>
Co-authored-by: Han Zhenyu 韩振宇 <hanzy19@tsinghua.org.cn>
Co-authored-by: zhabuye <74179177+zhabuye@users.noreply.github.com>
Co-authored-by: Jianjun Zhong <87791082+jianjunzhong@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants