Introduce async scheduler implementation with mixin pattern#941
Introduce async scheduler implementation with mixin pattern#941GOavi101 wants to merge 1 commit intotorch-spyre:mainfrom
Conversation
|
👋 Hi! Thank you for contributing. We also recommend installing prek and configuring it to check your code before every local commit. |
1a3ecbb to
b0e8e83
Compare
| SchedulerOutput = None | ||
|
|
||
| logger = init_logger(__name__) | ||
| from vllm_spyre.v1.core.scheduler_impl import ( |
There was a problem hiding this comment.
@GOavi101 it looks like most of this file has been deleted and moved to scheduler_impl. Can you put the implementation back in this file so that reviewers can see what's changed?
There was a problem hiding this comment.
Thanks, I've looked through the tests but I'll wait to review the code changes until after this diff is in nicer shape- I don't really want to try to recreate the diff myself 😉
Replace _create_pooling_scheduler() and _create_chunked_prefill_scheduler() factory functions with PoolingSpyreMixin and ChunkedPrefillSpyreMixin classes. Each mixin uses _is_async_scheduler() (isinstance check) to detect the concrete base class at runtime and adjust behaviour accordingly, instead of capturing is_async via a closure variable. Concrete classes use simple multiple inheritance: class PoolingSpyreScheduler(PoolingSpyreMixin, Scheduler): pass class AsyncPoolingSpyreScheduler(PoolingSpyreMixin, AsyncScheduler): pass class ChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, Scheduler): pass class AsyncChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, AsyncScheduler): pass Side effects: - __module__/__name__/__qualname__ fixup blocks removed (no longer needed) - _async_warning_logged flag removed (debug log emitted each call is fine) - TYPE_CHECKING import removed (unused after refactor) Signed-off-by: Avishek Goswami <avishek.goswami@ibm.com>
b0e8e83 to
d71cfb3
Compare
| return EMPTY_MODEL_RUNNER_OUTPUT | ||
| cached = self._last_execute_model_output | ||
| self._last_execute_model_output = None | ||
| return cached if cached is not None else EMPTY_MODEL_RUNNER_OUTPUT |
There was a problem hiding this comment.
Ideally we would actually run the sampling here - see related comment on the structured output PR: #903 (comment)
I'm fine with leaving this as-is and then fixing it to work with both async scheduling and structured outputs in a followup. Issue opened here: #947
| Key behaviours under test: | ||
| - _is_async_scheduler() correctly identifies async vs sync instances | ||
| - PoolingSpyreMixin.schedule() applies warmup-shape constraints in both modes | ||
| - ChunkedPrefillSpyreMixin.schedule() bypasses Spyre constraints in async mode |
There was a problem hiding this comment.
This statement seems incorrect- we definitely can't just bypass spyre constraints because there are hard limits to what we can run on the cards. What's really going on?
| is_pooling=True, | ||
| ) | ||
| # Set as string path for vLLM's resolution (matches upstream behavior) | ||
| # Only convert to string if it's not already a string |
There was a problem hiding this comment.
a class should be fine to pass here though, what goes wrong?
| # The mixin's pre-filter pattern is not safe under that run-ahead scenario. | ||
| # For TP=1 (UniProcExecutor), futures are immediately done so it's safe. | ||
| if parallel_config.world_size > 1: | ||
| scheduler_config.async_scheduling = False |
There was a problem hiding this comment.
Interesting- if we wanted to support this feature then it would likely need to work with TP=4 which is how we run most models. I thought this was only incompatible with pipeline parallel upstream - does it also not work with tensor parallel?
|
Thanks @GOavi101! A few notes:
|
Description
Introduce async scheduler implementation with mixin pattern for cleaner architecture.
New Implementation (mixins)
PoolingSpyreMixinandChunkedPrefillSpyreMixinclasses_is_async_scheduler()(isinstance check)class PoolingSpyreScheduler(PoolingSpyreMixin, Scheduler):class AsyncPoolingSpyreScheduler(PoolingSpyreMixin, AsyncScheduler):class ChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, Scheduler):class AsyncChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, AsyncScheduler):Related Issues
Test Plan
tests/v1/core/test_async_scheduler.py(16 tests):TestIsAsyncScheduler: Verifies_is_async_scheduler()detection (4 tests)TestPoolingSpyreMixinSchedule: Tests warmup-shape constraints in sync/async modes (4 tests)TestChunkedPrefillSpyreMixinSchedule: Verifies constraint bypass in async mode (3 tests)TestChunkedPrefillSpyreMixinUpdateFromOutput: Tests scheduler output filtering in async mode (5 tests)Checklist
bash format.sh)Signed-off-by:line (DCO compliance)