Skip to content

Introduce async scheduler implementation with mixin pattern#941

Draft
GOavi101 wants to merge 1 commit intotorch-spyre:mainfrom
GOavi101:feature/async-scheduler-mixin-pattern
Draft

Introduce async scheduler implementation with mixin pattern#941
GOavi101 wants to merge 1 commit intotorch-spyre:mainfrom
GOavi101:feature/async-scheduler-mixin-pattern

Conversation

@GOavi101
Copy link
Copy Markdown
Collaborator

@GOavi101 GOavi101 commented Apr 21, 2026

Description

Introduce async scheduler implementation with mixin pattern for cleaner architecture.

New Implementation (mixins)

  • PoolingSpyreMixin and ChunkedPrefillSpyreMixin classes
  • Runtime detection via _is_async_scheduler() (isinstance check)
  • Simple multiple inheritance for concrete classes:
    • class PoolingSpyreScheduler(PoolingSpyreMixin, Scheduler):
    • class AsyncPoolingSpyreScheduler(PoolingSpyreMixin, AsyncScheduler):
    • class ChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, Scheduler):
    • class AsyncChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, AsyncScheduler):

Related Issues

Test Plan

  • Added comprehensive unit tests in tests/v1/core/test_async_scheduler.py (16 tests):
    • TestIsAsyncScheduler: Verifies _is_async_scheduler() detection (4 tests)
    • TestPoolingSpyreMixinSchedule: Tests warmup-shape constraints in sync/async modes (4 tests)
    • TestChunkedPrefillSpyreMixinSchedule: Verifies constraint bypass in async mode (3 tests)
    • TestChunkedPrefillSpyreMixinUpdateFromOutput: Tests scheduler output filtering in async mode (5 tests)

Checklist

  • I have read the contributing guidelines
  • My code follows the project's code style (run bash format.sh)
  • I have added tests for my changes (if applicable)
  • I have updated the documentation (if applicable)
  • My commits include a Signed-off-by: line (DCO compliance)

@GOavi101 GOavi101 requested review from dilipgb and joerunde April 21, 2026 08:10
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

@GOavi101 GOavi101 force-pushed the feature/async-scheduler-mixin-pattern branch 15 times, most recently from 1a3ecbb to b0e8e83 Compare April 22, 2026 17:20
SchedulerOutput = None

logger = init_logger(__name__)
from vllm_spyre.v1.core.scheduler_impl import (
Copy link
Copy Markdown
Collaborator

@joerunde joerunde Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GOavi101 it looks like most of this file has been deleted and moved to scheduler_impl. Can you put the implementation back in this file so that reviewers can see what's changed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure joe

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've looked through the tests but I'll wait to review the code changes until after this diff is in nicer shape- I don't really want to try to recreate the diff myself 😉

Replace _create_pooling_scheduler() and _create_chunked_prefill_scheduler()
factory functions with PoolingSpyreMixin and ChunkedPrefillSpyreMixin classes.

Each mixin uses _is_async_scheduler() (isinstance check) to detect the concrete
base class at runtime and adjust behaviour accordingly, instead of capturing
is_async via a closure variable.

Concrete classes use simple multiple inheritance:

  class PoolingSpyreScheduler(PoolingSpyreMixin, Scheduler): pass
  class AsyncPoolingSpyreScheduler(PoolingSpyreMixin, AsyncScheduler): pass
  class ChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, Scheduler): pass
  class AsyncChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, AsyncScheduler): pass

Side effects:
- __module__/__name__/__qualname__ fixup blocks removed (no longer needed)
- _async_warning_logged flag removed (debug log emitted each call is fine)
- TYPE_CHECKING import removed (unused after refactor)

Signed-off-by: Avishek Goswami <avishek.goswami@ibm.com>
@GOavi101 GOavi101 force-pushed the feature/async-scheduler-mixin-pattern branch from b0e8e83 to d71cfb3 Compare April 22, 2026 17:34
return EMPTY_MODEL_RUNNER_OUTPUT
cached = self._last_execute_model_output
self._last_execute_model_output = None
return cached if cached is not None else EMPTY_MODEL_RUNNER_OUTPUT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would actually run the sampling here - see related comment on the structured output PR: #903 (comment)

I'm fine with leaving this as-is and then fixing it to work with both async scheduling and structured outputs in a followup. Issue opened here: #947

Key behaviours under test:
- _is_async_scheduler() correctly identifies async vs sync instances
- PoolingSpyreMixin.schedule() applies warmup-shape constraints in both modes
- ChunkedPrefillSpyreMixin.schedule() bypasses Spyre constraints in async mode
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement seems incorrect- we definitely can't just bypass spyre constraints because there are hard limits to what we can run on the cards. What's really going on?

Comment thread vllm_spyre/platform.py
is_pooling=True,
)
# Set as string path for vLLM's resolution (matches upstream behavior)
# Only convert to string if it's not already a string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a class should be fine to pass here though, what goes wrong?

Comment thread vllm_spyre/platform.py
# The mixin's pre-filter pattern is not safe under that run-ahead scenario.
# For TP=1 (UniProcExecutor), futures are immediately done so it's safe.
if parallel_config.world_size > 1:
scheduler_config.async_scheduling = False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting- if we wanted to support this feature then it would likely need to work with TP=4 which is how we run most models. I thought this was only incompatible with pipeline parallel upstream - does it also not work with tensor parallel?

@joerunde
Copy link
Copy Markdown
Collaborator

Thanks @GOavi101!

A few notes:

  1. If this can't be done with tensor parallel, then maybe it's not worth pursuing. Is that a hard blocker?
  2. We need to have an end-to-end test that shows this working, ie using an LLM with async scheduling enabled. It would also be good to include an illustrative test at the engine level (see https://github.com/torch-spyre/sendnn-inference/blob/main/tests/e2e/test_spyre_pc_scheduler_steps.py) that shows the effects of async scheduling. From my quick skim it sounds like the engine is speculatively scheduling batches one step ahead, so we should see a "dead token" in some cases where the engine schedules a decode past the end of a sequence.
  3. It would be really great to see a profile of this in action, or at least some minimal vllm bench results showing what kind of performance improvement we can expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants