Introduce async scheduler implementation with mixin pattern by GOavi101 · Pull Request #941 · torch-spyre/sendnn-inference

GOavi101 · 2026-04-21T08:10:51Z

Description

Introduce async scheduler implementation with mixin pattern for cleaner architecture.

New Implementation (mixins)

PoolingSpyreMixin and ChunkedPrefillSpyreMixin classes
Runtime detection via _is_async_scheduler() (isinstance check)
Simple multiple inheritance for concrete classes:
- class PoolingSpyreScheduler(PoolingSpyreMixin, Scheduler):
- class AsyncPoolingSpyreScheduler(PoolingSpyreMixin, AsyncScheduler):
- class ChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, Scheduler):
- class AsyncChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, AsyncScheduler):

Related Issues

Test Plan

Added comprehensive unit tests in tests/v1/core/test_async_scheduler.py (16 tests):
- TestIsAsyncScheduler: Verifies _is_async_scheduler() detection (4 tests)
- TestPoolingSpyreMixinSchedule: Tests warmup-shape constraints in sync/async modes (4 tests)
- TestChunkedPrefillSpyreMixinSchedule: Verifies constraint bypass in async mode (3 tests)
- TestChunkedPrefillSpyreMixinUpdateFromOutput: Tests scheduler output filtering in async mode (5 tests)

Checklist

I have read the contributing guidelines
My code follows the project's code style (run bash format.sh)
I have added tests for my changes (if applicable)
I have updated the documentation (if applicable)
My commits include a Signed-off-by: line (DCO compliance)

github-actions · 2026-04-21T08:11:01Z

👋 Hi! Thank you for contributing.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

joerunde · 2026-04-22T17:33:55Z

-    SchedulerOutput = None
-
-logger = init_logger(__name__)
+from vllm_spyre.v1.core.scheduler_impl import (


@GOavi101 it looks like most of this file has been deleted and moved to scheduler_impl. Can you put the implementation back in this file so that reviewers can see what's changed?

Thanks, I've looked through the tests but I'll wait to review the code changes until after this diff is in nicer shape- I don't really want to try to recreate the diff myself 😉

Replace _create_pooling_scheduler() and _create_chunked_prefill_scheduler() factory functions with PoolingSpyreMixin and ChunkedPrefillSpyreMixin classes. Each mixin uses _is_async_scheduler() (isinstance check) to detect the concrete base class at runtime and adjust behaviour accordingly, instead of capturing is_async via a closure variable. Concrete classes use simple multiple inheritance: class PoolingSpyreScheduler(PoolingSpyreMixin, Scheduler): pass class AsyncPoolingSpyreScheduler(PoolingSpyreMixin, AsyncScheduler): pass class ChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, Scheduler): pass class AsyncChunkedPrefillSpyreScheduler(ChunkedPrefillSpyreMixin, AsyncScheduler): pass Side effects: - __module__/__name__/__qualname__ fixup blocks removed (no longer needed) - _async_warning_logged flag removed (debug log emitted each call is fine) - TYPE_CHECKING import removed (unused after refactor) Signed-off-by: Avishek Goswami <avishek.goswami@ibm.com>

joerunde · 2026-04-22T17:54:48Z

-        return EMPTY_MODEL_RUNNER_OUTPUT
+        cached = self._last_execute_model_output
+        self._last_execute_model_output = None
+        return cached if cached is not None else EMPTY_MODEL_RUNNER_OUTPUT


Ideally we would actually run the sampling here - see related comment on the structured output PR: #903 (comment)

I'm fine with leaving this as-is and then fixing it to work with both async scheduling and structured outputs in a followup. Issue opened here: #947

joerunde · 2026-04-22T20:17:58Z

+Key behaviours under test:
+  - _is_async_scheduler() correctly identifies async vs sync instances
+  - PoolingSpyreMixin.schedule() applies warmup-shape constraints in both modes
+  - ChunkedPrefillSpyreMixin.schedule() bypasses Spyre constraints in async mode


This statement seems incorrect- we definitely can't just bypass spyre constraints because there are hard limits to what we can run on the cards. What's really going on?

joerunde · 2026-04-22T20:29:20Z

+                is_pooling=True,
+            )
+            # Set as string path for vLLM's resolution (matches upstream behavior)
+            # Only convert to string if it's not already a string


a class should be fine to pass here though, what goes wrong?

joerunde · 2026-04-22T20:30:05Z

+            # The mixin's pre-filter pattern is not safe under that run-ahead scenario.
+            # For TP=1 (UniProcExecutor), futures are immediately done so it's safe.
+            if parallel_config.world_size > 1:
+                scheduler_config.async_scheduling = False


Interesting- if we wanted to support this feature then it would likely need to work with TP=4 which is how we run most models. I thought this was only incompatible with pipeline parallel upstream - does it also not work with tensor parallel?

joerunde · 2026-04-22T20:58:24Z

Thanks @GOavi101!

A few notes:

If this can't be done with tensor parallel, then maybe it's not worth pursuing. Is that a hard blocker?
We need to have an end-to-end test that shows this working, ie using an LLM with async scheduling enabled. It would also be good to include an illustrative test at the engine level (see https://github.com/torch-spyre/sendnn-inference/blob/main/tests/e2e/test_spyre_pc_scheduler_steps.py) that shows the effects of async scheduling. From my quick skim it sounds like the engine is speculatively scheduling batches one step ahead, so we should see a "dead token" in some cases where the engine schedules a decode past the end of a sequence.
It would be really great to see a profile of this in action, or at least some minimal vllm bench results showing what kind of performance improvement we can expect.

GOavi101 requested review from dilipgb and joerunde April 21, 2026 08:10

GOavi101 force-pushed the feature/async-scheduler-mixin-pattern branch 15 times, most recently from 1a3ecbb to b0e8e83 Compare April 22, 2026 17:20

joerunde reviewed Apr 22, 2026

View reviewed changes

GOavi101 force-pushed the feature/async-scheduler-mixin-pattern branch from b0e8e83 to d71cfb3 Compare April 22, 2026 17:34

joerunde reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce async scheduler implementation with mixin pattern#941

Introduce async scheduler implementation with mixin pattern#941
GOavi101 wants to merge 1 commit intotorch-spyre:mainfrom
GOavi101:feature/async-scheduler-mixin-pattern

GOavi101 commented Apr 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

joerunde Apr 22, 2026 •

edited

Loading

Uh oh!

GOavi101 Apr 22, 2026

Uh oh!

joerunde Apr 22, 2026

Uh oh!

joerunde Apr 22, 2026

Uh oh!

joerunde Apr 22, 2026

Uh oh!

joerunde Apr 22, 2026

Uh oh!

joerunde Apr 22, 2026

Uh oh!

joerunde commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GOavi101 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

New Implementation (mixins)

Related Issues

Test Plan

Checklist

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

joerunde Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GOavi101 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GOavi101 commented Apr 21, 2026 •

edited

Loading

joerunde Apr 22, 2026 •

edited

Loading