Skip to content

[core] Introduce MemoryPoolConfigurator class hierarchy#22389

Merged
ispobock merged 9 commits into
mainfrom
lsyin/pool-configurator-v2
Apr 9, 2026
Merged

[core] Introduce MemoryPoolConfigurator class hierarchy#22389
ispobock merged 9 commits into
mainfrom
lsyin/pool-configurator-v2

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 8, 2026

Summary

  • Introduce MemoryPoolConfigurator base class with unified coeff+bias interface (calculate_pool_sizes / calculate_pool_sizes_from_max_tokens)
  • Add DefaultPoolConfigurator for MHA/MLA/NSA/FP4 — absorbs get_cell_size_per_token, num_layers deduction, DFLASH scaling
  • Add HybridSWAPoolConfigurator for Gemma2/Command-R/MiMo — absorbs resolve_hybrid_swa_tokens with full/swa pool splitting
  • Add create_memory_pool_configurator() factory
  • Rewrite _resolve_memory_pool_config to use configurator flow
  • Delete absorbed methods: profile_max_num_token, _resolve_hybrid_swa_tokens
  • Move MemoryPoolConfig from model_runner_kv_cache_mixin.py to pool_configurator.py
  • Page alignment now owned by configurator; removed from _apply_token_constraints

Follows up on #22384. Mamba configurator is a separate follow-up.

Behavioral changes

  • Fix hybrid SWA _cell_size to use ratio-weighted formula (F*nf + r*S*ns), so --max-total-tokens correctly constrains full_tokens rather than inflating it through a memory-budget round-trip (pre-existing issue in the old code)
  • Configurator returns MemoryPoolConfig directly; max_running_requests default changed from required int to Optional[int] = None (filled by consumer after configurator runs)

Test plan

  • /rerun-stage stage-a-test-1
  • /rerun-stage stage-b-test-small-1-gpu
  • /rerun-stage stage-b-test-large-1-gpu

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Base automatically changed from lsyin/pool-configurator to main April 8, 2026 23:13
@hnyls2002 hnyls2002 force-pushed the lsyin/pool-configurator-v2 branch from 85165a2 to d235d6d Compare April 8, 2026 23:16
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-test test_swa_unittest.py test_mimo_models.py test_deepseek_v3_mtp.py test_dsa_models_mtp.py test_qwen3_next_models_mtp.py test_qwen35_models.py test_triton_sliding_window.py test_mamba_unittest.py test_mamba2_mixer.py test_nvidia_nemotron_nano_v2.py test_nvidia_nemotron_3_super_bf16.py test_mla_deepseek_v3.py test_generation_models.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

1-gpu-h100 (4 tests): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_swa_unittest.py
cd test/ && python3 registered/attention/test_triton_sliding_window.py
cd test/ && python3 registered/mla/test_mla_deepseek_v3.py
cd test/ && python3 registered/models/test_generation_models.py

8-gpu-h200 (3 tests): View workflow run

cd test/ && python3 registered/8-gpu-models/test_mimo_models.py
cd test/ && python3 registered/8-gpu-models/test_dsa_models_mtp.py
cd test/ && python3 registered/8-gpu-models/test_nvidia_nemotron_3_super_bf16.py

4-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/4-gpu-models/test_qwen3_next_models_mtp.py

4-gpu-b200 (1 test): View workflow run

cd test/ && python3 registered/4-gpu-models/test_qwen35_models.py

1-gpu-5090 (1 test): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_mamba_unittest.py

2-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/layers/mamba/test_mamba2_mixer.py
cd test/ && python3 registered/models/test_nvidia_nemotron_nano_v2.py

test_deepseek_v3_mtp.py: Ambiguous filename test_deepseek_v3_mtp.py — matched 2 files:

  • test/registered/8-gpu-models/test_deepseek_v3_mtp.py
  • test/registered/amd/test_deepseek_v3_mtp.py

Please provide the full path, e.g. /rerun-test test/registered/8-gpu-models/test_deepseek_v3_mtp.py

Comment thread python/sglang/srt/model_executor/pool_configurator.py
Comment thread python/sglang/srt/model_executor/pool_configurator.py
@ispobock ispobock merged commit de441ac into main Apr 9, 2026
194 of 247 checks passed
@ispobock ispobock deleted the lsyin/pool-configurator-v2 branch April 9, 2026 07:29
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
@hnyls2002 hnyls2002 mentioned this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants