Fix test cases by merrymercy · Pull Request #6 · sgl-project/sglang

merrymercy · 2024-01-15T09:12:09Z

No description provided.

…e flashinfer decode kernel (#6)

sgl-project#6) * Optimize all_reduce by porting the shm kernel of deepspeed * Fix rebase: use get_tp_group in sglang.srt.distributed * Fix rebase: directly modify tensor_model_parallel_all_reduce in sglang

…roject#7) This reverts commit eac4599.

Add Deepseek FP8FP8 brgemm kernel

support for more than one host name

Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>

Merge upstream changes, 20251022

Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Add wave extend attention kernel Signed-off-by: Harsh Menon <harsh@nod-labs.com> [Wave] Adding logit_cap and layer scaling to API Also add support for the wave backend to the model runner. And use Triton decode kernels for now. [Wave] Run chunked prefill for perf comparison on Wave test Need to rename the non chunked/regular prefill version because otherwise rpd will treat it as the same kernel Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Cache the function that loads the wave kernel Also maintain a global kernel hash to avoid recomputing the hash on every call. [Wave] Don't specify block size and enable buffer ops [Wave] Enable wave runtime and update scheduling API [Wave] Update API to use wave_compile & WaveCompileOptions [Wave] Update wave backend and extend attention to latest [Wave] Add speculative decode kernel Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> cache kernels using lru_cache Update WaveBackend to use Wave Decode (sgl-project#6) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-project#7) This reverts commit eac4599. Wave Backend decode (sgl-project#8) * align shapes Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> * fix Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Wave backend fixes (sgl-project#10) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> More fixes to Wave decode (sgl-project#12) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> is_causal Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Enable the grok in3 model (sgl-project#14) Set unique cache dir for each worker (sgl-project#16) update kernel (sgl-project#18) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> updated spec decode test as per wave Signed-off-by: xintin <gaurav.verma@amd.com> fix extend (sgl-project#23) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Refactor paged decode intermediate arrays shapes (sgl-project#24) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> remove dyn symbols (sgl-project#26) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> cleanup shapes (sgl-project#27) Some fields were removed from `paged_decode_attention_shape`. Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Remove `mha` param from Wave decode attention kernel (sgl-project#28) Depends on iree-org/iree-turbine#1039 Signed-off-by: Paul Zhang <paul.zhang@amd.com> nfc: fix problems reported by linting update references from iree.turbine to wave_lang

* Replace type to isinstance * Check --encode-urls * Add async lock for rid * Move thread logic into mm_receiver

# This is the 1st commit message: rebase # This is the commit message sgl-project#2: remove duplicated code # This is the commit message sgl-project#3: add type hints # This is the commit message sgl-project#4: add clear cache for benchmark alignment # This is the commit message sgl-project#5: remove unuse arg # This is the commit message sgl-project#6: clear cache once

# This is the 1st commit message: rebase # This is the commit message sgl-project#2: remove duplicated code # This is the commit message sgl-project#3: add type hints # This is the commit message sgl-project#4: add clear cache for benchmark alignment # This is the commit message sgl-project#5: remove unuse arg # This is the commit message sgl-project#6: clear cache once # This is the commit message sgl-project#7: simplified VAE cache logic for qwenimage and wan # This is the commit message sgl-project#8: remove duplicated code

* add get_default_sampling_params definition * Merge pull request sgl-project#6 from primatrix/feat/align-sampling-for-tunix align sampling param ability according to rfc * add multinomial_with_seed for sampler and test_sampler.py (sgl-project#12) * update flax fix duplicate register pytree and use nnx.data to wrap FlashAttentionMetadata * extract scheduler thread * add event loop * fix duplicate params * use server parameters * add tree_flatten & tree_unflatten * with mesh --------- Co-authored-by: aolemila <aolemilaluo@gmail.com> Co-authored-by: pathfinder-fp <aaaabbbbbb@163.com> Co-authored-by: aolemila <aolemila@primatrix.ai> Co-authored-by: pathfinder-fp <slackexplorer@gmail.com>

- Validate alloc reply_id matches request_id (sgl-project#3) - Remove dead variable num_gen_tokens (sgl-project#4) - Move inline imports to top level (sgl-project#5) - Replace hasattr guards with proper None checks (sgl-project#6) - Demote per-request logs to DEBUG, keep milestones at INFO (sgl-project#11) - Remove unused tree_cache param from start_kv_return_receiver (sgl-project#14)

…ject#6) Four fixes for PD disaggregation hangs around weight updates and transient failures: 1. DecodePreallocQueue timeout: requests stuck in the prealloc queue with waiting_for_input=True but insufficient KV cache memory now time out after SGLANG_DISAGGREGATION_TRANSFER_TIMEOUT (default 600s) instead of hanging indefinitely. This closes a gap where no existing timeout covered this state. 2. Pre-aborted bootstrap rooms: if an abort arrives on the prefill side before the corresponding request enters the bootstrap queue, the bootstrap_room is recorded. When the request later arrives, it is immediately aborted instead of entering the queue and potentially hanging. 3. Pause/resume queue draining: in PD disaggregation, prefill and decode event loops now continue advancing already-admitted bootstrap and transfer queues while /pause_generation mode=in_place is active. This prevents in-flight requests from sitting until the 1800s disaggregation timeout fires. 4. Decode PREBUILT preservation on pause: if pause_generation lands after a decode PREBUILT batch has left waiting_queue but before it is merged into running_batch, the batch is now committed before pause state is finalized. This prevents a small number of requests from disappearing and timing out at the client. Also updates the PD pause/resume regression test documentation to cover both pause-related failure modes. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: chatgpt-codex-connector[bot] <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>

* Initial commit Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Apply review comments Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Debug Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Add model test Signed-off-by: Xiake Sun <xiake.sun@amd.com> * add CI script permission to be executable Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Update model path, fix aiter path in original docker image Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Disable cudagraph for debug Signed-off-by: Xiake Sun <xiake.sun@amd.com> * AOT Prebuild aiter gemma rmsnorm fusion kernel Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Comment out curl single test temporarily Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Comment out curl single test temporarily Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Enable cuda graph Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix lauch server crash issue Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Update GPU_ARCHS and PYTORCH_ROCM_ARCH Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix bug Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix curl test Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Clean up build cache & images Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Clean up build cache & images Signed-off-by: Xiake Sun <xiake.sun@amd.com> * Fix format Signed-off-by: Xiake Sun <xiake.sun@amd.com> --------- Signed-off-by: Xiake Sun <xiake.sun@amd.com>

merrymercy added 5 commits January 15, 2024 08:57

fix versionc

1ea2499

fix test programs

09226c5

fix all

6bd2c93

fix

b568241

fix

6ae61c8

merrymercy merged commit 4bd8233 into main Jan 15, 2024

merrymercy deleted the fix branch January 15, 2024 09:15

Rookie-Kai mentioned this pull request Aug 14, 2024

[Bug] Always Watch Dog TimeOut #1093

Closed

4 tasks

Ying1123 pushed a commit that referenced this pull request Sep 13, 2024

fix [infer_batch]: fix _get_decode_local_lens and use it to initializ…

639e716

…e flashinfer decode kernel (#6)

wonderisland mentioned this pull request Sep 19, 2024

[Bug] illegal memory access encountered #1467

Closed

5 tasks

learninmou mentioned this pull request Sep 25, 2024

[Bug] sglang run for few hours, it will stop returning valid response #1270

Closed

5 tasks

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

zhaotyer mentioned this pull request Feb 14, 2025

[Bug] DeepSeek-R1-BF16 can't output with /v1/chat/completions on 4 node*8*A100 #3572

Closed

5 tasks

lambert0312 mentioned this pull request Feb 18, 2025

Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 #3582

Merged

ToughK mentioned this pull request Feb 18, 2025

[Bug] sglang crashed when use enable_dp_attention running DeepSeekV3 on 2x8xH100 #3658

Closed

5 tasks

mahaocong90 mentioned this pull request Feb 26, 2025

[Bug] H20 8 gpu x 2 with --enable-dp-attention occurred CUDA error: an illegal memory access #3892

Closed

5 tasks

verigle mentioned this pull request Feb 27, 2025

[Bug] Model Stuck at Prefill and then throw "Watchdog Timeout" Error After Idle Period (Deepseek-r1:671b on two H100*8) #3836

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Fix test cases (sgl-project#6)

30f56f6

This was referenced Apr 16, 2025

enable ci test: upstream ci for XPU DiweiSun/sglang#4

Closed

Enable CPU CI: upstream CI enabling with github workflow DiweiSun/sglang#3

Closed

Update README.md DiweiSun/sglang#1

Closed

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 11, 2025

Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-p…

3333572

…roject#7) This reverts commit eac4599.

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

dhh123 mentioned this pull request Aug 31, 2025

Support DeepSeek-V3.1 tool call #9446

Merged

4 tasks

Xia-Weiwen pushed a commit to Xia-Weiwen/sglang that referenced this pull request Sep 5, 2025

Merge pull request sgl-project#6 from jianan-gu/deepseek_fp8_po

558d6b2

Add Deepseek FP8FP8 brgemm kernel

Johnsonms mentioned this pull request Oct 2, 2025

Support DeepSeek V3.2 Exp #11061

Merged

someoneexistsontheinternet pushed a commit to someoneexistsontheinternet/sglang that referenced this pull request Oct 23, 2025

Merge pull request sgl-project#6 from NousResearch/fix-hgx-issues

9907806

support for more than one host name

kalyank007 pushed a commit to kalyank007/sglang that referenced this pull request Nov 7, 2025

Tests/fix attributeerror error. (sgl-project#6)

a5b3535

Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>

Johnsonms mentioned this pull request Nov 10, 2025

[Bug] DeepSeek V32 CUDA error: an illegal memory access was encountered #12893

Closed

5 tasks

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

fstandhartinger pushed a commit to fstandhartinger/sglang that referenced this pull request Nov 11, 2025

Merge pull request sgl-project#6 from chutesai/merge-20251022

1dafa9b

Merge upstream changes, 20251022

RahulB200 mentioned this pull request Nov 13, 2025

[Bug] Kimi K2 Thinking Marlin Kernel Crash #13234

Closed

5 tasks

chz34 added a commit to chz34/sglang that referenced this pull request Dec 4, 2025

use attr output_dim, input_dim for weight load (sgl-project#6)

685cd91

yhyang201 pushed a commit that referenced this pull request Dec 13, 2025

Fix comments (#6)

355e6e8

* Replace type to isinstance * Check --encode-urls * Add async lock for rid * Move thread logic into mm_receiver

Martion-z mentioned this pull request Feb 13, 2026

[Bug] CUDA error: an illegal memory access was encountered with SGLang v0.5.8 + HiCache #18785

Open

5 tasks

dongyibo mentioned this pull request Feb 13, 2026

[Bug] Deepseek v3.2 prefill workers crash when PD disaggregation #18799

Closed

5 tasks

chenkaiyue mentioned this pull request Feb 28, 2026

Fix: Cuda Graph + HiCache + Speculative Decoding Working Together were giving Cuda Illegal memory access error. #19177

Open

alisonshao mentioned this pull request Mar 1, 2026

Upgrade transformers==5.3.0 #17784

Merged

21 tasks

putdanil mentioned this pull request Mar 4, 2026

[Bug] FLUX.2-dev FP8 transformer crashes with 4 reference images during denoising #19873

Closed

5 tasks

0xymoro mentioned this pull request Mar 6, 2026

[Bug] Illegal memory access on 0.5.9 nvfp4 #20011

Open

5 tasks

Jacob0226 mentioned this pull request Mar 26, 2026

[AMD] Fuse RMSNorm + FP8 per-token quant for GLM-4.7-FP8 #21403

Open

lviy mentioned this pull request Mar 26, 2026

[Bug] Enablling DP-Attention cause 'nan' of 'inf' in prob tensor #21460

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test cases#6

Fix test cases#6
merrymercy merged 5 commits intomainfrom
fix

merrymercy commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

merrymercy commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant