Skip to content

Conversation

@Alcanderian
Copy link
Collaborator

@Alcanderian Alcanderian commented May 11, 2025

Motivation

DeepSeek-V3-0324 gsm8k accuracy: 0.951

After this upgrade, the compilation speed of both NVCC and NVRTC is truly impressive! First, NVRTC takes about 1s per kernel to compile, while NVCC's compilation time has improved from the previous 4s per kernel to 1.2s~1.3s per kernel. Finally, we no longer have to endure long waits for precompilation!

Refer to deepseek-ai/DeepGEMM@d75b218
NVRTC may have performance loss with some cases and NVCC JIT speed is also 4x faster now. So I keep using NVCC here.

@zhyncs Dependency pipeline:

  1. merge chore: upgrade deepgemm #6073
  2. release new version of sgl-kernel and upload wheel to pip
  3. update srt's sgl-kernel tag
  4. merge this PR

Modifications

Checklist

@zhyncs zhyncs marked this pull request as ready for review May 11, 2025 10:15
@Alcanderian
Copy link
Collaborator Author

merged into #6196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants