[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3#31025
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds new performance benchmarks for Intel Gaudi 3, specifically for DeepSeek-R1, Llama-4-Maverick-17B-128E-Instruct-FP8, and Qwen-3-8B models. The changes look good overall, but I've found a few critical issues in the benchmark configuration files that could cause failures or incorrect results. These include missing quantization settings for FP8 models, incorrect model names for client tokenizers, and typos in model identifiers. Please see the detailed comments for suggestions on how to fix them.
cc5e80a to
d4e41d9
Compare
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
7f3ad25 to
9513d3b
Compare
9513d3b to
f737407
Compare
- DeepSeek-R1 - Llama-4-Maverick-17B-128E-Instruct-FP8 - Qwen-3-8B Signed-off-by: Szymon Reginis <sreginis@habana.ai>
f737407 to
92c454c
Compare
|
This is a continuation of @jakub-sochacki's PR #26919 Related PR in pytortch-integration-testing is merged @xuechendi @jikunshang Please review. |
|
@huydhn 👆 |
jikunshang
left a comment
There was a problem hiding this comment.
LGTM! trigger HPU CI to check status.
…llm-project#31025) Signed-off-by: Szymon Reginis <sreginis@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…llm-project#31025) Signed-off-by: Szymon Reginis <sreginis@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…llm-project#31025) Signed-off-by: Szymon Reginis <sreginis@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Purpose
Add new performance benchmarks for Intel Gaudi 3 Accelerator which include latency, throughput, and serving test suites for DeepSeek-R1, Llama-4-Maverick-17B-128E-Instruct-FP8, Qwen-3-8B models with HPU-specific optimizations
Test Plan
Models tested: DeepSeek-R1, Llama-4-Maverick-17B-128E-Instruct-FP8, Qwen-3-8B
Scenarios: throughput, latency and serving
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.