Skip to content

[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3#31025

Merged
jikunshang merged 4 commits intovllm-project:mainfrom
simonreginis:sreginis/new_benchmarks_dec
Mar 3, 2026
Merged

[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3#31025
jikunshang merged 4 commits intovllm-project:mainfrom
simonreginis:sreginis/new_benchmarks_dec

Conversation

@simonreginis
Copy link
Copy Markdown
Contributor

@simonreginis simonreginis commented Dec 19, 2025

Purpose

Add new performance benchmarks for Intel Gaudi 3 Accelerator which include latency, throughput, and serving test suites for DeepSeek-R1, Llama-4-Maverick-17B-128E-Instruct-FP8, Qwen-3-8B models with HPU-specific optimizations

Test Plan

Models tested: DeepSeek-R1, Llama-4-Maverick-17B-128E-Instruct-FP8, Qwen-3-8B
Scenarios: throughput, latency and serving

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify Bot added ci/build performance Performance-related issues labels Dec 19, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new performance benchmarks for Intel Gaudi 3, specifically for DeepSeek-R1, Llama-4-Maverick-17B-128E-Instruct-FP8, and Qwen-3-8B models. The changes look good overall, but I've found a few critical issues in the benchmark configuration files that could cause failures or incorrect results. These include missing quantization settings for FP8 models, incorrect model names for client tokenizers, and typos in model identifiers. Please see the detailed comments for suggestions on how to fix them.

Comment thread .buildkite/performance-benchmarks/tests/latency-tests-hpu.json
Comment thread .buildkite/performance-benchmarks/tests/serving-tests-hpu.json Outdated
Comment thread .buildkite/performance-benchmarks/tests/serving-tests-hpu.json
Comment thread .buildkite/performance-benchmarks/tests/serving-tests-hpu.json
Comment thread .buildkite/performance-benchmarks/tests/serving-tests-hpu.json
Comment thread .buildkite/performance-benchmarks/tests/throughput-tests-hpu.json
Comment thread .buildkite/performance-benchmarks/tests/throughput-tests-hpu.json
@simonreginis simonreginis force-pushed the sreginis/new_benchmarks_dec branch from cc5e80a to d4e41d9 Compare December 19, 2025 11:32
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@simonreginis simonreginis force-pushed the sreginis/new_benchmarks_dec branch 3 times, most recently from 7f3ad25 to 9513d3b Compare December 23, 2025 11:11
@simonreginis simonreginis force-pushed the sreginis/new_benchmarks_dec branch from 9513d3b to f737407 Compare February 4, 2026 12:12
- DeepSeek-R1
- Llama-4-Maverick-17B-128E-Instruct-FP8
- Qwen-3-8B

Signed-off-by: Szymon Reginis <sreginis@habana.ai>
@simonreginis simonreginis force-pushed the sreginis/new_benchmarks_dec branch from f737407 to 92c454c Compare February 4, 2026 12:56
@simonreginis simonreginis marked this pull request as ready for review February 4, 2026 12:58
@simonreginis
Copy link
Copy Markdown
Contributor Author

This is a continuation of @jakub-sochacki's PR #26919

Related PR in pytortch-integration-testing is merged
pytorch/pytorch-integration-testing#121

@xuechendi @jikunshang Please review.

@PatrykWo
Copy link
Copy Markdown
Contributor

@huydhn 👆

Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! trigger HPU CI to check status.

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2026
@jikunshang jikunshang merged commit 4beebfd into vllm-project:main Mar 3, 2026
13 checks passed
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
…llm-project#31025)

Signed-off-by: Szymon Reginis <sreginis@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Mar 12, 2026
…llm-project#31025)

Signed-off-by: Szymon Reginis <sreginis@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…llm-project#31025)

Signed-off-by: Szymon Reginis <sreginis@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants