[FIX_FOR_VLLM_CUSTOM=d3af8c18317c0dc008d42e4367fbb9045cfb7bf6] Add hf_config parameter to HPU quantization config overrides by pawel-olejniczak · Pull Request #1349 · vllm-project/vllm-gaudi

pawel-olejniczak · 2026-04-14T19:09:48Z

Summary

Fixes a regression introduced by upstream vLLM that breaks all quantization tests using HPU-specific GPTQ and AWQ backends (e.g. run_qwen3_inc_dynamic_load_generate_test).

Changes

Add hf_config parameter to override_quantization_method() in GPTQHPUConfig and AWQHPUConfig — upstream changed the call site in vllm/config/model.py to pass hf_config=self.hf_config, but plugin implementations still used the old 2-parameter signature, causing TypeError.
Re-enable build_nixl_dockerfile CI test in pre-merge workflow.

Upstream PR that introduced the regression

[Quantization] [Refactor] Create special "GptOssMxfp4MoeMethod" vllm#39604 — added hf_config keyword argument to override_quantization_method() call and updated all upstream implementations, but plugin implementations were not updated.

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

Copilot

Pull request overview

Fixes an upstream-compatibility regression where vLLM now passes hf_config= into override_quantization_method(), causing the Gaudi HPU GPTQ/AWQ plugin implementations (still using the old signature) to raise a TypeError.

Changes:

Extend GPTQHPUConfig.override_quantization_method() to accept an optional hf_config parameter for API compatibility.
Extend AWQHPUConfig.override_quantization_method() to accept an optional hf_config parameter for API compatibility.
Re-enable the build_nixl_dockerfile CI job when the NIXL Dockerfile is modified.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`vllm_gaudi/ops/hpu_gptq.py`	Updates the override hook signature to accept the new upstream `hf_config` kwarg.
`vllm_gaudi/ops/hpu_awq.py`	Updates the override hook signature to accept the new upstream `hf_config` kwarg.
`.github/workflows/pre-merge.yaml`	Restores Dockerfile build validation for `.cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest` changes.

github-actions · 2026-04-14T22:26:56Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
d3af8c18317c0dc008d42e4367fbb9045cfb7bf6

…_config parameter to HPU quantization config overrides (vllm-project#1349) ## Summary Fixes a regression introduced by upstream vLLM that breaks all quantization tests using HPU-specific GPTQ and AWQ backends (e.g. `run_qwen3_inc_dynamic_load_generate_test`). ## Changes 1. **Add `hf_config` parameter to `override_quantization_method()` in `GPTQHPUConfig` and `AWQHPUConfig`** — upstream changed the call site in `vllm/config/model.py` to pass `hf_config=self.hf_config`, but plugin implementations still used the old 2-parameter signature, causing `TypeError`. 2. **Re-enable `build_nixl_dockerfile` CI test** in pre-merge workflow. ## Upstream PR that introduced the regression - vllm-project/vllm#39604 — added `hf_config` keyword argument to `override_quantization_method()` call and updated all upstream implementations, but plugin implementations were not updated. --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>

…_config parameter to HPU quantization config overrides (vllm-project#1349) ## Summary Fixes a regression introduced by upstream vLLM that breaks all quantization tests using HPU-specific GPTQ and AWQ backends (e.g. `run_qwen3_inc_dynamic_load_generate_test`). ## Changes 1. **Add `hf_config` parameter to `override_quantization_method()` in `GPTQHPUConfig` and `AWQHPUConfig`** — upstream changed the call site in `vllm/config/model.py` to pass `hf_config=self.hf_config`, but plugin implementations still used the old 2-parameter signature, causing `TypeError`. 2. **Re-enable `build_nixl_dockerfile` CI test** in pre-merge workflow. ## Upstream PR that introduced the regression - vllm-project/vllm#39604 — added `hf_config` keyword argument to `override_quantization_method()` call and updated all upstream implementations, but plugin implementations were not updated. --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>

…_config parameter to HPU quantization config overrides (vllm-project#1349) ## Summary Fixes a regression introduced by upstream vLLM that breaks all quantization tests using HPU-specific GPTQ and AWQ backends (e.g. `run_qwen3_inc_dynamic_load_generate_test`). ## Changes 1. **Add `hf_config` parameter to `override_quantization_method()` in `GPTQHPUConfig` and `AWQHPUConfig`** — upstream changed the call site in `vllm/config/model.py` to pass `hf_config=self.hf_config`, but plugin implementations still used the old 2-parameter signature, causing `TypeError`. 2. **Re-enable `build_nixl_dockerfile` CI test** in pre-merge workflow. ## Upstream PR that introduced the regression - vllm-project/vllm#39604 — added `hf_config` keyword argument to `override_quantization_method()` call and updated all upstream implementations, but plugin implementations were not updated. --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>

pawel-olejniczak added 2 commits April 14, 2026 21:59

Add hf_config parameter to HPU quantization config overrides

53262a1

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

Enable build_nixl_dockerfile test

bb19a33

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

Copilot AI review requested due to automatic review settings April 14, 2026 19:09

pawel-olejniczak requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners April 14, 2026 19:09

Copilot started reviewing on behalf of pawel-olejniczak April 14, 2026 19:10 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

iboiko-habana approved these changes Apr 14, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 14, 2026

🚦 Team Review Dashboard #701

Open

tzielinski-habana merged commit 8a9d698 into vllm-project:main Apr 15, 2026
73 of 74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX_FOR_VLLM_CUSTOM=d3af8c18317c0dc008d42e4367fbb9045cfb7bf6] Add hf_config parameter to HPU quantization config overrides#1349

[FIX_FOR_VLLM_CUSTOM=d3af8c18317c0dc008d42e4367fbb9045cfb7bf6] Add hf_config parameter to HPU quantization config overrides#1349
tzielinski-habana merged 2 commits into
vllm-project:mainfrom
pawel-olejniczak:fix/vllm-hourly-14-4

pawel-olejniczak commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pawel-olejniczak commented Apr 14, 2026

Summary

Changes

Upstream PR that introduced the regression

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Apr 14, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants