Skip to content

[FIX_FOR_VLLM_CUSTOM=d3af8c18317c0dc008d42e4367fbb9045cfb7bf6] Add hf_config parameter to HPU quantization config overrides#1349

Merged
tzielinski-habana merged 2 commits into
vllm-project:mainfrom
pawel-olejniczak:fix/vllm-hourly-14-4
Apr 15, 2026
Merged

[FIX_FOR_VLLM_CUSTOM=d3af8c18317c0dc008d42e4367fbb9045cfb7bf6] Add hf_config parameter to HPU quantization config overrides#1349
tzielinski-habana merged 2 commits into
vllm-project:mainfrom
pawel-olejniczak:fix/vllm-hourly-14-4

Conversation

@pawel-olejniczak
Copy link
Copy Markdown
Collaborator

Summary

Fixes a regression introduced by upstream vLLM that breaks all quantization tests using HPU-specific GPTQ and AWQ backends (e.g. run_qwen3_inc_dynamic_load_generate_test).

Changes

  1. Add hf_config parameter to override_quantization_method() in GPTQHPUConfig and AWQHPUConfig — upstream changed the call site in vllm/config/model.py to pass hf_config=self.hf_config, but plugin implementations still used the old 2-parameter signature, causing TypeError.
  2. Re-enable build_nixl_dockerfile CI test in pre-merge workflow.

Upstream PR that introduced the regression

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an upstream-compatibility regression where vLLM now passes hf_config= into override_quantization_method(), causing the Gaudi HPU GPTQ/AWQ plugin implementations (still using the old signature) to raise a TypeError.

Changes:

  • Extend GPTQHPUConfig.override_quantization_method() to accept an optional hf_config parameter for API compatibility.
  • Extend AWQHPUConfig.override_quantization_method() to accept an optional hf_config parameter for API compatibility.
  • Re-enable the build_nixl_dockerfile CI job when the NIXL Dockerfile is modified.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
vllm_gaudi/ops/hpu_gptq.py Updates the override hook signature to accept the new upstream hf_config kwarg.
vllm_gaudi/ops/hpu_awq.py Updates the override hook signature to accept the new upstream hf_config kwarg.
.github/workflows/pre-merge.yaml Restores Dockerfile build validation for .cd/Dockerfile.ubuntu.pytorch.vllm.nixl.latest changes.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d3af8c18317c0dc008d42e4367fbb9045cfb7bf6

@tzielinski-habana tzielinski-habana merged commit 8a9d698 into vllm-project:main Apr 15, 2026
73 of 74 checks passed
bmyrcha pushed a commit to bmyrcha/vllm-gaudi that referenced this pull request Apr 17, 2026
…_config parameter to HPU quantization config overrides (vllm-project#1349)

## Summary

Fixes a regression introduced by upstream vLLM that breaks all
quantization tests using HPU-specific GPTQ and AWQ backends (e.g.
`run_qwen3_inc_dynamic_load_generate_test`).

## Changes

1. **Add `hf_config` parameter to `override_quantization_method()` in
`GPTQHPUConfig` and `AWQHPUConfig`** — upstream changed the call site in
`vllm/config/model.py` to pass `hf_config=self.hf_config`, but plugin
implementations still used the old 2-parameter signature, causing
`TypeError`.
2. **Re-enable `build_nixl_dockerfile` CI test** in pre-merge workflow.

## Upstream PR that introduced the regression

- vllm-project/vllm#39604 — added `hf_config`
keyword argument to `override_quantization_method()` call and updated
all upstream implementations, but plugin implementations were not
updated.

---------

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>
yeonsily pushed a commit to yeonsily/vllm-gaudi that referenced this pull request Apr 21, 2026
…_config parameter to HPU quantization config overrides (vllm-project#1349)

## Summary

Fixes a regression introduced by upstream vLLM that breaks all
quantization tests using HPU-specific GPTQ and AWQ backends (e.g.
`run_qwen3_inc_dynamic_load_generate_test`).

## Changes

1. **Add `hf_config` parameter to `override_quantization_method()` in
`GPTQHPUConfig` and `AWQHPUConfig`** — upstream changed the call site in
`vllm/config/model.py` to pass `hf_config=self.hf_config`, but plugin
implementations still used the old 2-parameter signature, causing
`TypeError`.
2. **Re-enable `build_nixl_dockerfile` CI test** in pre-merge workflow.

## Upstream PR that introduced the regression

- vllm-project/vllm#39604 — added `hf_config`
keyword argument to `override_quantization_method()` call and updated
all upstream implementations, but plugin implementations were not
updated.

---------

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>
bmyrcha pushed a commit to bmyrcha/vllm-gaudi that referenced this pull request Apr 22, 2026
…_config parameter to HPU quantization config overrides (vllm-project#1349)

## Summary

Fixes a regression introduced by upstream vLLM that breaks all
quantization tests using HPU-specific GPTQ and AWQ backends (e.g.
`run_qwen3_inc_dynamic_load_generate_test`).

## Changes

1. **Add `hf_config` parameter to `override_quantization_method()` in
`GPTQHPUConfig` and `AWQHPUConfig`** — upstream changed the call site in
`vllm/config/model.py` to pass `hf_config=self.hf_config`, but plugin
implementations still used the old 2-parameter signature, causing
`TypeError`.
2. **Re-enable `build_nixl_dockerfile` CI test** in pre-merge workflow.

## Upstream PR that introduced the regression

- vllm-project/vllm#39604 — added `hf_config`
keyword argument to `override_quantization_method()` call and updated
all upstream implementations, but plugin implementations were not
updated.

---------

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants