lm_eval to 0.4.9.1 and support for new args by 12010486 · Pull Request #2193 · huggingface/optimum-habana

12010486 · 2025-08-06T15:18:40Z

Argument parsing and evaluation configuration:

Added new command-line arguments in run_lm_eval.py for controlling evaluation, including support for generation kwargs, few-shot and multi-turn settings, metadata, system instructions, chat template application, and sample selection. This allows for more granular and customizable evaluation runs.
Add try_parse_json to robustly handle JSON or string input for generation arguments.
Updated main evaluation logic to pass new arguments through to the evaluator, and added validation for combinations of options (e.g., requiring chat template when using multi-turn few-shot).

Model adapter enhancements:

Added support for softmax_dtype, think_end_token, enable_thinking, and chat_template_args in HabanaModelAdapter initialization, enabling more advanced generation and prompt formatting features.
Improved bucket selection logic for static shape generation, and switched to using max_new_tokens for generation instead of max_length, aligning with HuggingFace's recommended API usage.

Dependency update:

Upgraded the lm-eval package to version 0.4.9.1 to support new features and bug fixes.

General improvements:

Minor refactoring for imports and typing to support new features.

…t variable (huggingface#2084)

* Unify SetTrueOrFalseOrNone and StoreTrueFalseAction * Fix style

Fix utils package A previous PR [1] introduced a new package `optimum.habana.utils` which created a conflict with the `optimum/habana/utils.py` file. This commit adds the missing `utils/__init__.py` file and moves the `utils.py` file into the `utils/` module (renamed as `misc.py`). [1] huggingface#1926 Co-authored-by: Piotr Bielak <pbielak@habana.ai>

* Align VideoLlavaProcessor with Transformers v4.51.3 The GaudiVideoLlavaProcessor has been removed from optimum-habana as its functionality is now fully aligned with the upstream Transformers implementation. No custom logic is required, and maintaining a separate class is redundant. * Update GaudiVideoLlavaForConditionalGeneration This update aligns the GaudiVideoLlavaForConditionalGeneration implementation with the v4.51.3 transformers changes while retaining `token_idx` argument for compatibility with Gaudi optimizations.

With Llama 4 support in Transformers 4.51, there was a change in the `Pipeline` class [1], which causes the pipeline device to be set to `self.model.device`. In the case of Mllama, DeepSpeed is used to create the `.language_model` on HPU, whereas the rest of the model stays on CPU [2]. Hence, always `self.model.device = CPU`, which causes the whole model to be placed back on CPU. This commit explicitly moves the model to HPU, so the pipeline will be also placed on HPU. [1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61 [2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Co-authored-by: Yuan Tian <tian.yuan@intel.com>

* Synapse 1.22 Optimum Habana 1.19 * Fix deepspeed version to 1.21.0

…itialization. (huggingface#2126) * Move import to local scope in run_lm_eval, to allow prior env vars initialization. Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai> * Address PR comment: Adding comment explaining delayed import. Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai> --------- Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>

…gingface#2145) Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>

…ngface#2136) add param DATASET_CONFIG add param DATASET_CONFIG to makefile add param DATASET_CONFIG to makefile - part.2 add log msg update path update path - part.2 update path - part.3 update path - part.4 update path - part.5 Return empty args for unsupported examples

* Ported arctic instruct code Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Make style and resolve GenerationMixin warnings Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fixed tokenization imports Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated requirements for Arctic Model Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Apply fix for ArcticRMSNorm from Llama Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Use customized rope Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Better try imports, unified RoPE implementation Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix typo Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Added mark step after decoder layers Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Using gaudi mixtral MOE impl Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Changed to gaudi repeat_kv and rope impls Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add missing rope scaling to config Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix repeat_kv signature Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Revert "Using gaudi mixtral MOE impl" This reverts commit 9c390e7. * Remove other attention impls Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add initial KV cache support Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add fixed moe from mixtral Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Integrate KV cache Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated docs Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Apply fixes from huggingface#1705 Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Resolve GenerationMixin and LlamaRotaryEmbedding warnings Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Cleanup unused code Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * make style Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Mirror rope usage in llama Signed-off-by: Daniel Huang <daniel1.huang@intel.com> --------- Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Daniel Huang <tianmu.li@intel.com>

With the merge of [1], the patching in `modeling_utils.py` causes a series of imports which eventually tries to import `sentencepiece`. Hence, now `sentencepiece` is now a required package. [1] huggingface#1719

12010486 · 2025-08-06T16:12:14Z

Examples of new command lines:

PT_HPU_LAZY=1 python run_lm_eval.py --model_name_or_path meta-llama/Llama-3.2-1B-Instruct --attn_softmax_bf16 --bf16 --batch_size 16 -o temp.txt --tasks mmlu --num_fewshot 5 --metadata '{"max_length":496}'

Note: metadata is fully supported in 0.4.9, while we are still on 0.4.8. I will update it shortly

[Add] apply_chat_template CLI option

12010486 · 2025-08-07T08:15:47Z

Apologies, we decided to add even more args. I'm marking it as draft until all the args are in

12010486 · 2025-08-08T10:50:09Z

When #2197 is merged, the lm_eval generate tasks will be able to run also in eager mode

regisss · 2025-08-08T14:55:21Z

When #2197 is merged, the lm_eval generate tasks will be able to run also in eager mode

#2197 should be merged before this one?

12010486 · 2025-08-08T14:59:34Z

No need, the limitation of this one is to use only Lazy mode. Ideally we are going to have both in soon

regisss

LGTM, is this ready to be merged?

karol-brejna-i · 2025-08-20T12:14:18Z

@12010486 we won't be able to include this one in OH 1.19.0 release (as it is happening "right now").
Would you mind targeting this changes to main? This way we could include this in the earliest new release (either a dot release or 1.20 release...)

12010486 · 2025-08-26T14:02:24Z

@karol-brejna-i I created a new PR after your advice to merge these changes on main.
Please note, main is not including 8f7fecd yet, which I believe would be needed for RnD CI.

CC: @regisss, @sungwook-son

Closing the PR due to the above reason

astachowiczhabana and others added 30 commits July 11, 2025 10:40

Release 1.19 only: QA changes to examples

b5a6aa0

Upgrade to lm_eval==4.8.0 (huggingface#2082)

4f7b3cc

Add support for setting --junitxml output via JUNITXML_DIR environmen…

6f1cfce

…t variable (huggingface#2084)

Bitsandbytes installation for qlora tests (huggingface#1951)

67c662e

Temporarily revert SD quant files to fix promotion (huggingface#2069)

76f6bd7

Update readme files for explicit lazy mode (huggingface#1921)

0955a80

Integrated NF4 inference tests to text-generation (huggingface#2058)

b017b2b

Remove bitsandbytes monkey-patching (II) (huggingface#2114)

54aded1

Add groups to slow_tests_image_to_text_example (huggingface#2008)

c03dbea

Lm eval accuracy regression fix (huggingface#2105)

61be32b

Skip unnecessary padding in text generation task (huggingface#2055)

2b66513

Reduce index_copy to fp8 in llama2 - QDQ flow huggingface#2065

1cf2032

Unify SetTrueOrFalseOrNone and StoreTrueFalseAction (huggingface#2119)

62b45d7

* Unify SetTrueOrFalseOrNone and StoreTrueFalseAction * Fix style

Fix profiler (huggingface#2134)

a5500ea

Fix missing openorca dataset (huggingface#2133)

ccf00fb

Add boft support in stable-diffusion (huggingface#1295)

4ba841d

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Add Qwen classification (huggingface#2062)

1c0d4c0

Co-authored-by: Yuan Tian <tian.yuan@intel.com>

Merge branch 'main' into v1.19-release

efc1e29

Synapse 1.22 Optimum Habana 1.19 (huggingface#2137)

f10d7d9

* Synapse 1.22 Optimum Habana 1.19 * Fix deepspeed version to 1.21.0

Pin opencv version to 4.10.0.84 and remove pinned numpy version. (hug…

c3e3adb

…gingface#2145) Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>

Use eval_strategy in sentence-transformers-training (huggingface#2151)

a07fa48

Add sentencepiece to setup.py (huggingface#2153)

4a37df6

With the merge of [1], the patching in `modeling_utils.py` causes a series of imports which eventually tries to import `sentencepiece`. Hence, now `sentencepiece` is now a required package. [1] huggingface#1719

Diffusers 0.34.0 (huggingface#2152)

77a0f37

Use profiler in text-generation-pipeline (huggingface#2154)

9e79289

12010486 requested a review from regisss as a code owner August 6, 2025 15:18

12010486 changed the title ~~Adding more relevant args to lm_eval~~ Adding more args to lm_eval Aug 6, 2025

Partial support of metadata

dc88d4e

12010486 assigned astachowiczhabana Aug 6, 2025

sungwook-son and others added 2 commits August 6, 2025 21:59

Add: adding appy_chat_template config

888d827

[Add] apply_chat_template CLI option

0cee5df

[Add] apply_chat_template CLI option

12010486 marked this pull request as draft August 7, 2025 08:15

12010486 added 4 commits August 7, 2025 09:35

Support latest lm_eval + samples and metadata args fully

acccc5f

Add system_instruction

433f568

Add gen_kwargs

bfc5e85

Add HabanaModelAdapter attributes + _model_generate() improv

63ef628

12010486 changed the title ~~Adding more args to lm_eval~~ lm_eval to 0.4.9.1 and support for new args Aug 7, 2025

12010486 marked this pull request as ready for review August 7, 2025 14:36

Fix for negative max_gen_toks (e.g. in gsm8k)

c9a4e74

Added to run HumanEval

80a3733

regisss approved these changes Aug 11, 2025

View reviewed changes

astachowiczhabana force-pushed the v1.19-release branch from 061905e to 13c4731 Compare August 22, 2025 14:05

astachowiczhabana requested review from libinta, mandy-li and vivekgoe as code owners August 22, 2025 14:05

12010486 mentioned this pull request Aug 26, 2025

lm_eval to 0.4.9.1 and support for new args - rebased #2228

Merged

12010486 closed this Aug 26, 2025

12010486 deleted the lm_eval_args branch September 3, 2025 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm_eval to 0.4.9.1 and support for new args#2193

lm_eval to 0.4.9.1 and support for new args#2193
12010486 wants to merge 66 commits into
huggingface:v1.19-releasefrom
12010486:lm_eval_args

12010486 commented Aug 6, 2025 •

edited

Loading

Uh oh!

12010486 commented Aug 6, 2025

Uh oh!

12010486 commented Aug 7, 2025

Uh oh!

12010486 commented Aug 8, 2025

Uh oh!

regisss commented Aug 8, 2025

Uh oh!

12010486 commented Aug 8, 2025

Uh oh!

regisss left a comment

Uh oh!

karol-brejna-i commented Aug 20, 2025

Uh oh!

12010486 commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

12010486 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

12010486 commented Aug 6, 2025

Uh oh!

12010486 commented Aug 7, 2025

Uh oh!

12010486 commented Aug 8, 2025

Uh oh!

regisss commented Aug 8, 2025

Uh oh!

12010486 commented Aug 8, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

karol-brejna-i commented Aug 20, 2025

Uh oh!

12010486 commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

12010486 commented Aug 6, 2025 •

edited

Loading