Skip to content

lm_eval to 0.4.9.1 and support for new args#2193

Closed
12010486 wants to merge 66 commits into
huggingface:v1.19-releasefrom
12010486:lm_eval_args
Closed

lm_eval to 0.4.9.1 and support for new args#2193
12010486 wants to merge 66 commits into
huggingface:v1.19-releasefrom
12010486:lm_eval_args

Conversation

@12010486
Copy link
Copy Markdown
Contributor

@12010486 12010486 commented Aug 6, 2025

Argument parsing and evaluation configuration:

  • Added new command-line arguments in run_lm_eval.py for controlling evaluation, including support for generation kwargs, few-shot and multi-turn settings, metadata, system instructions, chat template application, and sample selection. This allows for more granular and customizable evaluation runs.
  • Add try_parse_json to robustly handle JSON or string input for generation arguments.
  • Updated main evaluation logic to pass new arguments through to the evaluator, and added validation for combinations of options (e.g., requiring chat template when using multi-turn few-shot).

Model adapter enhancements:

  • Added support for softmax_dtype, think_end_token, enable_thinking, and chat_template_args in HabanaModelAdapter initialization, enabling more advanced generation and prompt formatting features.
  • Improved bucket selection logic for static shape generation, and switched to using max_new_tokens for generation instead of max_length, aligning with HuggingFace's recommended API usage.

Dependency update:

  • Upgraded the lm-eval package to version 0.4.9.1 to support new features and bug fixes.

General improvements:

  • Minor refactoring for imports and typing to support new features.

astachowiczhabana and others added 30 commits July 11, 2025 10:40
* Unify SetTrueOrFalseOrNone and StoreTrueFalseAction

* Fix style
Fix utils package

A previous PR [1] introduced a new package `optimum.habana.utils`
which created a conflict with the `optimum/habana/utils.py` file.
This commit adds the missing `utils/__init__.py` file and moves the
`utils.py` file into the `utils/` module (renamed as `misc.py`).

[1] huggingface#1926

Co-authored-by: Piotr Bielak <pbielak@habana.ai>
* Align VideoLlavaProcessor with Transformers v4.51.3

The GaudiVideoLlavaProcessor has been removed from optimum-habana
as its functionality is now fully aligned with the upstream
Transformers implementation. No custom logic is required, and
maintaining a separate class is redundant.

* Update GaudiVideoLlavaForConditionalGeneration

This update aligns the GaudiVideoLlavaForConditionalGeneration
implementation with the v4.51.3 transformers changes while retaining
`token_idx` argument for compatibility with Gaudi optimizations.
With Llama 4 support in Transformers 4.51, there was a change in the
`Pipeline` class [1], which causes the pipeline device to be set to
`self.model.device`. In the case of Mllama, DeepSpeed is used to create
the `.language_model` on HPU, whereas the rest of the model stays
on CPU [2]. Hence, always `self.model.device = CPU`, which causes the
whole model to be placed back on CPU. This commit explicitly moves the
model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Yuan Tian <tian.yuan@intel.com>
* Synapse 1.22 Optimum Habana 1.19

* Fix deepspeed version to 1.21.0
…itialization. (huggingface#2126)

* Move import to local scope in run_lm_eval, to allow prior env vars initialization.

Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>

* Address PR comment: Adding comment explaining delayed import.

Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>

---------

Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>
…ngface#2136)

add param DATASET_CONFIG

add param DATASET_CONFIG to makefile

add param DATASET_CONFIG to makefile - part.2

add log msg

update path

update path - part.2

update path - part.3

update path - part.4

update path - part.5

Return empty args for unsupported examples
* Ported arctic instruct code

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Make style and resolve GenerationMixin warnings

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Fixed tokenization imports

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Updated requirements for Arctic Model

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Apply fix for ArcticRMSNorm from Llama

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Use customized rope

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Better try imports, unified RoPE implementation

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Fix typo

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Added mark step after decoder layers

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Using gaudi mixtral MOE impl

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Changed to gaudi repeat_kv and rope impls

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Add missing rope scaling to config

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Fix repeat_kv signature

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Revert "Using gaudi mixtral MOE impl"

This reverts commit 9c390e7.

* Remove other attention impls

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Add initial KV cache support

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Add fixed moe from mixtral

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Integrate KV cache

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Updated docs

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Apply fixes from huggingface#1705

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Resolve GenerationMixin and LlamaRotaryEmbedding warnings

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Cleanup unused code

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* make style

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Mirror rope usage in llama

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

---------

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Co-authored-by: Daniel Huang <tianmu.li@intel.com>
With the merge of [1], the patching in `modeling_utils.py` causes a
series of imports which eventually tries to import `sentencepiece`.
Hence, now `sentencepiece` is now a required package.

[1] huggingface#1719
@12010486 12010486 requested a review from regisss as a code owner August 6, 2025 15:18
@12010486 12010486 changed the title Adding more relevant args to lm_eval Adding more args to lm_eval Aug 6, 2025
@12010486
Copy link
Copy Markdown
Contributor Author

12010486 commented Aug 6, 2025

Examples of new command lines:

PT_HPU_LAZY=1 python run_lm_eval.py --model_name_or_path meta-llama/Llama-3.2-1B-Instruct --attn_softmax_bf16 --bf16 --batch_size 16 -o temp.txt --tasks mmlu --num_fewshot 5 --metadata '{"max_length":496}'

Note: metadata is fully supported in 0.4.9, while we are still on 0.4.8. I will update it shortly

@12010486
Copy link
Copy Markdown
Contributor Author

12010486 commented Aug 7, 2025

Apologies, we decided to add even more args. I'm marking it as draft until all the args are in

@12010486 12010486 marked this pull request as draft August 7, 2025 08:15
@12010486 12010486 changed the title Adding more args to lm_eval lm_eval to 0.4.9.1 and support for new args Aug 7, 2025
@12010486 12010486 marked this pull request as ready for review August 7, 2025 14:36
@12010486
Copy link
Copy Markdown
Contributor Author

12010486 commented Aug 8, 2025

When #2197 is merged, the lm_eval generate tasks will be able to run also in eager mode

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Aug 8, 2025

When #2197 is merged, the lm_eval generate tasks will be able to run also in eager mode

#2197 should be merged before this one?

@12010486
Copy link
Copy Markdown
Contributor Author

12010486 commented Aug 8, 2025

No need, the limitation of this one is to use only Lazy mode. Ideally we are going to have both in soon

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, is this ready to be merged?

@karol-brejna-i
Copy link
Copy Markdown
Collaborator

@12010486 we won't be able to include this one in OH 1.19.0 release (as it is happening "right now").
Would you mind targeting this changes to main? This way we could include this in the earliest new release (either a dot release or 1.20 release...)

@12010486
Copy link
Copy Markdown
Contributor Author

@karol-brejna-i I created a new PR after your advice to merge these changes on main.
Please note, main is not including 8f7fecd yet, which I believe would be needed for RnD CI.

CC: @regisss, @sungwook-son

Closing the PR due to the above reason

@12010486 12010486 closed this Aug 26, 2025
@12010486 12010486 deleted the lm_eval_args branch September 3, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.