lm_eval to 0.4.9.1 and support for new args - rebased by 12010486 · Pull Request #2228 · huggingface/optimum-habana

12010486 · 2025-08-26T13:54:07Z

Cherry-picked from #2193 to have it on main.

Changes are as follow:
Argument parsing and evaluation configuration:

Added new command-line arguments in run_lm_eval.py for controlling evaluation, including support for generation kwargs, few-shot and multi-turn settings, metadata, system instructions, chat template application, and sample selection. This allows for more granular and customizable evaluation runs.
Add try_parse_json to robustly handle JSON or string input for generation arguments.
Updated main evaluation logic to pass new arguments through to the evaluator, and added validation for combinations of options (e.g., requiring chat template when using multi-turn few-shot).

Model adapter enhancements:

Added support for softmax_dtype, think_end_token, enable_thinking, and chat_template_args in HabanaModelAdapter initialization, enabling more advanced generation and prompt formatting features.
Improved bucket selection logic for static shape generation, and switched to using max_new_tokens for generation instead of max_length, aligning with HuggingFace's recommended API usage.

Dependency update:

Upgraded the lm-eval package to version 0.4.9.1 to support new features and bug fixes.

General improvements:

Minor refactoring for imports and typing to support new features.

HuggingFaceDocBuilderDev · 2025-08-26T13:58:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AKloniecki

We're currently working on running these changes in our CI, to check results after the upgrade. Will report back as soon as we get the results.

AKloniecki · 2025-08-27T10:47:55Z

    def get_model_info(self) -> dict:
        """
        Patched method to get Hugging Face model information for experiment reproducibility.


Please update get_model_info() function to match current version at https://github.com/EleutherAI/lm-evaluation-harness/blob/v0.4.9.1/lm_eval/models/huggingface.py#L1522
It seems to have changed since our last update.

@AKloniecki, thank you very much for the review. On this specific point, it might be that we can actually remove the whole get_model_info(), as that add was due to a glitch in neural_compressor_pt, I'm checking in the background if it was solved with v 3.3.1, the latest on SynapseAI 1.21

I can confirm the error that required the get_model_info() patch is gone. Hence, my last commit is removing the function

regisss

LGTM!

…) (huggingface#641) Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

12010486 and others added 10 commits August 26, 2025 12:59

Adding more relevant args to lm_eval

e37f799

Partial support of metadata

39d9e81

Add: adding appy_chat_template config

f8dc9ce

Support latest lm_eval + samples and metadata args fully

7076d9c

Add system_instruction

0b356be

Add gen_kwargs

f8e030e

Add HabanaModelAdapter attributes + _model_generate() improv

5e94dda

Fix for negative max_gen_toks (e.g. in gsm8k)

c09d044

Added to run HumanEval

3388c85

Fix after rebase

53c9468

12010486 requested a review from regisss as a code owner August 26, 2025 13:54

karol-brejna-i self-assigned this Aug 27, 2025

AKloniecki reviewed Aug 27, 2025

View reviewed changes

12010486 added 3 commits August 27, 2025 13:48

List removal & other improvements

d77cbea

Removed patched get_model_info()

f38ff1d

Merge branch 'main' into lm_eval_new_args_rebased

d2765d9

12010486 requested a review from AKloniecki August 27, 2025 15:00

AKloniecki reviewed Aug 28, 2025

View reviewed changes

Comment thread examples/text-generation/run_lm_eval.py Outdated

12010486 added 2 commits August 28, 2025 17:56

Fix redundant args

e86a26f

Left out in the rebase - Fix

b12bd06

regisss approved these changes Sep 2, 2025

View reviewed changes

regisss merged commit 54ddb64 into huggingface:main Sep 2, 2025
2 of 5 checks passed

astachowiczhabana pushed a commit that referenced this pull request Sep 2, 2025

lm_eval to 0.4.9.1 and support for new args - rebased (#2228)

0996c42

12010486 deleted the lm_eval_new_args_rebased branch September 3, 2025 07:19

astachowiczhabana pushed a commit that referenced this pull request Sep 17, 2025

lm_eval to 0.4.9.1 and support for new args - rebased (#2228)

34410ed

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

lm_eval to 0.4.9.1 and support for new args - rebased (huggingface#2228…

96508ad

…) (huggingface#641) Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm_eval to 0.4.9.1 and support for new args - rebased#2228

lm_eval to 0.4.9.1 and support for new args - rebased#2228
regisss merged 15 commits into
huggingface:mainfrom
12010486:lm_eval_new_args_rebased

12010486 commented Aug 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 26, 2025

Uh oh!

AKloniecki left a comment

Uh oh!

Uh oh!

Uh oh!

AKloniecki Aug 27, 2025

Uh oh!

12010486 Aug 27, 2025

Uh oh!

12010486 Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

12010486 commented Aug 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 26, 2025

Uh oh!

AKloniecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AKloniecki Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

12010486 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

12010486 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants