Lm eval accuracy regression fix by 12010486 · Pull Request #2105 · huggingface/optimum-habana

12010486 · 2025-07-03T15:25:01Z

What does this PR do?

Fix a small regression, ~-0.39% in some of the fp8 eval harness.

For example, if you execute:

PT_HPU_LAZY_MODE=1 HF_DATASETS_TRUST_REMOTE_CODE=true QUANT_CONFIG=/root/optimum-habana/examples/text-generation//quantization_config//maxabs_measure.json  TQDM_DISABLE=1 python3  run_lm_eval.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --warmup 0 --use_hpu_graphs -o test_results_measure.json --bf16 --batch_size 1 --use_kv_cache --trim_logits --attn_softmax_bf16 --bucket_size=128 --bucket_internal --trust_remote_code --tasks hellaswag lambada_openai piqa winogrande mathqa pubmedqa arc_easy arc_challenge

and

PT_HPU_LAZY_MODE=1 HF_DATASETS_TRUST_REMOTE_CODE=true QUANT_CONFIG=/root/optimum-habana/examples/text-generation//quantization_config//act_maxabs_pow2_weights_pcs_opt_pow2_quant.json  TQDM_DISABLE=1  python3  run_lm_eval.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --warmup 0 --use_hpu_graphs -o test_results_quant.json --bf16 --batch_size 1 --use_kv_cache --trim_logits --attn_softmax_bf16 --bucket_size=128 --bucket_internal --trust_remote_code --tasks hellaswag lambada_openai piqa winogrande mathqa pubmedqa arc_easy arc_challenge     --show_config

with this fix we have an average accuracy of 66.683, while before we dropped to 66.422

* Fix for accuracy regression * Fix for utf-8 encoding issue Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

12010486 added 2 commits July 2, 2025 13:38

Fix for accuracy regression

2c6c562

Fix for utf-8 encoding issue

19944d3

12010486 requested a review from regisss as a code owner July 3, 2025 15:25

12010486 requested review from astachowiczhabana and removed request for regisss July 3, 2025 15:25

astachowiczhabana approved these changes Jul 7, 2025

View reviewed changes

astachowiczhabana merged commit e4ed6c6 into huggingface:v1.19-release Jul 7, 2025
1 check passed

12010486 deleted the lm_eval_regression_fix branch July 7, 2025 13:39

astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 8, 2025

Lm eval accuracy regression fix (huggingface#2105)

662a1b1

astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 8, 2025

Lm eval accuracy regression fix (huggingface#2105)

5b27411

astachowiczhabana added a commit that referenced this pull request Jul 8, 2025

Lm eval accuracy regression fix (#2105)

946bd74

astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 10, 2025

Lm eval accuracy regression fix (huggingface#2105)

8e5ab97

astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 11, 2025

Lm eval accuracy regression fix (huggingface#2105)

61be32b

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

Lm eval accuracy regression fix (huggingface#2105) (#353)

777918c

* Fix for accuracy regression * Fix for utf-8 encoding issue Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lm eval accuracy regression fix#2105

Lm eval accuracy regression fix#2105
astachowiczhabana merged 2 commits into
huggingface:v1.19-releasefrom
12010486:lm_eval_regression_fix

12010486 commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

12010486 commented Jul 3, 2025

What does this PR do?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants