Skip to content

Lm eval accuracy regression fix#2105

Merged
astachowiczhabana merged 2 commits into
huggingface:v1.19-releasefrom
12010486:lm_eval_regression_fix
Jul 7, 2025
Merged

Lm eval accuracy regression fix#2105
astachowiczhabana merged 2 commits into
huggingface:v1.19-releasefrom
12010486:lm_eval_regression_fix

Conversation

@12010486
Copy link
Copy Markdown
Contributor

@12010486 12010486 commented Jul 3, 2025

What does this PR do?

Fix a small regression, ~-0.39% in some of the fp8 eval harness.

For example, if you execute:

PT_HPU_LAZY_MODE=1 HF_DATASETS_TRUST_REMOTE_CODE=true QUANT_CONFIG=/root/optimum-habana/examples/text-generation//quantization_config//maxabs_measure.json  TQDM_DISABLE=1 python3  run_lm_eval.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --warmup 0 --use_hpu_graphs -o test_results_measure.json --bf16 --batch_size 1 --use_kv_cache --trim_logits --attn_softmax_bf16 --bucket_size=128 --bucket_internal --trust_remote_code --tasks hellaswag lambada_openai piqa winogrande mathqa pubmedqa arc_easy arc_challenge

and

PT_HPU_LAZY_MODE=1 HF_DATASETS_TRUST_REMOTE_CODE=true QUANT_CONFIG=/root/optimum-habana/examples/text-generation//quantization_config//act_maxabs_pow2_weights_pcs_opt_pow2_quant.json  TQDM_DISABLE=1  python3  run_lm_eval.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --warmup 0 --use_hpu_graphs -o test_results_quant.json --bf16 --batch_size 1 --use_kv_cache --trim_logits --attn_softmax_bf16 --bucket_size=128 --bucket_internal --trust_remote_code --tasks hellaswag lambada_openai piqa winogrande mathqa pubmedqa arc_easy arc_challenge     --show_config

with this fix we have an average accuracy of 66.683, while before we dropped to 66.422

@12010486 12010486 requested a review from regisss as a code owner July 3, 2025 15:25
@12010486 12010486 requested review from astachowiczhabana and removed request for regisss July 3, 2025 15:25
@astachowiczhabana astachowiczhabana merged commit e4ed6c6 into huggingface:v1.19-release Jul 7, 2025
1 check passed
@12010486 12010486 deleted the lm_eval_regression_fix branch July 7, 2025 13:39
astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 8, 2025
astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 8, 2025
astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 10, 2025
astachowiczhabana added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jul 11, 2025
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
* Fix for accuracy regression

* Fix for utf-8 encoding issue

Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants