Enable mixtral 8x7b accuracy evaluation by rbogdano · Pull Request #1986 · huggingface/optimum-habana

rbogdano · 2025-05-19T14:25:48Z

This commit implements accuracy.json generation.
In order to generate inference responses for custom dataset simply run run_generation script with two parameters --dataset custom_dataset.pkl and --dataset_name custom.

Additionaly you can setup environment for mbxp dataset evaluation and evaluate responses by using mlcommon scripts.

Results from the evaluation of the mlcommon dataset, which is a combination of OpenOrca, GSM8K, and MBXP.
{
'rouge1': 45.4708,
'rouge2': 23.2887,
'rougeL': 30.3478,
'rougeLsum': 42.4501,
'gsm8k': 74.16,
'mbxp': 60.36,
'gen_len': 4243067,
'gen_num': 15000,
'gen_tok_len': 2808861,
'tokens_per_sample': 187.3
}

vidyasiv

Thanks for your PR @rbogdano

Could you provide me with a custom pickle file for testing this PR?
Please also run make style on this PR and fix any issues (I see 2 at my end which need manual fixing).

vidyasiv · 2025-05-27T16:16:59Z

Contacted author offline, they're busy now and will respond in 1-2 weeks.

HuggingFaceDocBuilderDev · 2025-06-02T11:40:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vidyasiv

LGTM, thanks for the prompt updates @rbogdano

rbogdano · 2025-06-02T19:11:52Z

LGTM, thanks for the prompt updates @rbogdano

Thanks for really good review :)

astachowiczhabana · 2025-06-03T07:13:57Z

@regisss can we merge this to main/1.18. The code is separated and shouldn't disrupt current tests results

regisss

LGTM

Co-authored-by: Rafal <rbogdanowicz@habana.ai>

* Merge v1.18-release * Hot fix regional compilation (huggingface#2005) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Enable mixtral 8x7b accuracy evaluation (huggingface#1986) Co-authored-by: Rafal <rbogdanowicz@habana.ai> * Update readme files for explicit lazy mode (huggingface#1921) Co-authored-by: Karol Brejna <karol.brejna@intel.com> Co-authored-by: Piotr Bielak <piotr.bielak@intel.com> * [llama-vision] Remove token_idx_cpu parameter (huggingface#2018) Integer parameter token_idx_cpu passed to mllama's forward() method caused an issue with hpu graph cache which led to performance drop. Signed-off-by: Urszula <urszula.golowicz@intel.com> * Update README examples (huggingface#2020) * Fix examples in README audio-classification: - add space between "False" and backslash image-to-text: - add "datasets" to requirements.txt pytorch-image-models: - add "datasets" to requirements.txt sentence-transformers-training/nli: - add command to properly discover HABANA_VISIBLE_MODULES sentence-transformers-training/sts: - add command to properly discover HABANA_VISIBLE_MODULES speech-recognition: - add `--trust_remote_code` for seq2seq examples stable-diffusion/training: - add missing OpenCV requirement for ControlNet Training Co-authored-by: Karol Brejna <karol.brejna@intel.com> * Review fixes: remove grabbing all modules --------- Co-authored-by: Karol Brejna <karol.brejna@intel.com> Co-authored-by: karol-brejna-i <karolbrejna@apache.org> * Pin latest optimum to force mutual updates (huggingface#2016) pin latest optimum to force mutual updates * Fix FP8 support and address related issues (huggingface#2010) - Resolve bugs related to FP8 (floating point 8-bit) computation - Improve stability and correctness of FP8 operations - Add/fix tests to validate FP8 functionality - Update relevant documentation and comments Co-authored-by: IlyasMoutawwakil --------- Signed-off-by: Urszula <urszula.golowicz@intel.com> Co-authored-by: Adam Stachowicz <astachowicz@habana.ai> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Rafal Bogdanowicz <rafal.bogdanowicz@intel.com> Co-authored-by: Rafal <rbogdanowicz@habana.ai> Co-authored-by: Jan Kamiński <jkaminski@habana.ai> Co-authored-by: Karol Brejna <karol.brejna@intel.com> Co-authored-by: Piotr Bielak <piotr.bielak@intel.com> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com> Co-authored-by: karol-brejna-i <karolbrejna@apache.org>

Rafal added 2 commits May 19, 2025 17:16

implemented accuracy.json generation and added evaluation setup scripts

03e5b7e

added necessary information to readme

5206f63

rbogdano requested a review from regisss as a code owner May 19, 2025 14:25

kwisniewski98 approved these changes May 20, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

rbogdano mentioned this pull request May 21, 2025

enabling accuracy tests for mbxp gsm8k datasets (#178) #1840

Closed

vidyasiv suggested changes May 21, 2025

View reviewed changes

Comment thread examples/text-generation/README.md

Comment thread examples/text-generation/README.md Outdated

Comment thread examples/text-generation/README.md Outdated

Comment thread examples/text-generation/run_generation.py

readme improvement

d6b188f

vidyasiv suggested changes May 29, 2025

View reviewed changes

Comment thread examples/text-generation/run_generation.py

Comment thread examples/text-generation/README.md

vidyasiv reviewed May 30, 2025

View reviewed changes

Comment thread examples/text-generation/run_generation.py Outdated

Rafal added 2 commits June 2, 2025 13:13

updated run_generation.py script to new argument name mlcommons-dataset

de2f681

updated README with new argment name mlcommons-dataset

70bba7d

rbogdano requested a review from vidyasiv June 2, 2025 10:22

astachowiczhabana added the synapse 1.21 label Jun 2, 2025

astachowiczhabana requested changes Jun 2, 2025

View reviewed changes

Comment thread examples/text-generation/mbxp_evaluation/evaluation_setup/ubuntu.sh

vidyasiv suggested changes Jun 2, 2025

View reviewed changes

Rafal added 2 commits June 2, 2025 20:08

Review changes in README and run_generation.py

eb1a063

Lowered batch size

0269d13

vidyasiv approved these changes Jun 2, 2025

View reviewed changes

regisss approved these changes Jun 3, 2025

View reviewed changes

regisss merged commit 2d71a45 into huggingface:main Jun 3, 2025
4 checks passed

regisss pushed a commit that referenced this pull request Jun 3, 2025

Enable mixtral 8x7b accuracy evaluation (#1986)

5f2bb76

Co-authored-by: Rafal <rbogdanowicz@habana.ai>

astachowiczhabana pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jun 5, 2025

Enable mixtral 8x7b accuracy evaluation (huggingface#1986)

e9f2a76

Co-authored-by: Rafal <rbogdanowicz@habana.ai>

astachowiczhabana pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jun 10, 2025

Enable mixtral 8x7b accuracy evaluation (huggingface#1986)

80b25b4

Co-authored-by: Rafal <rbogdanowicz@habana.ai>

Conversation

rbogdano commented May 19, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

vidyasiv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vidyasiv commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vidyasiv left a comment

Choose a reason for hiding this comment

Uh oh!

rbogdano commented Jun 2, 2025

Uh oh!

astachowiczhabana commented Jun 3, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vidyasiv left a comment •

edited

Loading