fea(): WA for run_clm config imports by imangohari1 · Pull Request #2232 · huggingface/optimum-habana

imangohari1 · 2025-08-27T22:05:16Z

What does this PR do?

There is bug in run_clm.py that instead of loading the local OH configuration, it is loading the upstream HF here:

MODEL_CONFIG_CLASSES = list(MODEL_FOR_CAUSAL_LM_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

This is WA to fix it.

Reproducer

Applying this patch

diff --git a/optimum/habana/transformers/models/mistral/configuration_mistral.py b/optimum/habana/transformers/models/mistral/configuration_mistral.py
index 9af0c897..e273509b 100644
--- a/optimum/habana/transformers/models/mistral/configuration_mistral.py
+++ b/optimum/habana/transformers/models/mistral/configuration_mistral.py
@@ -56,6 +56,7 @@ class MistralConfig(MistralConfig):
             **kwargs,
         )
 
+        breakpoint()
         self.rope_scaling = rope_scaling
 
         # Validate the correctness of rotary position embeddings parameters

and run cmd

make test_instals; python3 examples/language-modeling/run_clm.py --model_name_or_path mistralai/Mistral-7B-v0.1 --gaudi_config_name Habana/gpt2 --dataset_name wikitext --do_train --output_dir /tmp/tmp3vtnetji --overwrite_output_dir --learning_rate 0.0002 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --num_train_epochs 2 --use_habana --throughput_warmup_steps 3 --save_strategy no --use_lazy_mode --do_eval --dataset_config_name wikitext-2-raw-v1 --use_hpu_graphs_for_inference

Results

with this PR: the code load local OH mistral and breaks (expected)
without this PR: the code loads the HF mistral and doesn't break (not expected)

Notes:

I've checked the models where we have the configuration overwrite and cross-referenced them with HF CLM and MLM. We only have configuration overwrite for CLM models.
This issue has not been noticed since none of the config var changes are not directly used in modeling_XX.py files.
There might be a better, more comprehensive solution this. Therefore this PR starts with draft.

--
co-authored by Yaser Afshar @yafshar

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

imangohari1 · 2025-08-27T22:08:12Z

CC: @libinta
Hi @regisss,
can you take a look at this and suggest what is the best way to handle this? the current PR works fine, but I would love to hear your inputs. Thank you.

… for run_clm. fea(gemma2/3): remoevd the configuration changes and simplied the swa implementation

regisss · 2025-08-28T07:42:27Z

I think the current change is fine

imangohari1 · 2025-08-28T14:10:55Z

I think the current change is fine

@regisss thanks. I made this PR ready then.

HuggingFaceDocBuilderDev · 2025-08-28T14:32:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM

) Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

fea(): WA for run_clm config imports

c7eb851

imangohari1 referenced this pull request in imangohari1/optimum-habana Aug 27, 2025

fea(swa): splitted the attention_mask for swa and none-swa. fea(): WA…

50a9d71

… for run_clm. fea(gemma2/3): remoevd the configuration changes and simplied the swa implementation

imangohari1 marked this pull request as ready for review August 28, 2025 14:10

imangohari1 requested a review from regisss as a code owner August 28, 2025 14:10

regisss approved these changes Aug 28, 2025

View reviewed changes

regisss merged commit 5eb8842 into huggingface:main Aug 28, 2025
2 of 4 checks passed

astachowiczhabana pushed a commit that referenced this pull request Aug 29, 2025

fea(): WA for run_clm config imports (#2232)

73f095b

astachowiczhabana pushed a commit that referenced this pull request Sep 2, 2025

fea(): WA for run_clm config imports (#2232)

c34a34b

astachowiczhabana pushed a commit that referenced this pull request Sep 17, 2025

fea(): WA for run_clm config imports (#2232)

c1a3e98

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

fea(): WA for run_clm config imports (huggingface#2232) (huggingface#636

042a99c

) Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fea(): WA for run_clm config imports#2232

fea(): WA for run_clm config imports#2232
regisss merged 1 commit into
huggingface:mainfrom
imangohari1:ig/clm-import-fix

imangohari1 commented Aug 27, 2025

Uh oh!

imangohari1 commented Aug 27, 2025

Uh oh!

regisss commented Aug 28, 2025

Uh oh!

imangohari1 commented Aug 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2025

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

imangohari1 commented Aug 27, 2025

What does this PR do?

Reproducer

Results

Notes:

Before submitting

Uh oh!

imangohari1 commented Aug 27, 2025

Uh oh!

regisss commented Aug 28, 2025

Uh oh!

imangohari1 commented Aug 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants