Skip to content

fea(): WA for run_clm config imports#2232

Merged
regisss merged 1 commit into
huggingface:mainfrom
imangohari1:ig/clm-import-fix
Aug 28, 2025
Merged

fea(): WA for run_clm config imports#2232
regisss merged 1 commit into
huggingface:mainfrom
imangohari1:ig/clm-import-fix

Conversation

@imangohari1
Copy link
Copy Markdown
Contributor

What does this PR do?

There is bug in run_clm.py that instead of loading the local OH configuration, it is loading the upstream HF here:

MODEL_CONFIG_CLASSES = list(MODEL_FOR_CAUSAL_LM_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

This is WA to fix it.

Reproducer

Applying this patch

diff --git a/optimum/habana/transformers/models/mistral/configuration_mistral.py b/optimum/habana/transformers/models/mistral/configuration_mistral.py
index 9af0c897..e273509b 100644
--- a/optimum/habana/transformers/models/mistral/configuration_mistral.py
+++ b/optimum/habana/transformers/models/mistral/configuration_mistral.py
@@ -56,6 +56,7 @@ class MistralConfig(MistralConfig):
             **kwargs,
         )
 
+        breakpoint()
         self.rope_scaling = rope_scaling
 
         # Validate the correctness of rotary position embeddings parameters

and run cmd

make test_instals; python3 examples/language-modeling/run_clm.py --model_name_or_path mistralai/Mistral-7B-v0.1 --gaudi_config_name Habana/gpt2 --dataset_name wikitext --do_train --output_dir /tmp/tmp3vtnetji --overwrite_output_dir --learning_rate 0.0002 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --num_train_epochs 2 --use_habana --throughput_warmup_steps 3 --save_strategy no --use_lazy_mode --do_eval --dataset_config_name wikitext-2-raw-v1 --use_hpu_graphs_for_inference

Results

  • with this PR: the code load local OH mistral and breaks (expected)
  • without this PR: the code loads the HF mistral and doesn't break (not expected)

Notes:

  • I've checked the models where we have the configuration overwrite and cross-referenced them with HF CLM and MLM. We only have configuration overwrite for CLM models.
  • This issue has not been noticed since none of the config var changes are not directly used in modeling_XX.py files.
  • There might be a better, more comprehensive solution this. Therefore this PR starts with draft.

--
co-authored by Yaser Afshar @yafshar

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@imangohari1
Copy link
Copy Markdown
Contributor Author

CC: @libinta
Hi @regisss,
can you take a look at this and suggest what is the best way to handle this? the current PR works fine, but I would love to hear your inputs. Thank you.

imangohari1 referenced this pull request in imangohari1/optimum-habana Aug 27, 2025
… for run_clm. fea(gemma2/3): remoevd the configuration changes and simplied the swa implementation
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Aug 28, 2025

I think the current change is fine

@imangohari1 imangohari1 marked this pull request as ready for review August 28, 2025 14:10
@imangohari1 imangohari1 requested a review from regisss as a code owner August 28, 2025 14:10
@imangohari1
Copy link
Copy Markdown
Contributor Author

I think the current change is fine

@regisss thanks. I made this PR ready then.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit 5eb8842 into huggingface:main Aug 28, 2025
2 of 4 checks passed
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
)

Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants