lm_eval to 0.4.9.1 and support for new args#2193
Conversation
* Unify SetTrueOrFalseOrNone and StoreTrueFalseAction * Fix style
Fix utils package A previous PR [1] introduced a new package `optimum.habana.utils` which created a conflict with the `optimum/habana/utils.py` file. This commit adds the missing `utils/__init__.py` file and moves the `utils.py` file into the `utils/` module (renamed as `misc.py`). [1] huggingface#1926 Co-authored-by: Piotr Bielak <pbielak@habana.ai>
* Align VideoLlavaProcessor with Transformers v4.51.3 The GaudiVideoLlavaProcessor has been removed from optimum-habana as its functionality is now fully aligned with the upstream Transformers implementation. No custom logic is required, and maintaining a separate class is redundant. * Update GaudiVideoLlavaForConditionalGeneration This update aligns the GaudiVideoLlavaForConditionalGeneration implementation with the v4.51.3 transformers changes while retaining `token_idx` argument for compatibility with Gaudi optimizations.
With Llama 4 support in Transformers 4.51, there was a change in the `Pipeline` class [1], which causes the pipeline device to be set to `self.model.device`. In the case of Mllama, DeepSpeed is used to create the `.language_model` on HPU, whereas the rest of the model stays on CPU [2]. Hence, always `self.model.device = CPU`, which causes the whole model to be placed back on CPU. This commit explicitly moves the model to HPU, so the pipeline will be also placed on HPU. [1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61 [2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Yuan Tian <tian.yuan@intel.com>
* Synapse 1.22 Optimum Habana 1.19 * Fix deepspeed version to 1.21.0
…itialization. (huggingface#2126) * Move import to local scope in run_lm_eval, to allow prior env vars initialization. Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai> * Address PR comment: Adding comment explaining delayed import. Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai> --------- Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>
…gingface#2145) Signed-off-by: Artur Kloniecki <aklonieckix@habana.ai>
…ngface#2136) add param DATASET_CONFIG add param DATASET_CONFIG to makefile add param DATASET_CONFIG to makefile - part.2 add log msg update path update path - part.2 update path - part.3 update path - part.4 update path - part.5 Return empty args for unsupported examples
* Ported arctic instruct code Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Make style and resolve GenerationMixin warnings Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fixed tokenization imports Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated requirements for Arctic Model Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Apply fix for ArcticRMSNorm from Llama Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Use customized rope Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Better try imports, unified RoPE implementation Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix typo Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Added mark step after decoder layers Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Using gaudi mixtral MOE impl Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Changed to gaudi repeat_kv and rope impls Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add missing rope scaling to config Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix repeat_kv signature Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Revert "Using gaudi mixtral MOE impl" This reverts commit 9c390e7. * Remove other attention impls Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add initial KV cache support Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add fixed moe from mixtral Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Integrate KV cache Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated docs Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Apply fixes from huggingface#1705 Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Resolve GenerationMixin and LlamaRotaryEmbedding warnings Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Cleanup unused code Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * make style Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Mirror rope usage in llama Signed-off-by: Daniel Huang <daniel1.huang@intel.com> --------- Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Daniel Huang <tianmu.li@intel.com>
With the merge of [1], the patching in `modeling_utils.py` causes a series of imports which eventually tries to import `sentencepiece`. Hence, now `sentencepiece` is now a required package. [1] huggingface#1719
|
Examples of new command lines: Note: |
[Add] apply_chat_template CLI option
|
Apologies, we decided to add even more args. I'm marking it as draft until all the args are in |
|
When #2197 is merged, the lm_eval generate tasks will be able to run also in eager mode |
|
No need, the limitation of this one is to use only Lazy mode. Ideally we are going to have both in soon |
regisss
left a comment
There was a problem hiding this comment.
LGTM, is this ready to be merged?
|
@12010486 we won't be able to include this one in OH 1.19.0 release (as it is happening "right now"). |
061905e to
13c4731
Compare
|
@karol-brejna-i I created a new PR after your advice to merge these changes on CC: @regisss, @sungwook-son Closing the PR due to the above reason |
Argument parsing and evaluation configuration:
run_lm_eval.pyfor controlling evaluation, including support for generation kwargs, few-shot and multi-turn settings, metadata, system instructions, chat template application, and sample selection. This allows for more granular and customizable evaluation runs.try_parse_jsonto robustly handle JSON or string input for generation arguments.Model adapter enhancements:
softmax_dtype,think_end_token,enable_thinking, andchat_template_argsinHabanaModelAdapterinitialization, enabling more advanced generation and prompt formatting features.max_new_tokensfor generation instead ofmax_length, aligning with HuggingFace's recommended API usage.Dependency update:
lm-evalpackage to version 0.4.9.1 to support new features and bug fixes.General improvements: