Untangle config inheritance#41541
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Oh no, I caused 100+ merge conflicts with another PR 🙃 |
| pad_token_id: Optional[int] = None, | ||
| bos_token_id: Optional[int] = None, |
There was a problem hiding this comment.
modular file shows that these were supposed to be deleted with del bos_token_id, but the conversion script could not handle it correctly
|
Full CI didn't show any test errors, so I will be wrapping this up today-tmrw |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: afmoe, aimv2, albert, align, altclip, apertus, arcee, aria, audioflamingo3, auto, bamba, bark, bart, bert, bert_generation, big_bird |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41541&sha=0e9d3d |
|
Thanks, now I will go back to the hub validation and making configs a |
* remove from base * delete * fetcher fix * missing values * update * is decoder missing * forgot to add * add special tokens with default `None` in text models * fsmt has unused subconfig, fix it! * update * fix * add missig token id defaults * fix more tests * tie_word_embeddings * tiny fixes * more test fixes * fix docstrings * fix copies * fix style? * fix copied again * fix copies * fix examples * delete left over print stmt * splitnter * this defi will fix a bunch decoder-only models * make style * fix copies * not all models are supposed to have an attr for `tie_word_embeddings`! * comment out * fix * more fixes * fix copies * docstring and non-model tests * update * fix repo consistency * style * fix * revert * remove unused attr * fix repo * fix test * fix a few tests, more tests * fix gemma & llava * style * gemma3n also * new models as well * skip the test --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5).
## Summary <!--- This is a required section; please describe the main purpose of this proposed code change. ---> Fix multiple failing monkey_patch tests for transformers v5. The tests failed because of huggingface/transformers#41541. These changes are backward compatible. This change fixes the following failing tests: ``` FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_for_conditional_generation - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_text - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_llama4_for_conditional_generation - AttributeError: 'Llama4Config' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_glm4v - AttributeError: 'Glm4vTextConfig' object has no attribute 'pad_token_id' ``` Fixes #1059. <!--- ## Details This is an optional section; is there anything specific that reviewers should be aware of? ---> ## Testing Done <!--- This is a required section; please describe how this change was tested. ---> <!-- Replace BLANK with your device type. For example, A100-80G-PCIe Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> - Hardware Type: <BLANK> - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
* Fix T5 v1.1 detection PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
What does this PR do?
As per title, deletes base config attributes that are not actually universal for all models (e.g. token ids used only for text models and tying can be done only if model has embedding layers)
After this PR, we need to clean up generation-related params from config classes (#41695), and then we can easily turn config classes to a
dataclass. Usingdataclassis one of the requirements for hf hub type validation which I am working on currently