forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 8
IFU-master-2022-05-05 #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If global_attention_mask is found in the models inputs (used by certain models, like LED) in the prediction_step method of Seq2SeqTrainer, it is added to the gen_kwargs, which are passed to model.decode(). This allows us to properly set the global attention when decoding.
* [benchmark tool] trainer-benchmark.py * improve * massive rework/expansion * fix * mucho improved * improved * fix prefix * fix * fix diff calculation * address suggestions
* 📝 add image/vision classification and asr * 🖍 minor formatting fixes * Fixed a typo in legacy seq2seq_trainer.py (huggingface#16531) * Add ONNX export for BeiT (huggingface#16498) * Add beit onnx conversion support * Updated docs * Added cross reference to ViT ONNX config * call on_train_end when trial is pruned (huggingface#16536) * Type hints added (huggingface#16529) * Fix Bart type hints (huggingface#16297) * Add type hints to PLBart PyTorch * Remove pending merge conflicts * Fix PLBart Type Hints * Add changes from review * Add VisualBert type hints (huggingface#16544) * Adding missing type hints for mBART model (PyTorch) (huggingface#16429) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent Co-authored-by: matt <[email protected]> * Remove MBart subclass of XLMRoberta in tokenzier docs (huggingface#16546) * Remove MBart subclass of XLMRoberta in tokenzier * Fix style * Copy docs from MBart50 tokenizer * Use random_attention_mask for TF tests (huggingface#16517) * use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh <[email protected]> * Improve code example (huggingface#16450) Co-authored-by: Niels Rogge <[email protected]> * Pin tokenizers version <0.13 (huggingface#16539) * Pin tokenizers version <0.13 * Style * Add code samples for TF speech models (huggingface#16494) Co-authored-by: ydshieh <[email protected]> * [FlaxSpeechEncoderDecoder] Fix dtype bug (huggingface#16581) * [FlaxSpeechEncoderDecoder] Fix dtype bug * more fixes * Making the impossible to connect error actually report the right URL. (huggingface#16446) * Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (huggingface#16556) * Add utility to find model labels (huggingface#16526) * Add utility to find model labels * Use it in the Trainer * Update src/transformers/utils/generic.py Co-authored-by: Matt <[email protected]> * Quality Co-authored-by: Matt <[email protected]> * Enable doc in Spanish (huggingface#16518) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates * Add use_auth to load_datasets for private datasets to PT and TF examples (huggingface#16521) * fix formatting and remove use_auth * Add use_auth_token to Flax examples * add a test checking the format of `convert_tokens_to_string`'s output (huggingface#16540) * add new tests * add comment to overridden tests * TF: Finalize `unpack_inputs`-related changes (huggingface#16499) * Add unpack_inputs to remaining models * removed kwargs to `call()` in TF models * fix TF T5 tests * [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (huggingface#16586) * initialize the default rank set on TrainerState (huggingface#16530) * initialize the default rank set on TrainerState * fix style * Trigger doc build * Fix CI: test_inference_for_pretraining in ViTMAEModelTest (huggingface#16591) Co-authored-by: ydshieh <[email protected]> * add a template to add missing tokenization test (huggingface#16553) * add a template to add missing tokenization test * add cookiecutter setting * improve doc * Update templates/adding_a_missing_tokenization_test/README.md Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * made _load_pretrained_model_low_mem static + bug fix (huggingface#16548) * handle torch_dtype in low cpu mem usage (huggingface#16580) * [Doctests] Correct filenaming (huggingface#16599) * [Doctests] Correct filenaming * improve quicktour * make style * Adding new train_step logic to make things less confusing for users (huggingface#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <[email protected]> * Adding missing type hints for BigBird model (huggingface#16555) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent * Type hints for BigBird * removing typos Co-authored-by: matt <[email protected]> * [deepspeed] fix typo, adjust config name (huggingface#16597) * 🖍 apply feedback Co-authored-by: Cathy <[email protected]> Co-authored-by: Jim Rohrer <[email protected]> Co-authored-by: Ferdinand Schlatt <[email protected]> Co-authored-by: Dahlbomii <[email protected]> Co-authored-by: Gunjan Chhablani <[email protected]> Co-authored-by: Rishav Chandra Varma <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Niels Rogge <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Nicolas Patry <[email protected]> Co-authored-by: Daniel Stancl <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Karim Foda <[email protected]> Co-authored-by: SaulLu <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Andres Codas <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Francesco Saverio Zuppichini <[email protected]> Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Stas Bekman <[email protected]>
* Completed documentation of CTRL * Missing optional None * Added return types * updated imports * Update modeling_ctrl.py
* fix bart and mbart * add ckpt names as variables * fix mbart * fix plbart * use varibale for ckot name
…16609) * Use CLIP model's config for some fields (if specified) instead of those of vision & text components. Co-authored-by: ydshieh <[email protected]>
* [Speech2Text Doc] Fix docs * apply ydshiehs suggestions
Co-authored-by: ydshieh <[email protected]>
This reverts commit b1a7dfe.
* refactor TF beam search * refactored generate can now properly use attention masks * add force bos/eos logit processors
* Update modeling_mpnet.py * Update modeling_ctrl.py * formatting * Formatting * Formatting * annotated FSMT * Added annotations for LED * Added Annotations for M2M * Added annotations for nystromformer * Added annotations for OpenAI * Added annotations for RAG * Removed unused imports * fix isort errors * Removed inputs_embeds docstring, corrected original * flake8 fixes * doc-builder fixes
…gface#16617) Adds logging and save/loading to the Accelerate scripts Co-authored-by: Sylvain Gugger <[email protected]>
* Fix doc * Make fixup Co-authored-by: Niels Rogge <[email protected]>
* Add inputs vector to calculate metric method * Include inputs for evaluation metrics with backwards compatibility * Prevent inputs create OOM issue and documentation details * Update style and code documentation * Fix style formatting issues * Update files format with make style
…ate_dict (huggingface#16643) * Updated _load_pretrained_model_low_mem to check if keys are in the stored state_dict * update after conversions
* Update README.md Support Image Updates the Support image linking to our EAP page (to give it a refresh + help avoid image fatigue). Slack thread checking in with #open-source-internal on this update (https://huggingface.slack.com/archives/C021H1P1HKR/p1648838903316709) * Compressed Updated Support image * Improves Support Image Logo + Height Updated the image based on logo + size feedback. Big thanks to Bibi for making quick edits to this image.
* base model done * make style * done * added files * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Trigger doc build * resolved conversations * resolved conversations * seer models * minor changes * minor changes * make fixup * glob variables * minor changes * fix copies * config when possibile * resolved conflicts * resolved conflicts * resolved conflicts * CI * conversion script for 10b param * fixed for 10b model * minor updates in the doc + make style * removed unused code * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * removed unused code * removed unused code * updated modeling_utils from main Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
* Skip RoFormer ONNX test if rjieba not installed * Update deps table * Skip RoFormer serialization test * Fix RoFormer vocab * Add rjieba to CircleCI
* Add masked image modelling to task mapping * Refactor ONNX features to be listed alphabetically * Add warning about BEiT masked image modeling Co-authored-by: Sylvain Gugger <[email protected]>
…ingface#17063) * Make sure telemetry arguments are not returned as unused kwargs * Fix test
* add utilities till TFData2VecVisionLayer. * chore: pass window_size to attention layer. * feat: add TFData2VecVisionRelativePositionBias. * feat: initial implementation ready for tf data2vec. * fix: relative position bias index, table to be fixed. * chore: implementation added, tests remaining. * add: tests, other PR files. * fix: code quality. * fix: import structure in init. * chore: run make fix-copies. * chore: address PR feedback (round I). * chore: styling nit. * fix: tests due to removal of to_2tuple(). * chore: rebase with upstream main and move the test. * Update src/transformers/models/auto/modeling_tf_auto.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/auto/modeling_tf_auto.py Co-authored-by: Sylvain Gugger <[email protected]> * fix: layer call. * chore: remove from_pt=True and rerun test. * chore: remove cast and tf.divide. * chore: minor edits to the test script. * Update src/transformers/models/data2vec/modeling_tf_data2vec_vision.py Co-authored-by: Matt <[email protected]> * fix: expand() on TF tensors with broadcast_to(). * fix: test import. Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Matt <[email protected]>
…#16635) Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ert (huggingface#16634) Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Type hint complete Albert model file. * Update typing. * Update src/transformers/models/albert/modeling_albert.py Co-authored-by: Matt <[email protected]>
* Deprecate model templates * Address review comments
…ce#16886) * CLIP Serving * Add type hints per code review * Use black, flake8, and isort * Update src/transformers/models/clip/modeling_tf_clip.py Co-authored-by: Joao Gante <[email protected]> * Rollback serving_output and add TODO * Remove irrelevant portions of failing tests * Revert "Rollback serving_output and add TODO" This reverts commit a4abfa6ba3b7875a13538dbc2ddc4eb17dfcca8d. * Rollback to original test/serving_output * Fix unused var * Apply suggestions from code review * Update formatting with black * Fix style again from rebase * Update tests/models/clip/test_modeling_tf_clip.py Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Sean Moriarity <[email protected]> Co-authored-by: Yih-Dar <[email protected]>
Co-authored-by: ydshieh <[email protected]>
* Added spanish translation of autoclass_tutorial. Added 'local' and 'title' fields for autoclass_tutorial. * Fixed autoclass_tutorial title in _toctree.yml and autoclass_tutorial.mdx
* type hints for pytorch models * fixed import error * fixed some errors
Added type hints for the BERTGenerationEncoder and BERTGenerationDecoder classes.
…gface#17091) * Fix use of mlflow.active_run() and add proper support for MLFLOW_EXPERIMENT_NAME * Fix code style (make style)
Co-authored-by: ydshieh <[email protected]>
|
The documentation is not available anymore as the PR was closed or merged. |
amathews-amd
approved these changes
May 11, 2022
Collaborator
amathews-amd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Cemberk
pushed a commit
that referenced
this pull request
May 9, 2024
* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>
Cemberk
pushed a commit
that referenced
this pull request
Mar 19, 2025
* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * `UniversalSpeculativeDecodingGenerator` * Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * add `TestGenerateWithDifferentModels` * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * `UniversalSpeculativeDecodingGenerator` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * fix device issue * fix get_assistant_input_ids * add `TestAssistedCandidateGeneratorDifferentTokenizers` * formatting * `AssistantVocabTranslatorCache` refactor & tests * revert changes in `src/transformers/generation/logits_process.py` * refactor `AssistedCandidateGenerator` * refactor `AssistedCandidateGeneratorDifferentTokenizers` * formatting * refactor `UniversalSpeculativeDecodingGenerator` * fix negative value for max_new_tokens * fix generation length target + attention_mask vs. assistant + attent * fix device * fix negative max_new_tokens bug * fix UAG * minor * formatting * `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init * resolve conflict & formatting * rerun CI tests * remove space... * remove old code * fix candidate_input_ids device * minor * formatting * Fix prepare + apply (#7) * fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD * Add unittests for Universal Assisted generation * fix style * update tests * Remove unused import and fix `test_speculation_depth` test * exclude special and reserved tokens from tokenizer for UAG * mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py` * Remove unused imports and fix style using `make style` (#9) * formatting * Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10) * Fix space sign disagreement (#12) * default values for AssistantToTargetTranslator fileds * fix space sign * minor * fix test + style * Default values for some fields of assistant to target translator (#11) * default values for AssistantToTargetTranslator fileds * fix * add support to empty logit_processors * Update candidate_generator.py (#15) fix typo * BUG fix in _prepare_assistant_input_ids (#14) * fix _prepare_assistant_input_ids * target_to_assistant_input_ids * Update src/transformers/generation/candidate_generator.py Co-authored-by: Nadav Timor <[email protected]> --------- Co-authored-by: Nadav Timor <[email protected]> * typo (`target_to_assistant_input_ids`) * formatting * merge upstream/main * Fix minor review comments (#16) * Fix: `token_ids.to(torch.int64)` (#18) * tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers) * `LongTensor` * fix dtype * `assistant_input_ids.to(dtype=torch.long)` * Remove unused import from test_candidate_generator.py * Remove unused import from test_candidate_generator.py * Remove `numpy` import * resolve pr comments (#19) * `AssistantToTargetTranslator` docstring * (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants * update `AssistantToTargetTranslator` docstring * (gante's comment) replace `match-case` * formatting * Fix Joao's comments (#21) * remove threading * fix logits_processor * fix test device * fix style (#23) * Move atm (#24) * move AssistantToTargetTranslator * fixup * fix logit_processor * add atm_translator test * refactor test * remove threading from test * add require_torch in tests * move AssistantVocabTranslatorCache + add tests * ruff fix --------- Co-authored-by: jmamou <[email protected]> Co-authored-by: Gaurav <[email protected]> Co-authored-by: Gaurav Jain <[email protected]> Co-authored-by: gauravjain14 <[email protected]>
Cemberk
pushed a commit
that referenced
this pull request
Jul 17, 2025
* Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <[email protected]> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <[email protected]> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <[email protected]> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <[email protected]> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: pculliton <[email protected]> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <[email protected]> * parameterize tests --------- Co-authored-by: Douglas Reid <[email protected]> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: pculliton <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: pculliton <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> Co-authored-by: Arthur <[email protected]>
Cemberk
pushed a commit
that referenced
this pull request
Nov 13, 2025
* merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (#5) * fix attention * use llamamodel * fix code * Fix qk (#6) * fix qk_norm * fix * fix modual * Fix moe (#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (#9) fix moe & gate * Fix gate norm (#10) * add norm_topk_prob * Fix testcase (#11) * fix&skip test * Fix testcase (#12) * skip testcase * Fix norm topk (#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <[email protected]> Co-authored-by: Mingji Han <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is to integrate the latest commits from upstream into ROCm fork.