IFU-master-2022-05-05 #11

rraminen · 2022-05-05T20:05:05Z

This PR is to integrate the latest commits from upstream into ROCm fork.

If global_attention_mask is found in the models inputs (used by certain models, like LED) in the prediction_step method of Seq2SeqTrainer, it is added to the gen_kwargs, which are passed to model.decode(). This allows us to properly set the global attention when decoding.

* [benchmark tool] trainer-benchmark.py * improve * massive rework/expansion * fix * mucho improved * improved * fix prefix * fix * fix diff calculation * address suggestions

* 📝 add image/vision classification and asr * 🖍 minor formatting fixes * Fixed a typo in legacy seq2seq_trainer.py (huggingface#16531) * Add ONNX export for BeiT (huggingface#16498) * Add beit onnx conversion support * Updated docs * Added cross reference to ViT ONNX config * call on_train_end when trial is pruned (huggingface#16536) * Type hints added (huggingface#16529) * Fix Bart type hints (huggingface#16297) * Add type hints to PLBart PyTorch * Remove pending merge conflicts * Fix PLBart Type Hints * Add changes from review * Add VisualBert type hints (huggingface#16544) * Adding missing type hints for mBART model (PyTorch) (huggingface#16429) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent Co-authored-by: matt <[email protected]> * Remove MBart subclass of XLMRoberta in tokenzier docs (huggingface#16546) * Remove MBart subclass of XLMRoberta in tokenzier * Fix style * Copy docs from MBart50 tokenizer * Use random_attention_mask for TF tests (huggingface#16517) * use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh <[email protected]> * Improve code example (huggingface#16450) Co-authored-by: Niels Rogge <[email protected]> * Pin tokenizers version <0.13 (huggingface#16539) * Pin tokenizers version <0.13 * Style * Add code samples for TF speech models (huggingface#16494) Co-authored-by: ydshieh <[email protected]> * [FlaxSpeechEncoderDecoder] Fix dtype bug (huggingface#16581) * [FlaxSpeechEncoderDecoder] Fix dtype bug * more fixes * Making the impossible to connect error actually report the right URL. (huggingface#16446) * Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (huggingface#16556) * Add utility to find model labels (huggingface#16526) * Add utility to find model labels * Use it in the Trainer * Update src/transformers/utils/generic.py Co-authored-by: Matt <[email protected]> * Quality Co-authored-by: Matt <[email protected]> * Enable doc in Spanish (huggingface#16518) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates * Add use_auth to load_datasets for private datasets to PT and TF examples (huggingface#16521) * fix formatting and remove use_auth * Add use_auth_token to Flax examples * add a test checking the format of `convert_tokens_to_string`'s output (huggingface#16540) * add new tests * add comment to overridden tests * TF: Finalize `unpack_inputs`-related changes (huggingface#16499) * Add unpack_inputs to remaining models * removed kwargs to `call()` in TF models * fix TF T5 tests * [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (huggingface#16586) * initialize the default rank set on TrainerState (huggingface#16530) * initialize the default rank set on TrainerState * fix style * Trigger doc build * Fix CI: test_inference_for_pretraining in ViTMAEModelTest (huggingface#16591) Co-authored-by: ydshieh <[email protected]> * add a template to add missing tokenization test (huggingface#16553) * add a template to add missing tokenization test * add cookiecutter setting * improve doc * Update templates/adding_a_missing_tokenization_test/README.md Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * made _load_pretrained_model_low_mem static + bug fix (huggingface#16548) * handle torch_dtype in low cpu mem usage (huggingface#16580) * [Doctests] Correct filenaming (huggingface#16599) * [Doctests] Correct filenaming * improve quicktour * make style * Adding new train_step logic to make things less confusing for users (huggingface#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <[email protected]> * Adding missing type hints for BigBird model (huggingface#16555) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent * Type hints for BigBird * removing typos Co-authored-by: matt <[email protected]> * [deepspeed] fix typo, adjust config name (huggingface#16597) * 🖍 apply feedback Co-authored-by: Cathy <[email protected]> Co-authored-by: Jim Rohrer <[email protected]> Co-authored-by: Ferdinand Schlatt <[email protected]> Co-authored-by: Dahlbomii <[email protected]> Co-authored-by: Gunjan Chhablani <[email protected]> Co-authored-by: Rishav Chandra Varma <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Niels Rogge <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Nicolas Patry <[email protected]> Co-authored-by: Daniel Stancl <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Matt <[email protected]> Co-authored-by: Karim Foda <[email protected]> Co-authored-by: SaulLu <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Andres Codas <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Francesco Saverio Zuppichini <[email protected]> Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Stas Bekman <[email protected]>

* Completed documentation of CTRL * Missing optional None * Added return types * updated imports * Update modeling_ctrl.py

* fix bart and mbart * add ckpt names as variables * fix mbart * fix plbart * use varibale for ckot name

…rained (huggingface#16602)

…16609) * Use CLIP model's config for some fields (if specified) instead of those of vision & text components. Co-authored-by: ydshieh <[email protected]>

* [Speech2Text Doc] Fix docs * apply ydshiehs suggestions

…ts (huggingface#16589)

Co-authored-by: ydshieh <[email protected]>

This reverts commit b1a7dfe.

* refactor TF beam search * refactored generate can now properly use attention masks * add force bos/eos logit processors

* Update modeling_mpnet.py * Update modeling_ctrl.py * formatting * Formatting * Formatting * annotated FSMT * Added annotations for LED * Added Annotations for M2M * Added annotations for nystromformer * Added annotations for OpenAI * Added annotations for RAG * Removed unused imports * fix isort errors * Removed inputs_embeds docstring, corrected original * flake8 fixes * doc-builder fixes

…gface#16617) Adds logging and save/loading to the Accelerate scripts Co-authored-by: Sylvain Gugger <[email protected]>

* Fix doc * Make fixup Co-authored-by: Niels Rogge <[email protected]>

* Add inputs vector to calculate metric method * Include inputs for evaluation metrics with backwards compatibility * Prevent inputs create OOM issue and documentation details * Update style and code documentation * Fix style formatting issues * Update files format with make style

…ate_dict (huggingface#16643) * Updated _load_pretrained_model_low_mem to check if keys are in the stored state_dict * update after conversions

* Update README.md Support Image Updates the Support image linking to our EAP page (to give it a refresh + help avoid image fatigue). Slack thread checking in with #open-source-internal on this update (https://huggingface.slack.com/archives/C021H1P1HKR/p1648838903316709) * Compressed Updated Support image * Improves Support Image Logo + Height Updated the image based on logo + size feedback. Big thanks to Bibi for making quick edits to this image.

* base model done * make style * done * added files * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Trigger doc build * resolved conversations * resolved conversations * seer models * minor changes * minor changes * make fixup * glob variables * minor changes * fix copies * config when possibile * resolved conflicts * resolved conflicts * resolved conflicts * CI * conversion script for 10b param * fixed for 10b model * minor updates in the doc + make style * removed unused code * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * removed unused code * removed unused code * updated modeling_utils from main Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

* Skip RoFormer ONNX test if rjieba not installed * Update deps table * Skip RoFormer serialization test * Fix RoFormer vocab * Add rjieba to CircleCI

* Add masked image modelling to task mapping * Refactor ONNX features to be listed alphabetically * Add warning about BEiT masked image modeling Co-authored-by: Sylvain Gugger <[email protected]>

…ingface#17063) * Make sure telemetry arguments are not returned as unused kwargs * Fix test

* add utilities till TFData2VecVisionLayer. * chore: pass window_size to attention layer. * feat: add TFData2VecVisionRelativePositionBias. * feat: initial implementation ready for tf data2vec. * fix: relative position bias index, table to be fixed. * chore: implementation added, tests remaining. * add: tests, other PR files. * fix: code quality. * fix: import structure in init. * chore: run make fix-copies. * chore: address PR feedback (round I). * chore: styling nit. * fix: tests due to removal of to_2tuple(). * chore: rebase with upstream main and move the test. * Update src/transformers/models/auto/modeling_tf_auto.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/auto/modeling_tf_auto.py Co-authored-by: Sylvain Gugger <[email protected]> * fix: layer call. * chore: remove from_pt=True and rerun test. * chore: remove cast and tf.divide. * chore: minor edits to the test script. * Update src/transformers/models/data2vec/modeling_tf_data2vec_vision.py Co-authored-by: Matt <[email protected]> * fix: expand() on TF tensors with broadcast_to(). * fix: test import. Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Matt <[email protected]>

…#16635) Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ert (huggingface#16634) Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Type hint complete Albert model file. * Update typing. * Update src/transformers/models/albert/modeling_albert.py Co-authored-by: Matt <[email protected]>

* Deprecate model templates * Address review comments

…ce#16886) * CLIP Serving * Add type hints per code review * Use black, flake8, and isort * Update src/transformers/models/clip/modeling_tf_clip.py Co-authored-by: Joao Gante <[email protected]> * Rollback serving_output and add TODO * Remove irrelevant portions of failing tests * Revert "Rollback serving_output and add TODO" This reverts commit a4abfa6ba3b7875a13538dbc2ddc4eb17dfcca8d. * Rollback to original test/serving_output * Fix unused var * Apply suggestions from code review * Update formatting with black * Fix style again from rebase * Update tests/models/clip/test_modeling_tf_clip.py Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Sean Moriarity <[email protected]> Co-authored-by: Yih-Dar <[email protected]>

Co-authored-by: ydshieh <[email protected]>

* Added spanish translation of autoclass_tutorial. Added 'local' and 'title' fields for autoclass_tutorial. * Fixed autoclass_tutorial title in _toctree.yml and autoclass_tutorial.mdx

* type hints for pytorch models * fixed import error * fixed some errors

Added type hints for the BERTGenerationEncoder and BERTGenerationDecoder classes.

…gface#17091) * Fix use of mlflow.active_run() and add proper support for MLFLOW_EXPERIMENT_NAME * Fix code style (make style)

Co-authored-by: ydshieh <[email protected]>

HuggingFaceDocBuilderDev · 2022-05-05T20:24:43Z

The documentation is not available anymore as the PR was closed or merged.

amathews-amd

LGTM

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * `UniversalSpeculativeDecodingGenerator` * Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * add `TestGenerateWithDifferentModels` * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * `UniversalSpeculativeDecodingGenerator` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * fix device issue * fix get_assistant_input_ids * add `TestAssistedCandidateGeneratorDifferentTokenizers` * formatting * `AssistantVocabTranslatorCache` refactor & tests * revert changes in `src/transformers/generation/logits_process.py` * refactor `AssistedCandidateGenerator` * refactor `AssistedCandidateGeneratorDifferentTokenizers` * formatting * refactor `UniversalSpeculativeDecodingGenerator` * fix negative value for max_new_tokens * fix generation length target + attention_mask vs. assistant + attent * fix device * fix negative max_new_tokens bug * fix UAG * minor * formatting * `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init * resolve conflict & formatting * rerun CI tests * remove space... * remove old code * fix candidate_input_ids device * minor * formatting * Fix prepare + apply (#7) * fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD * Add unittests for Universal Assisted generation * fix style * update tests * Remove unused import and fix `test_speculation_depth` test * exclude special and reserved tokens from tokenizer for UAG * mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py` * Remove unused imports and fix style using `make style` (#9) * formatting * Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10) * Fix space sign disagreement (#12) * default values for AssistantToTargetTranslator fileds * fix space sign * minor * fix test + style * Default values for some fields of assistant to target translator (#11) * default values for AssistantToTargetTranslator fileds * fix * add support to empty logit_processors * Update candidate_generator.py (#15) fix typo * BUG fix in _prepare_assistant_input_ids (#14) * fix _prepare_assistant_input_ids * target_to_assistant_input_ids * Update src/transformers/generation/candidate_generator.py Co-authored-by: Nadav Timor <[email protected]> --------- Co-authored-by: Nadav Timor <[email protected]> * typo (`target_to_assistant_input_ids`) * formatting * merge upstream/main * Fix minor review comments (#16) * Fix: `token_ids.to(torch.int64)` (#18) * tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers) * `LongTensor` * fix dtype * `assistant_input_ids.to(dtype=torch.long)` * Remove unused import from test_candidate_generator.py * Remove unused import from test_candidate_generator.py * Remove `numpy` import * resolve pr comments (#19) * `AssistantToTargetTranslator` docstring * (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants * update `AssistantToTargetTranslator` docstring * (gante's comment) replace `match-case` * formatting * Fix Joao's comments (#21) * remove threading * fix logits_processor * fix test device * fix style (#23) * Move atm (#24) * move AssistantToTargetTranslator * fixup * fix logit_processor * add atm_translator test * refactor test * remove threading from test * add require_torch in tests * move AssistantVocabTranslatorCache + add tests * ruff fix --------- Co-authored-by: jmamou <[email protected]> Co-authored-by: Gaurav <[email protected]> Co-authored-by: Gaurav Jain <[email protected]> Co-authored-by: gauravjain14 <[email protected]>

* Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <[email protected]> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <[email protected]> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <[email protected]> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <[email protected]> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: pculliton <[email protected]> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <[email protected]> * parameterize tests --------- Co-authored-by: Douglas Reid <[email protected]> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: pculliton <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: pculliton <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> Co-authored-by: Arthur <[email protected]>

* merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (#5) * fix attention * use llamamodel * fix code * Fix qk (#6) * fix qk_norm * fix * fix modual * Fix moe (#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (#9) fix moe & gate * Fix gate norm (#10) * add norm_topk_prob * Fix testcase (#11) * fix&skip test * Fix testcase (#12) * skip testcase * Fix norm topk (#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <[email protected]> Co-authored-by: Mingji Han <[email protected]>

stas00 and others added 30 commits April 5, 2022 08:13

[deepspeed] fix typo, adjust config name (huggingface#16597)

9fd5e6b

[benchmark tool] trainer-benchmark.py (huggingface#14934)

23fc4cb

* [benchmark tool] trainer-benchmark.py * improve * massive rework/expansion * fix * mucho improved * improved * fix prefix * fix * fix diff calculation * address suggestions

Quality

208f4c1

added type hints to CTRL pytorch (huggingface#16593)

b18dfd9

* Completed documentation of CTRL * Missing optional None * Added return types * updated imports * Update modeling_ctrl.py

fix default num_attention_heads in segformer doc (huggingface#16612)

d55fcbc

[Minds14] Correct quicktour (huggingface#16626)

0bf1864

Fix seq2seq doc tests (huggingface#16606)

a2b7d19

* fix bart and mbart * add ckpt names as variables * fix mbart * fix plbart * use varibale for ckot name

don't load state_dict twice when using low_cpu_mem_usage in from_pret…

47c5c05

…rained (huggingface#16602)

Use CLIP model config to set some kwargs for components (huggingface#…

ae6a7a7

…16609) * Use CLIP model's config for some fields (if specified) instead of those of vision & text components. Co-authored-by: ydshieh <[email protected]>

typo (huggingface#16621)

fb3d0df

[Speech2Text Doc] Fix docs (huggingface#16611)

c656331

* [Speech2Text Doc] Fix docs * apply ydshiehs suggestions

[FlaxSpeechEncoderDecoderModel] More Rigorous PT-Flax Equivalence Tes…

8d57c42

…ts (huggingface#16589)

Fix TFTransfoXLLMHeadModel outputs (huggingface#16590)

2aef4cf

Co-authored-by: ydshieh <[email protected]>

Allow the same config in the auto mapping

b1a7dfe

Revert "Allow the same config in the auto mapping"

b9bf91a

This reverts commit b1a7dfe.

Dev version

a180efe

[modeling_utils] rearrange text (huggingface#16632)

4d10083

TF generate refactor - Beam Search (huggingface#16374)

3f43d82

* refactor TF beam search * refactored generate can now properly use attention masks * add force bos/eos logit processors

Allow the same config in the auto mapping (huggingface#16631)

10c15d2

Update no_trainer scripts with new Accelerate functionalities (huggin…

febe42b

…gface#16617) Adds logging and save/loading to the Accelerate scripts Co-authored-by: Sylvain Gugger <[email protected]>

Fix doc example (huggingface#16448)

dc99180

* Fix doc * Make fixup Co-authored-by: Niels Rogge <[email protected]>

[megatron-bert-uncased-345m] fix conversion (huggingface#16639)

080e42d

Remove parent/child tests in auto model tests (huggingface#16653)

389f661

Updated _load_pretrained_model_low_mem to check if keys are in the st…

4099817

…ate_dict (huggingface#16643) * Updated _load_pretrained_model_low_mem to check if keys are in the stored state_dict * update after conversions

lewtun and others added 20 commits May 4, 2022 10:04

Skip RoFormer ONNX test if rjieba not installed (huggingface#16981)

4bb1d0e

* Skip RoFormer ONNX test if rjieba not installed * Update deps table * Skip RoFormer serialization test * Fix RoFormer vocab * Add rjieba to CircleCI

Remove masked image modeling from BEIT ONNX export (huggingface#16980)

675e2d1

* Add masked image modelling to task mapping * Refactor ONNX features to be listed alphabetically * Add warning about BEiT masked image modeling Co-authored-by: Sylvain Gugger <[email protected]>

Make sure telemetry arguments are not returned as unused kwargs (hugg…

d76d2a2

…ingface#17063) * Make sure telemetry arguments are not returned as unused kwargs * Fix test

Type hint complete Albert model file. (huggingface#16682)

9c5ae87

* Type hint complete Albert model file. * Update typing. * Update src/transformers/models/albert/modeling_albert.py Co-authored-by: Matt <[email protected]>

Deprecate model templates (huggingface#17062)

bb8d405

* Deprecate model templates * Address review comments

Update to build via git for accelerate (huggingface#17084)

ef20390

Fix DeBERTa token_type_ids (huggingface#17082)

870e6f2

📝 open fresh PR for pipeline doctests (huggingface#17073)

23619ef

minor change on TF Data2Vec test (huggingface#17085)

6dc4c36

Co-authored-by: ydshieh <[email protected]>

Added spanish translation of autoclass_tutorial. (huggingface#17069)

db377a0

* Added spanish translation of autoclass_tutorial. Added 'local' and 'title' fields for autoclass_tutorial. * Fixed autoclass_tutorial title in _toctree.yml and autoclass_tutorial.mdx

type hints for pytorch models (huggingface#17064)

45360e1

* type hints for pytorch models * fixed import error * fixed some errors

Add type hints for BERTGeneration (huggingface#17047)

99289c0

Added type hints for the BERTGenerationEncoder and BERTGenerationDecoder classes.

Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME (huggin…

c849a61

…gface#17091) * Fix use of mlflow.active_run() and add proper support for MLFLOW_EXPERIMENT_NAME * Fix code style (make style)

Remove torchhub test (huggingface#17097)

dd16a11

fix missing "models" in pipeline test module (huggingface#17090)

a59eb34

Co-authored-by: ydshieh <[email protected]>

Merge remote-tracking branch 'upstream/main' into IFU-master-2022-05-05

00e12e7

rraminen requested a review from amathews-amd May 5, 2022 20:05

rraminen requested a review from micmelesse May 5, 2022 20:49

amathews-amd approved these changes May 11, 2022

View reviewed changes

amathews-amd merged commit dc78c95 into master May 11, 2022

gargrahul deleted the IFU-master-2022-05-05 branch August 6, 2024 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IFU-master-2022-05-05 #11

IFU-master-2022-05-05 #11

Uh oh!

rraminen commented May 5, 2022

Uh oh!

HuggingFaceDocBuilderDev commented May 5, 2022 •

edited

Loading

Uh oh!

amathews-amd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

IFU-master-2022-05-05 #11

IFU-master-2022-05-05 #11

Uh oh!

Conversation

rraminen commented May 5, 2022

Uh oh!

HuggingFaceDocBuilderDev commented May 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amathews-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

HuggingFaceDocBuilderDev commented May 5, 2022 •

edited

Loading