IFU 2023-09-28 #26

AdrianAbeyta · 2023-09-28T20:51:01Z

IFU on 09/28/2023 from main branch of upstream transformers.

Update pop2piano.md

rename doanloading to downloading

* fix * fix --------- Co-authored-by: ydshieh <[email protected]>

…ce#25953) Update README.md with correct path to examples/seq2seq

This cl iterates through a list of keys rather than dict items while updating the dict elements. Fixes the following error: File "..../transformers/training_args.py", line 1544, in post_init for k, v in self.fsdp_config.items(): RuntimeError: dictionary keys changed during iteration

…uggingface#25987)

* update * update * fix --------- Co-authored-by: ydshieh <[email protected]>

* patch with accelerate xpu * patch with accelerate xpu * formatting * fix tests * revert ruff unrelated fixes * revert ruff unrelated fixes * revert ruff unrelated fixes * fix test * review fixes * review fixes * black fixed * review commits * review commits * style fix * use pytorch_utils * revert markuplm test

* no_split_modules * no_split_modules * inputs_embeds+pos same device * update _no_split_modules * update _no_split_modules

* Add TFDebertaV2ForMultipleChoice * Import newer model in main init * Fix import issues * Fix copies * Add doc * Fix tests * Fix copies * Fix docstring

…imizer and HF scheduler (huggingface#25863) * Add support for deepspeed optimizer and HF scheduler * fix bug * fix the import * fix issue with deepspeed scheduler saving for hf optim + hf scheduler scenario * fix loading of hf scheduler when loading deepspeed checkpoint * fix import of `DeepSpeedSchedulerWrapper` * add tests * add the comment and skip the failing tests * address comment

* [Wav2Vec2 Conformer] Fix inference float16 * fix test * fix test more * clean pipe test

* docs: feat: model resources for llama * fix: resolve suggestion Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Jungnerd <[email protected]> Co-authored-by: Wonhyeong Seo <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Jungnerd <[email protected]> Co-authored-by: Wonhyeong Seo <[email protected]>

* start with error too * fix ? * start with nit * one more path * use `job_name` * mark pipeline test as slow

…ingface#25996) * revision did not exist * correct revision

) * add: potential fix to mega chunking in decoder only model bug * add: decoder with chunking test * add: input_mask passed with input_ids

…5950) * fix convert megatron model too large * fix convert megatron model too large

fixed a typo

* Fix revision propagation * Cleaner

* stash commit * More OPT updates * Update src/transformers/models/opt/modeling_tf_opt.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

* fix some samll bugs in readme * Update docs/README.md Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

* docs: ko: llm_tutoroal.md * feat: chatgpt draft * fix: manual edits * fix: resolve suggestions * fix: resolve suggestions

Remove falcon from undocumented list

* add new arg for gptq * add tests * add min version autogptq * fix order * skip test * fix * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <[email protected]> * fix style * change model path --------- Co-authored-by: Arthur <[email protected]>

* Fix err * Use version check

…deepspeed zero3 (huggingface#26024)

fix Co-authored-by: ydshieh <[email protected]>

* Add tgs metrics * bugfix and black formatting * workaround for tokens counting * formating and bugfix * Fix * Add opt-in for tgs metrics * make style and fix error * Fix doc * fix docbuild * hf-doc-build * fix * test * Update src/transformers/training_args.py renaming Co-authored-by: Zach Mueller <[email protected]> * Update src/transformers/training_args.py renaming Co-authored-by: Zach Mueller <[email protected]> * Fix some symbol * test * Update src/transformers/trainer_utils.py match nameing patterns Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/trainer.py nice Co-authored-by: amyeroberts <[email protected]> * Fix reviews * Fix * Fix black --------- Co-authored-by: Zach Mueller <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* fix tokenizer * make bs even * fix multi gpu test * style * model forward * fix torch import * revert tok pin

skip flaky

* fix wav2vec2 doctest * suggestion * fix * final fix * revert since we need AddedTokens

* translate installation to zh * fix translation typo

etemadiamd · 2023-10-04T17:25:38Z

@amathews-amd @AdrianAbeyta pyt_huggingface_gpt2 test:
1- Observed OOM when testing pyt_huggingface_gpt2 with this PR on MI250x Hayabusa system. Decreased batch size from 22 to 18 and observed 34% performance drop compared to the internal transformers.
2-Observed "NameError: name 'stable_train_metrics' is not defined" when testing IFU 2023-09-28 #26.
Here is the data:

fix

…es) (huggingface#25830) * Faster rotary embedding for GPTNeoX * there might be un-necessary moves from device * fixup * fix dtype issue * add copied from statements * fox copies * oupsy * add copied from Llama for scaled ones as well * fixup * fix * fix copies

…s on `use_cache` (huggingface#26328) * Set `presents=None` when `use_cache` is set to False for activation ckpt * Update modeling_falcon.py * fix black

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* Make `ModelOutput` serializable Original PR from diffusers : huggingface/diffusers#5234 * Black

* fix silent bug `keep_in_fp32` modules * final fix * added a common test. * Trigger CI * revert

* feat: close huggingface#26566, changed model & config files to accept arbitary in and out channels * updated docstrings * fix: linter error * fix: update Copy docstrings * fix: linter update * fix: rename num_channels_in to num_channels to prevent breaking changes * fix: make num_channels_out None per default * Update src/transformers/models/swin2sr/configuration_swin2sr.py Co-authored-by: Arthur <[email protected]> * fix: update tests to include num_channels_out * fix:linter * fix: remove normalization with precomputed rgb values when #input_channels!=#output_channels --------- Co-authored-by: marvingabler <[email protected]> Co-authored-by: Arthur <[email protected]>

) don't close clearml task if it was created externally

Fix Co-authored-by: ydshieh <[email protected]>

* build the table in index.md with links to the model_doc * removed list generation on index.md * fixed missing models * make style

…ingface#26622) Fix Co-authored-by: ydshieh <[email protected]>

* Remove unnecessary `view` of `position_ids` in `modeling_llama` When `position_ids` is `None`, its value is generated using `torch.arange`, which creates a tensor of size `(seq_length + past_key_values_length) - past_key_values_length = seq_length`. The tensor is then unsqueezed, resulting in a tensor of shape `(1, seq_length)`. This means that the last `view` to a tensor of shape `(-1, seq_length)` is a no-op. This commit removes the unnecessary view. * Remove no-op `view` of `position_ids` in rest of transformer models

* Update tokenization_code_llama_fast.py * Update test_tokenization_code_llama.py * Update test_tokenization_code_llama.py

…huggingface#26162) * remove unnecessary unsqueeze-squeeze in llama * correct other models * fix * revert gpt_neox_japanese * fix copie * fix test

…ngface#26625)

…26586) * fix * fix * Fix * Fix --------- Co-authored-by: ydshieh <[email protected]>

* remove SharedDDP as it was drepracated * apply review suggestion * make style * Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer. * remove the unnecessary conditional statement * keep the logic of IPEX * clean code * mix precision setup & make fixup --------- Co-authored-by: statelesshz <[email protected]>

…ggingface#26606) * make sure eos and bos are properly handled for fast tokenizer * fix code llama as well * nits * fix the conversion script as well * fix failing test

example fix docstring Co-authored-by: ydshieh <[email protected]>

* Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <[email protected]> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <[email protected]> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <[email protected]> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <[email protected]> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update modular Co-authored-by: Ryan Mullins <[email protected]> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <[email protected]> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: SindhuRaghuram97 <[email protected]> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <[email protected]> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: pculliton <[email protected]> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <[email protected]> * parameterize tests --------- Co-authored-by: Douglas Reid <[email protected]> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: pculliton <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <[email protected]> Co-authored-by: Sindhu Raghuram <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Douglas Reid <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: pculliton <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> Co-authored-by: Arthur <[email protected]>

susnato and others added 30 commits September 5, 2023 11:07

Add Pop2Piano space demo. (huggingface#25975)

52a46dc

Update pop2piano.md

fix typo (huggingface#25981)

6f125aa

rename doanloading to downloading

Use main in conversion script (huggingface#25973)

391f264

* fix * fix --------- Co-authored-by: ydshieh <[email protected]>

[doc] Always call it Agents for consistency (huggingface#25958)

6316ce8

Update RAG README.md with correct path to examples/seq2seq (huggingfa…

7011cd8

…ce#25953) Update README.md with correct path to examples/seq2seq

Trainer: delegate default generation values to generation_config (h…

9a70d6e

…uggingface#25987)

Show failed tests on CircleCI layout in a better way (huggingface#25895)

aa5c94d

* update * update * fix --------- Co-authored-by: ydshieh <[email protected]>

PegasusX add _no_split_modules (huggingface#25933)

da1af21

* no_split_modules * no_split_modules * inputs_embeds+pos same device * update _no_split_modules * update _no_split_modules

Add TFDebertaV2ForMultipleChoice (huggingface#25932)

1110b56

* Add TFDebertaV2ForMultipleChoice * Import newer model in main init * Fix import issues * Fix copies * Add doc * Fix tests * Fix copies * Fix docstring

[Wav2Vec2 Conformer] Fix inference float16 (huggingface#25985)

8d51801

* [Wav2Vec2 Conformer] Fix inference float16 * fix test * fix test more * clean pipe test

[CI] Fix red CI and ERROR failed should show (huggingface#25995)

d0354e5

* start with error too * fix ? * start with nit * one more path * use `job_name` * mark pipeline test as slow

[VITS] tokenizer integration test: fix revision did not exist (hugg…

4fa0aff

…ingface#25996) * revision did not exist * correct revision

Fix Mega chunking error when using decoder-only model (huggingface#25765

b8def68

) * add: potential fix to mega chunking in decoder only model bug * add: decoder with chunking test * add: input_mask passed with input_ids

save space when converting hf model to megatron model. (huggingface#2…

172f42c

…5950) * fix convert megatron model too large * fix convert megatron model too large

Update README.md (huggingface#26003)

f6295c6

fixed a typo

Falcon: fix revision propagation (huggingface#26006)

f6301b9

* Fix revision propagation * Cleaner

TF-OPT attention mask fixes (huggingface#25238)

842e99f

* stash commit * More OPT updates * Update src/transformers/models/opt/modeling_tf_opt.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

Fix small typo README.md (huggingface#25934)

3e203f9

* fix some samll bugs in readme * Update docs/README.md Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

🌐[i18n-KO] Translated llm_tutorial.md to Korean (huggingface#25791)

fa522d8

* docs: ko: llm_tutoroal.md * feat: chatgpt draft * fix: manual edits * fix: resolve suggestions * fix: resolve suggestions

Remove Falcon from undocumented list (huggingface#26008)

300d6a4

Remove falcon from undocumented list

Fix err with FSDP (huggingface#25991)

e3a9716

* Fix err * Use version check

fix _resize_token_embeddings will set lm head size to 0 when enabled …

df04959

…deepspeed zero3 (huggingface#26024)

Fix CircleCI config (huggingface#26023)

0188739

fix Co-authored-by: ydshieh <[email protected]>

[VITS] Fix nightly tests (huggingface#25986)

2af87d0

* fix tokenizer * make bs even * fix multi gpu test * style * model forward * fix torch import * revert tok pin

ArthurZucker and others added 4 commits October 4, 2023 17:47

skip flaky hub tests (huggingface#26594)

c037b2e

skip flaky

Update mistral.md to update 404 link (huggingface#26590)

f9ab07f

[Wav2Vec2] Fix tokenizer set lang (huggingface#26349)

2d8ee98

* fix wav2vec2 doctest * suggestion * fix * final fix * revert since we need AddedTokens

add zh translation for installation (huggingface#26084)

43bfd09

* translate installation to zh * fix translation typo

ArthurZucker and others added 23 commits October 5, 2023 09:38

[ NougatProcessor] Fix the default channel (huggingface#26608)

b4e66d7

fix

[Falcon] Set use_cache=False before creating presents which relie…

2ab76c2

…s on `use_cache` (huggingface#26328) * Set `presents=None` when `use_cache` is set to False for activation ckpt * Update modeling_falcon.py * fix black

Fix failing tests on main due to torch 2.1 (huggingface#26607)

54e17a1

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

Make ModelOutput serializable (huggingface#26493)

19f0b7d

* Make `ModelOutput` serializable Original PR from diffusers : huggingface/diffusers#5234 * Black

[core] fix silent bug keep_in_fp32 modules (huggingface#26589)

e6d250e

* fix silent bug `keep_in_fp32` modules * final fix * added a common test. * Trigger CI * revert

Don't close ClearML task if it was created externally (huggingface#26614

9e78c9a

) don't close clearml task if it was created externally

Fix transformers-pytorch-gpu docker build (huggingface#26615)

9d20601

Fix Co-authored-by: ydshieh <[email protected]>

[docs] Update to scripts building index.md (huggingface#26546)

18fbeec

* build the table in index.md with links to the model_doc * removed list generation on index.md * fixed missing models * make style

Don't install pytorch-quantization in Doc Builder docker file (hugg…

75a33d6

…ingface#26622) Fix Co-authored-by: ydshieh <[email protected]>

Fixed inconsistency in several fast tokenizers (huggingface#26561)

af38c83

Update tokenization_code_llama_fast.py (huggingface#26576)

65aabaf

* Update tokenization_code_llama_fast.py * Update test_tokenization_code_llama.py * Update test_tokenization_code_llama.py

Remove unnecessary unsqueeze - squeeze in rotary positional embedding (…

6484530

…huggingface#26162) * remove unnecessary unsqueeze-squeeze in llama * correct other models * fix * revert gpt_neox_japanese * fix copie * fix test

Update chat template docs with more tips on writing a template (huggi…

ea52ed9

…ngface#26625)

fix RoPE t range issue for fp16 (huggingface#26602)

8749942

Fix failing MusicgenTest .test_pipeline_text_to_audio (huggingface#…

e840aa6

…26586) * fix * fix * Fix * Fix --------- Co-authored-by: ydshieh <[email protected]>

[LlamaTokenizerFast] Adds edge cases for the template processor (hu…

9ad815e

…ggingface#26606) * make sure eos and bos are properly handled for fast tokenizer * fix code llama as well * nits * fix the conversion script as well * fix failing test

[docstring] Fix docstring for AlbertConfig (huggingface#26636)

360ea8f

example fix docstring Co-authored-by: ydshieh <[email protected]>

docs(zh): review and punctuation & space fix (huggingface#26627)

897a826

Merge remote-tracking branch 'upstream/main' into IFU-master-2023-09-28

568596d

AdrianAbeyta closed this Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IFU 2023-09-28 #26

IFU 2023-09-28 #26

Uh oh!

AdrianAbeyta commented Sep 28, 2023

Uh oh!

etemadiamd commented Oct 4, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

IFU 2023-09-28 #26

IFU 2023-09-28 #26

Uh oh!

Conversation

AdrianAbeyta commented Sep 28, 2023

Uh oh!

etemadiamd commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

etemadiamd commented Oct 4, 2023 •

edited

Loading