Skip to content

Conversation

@Cemberk
Copy link
Collaborator

@Cemberk Cemberk commented Sep 19, 2024

This PR was created automatically by the Fork Maintenance System to sync changes from the downstream main into downstream develop.

muellerzr and others added 30 commits August 16, 2024 12:35
…enamed, and provide a step forward (huggingface#32656)

* Fin

* Modify msg

* Finish up nits
…uggingface#32674)

* Fix beam_constraints.Constraint.advance() docstring

* Update src/transformers/generation/beam_constraints.py

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
* tfmsenv restored in main

* installed flax

* forward pass done and all tests passed

* make fix-copies and cleaning the scripts

* fixup attempt 1

* fixup attempt 2

* fixup third attempt

* fixup attempt 4

* fixup attempt 5

* dinov2 doc fixed

* FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE

* external pos_encoding layer removed

* fixup attempt 6

* fixed integration test values

* fixup attempt 7

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <[email protected]>

* comments removed

* comment removed from the test

* fixup

* Update src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: Sanchit Gandhi <[email protected]>

* new fixes 1

* interpolate_pos_encoding function removed

* droppath rng fixed, pretrained beit copied-from still not working

* modeling_flax_dinov2.py reformatted

* Update tests/models/dinov2/test_modeling_flax_dinov2.py

Co-authored-by: Sanchit Gandhi <[email protected]>

* added Copied from, to the tests

* copied from statements removed from tests

* fixed copied from statements in the tests

* [run_slow] dinov2

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Sanchit Gandhi <[email protected]>
* dac model

* original dac works

* add dac model

* dac can be instatiated

* add forward pass

* load weights

* all weights are used

* convert checkpoint script ready

* test

* add feature extractor

* up

* make style

* apply cookicutter

* fix tests

* iterate on FeatureExtractor

* nit

* update dac doc

* replace nn.Sequential with nn.ModuleList

* nit

* apply review suggestions 1/2

* Update src/transformers/models/dac/modeling_dac.py

Co-authored-by: Sanchit Gandhi <[email protected]>

* up

* apply review suggestions 2/2

* update padding in FeatureExtractor

* apply review suggestions

* iterate on design and tests

* add integration tests

* feature extractor tests

* make style

* all tests pass

* make style

* fixup

* apply review suggestions

* fix-copies

* apply review suggestions

* apply review suggestions

* Update docs/source/en/model_doc/dac.md

Co-authored-by: Yoach Lacombe <[email protected]>

* Update docs/source/en/model_doc/dac.md

Co-authored-by: Yoach Lacombe <[email protected]>

* anticipate transfer weights to descript

* up

* make style

* apply review suggestions

* update slow test values

* update slow tests

* update test values

* update with CI values

* update with vorace values

* update test with slice

* make style

---------

Co-authored-by: Sanchit Gandhi <[email protected]>
Co-authored-by: Yoach Lacombe <[email protected]>
* Add representation for Conv1D, for better output info.

* code format for Conv1D

* We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.
* Support save/load ckpt for XLA FSDP

* Fix bug for save

* Fix style

* reserve sharded ckpt and better file naming

* minor fix

Co-authored-by: Zach Mueller <[email protected]>

* add is_fsdp_xla_v1_enabled

---------

Co-authored-by: Zach Mueller <[email protected]>
* fix: Parameterized norm freezing

For the R18 model, the authors don't freeze norms in the backbone.

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: Pavel Iakubovskii <[email protected]>

---------

Co-authored-by: Pavel Iakubovskii <[email protected]>
* fix gguf config vocab size

* minor fix

* link issue
* fix mamba left padding

* Apply suggestions from code review

Co-authored-by: Pablo Montalvo <[email protected]>

* fix copies

* test with `inputs_embeds`

* Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

Co-authored-by: Arthur <[email protected]>

* copies

* clairfy

* fix last comments

* remove

---------

Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Arthur <[email protected]>
…uggingface#32694)

* fix cache when using input embeddings

* simplify check, we can always add input ids seq len since its 0 in first pass
Fixed whisper-large-v2 model link in docs.
* support head dim

* fix the doc

* fixup

* add oproj

Co-authored-by: Suhara
<[email protected]>>

* update

Co-authored-by: bzantium <[email protected]>

* Co-authored-by: suhara <[email protected]>

* Update

Co-authored-by: Yoshi Suhara <[email protected]>

---------

Co-authored-by: bzantium <[email protected]>
Co-authored-by: Yoshi Suhara <[email protected]>
* Update min version of accelerate to 0.26.0

* dev-ci

* update min version in import

* remove useless check

* dev-ci

* style

* dev-ci

* dev-ci
* mamba2 uses norm_before_gate=False

* small nit

* remove norm_before_gate flag and follow False path only
…nsformer (huggingface#32903)

Bump nltk in /examples/research_projects/decision_transformer

Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](nltk/nltk@3.7...3.9)

---
updated-dependencies:
- dependency-name: nltk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…xport (huggingface#32887)

* Replace .norm() with decomposed version for executorch export

* [run_slow] clip
* link for optimizer names

Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring.

* make fixup
* Update README.md

* Update README.md

* Add README_ar.md to i18n/README_de.md

* Add README_ar.md to i18n/README_es.md

* Add README_ar.md to i18n/README_fr.md

* Add README_ar.md to i18n/README_hd.md

* Add README_ar.md to i18n/README_ja.md

* Add README_ar.md to i18n/README_ko.md

* Add README_ar.md to i18n/README_pt-br.md

* Add README_ar.md to i18n/README_ru.md

* Add README_ar.md to i18n/README_te.md

* Add README_ar.md to i18n/README_vi.md

* Add README_ar.md to i18n/README_vi.md

* Add README_ar.md to i18n/README_zh-hans.md

* Add README_ar.md to i18n/README_zh-hant.md

* Create README_ar.md
… when `return_timestamps` is not passed to `generate` function (huggingface#31296)

[whisper] don't overwrite return_timestamps when not passed to generate
* try test updates

* a few more changes

* a few more changes

* a few more changes

* [run slow] jamba

* skip logits checks on older gpus

* [run slow] jamba

* oops

* [run slow] jamba

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <[email protected]>

---------

Co-authored-by: amyeroberts <[email protected]>
…ngface#32891)

Added missing huggingface_hub installation to workflows.
maxwbuckley and others added 28 commits September 17, 2024 13:32
Update chameleon.md

Fix error

RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same
* Add explicit example for RAG chat templating

* Add Tip box and reformulate

Co-authored-by: Matt <[email protected]>

---------

Co-authored-by: Matt <[email protected]>
* move runners

* move runners

* move runners
…ce#33316)

* fix to jamba config, asserting attention and expert offset

* fix foramtting

* fix foramtting

* fix foramtting

* changed to error raise instead of assertion, added unittests

* fix

* changed t_ to property_

* changed t_ to property_

* quickfix

* ran code styler
…gingface#32970)

* added sequences_scores to the output

* added beam_indices to output

* added test to check for beam_indices, sequences_scores and their shape

* removed redundant whitespaces

* make fixup
* add uniformized pixtral and kwargs

* update doc

* fix _validate_images_text_input_order

* nit
* add revision to trainer push_to_hub

* apply suggestions

* add test for revision

* apply ruff format

* reorganize imports

* change test trainer path
huggingface#33499)

* fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1

Signed-off-by: Wang, Yi <[email protected]>

* fix format

Signed-off-by: Wang, Yi <[email protected]>

* [run_slow] idefics2

---------

Signed-off-by: Wang, Yi <[email protected]>
* add llava-ov-chat

* uncomment
…ggingface#32564)

* _decode signature change and quick return

* added bunch of decoding tests

* signature match and return

* added tests for decoding

* merged decoding test

* more tests for special tokens

* cosmetics

* fixed param

* ruffed the file

* refinement for single special tokens

* added test for single special tokens

* slight change to test name

Co-authored-by: Ita Zaporozhets <[email protected]>

* minor change test name for skip tokens

Co-authored-by: Ita Zaporozhets <[email protected]>

* killed already defined var

Co-authored-by: Ita Zaporozhets <[email protected]>

* minor update with vars

Co-authored-by: Ita Zaporozhets <[email protected]>

* killed already defined var once more

Co-authored-by: Ita Zaporozhets <[email protected]>

---------

Co-authored-by: Ita Zaporozhets <[email protected]>
)

* fix

* add tests

* fix tests

* Update tests/models/llava/test_processor_llava.py

Co-authored-by: amyeroberts <[email protected]>

* fix

* fix tests

* update tests

---------

Co-authored-by: amyeroberts <[email protected]>
* Urdu docs added

* fixed the misaligned issue.
* fix the wandb logging issue

* handle ConfigError in WandbCallback; move import to local scope

* update integration_utils.py; move import of ConfigError

* Update integration_utils.py: remove trailing whitespace
…ingface#33554)

* Added support for bfloat16 to zero-shot classification pipeline

* Ensure support for TF.

Co-authored-by: Matt <[email protected]>

* Remove dependency on `torch`.

Co-authored-by: Matt <[email protected]>

---------

Co-authored-by: Matt <[email protected]>
* enforce original size to be a list

* formatting

* apply datatype change to unpad_image in llava_next
* modify rt detr to improve inference times when compiled

* Remove redundant "to"

* Fix conditional lru_cache and missing shapes_list

* nit unnecessary list creation

* Fix compile error when ninja not available and custon kernel activated
* clean mimi commit

* some nits suggestions from Arthur

* make fixup

* rename repo id + change readme

* Update docs/source/en/model_doc/mimi.md

Co-authored-by: amyeroberts <[email protected]>

* add flaky flag to batching equivalence due to audio_codes failing sometimes

---------

Co-authored-by: amyeroberts <[email protected]>
* load and save from video-processor folder

* Update src/transformers/models/llava_onevision/processing_llava_onevision.py

Co-authored-by: amyeroberts <[email protected]>

---------

Co-authored-by: amyeroberts <[email protected]>
* add tests

* fix whisper

* update

* nit

* add qwen2-vl

* more updates!

* better this way

* fix this one

* fix more tests

* fix final tests, hope so

* fix led

* Update tests/generation/test_utils.py

Co-authored-by: Joao Gante <[email protected]>

* pr comments

* not pass pixels and extra for low-mem tests, very flaky because of visio tower

---------

Co-authored-by: Joao Gante <[email protected]>
@Cemberk Cemberk closed this Sep 19, 2024
Cemberk pushed a commit that referenced this pull request Aug 20, 2025
* fix

* nice

* where i am at

* Bro this works

* Update src/transformers/integrations/tensor_parallel.py

* cleanups

* yups that was breaking

* Update src/transformers/models/openai_moe/modeling_openai_moe.py

* gather on experts and not mlp

* add changes for latest convert branch

* adds options to get output_router_logits from config

* bring chat temlate + special tokens back into the script.

* initial commmit

* update

* working with shards

* add model.safetensors.index.json

* fix

* fix

* mxfp4 flag

* rm print

* Fix PAD/EOS/BOS (#18)

* fix pad/eos/bos

* base model maybe one day

* add some doc

* special tokens based on harmony.

* add in tokenizer config as well.

* prepare for rebase with main

* Fix for initialize_tensor_parallelism  now returning 4-tuple

```
[rank0]:   File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]:     model = AutoModelForCausalLM.from_pretrained(
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]:     tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```

* mxfp4

* mxfp4 draft

* fix

* fix import

* draft

* draft impl

* finally working !

* simplify

* add import

* working version

* consider blocks and scales

* device mesh fix

* initial commit

* add working dequant + quant logic

* update

* non nan, gibberish output

* working EP + quantization finally !

* start cleaning

* remove reversing process

* style

* some cleaning

* initial commmit

* more cleaning

* more cleaning

* simplify

* more cleaning

* rm duplicated function

* changing tp_plan

* update tp plan check

* add loading attribute

* dequantizing logic

* use subfunctions

* import cleaning

* update_param_name

* adds clamped swiglu

* add clamping to training path

* simplify dequant logic

* update

* Bad merge

* more simplifications & tests

* fix !

* fix registering custom attention

* fix order

* fixes

* some test nits

* nits

* nit

* fix

* Clamp sink logits

* Clean

* Soft-max trick

* Clean up

* p

* fix deepspeed

* update both modeling and modular for cleanup

* contiguous

* update tests

* fix top_k router call

* revert renaming

* test nits

* small fixes for EP

* fix path for our local tests

* update as I should not have broken that!

* fix the loss of mixtral

* revert part of the changes related to router_scores, kernel probably no ready for that!

* deleting a small nit

* update arch

* fix post processing

* update

* running version but not expected output

* moving to cuda

* initial commit

* revert

* erroring when loading on cpu

* updates

* del blocks, scales

* fix

* style

* rm comm

* comment

* add comment

* style

* remove duplicated lines

* Fix minor issue with weight_map conversion script

* fix sampling params

* rename to final name

* upate pre-final version of template

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* fix batched inference

* serve fixes

* swizzle !

* update final chat template by Matt.

* fix responses; pin oai

* sinplify

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <[email protected]>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Matt <[email protected]>

* fix

* Use ROCm kernels from HUB

* Make kernel modes explicit

* update final chat template by Matt. x2

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <[email protected]>

* Fix installation

* Update setup.py

Co-authored-by: Ákos Hadnagy <[email protected]>

* allow no content

* fix: update message handling in write_tokenizer function

* Fix template logic for user message role

* last nits for CB and flash_paged!

* there was one bad merge

* fix CB (hardcode for now, its just using kv groups instead)

* fix

* better fix for device_map

* minor device fix

* Fix flash paged

* updates

* Revert "remove dtensors, not explicit (huggingface#39840)"

This reverts commit 6dfd561.

* update

* Revert "remove dtensors, not explicit (huggingface#39840)"

This reverts commit 6dfd561.

* fix merge

* fix

* Fix line break when custom model indentity

* nits testing

* to locals first and pass sliding window to flash paged

* register modes for MegaBlocksMoeMlp

* add integration test in fixtures -> now update the tests to use it!

* update integration tests

* initial fix

* style and update tests

* fix

* chore(gpt oss): remove mlp_bias from configuration

It was just a leftover.

* stats

* Integration tests

* whoops

* Shouldn't move model

* Ensure assistant messages without thinking always go to "final" channel

* More checks to ensure expected format

* Add pad_token_id to model configuration in write_model function (#51)

* Add oai fix fast tests (#59)

* Fix some fast tests

* Force some updates

* Remove unnecessary fixes

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <[email protected]>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <[email protected]>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* reasoning -> Reasoning

* Add additional integration tests

* fixup

* Slight fixes

* align chat template with harmony

* simplify

* Add comment

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* Revert fixup

* skip 2 test remove todo

* merge

* padding side should be left for integration tests

* fix modular wrt to changes made to modeling

* style

* isort

* fix opies for the loss

* mmmm

---------

Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: edbeeching <[email protected]>
Co-authored-by: Vaibhavs10 <[email protected]>
Co-authored-by: MekkCyber <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Edward Beeching <[email protected]>
Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Lewis Tunstall <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Rocketknight1 <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Akos Hadnagy <[email protected]>
Co-authored-by: Ákos Hadnagy <[email protected]>
Co-authored-by: Alvaro Moran <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: Matt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.