forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 8
Automated PR: Downstream develop rebase new changes #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…enamed, and provide a step forward (huggingface#32656) * Fin * Modify msg * Finish up nits
…uggingface#32674) * Fix beam_constraints.Constraint.advance() docstring * Update src/transformers/generation/beam_constraints.py Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Steven Liu <[email protected]>
* tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <[email protected]> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <[email protected]> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <[email protected]> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]>
* dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <[email protected]> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <[email protected]> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <[email protected]> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Yoach Lacombe <[email protected]>
* Add representation for Conv1D, for better output info. * code format for Conv1D * We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.
* Support save/load ckpt for XLA FSDP * Fix bug for save * Fix style * reserve sharded ckpt and better file naming * minor fix Co-authored-by: Zach Mueller <[email protected]> * add is_fsdp_xla_v1_enabled --------- Co-authored-by: Zach Mueller <[email protected]>
* fix: Parameterized norm freezing For the R18 model, the authors don't freeze norms in the backbone. * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: Pavel Iakubovskii <[email protected]> --------- Co-authored-by: Pavel Iakubovskii <[email protected]>
* fix gguf config vocab size * minor fix * link issue
* fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <[email protected]> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <[email protected]> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <[email protected]> Co-authored-by: Arthur <[email protected]>
…uggingface#32694) * fix cache when using input embeddings * simplify check, we can always add input ids seq len since its 0 in first pass
Fixed whisper-large-v2 model link in docs.
* support head dim * fix the doc * fixup * add oproj Co-authored-by: Suhara <[email protected]>> * update Co-authored-by: bzantium <[email protected]> * Co-authored-by: suhara <[email protected]> * Update Co-authored-by: Yoshi Suhara <[email protected]> --------- Co-authored-by: bzantium <[email protected]> Co-authored-by: Yoshi Suhara <[email protected]>
* Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci
Co-authored-by: Gal Cohen <[email protected]>
* mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only
…nsformer (huggingface#32903) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](nltk/nltk@3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…xport (huggingface#32887) * Replace .norm() with decomposed version for executorch export * [run_slow] clip
* link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup
* Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md
… when `return_timestamps` is not passed to `generate` function (huggingface#31296) [whisper] don't overwrite return_timestamps when not passed to generate
* try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>
…ngface#32891) Added missing huggingface_hub installation to workflows.
Update chameleon.md Fix error RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same
* Add explicit example for RAG chat templating * Add Tip box and reformulate Co-authored-by: Matt <[email protected]> --------- Co-authored-by: Matt <[email protected]>
* move runners * move runners * move runners
…ce#33316) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler
…gingface#32970) * added sequences_scores to the output * added beam_indices to output * added test to check for beam_indices, sequences_scores and their shape * removed redundant whitespaces * make fixup
* add uniformized pixtral and kwargs * update doc * fix _validate_images_text_input_order * nit
* add revision to trainer push_to_hub * apply suggestions * add test for revision * apply ruff format * reorganize imports * change test trainer path
huggingface#33499) * fix patch_attention_mask incorrect setting which leads to the difference in the generated text if batch > 1 Signed-off-by: Wang, Yi <[email protected]> * fix format Signed-off-by: Wang, Yi <[email protected]> * [run_slow] idefics2 --------- Signed-off-by: Wang, Yi <[email protected]>
* add llava-ov-chat * uncomment
* Decorator for tool building
…ggingface#32564) * _decode signature change and quick return * added bunch of decoding tests * signature match and return * added tests for decoding * merged decoding test * more tests for special tokens * cosmetics * fixed param * ruffed the file * refinement for single special tokens * added test for single special tokens * slight change to test name Co-authored-by: Ita Zaporozhets <[email protected]> * minor change test name for skip tokens Co-authored-by: Ita Zaporozhets <[email protected]> * killed already defined var Co-authored-by: Ita Zaporozhets <[email protected]> * minor update with vars Co-authored-by: Ita Zaporozhets <[email protected]> * killed already defined var once more Co-authored-by: Ita Zaporozhets <[email protected]> --------- Co-authored-by: Ita Zaporozhets <[email protected]>
) * fix * add tests * fix tests * Update tests/models/llava/test_processor_llava.py Co-authored-by: amyeroberts <[email protected]> * fix * fix tests * update tests --------- Co-authored-by: amyeroberts <[email protected]>
fix missing head_dim in llama config from gguf
* Urdu docs added * fixed the misaligned issue.
* fix the wandb logging issue * handle ConfigError in WandbCallback; move import to local scope * update integration_utils.py; move import of ConfigError * Update integration_utils.py: remove trailing whitespace
…ingface#33554) * Added support for bfloat16 to zero-shot classification pipeline * Ensure support for TF. Co-authored-by: Matt <[email protected]> * Remove dependency on `torch`. Co-authored-by: Matt <[email protected]> --------- Co-authored-by: Matt <[email protected]>
…33509) return attention mask in ASR pipeline
* enforce original size to be a list * formatting * apply datatype change to unpad_image in llava_next
* modify rt detr to improve inference times when compiled * Remove redundant "to" * Fix conditional lru_cache and missing shapes_list * nit unnecessary list creation * Fix compile error when ninja not available and custon kernel activated
* clean mimi commit * some nits suggestions from Arthur * make fixup * rename repo id + change readme * Update docs/source/en/model_doc/mimi.md Co-authored-by: amyeroberts <[email protected]> * add flaky flag to batching equivalence due to audio_codes failing sometimes --------- Co-authored-by: amyeroberts <[email protected]>
* load and save from video-processor folder * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>
* add tests * fix whisper * update * nit * add qwen2-vl * more updates! * better this way * fix this one * fix more tests * fix final tests, hope so * fix led * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <[email protected]> * pr comments * not pass pixels and extra for low-mem tests, very flaky because of visio tower --------- Co-authored-by: Joao Gante <[email protected]>
Cemberk
pushed a commit
that referenced
this pull request
Aug 20, 2025
* fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` * mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <[email protected]> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <[email protected]> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <[email protected]> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <[email protected]> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (huggingface#39840)" This reverts commit 6dfd561. * update * Revert "remove dtensors, not explicit (huggingface#39840)" This reverts commit 6dfd561. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (#51) * Add oai fix fast tests (#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: edbeeching <[email protected]> Co-authored-by: Vaibhavs10 <[email protected]> Co-authored-by: MekkCyber <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Edward Beeching <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Lewis Tunstall <[email protected]> Co-authored-by: Zhuohan Li <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Rocketknight1 <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Akos Hadnagy <[email protected]> Co-authored-by: Ákos Hadnagy <[email protected]> Co-authored-by: Alvaro Moran <[email protected]> Co-authored-by: Lysandre <[email protected]> Co-authored-by: Matt <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR was created automatically by the Fork Maintenance System to sync changes from the downstream main into downstream develop.