-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Add jamba #29943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add jamba #29943
Changes from 9 commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
16b561d
Add jamba arch
tomeras91 2e7fbe4
Merge branch 'main' into add-jamba
tomeras91 5b84cbe
apply "make fix-copies" changes
tomeras91 b2f12fc
fix link to model in JambaConfig docstring
tomeras91 5f48e7b
Add n_ctx in modeling file because repo-consistency wants that
tomeras91 f2bbe6d
Add jamba to flash attention and sdpa documentation
tomeras91 5ec508e
mamba dt_proj quant fix now works for LoRA as well
tomeras91 35caa4f
Merge branch 'main' into add-jamba
tomeras91 240c577
override test_left_padding_compatibility and use a more permissive to…
tomeras91 783a1ac
add jamba to tokenization auto
tomeras91 b0c9d7c
Merge branch 'main' into add-jamba
tomeras91 56183b4
fix comments of shape (PR #24 in the model page: https://huggingface.…
tomeras91 59d832a
simple PR fixes
tomeras91 ce8b476
remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMa…
tomeras91 810dfbf
remove the LoRA hack for the mamba dt_proj bias. It was solved in hug…
tomeras91 b03a83d
Add copied comment on JambaMLP (it's the same as MixtralMLP)
tomeras91 9bd48ef
remove padding_mask warnings. It's not supported anymore
tomeras91 9c164dc
fix docstring. Float instead of int
tomeras91 3a1ef30
A few more minor PR fixes
tomeras91 16b397f
(1) lowercase names for mamba layernorms (2) remove _apply_inner_laye…
tomeras91 a272515
Return None attention weights from mamba layers. Append to all attent…
tomeras91 16cff22
remove some leftover jamba archive lists
tomeras91 4c044b2
Merge branch 'main' into add-jamba
tomeras91 f833e25
Better separation between expert vs non-expert layers. non-expert lay…
tomeras91 f368f8d
no need to take router_logits at config.expert_layer_offset anymore. …
tomeras91 a9342a2
Add Jamba paper on READMEs
tomeras91 40432cf
(1) rename n_ctx -> max_position_embeddings (2) don't use it in the m…
tomeras91 c0ef620
Add copied from comment
tomeras91 f980caa
remove the code path for apply_inner_layernorms=False. Jamba always h…
tomeras91 f9573b9
clearer docstring for _convert_to_standard_cache
tomeras91 21c43bd
style fixes
tomeras91 a425233
Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (in…
tomeras91 2090176
Merge branch 'main' into add-jamba
tomeras91 12d9914
rename test so it still overrides what its meant to override
tomeras91 c53439c
draft
ArthurZucker e3801cb
oups
ArthurZucker 8f7f1ad
nit
ArthurZucker c9c254a
remove more complexe logic
ArthurZucker 574e68e
fix names used in config
ArthurZucker b3d37a1
fix fix fix
ArthurZucker 5e9523c
style
ArthurZucker 0898ddc
fix some more failing tests
ArthurZucker 65bfbee
generate did not init the cache 🙃
ArthurZucker 3764de0
more small nits
ArthurZucker d7d64a7
typo
ArthurZucker e1ada1d
config.mamba_expand * config.hidden_size for the intermediate size o…
ArthurZucker 73603a2
fix init of pkv with torch.tensor()
ArthurZucker a0f92cb
empty tensor
ArthurZucker 9cce32b
fix some init issues
ArthurZucker 61ab3bc
stupid changes required by generate because it does not even support …
ArthurZucker 6c01417
Merge branch 'main' of github.com:huggingface/transformers into updat…
ArthurZucker a8982c5
more fixes
ArthurZucker ebbace3
Merge branch 'main' into add-jamba
tomeras91 82be569
fix general assisted gen cache_position bug
gante d7594d6
tests passing
gante bb5266a
Merge branch 'update-jamba' into add-jamba
tomeras91 7e8ac81
Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_att…
tomeras91 997be2c
fix reorder_cache to reorder mamba states and override some more func…
tomeras91 1f475b2
no need to override test_past_key_values_format() and _check_past_key…
tomeras91 a252fe0
fix docstrings and typehints for past_key_values
tomeras91 c9f094a
style fixes
tomeras91 5aace7c
fix docs
tomeras91 1b3f224
change typehint due to copy from Mixtral
tomeras91 1e87c88
forgot import
tomeras91 ae7f7fb
import order
tomeras91 e71421c
Merge branch 'main' into add-jamba
tomeras91 5e0244d
Add configuration_jamba and modeling_jamba to not_doctested because t…
tomeras91 5c03163
Add integration test with tiny tandom Jamba model on hub
tomeras91 7b15866
fix flash attention cache shapes
tomeras91 e9d227b
bring back forgotten hidden states
tomeras91 d1ae4fd
rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous…
tomeras91 a3e8094
align integration test after modeling fixes
tomeras91 a0a8d8c
bugfix - mamba can use precomputed states only of forward pass is on …
tomeras91 122c696
bugfix - mamba can use precomputed states only if they match the batc…
tomeras91 ab2a0d3
typo
tomeras91 6252603
Merge branch 'main' into add-jamba
tomeras91 aabe99d
remove making _prepare_4d_causal_attention_mask a leaf function
tomeras91 886e8c8
stop using past_seq_len.get_seq_length(). Use cache positions instead…
tomeras91 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fill!