-
Couldn't load subscription status.
- Fork 13.4k
llama: add initial support for Falcon-H1 model family #14534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 70 commits
991de6c
f897efd
71a6848
03568c9
0c93ef6
fdd5cff
14c37ec
8bea922
071f4b7
50eadc7
a39a842
1415cd8
243e4d1
cce3549
22de62c
2fe057c
d22b4ea
6c7d9e2
15138df
a6d0067
1fd0574
250b4f1
3ee7983
2aa48dd
9760c8b
7a25441
280dd2d
c56ec07
c4af0f3
53304c8
441d8d6
6c39e77
8c50893
49d7420
97011d7
286e1fa
b3bc1fb
a9f3a63
e96cc73
3afb2a8
0ad3502
53446f7
ae937f4
b6df0a4
935d46f
624699c
042e5ff
f74e266
632861e
084873c
fd20330
68cb784
d2f46f1
7d7da0b
67b2664
da8a338
e63ee46
d473d42
8555ee8
7846c67
2dee7cf
a846d02
f028a43
d41f111
f266d14
4bc9e0c
2834a4a
823696b
adff470
097df0e
9a048d8
52d1ef3
58e3866
d28c31a
9b92648
7fe1794
40058c0
debf4e5
212edff
90ddf24
7edf380
c3c5d51
f8d7c97
4610ee2
082ab4a
c5515e3
1ef53b3
d5efbd0
a5afc8b
99f9a3d
c3c64c3
63e3afc
d758578
8972c15
7897c21
6403caa
710630a
7b9aa7b
ecc5253
bbca33e
9f514e3
34c5d83
521e823
6943f4e
4d2c94b
b7c9a99
9fd308d
51f50bf
367d8c5
1fa361b
6dde986
94ab3a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -172,6 +172,7 @@ class SSM: | |
| TIME_STEP_RANK = "{arch}.ssm.time_step_rank" | ||
| GROUP_COUNT = "{arch}.ssm.group_count" | ||
| DT_B_C_RMS = "{arch}.ssm.dt_b_c_rms" | ||
| HEAD_DIM = "{arch}.ssm.head_dim" | ||
|
||
|
|
||
| class WKV: | ||
| HEAD_SIZE = "{arch}.wkv.head_size" | ||
|
|
@@ -288,6 +289,7 @@ class MODEL_ARCH(IntEnum): | |
| LLAMA4 = auto() | ||
| DECI = auto() | ||
| FALCON = auto() | ||
| FALCON_H1 = auto() | ||
| BAICHUAN = auto() | ||
| GROK = auto() | ||
| GPT2 = auto() | ||
|
|
@@ -660,6 +662,7 @@ class MODEL_TENSOR(IntEnum): | |
| MODEL_ARCH.DOTS1: "dots1", | ||
| MODEL_ARCH.ARCEE: "arcee", | ||
| MODEL_ARCH.ERNIE4_5: "ernie4_5", | ||
| MODEL_ARCH.FALCON_H1: "falcon_h1", | ||
younesbelkada marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| VISION_PROJECTOR_TYPE_NAMES: dict[VISION_PROJECTOR_TYPE, str] = { | ||
|
|
@@ -2211,6 +2214,40 @@ class MODEL_TENSOR(IntEnum): | |
| MODEL_TENSOR.FFN_DOWN, | ||
| MODEL_TENSOR.FFN_UP, | ||
| ], | ||
| MODEL_ARCH.FALCON_H1: [ | ||
| # Token embedding | ||
| MODEL_TENSOR.TOKEN_EMBD, | ||
|
|
||
| # Input layernorm | ||
| MODEL_TENSOR.ATTN_NORM, | ||
|
|
||
| # Attention components | ||
| MODEL_TENSOR.ATTN_Q, # Query projection | ||
| MODEL_TENSOR.ATTN_K, # Key projection | ||
| MODEL_TENSOR.ATTN_V, # Value projection | ||
| MODEL_TENSOR.ATTN_OUT, # Output projection | ||
|
|
||
| # SSM components (Mamba2 specific) | ||
| MODEL_TENSOR.SSM_IN, # Input projection for SSM | ||
| MODEL_TENSOR.SSM_CONV1D, # Convolution layer | ||
| MODEL_TENSOR.SSM_DT, # Delta time projection | ||
| MODEL_TENSOR.SSM_A, # A parameter (log form) | ||
| MODEL_TENSOR.SSM_D, # D parameter | ||
| MODEL_TENSOR.SSM_NORM, # Normalization in SSM | ||
| MODEL_TENSOR.SSM_OUT, # Output projection | ||
|
|
||
| # Pre-feedforward layernorm | ||
| MODEL_TENSOR.FFN_PRE_NORM, | ||
|
|
||
| # Feed-forward network components | ||
| MODEL_TENSOR.FFN_GATE, # Gate projection (SwiGLU) | ||
| MODEL_TENSOR.FFN_DOWN, # Down projection | ||
| MODEL_TENSOR.FFN_UP, # Up projection | ||
|
|
||
| # Post-feedforward layernorm | ||
| MODEL_TENSOR.OUTPUT_NORM, # Final layer norm | ||
| MODEL_TENSOR.OUTPUT, # Output projection (lm_head) | ||
| ], | ||
| # TODO | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have multiple hashes here?
This section should be generated by the
convert_hf_to_gguf_update.pyscript and it will be overwritten the next time we run it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we have multiple hashes here is that we use different tokenizers for each model size, hence leading to getting different hash for each size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to add all the models in the update script?
The idea is to not edit this block manually, because it will eventually get overwritten when the update script is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried quickly to adapt that script and got this diff:

We'll probably need to register one different name per model size (4 in total). We are not sure what is the preferred approach for llama.cpp, if that's an approach we want to do it's fine and we'll update it, otherwise we can add a comment explaining why we have 4 hashes here and add it by hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally fine to make this change.
Just to make sure that we are on the same page - getting different hashes here generally means that either:
The second option is OK. However, the first option is not OK.
I haven't looked what is the case for Falcon-H1, but if you can confirm that the reason for the different hashes is the second case (i.e. due to different tokens being present for the different models, but they in fact use the same tokenizer), we should be good.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explaining this - I can confirm it's the second case - we use the same tokenization algorithm (BPE) but the vocab size and the tokens inside the vocabulary for each model size is different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, about the removed lines in the diff: do not commit those. They are missing most likely because you don't have access to the respective HF repos. Only commit the new lines for Falcon-H1