models: new MPT model file without duplicated token_embd.weight #2006

cebtenzzre · 2024-02-22T22:20:33Z

Building on ggml-org/llama.cpp#4978 and ggml-org/llama.cpp#5650, I was finally able to implement a version of ggml-org/llama.cpp#3626 that upstream was satisfied by in ggml-org/llama.cpp#5670.

Now MPT Chat has gone from 3.64 GiB to 3.54 GiB on disk, without breaking upstream compatibility in either direction.

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre · 2024-02-26T18:16:01Z

Due to the model3.json change, I'll hold off on merging this until we're ready to make a new release.

Signed-off-by: Jared Van Bortel <[email protected]>

models: new MPT model file without duplicated token_embd.weight

fbef3a7

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre requested a review from manyoso February 22, 2024 22:20

manyoso approved these changes Feb 26, 2024

View reviewed changes

cebtenzzre added 2 commits March 8, 2024 17:11

models3.json: use removedIn to keep old MPT model

81703da

Signed-off-by: Jared Van Bortel <[email protected]>

models3.json: bump MPT Chat requires since this PR missed v2.7.2

0b938ee

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre merged commit 5c248db into main Mar 8, 2024

cebtenzzre deleted the mpt-tied-output branch March 8, 2024 22:18

dlippold mentioned this pull request May 9, 2024

[Feature] Crash: Support old MPT GGUF conversions with duplicated output tensor #2329

Open

Provide feedback