-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) #22575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
af055f3
Add model with cli tool
jlamypoirier 0f08a89
Remove unwanted stuff
jlamypoirier 244b060
Add new code
jlamypoirier ec9e830
Remove inference runner
jlamypoirier 5eb70cd
Style
jlamypoirier a268f64
Fix checks
jlamypoirier f56869e
Test updates
jlamypoirier 9754129
make fixup
younesbelkada 0a9b98c
fix docs
younesbelkada a929a88
fix doc
younesbelkada 323d337
fix test
younesbelkada 3a99205
Merge remote-tracking branch 'upstream/main' into HEAD
younesbelkada 7f2703b
hopefully fix pipeline tests
younesbelkada e75785a
refactor
younesbelkada ecca83d
fix CIs
younesbelkada 379f286
add comment
younesbelkada 8d78a6a
rename to `GPTBigCodeForCausalLM`
younesbelkada 6b1df7e
correct readme
younesbelkada be2fd2f
make fixup + docs
younesbelkada 9a5e58c
make fixup
younesbelkada 361c4c5
fixes
younesbelkada e4e289d
fixes
younesbelkada efa631d
Remove pruning
jlamypoirier f8d8946
Remove import
jlamypoirier c96304d
Doc updates
jlamypoirier a89d08f
More pruning removal
jlamypoirier 7f703c1
Combine copies
jlamypoirier 755088f
Single MQA implementation, remove kv cache pre-allocation and padding
jlamypoirier 7218190
Update doc
jlamypoirier b2a9e70
Revert refactor to match gpt2 style
jlamypoirier 4323e75
Merge back key and value caches, fix some type hints
jlamypoirier 2d389b3
Update doc
jlamypoirier 3f4f9d1
Fix position ids pith padding (PR 21080)
jlamypoirier 19d93f1
Add conversion script temporarily
jlamypoirier 194e92d
Update conversion script
jlamypoirier b53b92d
Remove checkpoint conversion
jlamypoirier 928da1d
New model
jlamypoirier da87548
Fix MQA test
jlamypoirier 09e5460
Fix copies
jlamypoirier 013ef2b
try fix tests
younesbelkada 48bdd68
Merge remote-tracking branch 'upstream/main' into HEAD
younesbelkada 656aed5
FIX TEST!!
younesbelkada 16338e4
remove `DoubleHeadsModel`
younesbelkada 0e2c0db
add MQA tests
younesbelkada b3eb76a
add slow tests
younesbelkada a3693ae
clean up
younesbelkada 5316d8c
add CPU checker
younesbelkada 0057454
final fixes
younesbelkada d58614e
fixes
younesbelkada 3b6faf4
fix final issue
younesbelkada 7888b6e
Simplify and comment baddbmm fix
jlamypoirier 49d556c
Remove unnecessary code
jlamypoirier 2b64dc2
Transpose tweaks
jlamypoirier fc6a6f4
Use beta=1 on cpu, improve tests
jlamypoirier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not the most appropriate since it's not really a new model and the code wasn't released with the paper, but I'm ok to leave it as it if we really need a reference.