-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention #6666
Merged
Merged
Changes from 85 commits
Commits
Show all changes
571 commits
Select commit
Hold shift + click to select a range
a8564d3
move to nvidia megatron repo (#6465) (#6475)
github-actions[bot] 7a17f73
Megatron KERPLE positional embeddings (#6478) (#6480)
github-actions[bot] a67b00f
Fix an invalid link in get_data.py of ljspeech (#6456)
pythinker 1e1fbbe
1. Added external index sample. (#6462) (#6483)
github-actions[bot] 4561e12
Update README to add core installation (#6488) (#6489)
github-actions[bot] 599f522
Fix cache aware hybrid bugs (#6466) (#6484)
github-actions[bot] ae4a4dd
Fix typos (#6494) (#6495)
github-actions[bot] df2b870
Add disclaimer about dataset for ASR (#6496)
titu1994 0c85e21
fix (#6502)
Jorjeous 24c77d0
fix broken links r1.18.0 (#6501) (#6504)
github-actions[bot] 07f6533
[TTS] Create functions for TTS preprocessing without dataloader (#6317)
rlangman 8bffc80
Cache aware streaming nfa (#6209)
Slyne 6b84a8a
[BugFix] Force _get_batch_preds() to keep logits in decoder timestamp…
tango4j 56ce2a6
[TTS] Fix FastPitch energy code (#6511)
rlangman b460716
fix custom forward_torch_softmax (#6512) (#6517)
github-actions[bot] 319b191
[TTS] fixed broken path. (#6514) (#6518)
github-actions[bot] 2dd91fa
Fix normalization of impulse response in ImpulsePerturbation (#6505)
anteju d0e2f5a
Add interleaved pp support (#6498)
titu1994 3cff6ce
Fix typos (#6523)
titu1994 c2a4264
New noise_norm perturbation based on Riva work (#6445)
trias702 669a8c2
[TTS] Add script for computing feature stats (#6508)
rlangman 798978d
Add Frame-VAD model and datasets (#6441)
stevehuang52 cb53ede
Support dynamic length batches with GPT SFT (#6510)
aklife97 1217668
added back the fast emit section to the configs. (#6540) (#6542)
github-actions[bot] 5090a94
removing unnessary avoid_bfloat16_autocast_context (#6481)
bmwshop b2f23bd
FC models in menu (#6473)
bmwshop 6c77583
[TTS] Add tutorials for FastPitch TTS speaker adaptation with adapter…
hsiehjackson ce84b1f
[TTS] Create initial TTS dataset feature processors (#6507)
rlangman 8bbc140
fix (#6529) (#6546)
github-actions[bot] dc0c332
Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, B…
github-actions[bot] 42691c3
Add scores for FastConformer models (#6557) (#6558)
github-actions[bot] e7f2210
Fix fp16 (#6543) (#6544)
github-actions[bot] 69b2c34
Patch transcribe and support offline transcribe for hybrid model (#65…
github-actions[bot] 24076ca
Fix notebook bad json (#6561)
titu1994 b41a511
Change Megatron Enc Dec model to use persistent_workers (#6548) (#6552)
github-actions[bot] 77369ef
Make KenLM with PC for AggregateTokenizer and merge it (#6081)
karpnv fa62794
fix for running on 1 GPU.
khcs 3817d41
temp rtd fix (#6568) (#6569)
github-actions[bot] a57ec70
[TTS] Add script for mapping speaker names to indices (#6509)
rlangman 5fd9c7f
whitespace (#6574)
karpnv 04c1b72
Update manifest.py for speedup (#6565) (#6573)
github-actions[bot] c13ffb9
More streaming conformer export fixes (#6567) (#6578)
github-actions[bot] 846fc83
user selected max_seq_len should be less than model's max_seq_len (#6…
github-actions[bot] c19aac5
Framework for PEFT via mixins (#6391)
arendu fba50b8
cache and reuse inputs (#6422) (#6452)
github-actions[bot] d0785d5
Add patches for Virtual Parallel conversion (#6589)
titu1994 c7f58d8
Pass `.scale` instead of scaler object to core (#6551)
github-actions[bot] 58440fb
Documentation for ASR-TTS models (#6594) (#6595)
github-actions[bot] aa2b9b8
[TTS] Fix aligner nan loss in fp32 (#6435)
hsiehjackson cf60b6c
Update SDP docs (#6485) (#6596)
github-actions[bot] 3c1147f
Bug/typo fixes (#6599)
Kipok 08ab1a7
Manual garbage collection with an interval (#6469) (#6482)
github-actions[bot] 3ed0282
Make tensor split contiguous (#6580) (#6593)
github-actions[bot] a9d2910
[ASR] Fix for old models in change_attention_model (#6608)
sam1373 077b7f9
Update manifest.py to use os.path for get_full_path (#6598)
stevehuang52 9eed6d3
Cherry pick commits in #6601 to main (#6611)
fayejf 77b9a85
Create dummy iters to satisy len checks (#6600) (#6603)
github-actions[bot] 9f367f4
add GPT eval mode fix for interleaved to main (#6610)
aklife97 8592562
Fix batch size reconf for T5 FT for multi-validation (#6582) (#6588)
github-actions[bot] b3f5f39
Not doing CastToFloat by default (#6524) (#6563)
github-actions[bot] 09f2e37
Turn autocast off when precision is fp32 (#6576)
github-actions[bot] 2a446cb
update core commit hash in readme (#6622) (#6623)
github-actions[bot] 2cc0f62
add hat image to docs (#6619) (#6621)
github-actions[bot] 94e6e25
Allow indices exchange via distributed (#6618) (#6624)
github-actions[bot] 7f48130
Offline and streaming inference support for hybrid model (#6570)
fayejf c44e3b6
Patch decoding for PC models (#6630) (#6631)
github-actions[bot] ef49b0a
Fix wer.py where 'errors' variable was not set (#6633) (#6634)
github-actions[bot] 1b785e2
Restore GPT support for interleaved pipeline parallelism (#6528) (#6613)
timmoon10 44e890e
Add FA
hsiehjackson a5fcbee
Fix XPOS
hsiehjackson aedcc7c
Add warning
hsiehjackson 7fbf571
Fix bugs
hsiehjackson ddb067e
Fix attention
hsiehjackson 81a8c21
Fix comment
hsiehjackson 36d685b
Fix cast dtype
hsiehjackson a1d1e5a
Undo xpos
hsiehjackson 2eaa60a
bugfix (#6636)
fayejf 5eb3552
Disable interctc tests (#6638)
Kipok 4e94268
Add megatron_core to requirements (#6639) (#6640)
github-actions[bot] 56847f3
Remove from jenkins (#6642)
github-actions[bot] 986feed
sft model can use this script for eval (#6637)
arendu 6d2c969
[TTS] Fix TTS audio preprocessing bugs (#6628)
rlangman 954d43f
Move black parameters to pyproject.toml (#6647)
artbataev 11c58f3
ASR-TTS Models: Support hybrid RNNT-CTC, improve docs. (#6620)
artbataev db7d578
fix conversion and eval (#6648)
arendu acb2c56
Confidence ensembles implementation (#6614)
Kipok 1b28a7b
Patch memory used for NeMo Megatron models (#6615)
titu1994 6fb6e47
handle artifacts when path is dir (#6658)
arendu 4ccba61
remove upgrading setuptools in reinstall.sh (#6659)
XuesongYang 82d5d58
merge lora weights into base model (#6597)
arendu 89b428c
upgrade to 23.04 (#6660)
ericharper 9683d02
Merge r1.18.0 bugfixes and doc updates to main (#6655)
ericharper c648d99
Confidence ensembles: fix issues and add tuning functionality (#6657)
Kipok f736f60
[TTS] Implement new TextToSpeech dataset (#6575)
rlangman 4e7afbb
Dialogue dataset (#6654)
yidong72 7e62925
Add support for RNNT/hybrid models to partial transcribe (#6609)
stevehuang52 e009385
eval_beamsearch_ngram.py with hybrid ctc (#6656)
karpnv c5e229a
fix bucketing bug issue for picking new bucket (#6663)
nithinraok 9d7d0b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b739a5e
Add t5 flash-attention
hsiehjackson 473ff20
PE refactor (#6673)
hsiehjackson 4a0699d
Add singleton alibi
hsiehjackson 9cfea92
Fix FA mask
hsiehjackson 8c3bfbd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9d01255
singleton PE
hsiehjackson 8a6e294
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8bd1466
Fix attn bias inference
hsiehjackson 0e02478
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8ed6a0a
fix eval
ekmb a6b856c
[TTS] Add callback for saving audio during FastPitch training (#6665)
rlangman 213b5a3
update batch size recommendation to min 32 for 43b (#6675)
Zhilin123 1b93141
Make Note usage consistent in adapter_mixins.py (#6678)
BrianMcBrayer d2938b9
Fix masking bug for TTS Aligner (#6677)
redoctopus 1564d94
[ASR] Adding ssl config for fast-conformer (#6672)
krishnacpuvvada 82f863b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8b55842
Fix xpos offset
hsiehjackson fbdd7fe
Fix sequence parallel
hsiehjackson 8535a6a
Fix parallel
hsiehjackson 873f2e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7847a54
Uncomment correct bias size
hsiehjackson 4aa46d7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7514bf4
Remove unused module
hsiehjackson 9a133d0
Fix singleton tril
hsiehjackson 5ce3819
Fix kerple/sandwitch rename xpos
hsiehjackson bbee276
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] de61214
fix sandwich
hsiehjackson dcab11e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cd3bb6d
Add unitest
hsiehjackson 4fac042
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 129e55d
Fix bug
hsiehjackson 3b5ec97
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b2eb222
Add requirements
hsiehjackson c73f983
Remove requirements
hsiehjackson 06ce313
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8c969fe
Remove requirement flash-attn
hsiehjackson f70cc3f
Fix FA causal for inference
hsiehjackson a0cea83
Add experimental PE
hsiehjackson c7c6a1b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6876703
Update all invalid tree references to blobs for NeMo samples (#6679)
BrianMcBrayer 6c65625
Update README.rst about container (#6686)
fayejf 456153a
Fix a bug, use _ceil_to_nearest instead as _round_to_nearest is not d…
github-actions[bot] 69992d6
Enable ONNX export of 5B GPT trained with TE FP8 modules (#6458)
asfiyab-nvidia 4ee6d8f
[TTS] Add script for text preprocessing (#6541)
rlangman c856936
[TTS] Fix adapter duration issue (#6697)
hsiehjackson b70dbf7
karpnv/issues6690 (#6705)
karpnv 1a66d30
Limit codeql scope (#6710)
titu1994 ff772f7
eval fix (#6685)
arendu 2231a57
Fix k2 installation in Docker with CUDA 12 (#6707) (#6709)
github-actions[bot] 8b3dce5
[TTS] Filter out silent audio files during preprocessing (#6716)
rlangman 963855b
not pinning version (#6680)
yidong72 b0f33f1
Tutorial fixes (#6717) (#6718)
github-actions[bot] a4ef711
preprocess squad in sft format (#6727)
arendu da5e6f8
Fix Codeql (#6731)
titu1994 2c35e0b
[TTS] fix inconsistent type hints for IpaG2p (#6733)
XuesongYang 2bac13d
VP Fixes for converter + Config management (#6698)
titu1994 5831405
Graph RNNT: Grid- and Compose-Transducer. W-Transducer loss (#6168)
artbataev 2e963da
Fix fastpitch test nightly (#6730)
hsiehjackson 7f83283
Fix for interctc test random failure (#6644)
Kipok 599c503
check for first or last stage (#6708) (#6743)
github-actions[bot] 0725b2d
sharded manifests docs (#6751)
bmwshop bdeab5b
[TTS] relax hardcoded prefix for phonemes and tones and infer phoneme…
XuesongYang 146371b
[TTS] corrected misleading deprecation warnings. (#6702)
XuesongYang 8f43ae3
Bug fix to restore act ckpt (#6753) (#6755)
github-actions[bot] 7daad62
Bug fix to reset sequence parallelism (#6756) (#6770)
github-actions[bot] 49e016e
Fix TTS adapter tutorial (#6741)
hsiehjackson 34f5452
Fix checkpointed forward and add test for full activation checkpointi…
github-actions[bot] c022acb
lora notebook (#6765)
arendu e98f425
Fix Links (#6777) (#6778)
github-actions[bot] bcb3fd3
Remove alibi tril
hsiehjackson 71bff2f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2e6eba5
Add flash-attn requirement
hsiehjackson 424a15d
revert sft dataset changes
ekmb e79a35a
Move flash-attn requirement
hsiehjackson 4c953aa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0b18768
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8863360
Add install
hsiehjackson 2dc0418
peft eval directly from ckpt (#6785)
arendu 1353aca
Add Frame-VAD examples and utils (#6463)
stevehuang52 b8d19b2
[TTS][zh] refine hardcoded lowercase for ASCII letters. (#6781)
XuesongYang 7ad325d
Revert evaluation
hsiehjackson b875a78
Revert evaluation
hsiehjackson 1f229c0
Fix
hsiehjackson 26dbc9f
Fix gpu
hsiehjackson a3cf08e
Spellchecking ASR customization model (#6179)
bene-ges 90ef33a
Fix test
hsiehjackson 380a6f2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b69cbf7
Fix device
hsiehjackson de52c2d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8dc863b
Fix conflict
hsiehjackson 29c3cd4
Merge branch 'main' into gpt-alibi-FA
hsiehjackson e782202
Revert
hsiehjackson 7f40a05
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson d814f47
clean
hsiehjackson 65118c4
Change device
hsiehjackson 89d4547
Change device
hsiehjackson 9c50e29
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 218ffa3
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 84acce0
Add test FA
hsiehjackson 874f992
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 35ac850
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 98783ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 08dbd86
Add CI
hsiehjackson bdfe61e
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 6df2df8
Fix yaml order
hsiehjackson 1f460d9
Test random attention mask
hsiehjackson 01f4391
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 23634bf
Add install FA for tests
hsiehjackson 4cfb2da
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 528c416
cherry pick 6788 (#6816)
ekmb a751928
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson ee692d4
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 5178f6b
Support 2D mask
hsiehjackson 45876ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1c15644
add missing comp_att_mask arg
ekmb 74da509
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
ekmb 5da1bc3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] fb895da
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 81d2fb0
Fix code ql
hsiehjackson b578ff5
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 82120c3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a9bb73e
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 662733b
Megatron MPT-7B Support (#6804)
trias702 6b18be2
Fix test triton
hsiehjackson bdd91d6
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 92e7dba
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] bb89e61
Update FA in CI
hsiehjackson 672f262
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 2a526ad
Fix Jenkin error
hsiehjackson 0ac5374
Resume with FA
hsiehjackson 7acf5cf
Follow comments
hsiehjackson cdff779
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] dcab29d
Merge branch 'main' into gpt-alibi-FA
hsiehjackson aba44ae
Fix README
hsiehjackson 7c0a530
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 194b4bb
Fix README
hsiehjackson fe173c1
Remove torch.cuda
hsiehjackson 7c38447
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 1104174
Remove unused import
hsiehjackson a3010bd
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 81b002e
Merge branch 'main' into gpt-alibi-FA
hsiehjackson a883aa2
kerple init
hsiehjackson 6a895f0
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 0504814
Merge branch 'main' into gpt-alibi-FA
hsiehjackson 889dec6
Add TE comment
hsiehjackson fd2899a
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 7255e31
Merge branch 'main' into gpt-alibi-FA
hsiehjackson c972553
Merge branch 'main' into gpt-alibi-FA
hsiehjackson b8b5611
Fix error when inference.compute_attention_mask=False
hsiehjackson 83ef08d
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson 498ec3d
Merge branch 'main' into gpt-alibi-FA
michalivne File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be pinned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FA has fixed triton version currently. I have tried other version which will raise errors.
https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py#L3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericharper are you OK with moving forward with this setup?