Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SpeechLM to main #8741
Add SpeechLM to main #8741
Changes from 89 commits
d0fd89b
9f117ee
61ac56c
b5cb783
f08cb21
cdf354c
66a85c9
de52d95
7ff8b08
1647612
fac7103
49b1aea
6db4097
9af84fb
08b26b5
8f3957f
c90625e
41678cc
192aa06
9ba5277
6e9df3d
5807931
511bb75
d88c137
8961cdf
a5e02a8
31c5b3d
d5d600d
2baef81
215cb9d
eed65b7
2ce0592
8a48480
e0b88f2
209f752
01dd0d6
528d1bf
d94f9dd
dbad4ac
73736ad
94bd346
8afd277
78c1e8e
2e74cd1
5ff28a1
e1e825f
99fb448
446c6d9
3d78dd7
0916850
db542b4
fe7214b
916324e
98f86b5
555a007
8f524e3
619d75d
c3ca938
76db149
f7afea1
c99ad43
7c9ded7
4c4ac20
afbc212
6bce450
b3f6156
f63b8b8
9dd72b6
11facc7
3da8282
179fafd
14c1334
98a0143
7dbe84d
3a039f5
55c9e04
073212b
ba86fb9
fdfe7b5
18b2921
fef24dc
c532150
21d4261
c2f6b78
647e184
36df825
f6a90d1
d73a684
3dea3ce
aa4f85b
9e10694
360acd4
6449924
52617f9
7c78165
ed29843
5a4be92
c991e5b
79156fc
0268898
b6cac3d
8b19dc5
960f958
2e18366
8043262
55f8231
d9e2788
3cd12e9
30a583a
55b270b
1c4cbd7
3e88457
17ab55b
6cae145
4cfaa30
27e33ee
67ecaa1
89926fa
5c7f18e
cdcb258
1afc6e1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor into more hierarchical structure inside conf/*/**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this naming scheme seems a bit off. Both are pretrained but one has pretrained name and other has restore_from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes...
restore_from_path
is a megatron default name and we shouldn't change that...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does
names
correspond to here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names of the datasets, which will be used to display and log the results for each manifest. this is useful when the manifest name does not show the dataset's name. added clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a question file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's defined in a previous paragraph "The
question
field in the manifest is optional, and you can put a list of questions in a question file then set++model.data.train_ds.question_file=<path to to question file>
to ask the dataloader to randomly pick a question from the file for each audio sample."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While script is fine, in future pr, lets look into minimizing the script as much as possible, and if possible reduce all the unnecessary stuff and have clean pythonic API to do inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this published already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet, but will be soon, before the PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not keep this at the conf/* level - move it to conf/*/** level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@titu1994 do we need to add LICENSE for configs and notebooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to configs usually. Notebooks - hmm not needed there either (at least we haven't done it before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some info on what this config is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: why not
-1
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still want to use epoch based training and want the wandb to log epoch info, but megatron doesn't not work properly on LR scheduling with max_steps=-1, so we set a large max_steps to make sure LR is correct and also use max_epoch to log the epoch info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying, with epoch=-1, there is an issue while logging the epoch number to wandb?