Conversation
|
Thanks for the PR! The general approach LGTM, but I do have one question. Have you verified that the attention mask is correct here? When I step through the forward method, it looks like we end up calling scaled dot product attention here, and from what I can see, |
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
…{Dict, List, Tuple} to primitive dict, list tuple
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> wip Signed-off-by: Terry Kong <terryk@nvidia.com> fix it Signed-off-by: Terry Kong <terryk@nvidia.com> patthing fix Signed-off-by: Terry Kong <terryk@nvidia.com> wip Signed-off-by: Terry Kong <terryk@nvidia.com> doesn't look like i needed that Signed-off-by: Terry Kong <terryk@nvidia.com> fix Signed-off-by: Terry Kong <terryk@nvidia.com> revert stuff Signed-off-by: Terry Kong <terryk@nvidia.com> make it better Signed-off-by: Terry Kong <terryk@nvidia.com> go Signed-off-by: Terry Kong <terryk@nvidia.com> cleanup Signed-off-by: Terry Kong <terryk@nvidia.com> mix it up Signed-off-by: Terry Kong <terryk@nvidia.com> touch up Signed-off-by: Terry Kong <terryk@nvidia.com> clean Signed-off-by: Terry Kong <terryk@nvidia.com> better Signed-off-by: Terry Kong <terryk@nvidia.com> clean up Signed-off-by: Terry Kong <terryk@nvidia.com> add it in Signed-off-by: Terry Kong <terryk@nvidia.com> mcore extra Signed-off-by: Terry Kong <terryk@nvidia.com> instructions Signed-off-by: Terry Kong <terryk@nvidia.com> works Signed-off-by: Terry Kong <terryk@nvidia.com> revert to 3.10, 3.12 didn't seem necessary Signed-off-by: Terry Kong <terryk@nvidia.com> ci has to recursively clone Signed-off-by: Terry Kong <terryk@nvidia.com> bump build workflow Signed-off-by: Terry Kong <terryk@nvidia.com> add megatron.core import Signed-off-by: Terry Kong <terryk@nvidia.com> potential fix for unit test on CI Signed-off-by: Terry Kong <terryk@nvidia.com> fix the test Signed-off-by: Terry Kong <terryk@nvidia.com> this should fix test (it was a collision of namespace) Signed-off-by: Terry Kong <terryk@nvidia.com> remove fp8 from test Signed-off-by: Terry Kong <terryk@nvidia.com> add shallow Signed-off-by: Terry Kong <terryk@nvidia.com> fix base build Signed-off-by: Terry Kong <terryk@nvidia.com> fix instructions Signed-off-by: Terry Kong <terryk@nvidia.com> fix the messed up indenting Signed-off-by: Terry Kong <terryk@nvidia.com> fix Signed-off-by: Terry Kong <terryk@nvidia.com> try nesting Signed-off-by: Terry Kong <terryk@nvidia.com> okay, got it to work Signed-off-by: Terry Kong <terryk@nvidia.com> fix up the readme Signed-off-by: Terry Kong <terryk@nvidia.com> ok Signed-off-by: Terry Kong <terryk@nvidia.com> touchup Signed-off-by: Terry Kong <terryk@nvidia.com> add the lock file back Signed-off-by: Terry Kong <terryk@nvidia.com> got Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
88cefb4 to
e7e4038
Compare
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Convergence graphs and logs are pending some runs. |
aeb36e9 to
f2aa89b
Compare
|
@ahmadki |
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
f2aa89b to
e078be1
Compare
resolved |
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
363c0bc to
0d2f2c0
Compare
| pad_packed_seq_to=pad_full_seq_to, | ||
| ) | ||
| ) | ||
| input_ids = input_ids |
There was a problem hiding this comment.
@SahilJain314 Do you mind taking another look at this code block.
input_ids = input_ids doesn't make sense
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
929c9bb to
344275c
Compare
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
|
closing since newer PR makes this one obsolete #704 |
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
convergence looks good between megatron+sequence packing (gray) and megatron without
Additional runs can be found in this wandb project.