Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "adding OnlineSampleMapping" #8164

Merged
merged 1 commit into from
Jan 13, 2024

Conversation

pablo-garay
Copy link
Collaborator

@pablo-garay pablo-garay commented Jan 12, 2024

Reverts #8137

Nemo pipeline is broken (finetune tests failing) due to this. Hence revert.

More context/details:

  • two gpt3 fine-tunning tests failed on EOS (finetune.gpt3.126m_tp1_pp1_1node_100steps_squad and finetune.gpt3.400m_improved_tp1_pp1_1node_100steps_squad). Check out the error log, it’s not the fault of context parallelism. The error comes from dataset, which is introduced by this PR adding OnlineSampleMapping #8137

10:43

Above problematic PR was merged after my context parallelism PR. So for sanity check, I also ran a pipeline with the NeMo commit ID of context parallelism merging (i.e., without above problematic PR)
you can see those two fine-tuning test passed, so I think context parallelism code works fine

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper
Copy link
Collaborator

jenkins

@pablo-garay pablo-garay force-pushed the revert-8137-adithyare/OnlineSampleMapping branch from 91d5685 to 301c326 Compare January 12, 2024 21:36
@arendu
Copy link
Collaborator

arendu commented Jan 13, 2024

Is there a CI test which uses context parallel? @pablo-garay @ericharper
cc @michalivne

@pablo-garay
Copy link
Collaborator Author

jenkins

@arendu
Copy link
Collaborator

arendu commented Jan 13, 2024

@xrennvidia yes but w/o CI its hard to debug/iterate on the PR.

@pablo-garay
Copy link
Collaborator Author

jenkins

@pablo-garay pablo-garay merged commit c30536d into main Jan 13, 2024
15 checks passed
@pablo-garay pablo-garay deleted the revert-8137-adithyare/OnlineSampleMapping branch January 13, 2024 23:41
@pablo-garay pablo-garay self-assigned this Jan 13, 2024
@NVIDIA NVIDIA deleted a comment from pablo-garay Jan 14, 2024
@NVIDIA NVIDIA deleted a comment from xrennvidia Jan 14, 2024
@NVIDIA NVIDIA deleted a comment from arendu Jan 14, 2024
@NVIDIA NVIDIA deleted a comment from xrennvidia Jan 14, 2024
@NVIDIA NVIDIA deleted a comment from arendu Jan 14, 2024
minitu pushed a commit to minitu/NeMo that referenced this pull request Jan 19, 2024
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
Signed-off-by: Pablo Garay <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
pablo-garay added a commit that referenced this pull request Mar 19, 2024
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants