Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes to bump mcore #11600

Conversation

ko3n1g
Copy link
Collaborator

@ko3n1g ko3n1g commented Dec 15, 2024

🚀 PR to bump NVIDIA/Megatron-LM in Dockerfile.ci to MCORE_TAG=71c394b172ce4f1b9466e1d728bad5cc6314d15d.

📝 Please remember the following to-do's before merge:

  • Verify the presubmit CI

🙏 Please merge this PR only if the CI workflow completed successfully.

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
@github-actions github-actions bot added the CI label Dec 16, 2024
@ko3n1g ko3n1g removed the Run CICD label Dec 17, 2024
@github-actions github-actions bot removed the CI label Dec 17, 2024
Signed-off-by: Oliver Koenig <[email protected]>
@ko3n1g ko3n1g changed the title chore(beep boop 🤖): Bump MCORE_TAG=71c394b... (2024-12-15) Some fixes to bump mcore Dec 21, 2024
@ko3n1g ko3n1g changed the base branch from main to bump-ci-container--NVIDIA-Megatron-LM-2025-01-03 January 3, 2025 10:36
…bump-ci-container--NVIDIA-Megatron-LM-2024-12-15

Signed-off-by: oliver könig <[email protected]>
@ko3n1g ko3n1g merged commit 778ea30 into bump-ci-container--NVIDIA-Megatron-LM-2025-01-03 Jan 3, 2025
9 of 14 checks passed
@ko3n1g ko3n1g deleted the bump-ci-container--NVIDIA-Megatron-LM-2024-12-15 branch January 3, 2025 10:37
ko3n1g added a commit that referenced this pull request Jan 7, 2025
* chore(beep boop 🤖): Bump `MCORE_TAG=076972e...` (2025-01-03)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Support attention backend configuration changes (#11517)

* remove nvte attention flags from test_nemo_resume_from_ckpt

Signed-off-by: Ananth Subramaniam <[email protected]>

* cherry pick 3410df6

Signed-off-by: Ananth Subramaniam <[email protected]>

* set local attention in config

Signed-off-by: Ananth Subramaniam <[email protected]>

* retro config attention backend setting

Signed-off-by: Ananth Subramaniam <[email protected]>

* set both

Signed-off-by: Ananth Subramaniam <[email protected]>

* update unfused

Signed-off-by: Ananth Subramaniam <[email protected]>

* gemma2b changes too

Signed-off-by: Ananth Subramaniam <[email protected]>

* replace more usages

Signed-off-by: Ananth Subramaniam <[email protected]>

* more test updates

Signed-off-by: Ananth Subramaniam <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ananthsub <[email protected]>

* update unfused

Signed-off-by: Ananth Subramaniam <[email protected]>

* remove duplicate gemma setting

Signed-off-by: Ananth Subramaniam <[email protected]>

* remove gemma2b fused attn env vars

Signed-off-by: Ananth Subramaniam <[email protected]>

* local for testing

Signed-off-by: Ananth Subramaniam <[email protected]>

* update conftest to reset environment variables, use unfused for L2_Megatron_GPT_PEFT_Lora_TP2SP1

Signed-off-by: Ananth Subramaniam <[email protected]>

---------

Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: ananthsub <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: oliver könig <[email protected]>

* Some fixes to bump mcore (#11600)

* chore(beep boop 🤖): Bump `MCORE_TAG=71c394b...` (2024-12-15)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* ci: Add `no-fail-fast` mode

Signed-off-by: Oliver Koenig <[email protected]>

* fix _get_layer_offset api for mllama

Signed-off-by: yaoyu-33 <[email protected]>

* bump

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

* Use empty dict instead of none to load only metadata from dist ckpt

due to change in mcore commit NVIDIA/Megatron-LM@31e8bfa

Signed-off-by: Chen Cui <[email protected]>

* remove mcore-inserted env vars

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Add raising=False for delenv

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: ananthsub <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Ananth Subramaniam <[email protected]>
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Abhishree <[email protected]>
abhinavg4 pushed a commit that referenced this pull request Jan 30, 2025
* chore(beep boop 🤖): Bump `MCORE_TAG=076972e...` (2025-01-03)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Support attention backend configuration changes (#11517)

* remove nvte attention flags from test_nemo_resume_from_ckpt

Signed-off-by: Ananth Subramaniam <[email protected]>

* cherry pick 3410df6

Signed-off-by: Ananth Subramaniam <[email protected]>

* set local attention in config

Signed-off-by: Ananth Subramaniam <[email protected]>

* retro config attention backend setting

Signed-off-by: Ananth Subramaniam <[email protected]>

* set both

Signed-off-by: Ananth Subramaniam <[email protected]>

* update unfused

Signed-off-by: Ananth Subramaniam <[email protected]>

* gemma2b changes too

Signed-off-by: Ananth Subramaniam <[email protected]>

* replace more usages

Signed-off-by: Ananth Subramaniam <[email protected]>

* more test updates

Signed-off-by: Ananth Subramaniam <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ananthsub <[email protected]>

* update unfused

Signed-off-by: Ananth Subramaniam <[email protected]>

* remove duplicate gemma setting

Signed-off-by: Ananth Subramaniam <[email protected]>

* remove gemma2b fused attn env vars

Signed-off-by: Ananth Subramaniam <[email protected]>

* local for testing

Signed-off-by: Ananth Subramaniam <[email protected]>

* update conftest to reset environment variables, use unfused for L2_Megatron_GPT_PEFT_Lora_TP2SP1

Signed-off-by: Ananth Subramaniam <[email protected]>

---------

Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: ananthsub <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: oliver könig <[email protected]>

* Some fixes to bump mcore (#11600)

* chore(beep boop 🤖): Bump `MCORE_TAG=71c394b...` (2024-12-15)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* ci: Add `no-fail-fast` mode

Signed-off-by: Oliver Koenig <[email protected]>

* fix _get_layer_offset api for mllama

Signed-off-by: yaoyu-33 <[email protected]>

* bump

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

* Use empty dict instead of none to load only metadata from dist ckpt

due to change in mcore commit NVIDIA/Megatron-LM@31e8bfa

Signed-off-by: Chen Cui <[email protected]>

* remove mcore-inserted env vars

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Add raising=False for delenv

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: ananthsub <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Ananth Subramaniam <[email protected]>
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Abhishree <[email protected]>
Signed-off-by: Abhinav Garg <[email protected]>
youngeunkwon0405 pushed a commit to youngeunkwon0405/NeMo that referenced this pull request Feb 10, 2025
* chore(beep boop 🤖): Bump `MCORE_TAG=076972e...` (2025-01-03)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Support attention backend configuration changes (NVIDIA#11517)

* remove nvte attention flags from test_nemo_resume_from_ckpt

Signed-off-by: Ananth Subramaniam <[email protected]>

* cherry pick 3410df6

Signed-off-by: Ananth Subramaniam <[email protected]>

* set local attention in config

Signed-off-by: Ananth Subramaniam <[email protected]>

* retro config attention backend setting

Signed-off-by: Ananth Subramaniam <[email protected]>

* set both

Signed-off-by: Ananth Subramaniam <[email protected]>

* update unfused

Signed-off-by: Ananth Subramaniam <[email protected]>

* gemma2b changes too

Signed-off-by: Ananth Subramaniam <[email protected]>

* replace more usages

Signed-off-by: Ananth Subramaniam <[email protected]>

* more test updates

Signed-off-by: Ananth Subramaniam <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ananthsub <[email protected]>

* update unfused

Signed-off-by: Ananth Subramaniam <[email protected]>

* remove duplicate gemma setting

Signed-off-by: Ananth Subramaniam <[email protected]>

* remove gemma2b fused attn env vars

Signed-off-by: Ananth Subramaniam <[email protected]>

* local for testing

Signed-off-by: Ananth Subramaniam <[email protected]>

* update conftest to reset environment variables, use unfused for L2_Megatron_GPT_PEFT_Lora_TP2SP1

Signed-off-by: Ananth Subramaniam <[email protected]>

---------

Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: ananthsub <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: oliver könig <[email protected]>

* Some fixes to bump mcore (NVIDIA#11600)

* chore(beep boop 🤖): Bump `MCORE_TAG=71c394b...` (2024-12-15)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* ci: Add `no-fail-fast` mode

Signed-off-by: Oliver Koenig <[email protected]>

* fix _get_layer_offset api for mllama

Signed-off-by: yaoyu-33 <[email protected]>

* bump

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

* Use empty dict instead of none to load only metadata from dist ckpt

due to change in mcore commit NVIDIA/Megatron-LM@31e8bfa

Signed-off-by: Chen Cui <[email protected]>

* remove mcore-inserted env vars

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* Add raising=False for delenv

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: ananthsub <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pablo-garay <[email protected]>
Co-authored-by: Ananth Subramaniam <[email protected]>
Co-authored-by: ananthsub <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Abhishree <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants