Skip to content

Commit

Permalink
add alpha scaling to lora (#8483)
Browse files Browse the repository at this point in the history
* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
  • Loading branch information
1 parent 37d79d5 commit f655aaa
Show file tree
Hide file tree
Showing 64 changed files with 3,355 additions and 725 deletions.
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:23.12-py3
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.01-py3

# build an image that includes only the nemo dependencies, ensures that dependencies
# are included first for optimal caching, and useful for building a development
Expand Down Expand Up @@ -66,19 +66,19 @@ WORKDIR /workspace/
# We leave it here in case we need to work off of a specific commit in main
RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout 27cbe46714a50c43ed290f1b1472db8d2780c55c && \
git checkout 240a8ef7a21df201e47b5b2ae33cc5f4c5486849 && \
pip install .

# Performance optimizations for distributed optimizer: https://github.com/NVIDIA/apex/pull/1771
RUN git clone https://github.com/NVIDIA/apex.git && \
cd apex && \
git checkout b496d85fb88a801d8e680872a12822de310951fd && \
git checkout f058162b215791b15507bb542f22ccfde49c872d && \
pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./

# Transformer Engine 1.2.0
RUN git clone https://github.com/NVIDIA/TransformerEngine.git && \
cd TransformerEngine && \
git fetch origin 4f9662fbe621671f5f905e772fc1138953af77f6 && \
git fetch origin da30634a6c9ccdbb6c587b6c93b1860e4b038204 && \
git checkout FETCH_HEAD && \
git submodule init && git submodule update && \
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
Expand Down
Loading

0 comments on commit f655aaa

Please sign in to comment.