Skip to content

Conversation

@saforem2
Copy link
Member

Copilot Summary

This pull request updates the ALCF/README.md file to reflect changes in the main training script used for distributed training on ALCF systems. The previous script train_llama_alcf.sh has been replaced with train_aGPT_7B.sh throughout the documentation.

Key changes include:

  • Updated the main entry point for launching distributed training from train_llama_alcf.sh to train_aGPT_7B.sh.
  • Changed references to the script in the instructions for submitting jobs and setting up the environment. [1] [2] [3]
  • Modified example commands and code snippets to use train_aGPT_7B.sh instead of train_llama_alcf.sh. [1] [2] [3] [4] [5] [6]

@saforem2 saforem2 merged commit 3af7eb4 into main Jan 27, 2025
1 check passed
@saforem2 saforem2 deleted the saforem2-patch-2 branch January 27, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant