Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Slurm Launcher #59

Merged
merged 4 commits into from
Jan 19, 2024
Merged

Update Slurm Launcher #59

merged 4 commits into from
Jan 19, 2024

Conversation

haileyschoelkopf
Copy link

This PR changes the the SLURM launcher in two ways:

  1. Due to what looks like a unintended consequence of a merge commit, self.args.slurm_comment was still referenced, and the launcher would include --comment even when self.args.comment was an empty string.
  2. We now want to be using --account rather than --comment for assigning accounting groups to a given job. (if this is too specific to the Stability cluster, happy to find some way to allow users to toggle between using --account and --comment

I've left the comment argument in GPT-NeoX with its same name, but can update it to something like slurm_account if that is preferred.

cc @Quentin-Anthony

@Quentin-Anthony
Copy link
Member

if this is too specific to the Stability cluster
This is a bit too specific and is misleading since we define a slurm comment arg, then pass it as a slurm account:

srun_cmd += ['--account', self.args.comment]

Slurm defines both an account and a comment arg, so let's allow for either/both? Let's add a new account argument, preserve the current comment arg, and expose both in gpt-neox

@haileyschoelkopf
Copy link
Author

Makes sense to me! Will adjust.

@haileyschoelkopf
Copy link
Author

Updated and added PR to NeoX here: EleutherAI/gpt-neox#1126

@Quentin-Anthony Quentin-Anthony merged commit a1af9e7 into main Jan 19, 2024
9 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants