-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GQA into Mitchich65 #443
GQA into Mitchich65 #443
Conversation
…down what you fucking mean?
Updates to 70B config and checkpointing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot approve this because I created the OR. I am OK with this, with two concerns in comments.
@@ -1047,6 +1049,11 @@ class TrainConfig(BaseConfig): | |||
The activation checkpointing strategy to use. | |||
""" | |||
|
|||
fused_loss: Optional[bool] = None | |||
""" | |||
Whether to use the fused CE loss function from `flash-attn`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to False
.
olmo/eval/evaluator.py
Outdated
out[f"eval/ppl/{label}/CrossEntropyLoss"] = loss.item() | ||
out[f"eval/ppl/{label}/Perplexity"] = torch.exp(loss).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make comparisons impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yea could be difficult. It just annoys me these aren't grouped together in W&B. Reverted: 856860d
try: | ||
mp.set_start_method("spawn", force=True) | ||
except RuntimeError as e: | ||
print(f"failed to set multiprocessing start method: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know this could fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I saw this happen a couple times, not sure why exactly.
TODO