-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[sgl] disable piecewise cuda graph when a model doesn't have layers #21565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2379,6 +2379,14 @@ def init_piecewise_cuda_graphs(self): | |
| # Collect attention layers and moe layers from the model | ||
| self.model.model = resolve_language_model(self.model) | ||
| language_model = getattr(self.model, "language_model", self.model) | ||
|
|
||
| # Some draft models (e.g. eagle3) don't have a standard 'layers' attribute | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you move the change to original 2371 line?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you mean Gemini's suggestion is wrong?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. the robustness problem will be fixed later
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Oasis-Git , line 2371 doesn't have language_model yet though, and we need to use that to test language_model.model. we could move the resolve language_model before enable piecewise_cuda_graph check though, but want to confirm if this matches your motivation?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh I see. sorry I made a mistake. what I hope to keep is that we should gather disable conditions for readability. Keep your first version. The disable conditions should be above the initialization of those layers.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good. updated. could you please take a look again? |
||
| if not hasattr(language_model.model, "layers"): | ||
| logger.warning( | ||
| "Disable piecewise CUDA graph because the model does not have a 'layers' attribute" | ||
| ) | ||
| return | ||
|
|
||
| self.attention_layers = [] | ||
| self.moe_layers = [] | ||
| self.moe_fusions = [] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line