Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions python/sglang/srt/model_executor/model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -2379,6 +2379,14 @@ def init_piecewise_cuda_graphs(self):
# Collect attention layers and moe layers from the model
self.model.model = resolve_language_model(self.model)
language_model = getattr(self.model, "language_model", self.model)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line

# Some draft models (e.g. eagle3) don't have a standard 'layers' attribute
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the change to original 2371 line?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean Gemini's suggestion is wrong?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. the robustness problem will be fixed later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oasis-Git , line 2371 doesn't have language_model yet though, and we need to use that to test language_model.model. we could move the resolve language_model before enable piecewise_cuda_graph check though, but want to confirm if this matches your motivation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. sorry I made a mistake. what I hope to keep is that we should gather disable conditions for readability. Keep your first version. The disable conditions should be above the initialization of those layers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. updated. could you please take a look again?

if not hasattr(language_model.model, "layers"):
logger.warning(
"Disable piecewise CUDA graph because the model does not have a 'layers' attribute"
)
return

self.attention_layers = []
self.moe_layers = []
self.moe_fusions = []
Expand Down
Loading