Skip to content

Conversation

@madamczyk-intel
Copy link

No description provided.

Copy link

@michalkuligowski michalkuligowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to plan dividing execute_model into smaller specialized methods in the future (it has more than 300 lines :D)

@kdamaszk kdamaszk merged commit 03df014 into habana_main Mar 7, 2025
34 checks passed
@kdamaszk kdamaszk deleted the dev/madamczyk/delayed_sampling branch March 7, 2025 07:20
@kamil-kaczor kamil-kaczor mentioned this pull request Mar 10, 2025
yangulei pushed a commit to yangulei/vllm-fork that referenced this pull request Mar 11, 2025
ranzhejiang pushed a commit to ranzhejiang/vllm-fork that referenced this pull request Apr 8, 2025
ranzhejiang pushed a commit to ranzhejiang/vllm-fork that referenced this pull request Apr 8, 2025
michalkuligowski pushed a commit that referenced this pull request Apr 8, 2025
This restores callback calling order from before #849 was merged . This
reduces host overhead between decodes when neither MSS nor delayed
sampling are enabled.
madamczyk-intel added a commit that referenced this pull request Apr 8, 2025
This restores callback calling order from before #849 was merged . This
reduces host overhead between decodes when neither MSS nor delayed
sampling are enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants