Port delayed sampling to habana_main #849

madamczyk-intel · 2025-02-20T12:45:00Z

No description provided.

Cherry-pick of: #834 which should fix TP>1 issues. --------- Co-authored-by: Tianmu Li <[email protected]>

Cherry-pick of: #845 fixing issue in fe. static benchmarks

michalkuligowski

We need to plan dividing execute_model into smaller specialized methods in the future (it has more than 300 lines :D)

Co-authored-by: Michal Adamczyk <[email protected]>

This restores callback calling order from before #849 was merged . This reduces host overhead between decodes when neither MSS nor delayed sampling are enabled.

madamczyk-intel and others added 4 commits February 20, 2025 14:42

Enable delayed sampling

8915b8e

Cherry-pick of: "Delayed sampling tp fix #834" (#885)

b152335

Cherry-pick of: #834 which should fix TP>1 issues. --------- Co-authored-by: Tianmu Li <[email protected]>

Cherry-pick "Fixes delayed sampling for sequential requests #845" (#888)

43b50bd

Cherry-pick of: #845 fixing issue in fe. static benchmarks

Merge branch 'habana_main' into dev/madamczyk/delayed_sampling

d90a0c4

madamczyk-intel marked this pull request as ready for review March 5, 2025 12:51

madamczyk-intel requested review from afierka-intel, kzawora-intel, mgawarkiewicz, michalkuligowski and vivekgoe as code owners March 5, 2025 12:51

michalkuligowski approved these changes Mar 6, 2025

View reviewed changes

kdamaszk merged commit 03df014 into habana_main Mar 7, 2025
34 checks passed

kdamaszk deleted the dev/madamczyk/delayed_sampling branch March 7, 2025 07:20

kamil-kaczor mentioned this pull request Mar 10, 2025

Delayed sampling #720

Closed

yangulei pushed a commit to yangulei/vllm-fork that referenced this pull request Mar 11, 2025

Port delayed sampling to habana_main (HabanaAI#849)

3ab3c17

ranzhejiang pushed a commit to ranzhejiang/vllm-fork that referenced this pull request Apr 8, 2025

Port delayed sampling to habana_main (HabanaAI#849) (HabanaAI#113)

cd84897

Co-authored-by: Michal Adamczyk <[email protected]>

ranzhejiang pushed a commit to ranzhejiang/vllm-fork that referenced this pull request Apr 8, 2025

Port delayed sampling to habana_main (HabanaAI#849) (HabanaAI#113)

8521d57

Co-authored-by: Michal Adamczyk <[email protected]>

madamczyk-intel mentioned this pull request Apr 8, 2025

Fix async callback ordering #1023

Merged

michalkuligowski pushed a commit that referenced this pull request Apr 8, 2025

Fix async callback ordering (#1023)

85c985e

This restores callback calling order from before #849 was merged . This reduces host overhead between decodes when neither MSS nor delayed sampling are enabled.

madamczyk-intel added a commit that referenced this pull request Apr 8, 2025

Fix async callback ordering (#1023)

cbfcf9d

This restores callback calling order from before #849 was merged . This reduces host overhead between decodes when neither MSS nor delayed sampling are enabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port delayed sampling to habana_main #849

Port delayed sampling to habana_main #849

Uh oh!

madamczyk-intel commented Feb 20, 2025

Uh oh!

michalkuligowski left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Port delayed sampling to habana_main #849

Port delayed sampling to habana_main #849

Uh oh!

Conversation

madamczyk-intel commented Feb 20, 2025

Uh oh!

michalkuligowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants