Fixes delayed sampling for sequential requests #845

attafosu · 2025-02-20T00:57:05Z

Previously LLM.generate() could not be called multiple times with delayed sampling enabled.
This also was the case with step() calls
Issue occurs when after the last (batch) request is finished, and we're starting a new request, but cached_step_inputs and cached_step_outputs still contain elements saved from the last served (batch) request. This shouldn't be the case.
The cleanest solution would be to skip appending to cached_step_inputs/outputs if the recently generated output is the final token generated for the current batch request. But couldn't find a cleaner way to check for this in the model runner.
So we instead check (in _patch_prev_output) for when the scheduler context has empty output_queue, which means no pending outputs to patch.

Tests here: https://github.com/habana-internal/mlperf_inference/pull/158

Cherry-pick of: #845 fixing issue in fe. static benchmarks

hans-intel and others added 4 commits February 19, 2025 03:48

fix assert error in second request to LLMEngine

52a9d62

delayed sampling new request fix

931789a

delayed sampling new request fix HabanaAI#3

59fff8a

Sanitize fix for delayed sampling error

47f56e8

attafosu requested review from madamczyk-intel, szutenberg and tianmu-li as code owners February 20, 2025 00:57

Merge branch 'mlperf_features' into delayed_sample_iter2_fix

ee0b119

attafosu requested a review from mswiniarsk February 20, 2025 16:31

szutenberg approved these changes Feb 20, 2025

View reviewed changes

tianmu-li merged commit 6eeefdd into HabanaAI:mlperf_features Feb 20, 2025
3 of 22 checks passed

kamil-kaczor mentioned this pull request Mar 5, 2025

Cherry-pick "Fixes delayed sampling for sequential requests #845" #888

Merged

kamil-kaczor added a commit that referenced this pull request Mar 5, 2025

Chery-pick "Fixes delayed sampling for sequential requests #845"

9063700

kamil-kaczor added a commit that referenced this pull request Mar 5, 2025

Cherry-pick "Fixes delayed sampling for sequential requests #845" (#888)

43b50bd

Cherry-pick of: #845 fixing issue in fe. static benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes delayed sampling for sequential requests #845

Fixes delayed sampling for sequential requests #845

Uh oh!

attafosu commented Feb 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixes delayed sampling for sequential requests #845

Fixes delayed sampling for sequential requests #845

Uh oh!

Conversation

attafosu commented Feb 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

attafosu commented Feb 20, 2025 •

edited by github-actions bot

Loading