adjust schedule to improve TTFT in pytorch engine #2477

grimoire · 2024-09-18T10:15:57Z

prefill enough requests before decoding.

lvhan028 · 2024-09-19T10:43:12Z

lmdeploy/pytorch/engine/engine.py

@@ -636,7 +631,11 @@ async def __long_context_single_forward(inputs):
                ret['logits'] = ret['logits'][:, last_token_loc]
            return ret
        else:
-            return await __long_context_single_forward(inputs)
+            ret = await __long_context_single_forward(inputs)
+            if not return_logits and not inputs.is_decoding:


what benefits can we get from this change?

lmdeploy/lmdeploy/pytorch/engine/engine.py

Lines 518 to 525 in 9cb2583

def __get_last_logits():

"""get last logits."""

seq_length = inputs.seq_length

if len(seq_length) == logits.size(0):

return logits

last_idx = seq_length.cumsum(-1) - 1

return logits[last_idx, :]

AllentDan · 2024-09-19T10:49:57Z

lmdeploy/pytorch/engine/engine.py

-            return await __long_context_single_forward(inputs)
+            ret = await __long_context_single_forward(inputs)
+            if not return_logits and not inputs.is_decoding:
+                last_token_loc = [-1]


Does it influence the following computation?

__long_context_single_forward is used for very long prefill. Only last output is required.

AllentDan · 2024-09-19T10:50:50Z

lmdeploy/pytorch/engine/engine.py

                loop_count = 1 if is_prefill else (prefill_interval - 1)
-                if len(running) == 0:
-                    raise NoRunningSeqs()
+                assert len(running) > 0


how do we make sure it is True?

lmdeploy/lmdeploy/pytorch/engine/engine.py

Lines 934 to 936 in 9cb2583

if not self.scheduler.has_unfinished():

await asyncio.sleep(0.01)

continue

Empty requests would be skipped.

adjust schedule

9cb2583

lvhan028 added the improvement label Sep 19, 2024

lvhan028 requested a review from AllentDan September 19, 2024 08:55

lvhan028 reviewed Sep 19, 2024

View reviewed changes

AllentDan reviewed Sep 19, 2024

View reviewed changes

AllentDan approved these changes Sep 20, 2024

View reviewed changes

lvhan028 approved these changes Sep 20, 2024

View reviewed changes

lvhan028 merged commit 82d0c00 into InternLM:main Sep 20, 2024
5 checks passed

lvhan028 changed the title ~~Pytorch Engine reduce TTFT~~ adjust schedule to improve TTFT in pytorch engine Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjust schedule to improve TTFT in pytorch engine #2477

adjust schedule to improve TTFT in pytorch engine #2477

grimoire commented Sep 18, 2024

lvhan028 Sep 19, 2024

grimoire Sep 19, 2024

AllentDan Sep 19, 2024

grimoire Sep 19, 2024

AllentDan Sep 19, 2024

grimoire Sep 19, 2024

	def __get_last_logits():
	"""get last logits."""
	seq_length = inputs.seq_length
	if len(seq_length) == logits.size(0):
	return logits

	last_idx = seq_length.cumsum(-1) - 1
	return logits[last_idx, :]

	if not self.scheduler.has_unfinished():
	await asyncio.sleep(0.01)
	continue

adjust schedule to improve TTFT in pytorch engine #2477

adjust schedule to improve TTFT in pytorch engine #2477

Conversation

grimoire commented Sep 18, 2024

lvhan028 Sep 19, 2024

Choose a reason for hiding this comment

grimoire Sep 19, 2024

Choose a reason for hiding this comment

AllentDan Sep 19, 2024

Choose a reason for hiding this comment

grimoire Sep 19, 2024

Choose a reason for hiding this comment

AllentDan Sep 19, 2024

Choose a reason for hiding this comment

grimoire Sep 19, 2024

Choose a reason for hiding this comment