Fix starvation with async server and interleaving optimization #13

JoeZijunZhou · 2024-03-15T22:02:17Z

Make server async to handle high throughput from client (the number of requests a sync server can handle is limited by server side thread number, since each thread handles a request in blocking manner).
Add AsyncMultiFuture data structure for the retuern_channel to stream response tokens asynchronously (threadsafe ensures no token drop in async server)
Using blocking fixed size queue to block and yield threads efficiently
- blocking for get and put operations for generate and detokenize queue
- blocking for prefill queue get operation
- unblocking for prefill queue put operation (unbound queue size to enqueue all requests from client)
Optimized interleaving prefill, insert, and generate
- When decode slots are empty, yield to prefill, then insert the prefill results as many as possible to the decode slots
- When decode slots are full, block insert and prefill, keep decoding and detokenizing until some deocde completes and returns available slot.
- generate queue size set to 3/1 decode batch size
  - get decode slots saturated fast (larger generate queue size ensures it has enough requests to insert to slots)
  - avoid OOM when both decode slots and generate queue are both saturated (can't set generate queue size too large, otherwise too many prefill results enqueued in generate queue)

rwitten · 2024-03-17T18:03:17Z

This CR cannot merge. It is driving contention (probably GIL) causing stalls.
https://xprof.corp.google.com/trace_viewer/rwitten-7572422995686954905?host_index=0

rwitten

We cannot merge these changes because they drive a duty cycle migration.

vipannalla · 2024-03-25T22:20:35Z

jetstream/core/utils/async_multifuture.py

@@ -0,0 +1,89 @@
+import asyncio


Where is this module used? I can't find any references.

In orchestrator, the return_channel, https://github.com/google/JetStream/pull/13/files/09e0e659ba821e14c5595dbfb3c8ec620725d865#diff-753df909960695fd6aa61b95c4762c4edc50a298007d40de7096a7e49e0e1215R130

vipannalla · 2024-03-25T22:22:35Z

Can you please add a xprof with the new changes so we can compare against the GIL-contention xprof that Rafi provided and rule-out any contention issues?

JoeZijunZhou · 2024-03-26T01:49:19Z

Can you please add a xprof with the new changes so we can compare against the GIL-contention xprof that Rafi provided and rule-out any contention issues?

Added a xprof in https://docs.google.com/document/d/1PWje3lup0ZHjtgbcWQbK8r1nNwgKE2rHBdVLbWU2aas/edit?tab=t.0#heading=h.f39bxdhl45e2 (Since here is public, just shared in the internal doc).

vipannalla

Thanks for stacktrace, fix looks good. I had a quick question about detokenize queue

vipannalla · 2024-03-26T18:29:17Z

jetstream/core/orchestrator.py

+        # We don't let detokenization accumulate more than 8 steps to avoid
+        # synchronization issues.
+        queue.Queue(8)


How did you comeup with 8? Is that hardware dependent -- will this work for all configs (v5e-8, v5e-1 etc?)

The JET has an investigation on GIL contention issue and detokenize queue size, and figured out that this size can avoid synchronization issue. It's not dependent to hardware.

jetstream/core/orchestrator.py

vipannalla

LGTM

Issue resolved and code reviewed.

JoeZijunZhou added 4 commits March 15, 2024 21:52

Fix starvation with async server

cf00e88

keep grpc option

cf09132

rm default for return_channel

fde895e

revert rm default for return_channel

bb96d36

qihqi approved these changes Mar 15, 2024

View reviewed changes

rwitten self-requested a review March 17, 2024 18:09

rwitten previously requested changes Mar 17, 2024

View reviewed changes

Using blocking fixed size queue to block and yield threads efficiently

a4d7f32

JoeZijunZhou requested a review from vipannalla as a code owner March 22, 2024 19:10

JoeZijunZhou added 3 commits March 25, 2024 18:33

fix AsyncMultifuture

ab4277b

complete fix - optimized interleaving prefill, insert, and generate

77dd514

fix unit test and pytype

09e0e65

JoeZijunZhou requested a review from rwitten March 25, 2024 21:57

JoeZijunZhou changed the title ~~Fix starvation with async server~~ Fix starvation with async server and interleaving optimization Mar 25, 2024

vipannalla reviewed Mar 25, 2024

View reviewed changes

JoeZijunZhou requested a review from zhihaoshan-google March 25, 2024 23:03

JoeZijunZhou requested a review from vipannalla March 26, 2024 01:50

vipannalla reviewed Mar 26, 2024

View reviewed changes

zhihaoshan-google reviewed Mar 26, 2024

View reviewed changes

jetstream/core/orchestrator.py Show resolved Hide resolved

add TODO

a9a9a9d

JoeZijunZhou requested review from zhihaoshan-google and vipannalla March 26, 2024 21:55

vipannalla approved these changes Mar 26, 2024

View reviewed changes

zhihaoshan-google approved these changes Mar 27, 2024

View reviewed changes

JoeZijunZhou merged commit 64b1e50 into main Mar 27, 2024
3 checks passed

JoeZijunZhou deleted the zijun/fix-starvation branch March 27, 2024 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix starvation with async server and interleaving optimization #13

Fix starvation with async server and interleaving optimization #13

JoeZijunZhou commented Mar 15, 2024 •

edited

Loading

rwitten commented Mar 17, 2024

rwitten left a comment

vipannalla Mar 25, 2024

JoeZijunZhou Mar 25, 2024

vipannalla commented Mar 25, 2024

JoeZijunZhou commented Mar 26, 2024

vipannalla left a comment

vipannalla Mar 26, 2024

JoeZijunZhou Mar 26, 2024

vipannalla left a comment

Fix starvation with async server and interleaving optimization #13

Fix starvation with async server and interleaving optimization #13

Conversation

JoeZijunZhou commented Mar 15, 2024 • edited Loading

rwitten commented Mar 17, 2024

rwitten left a comment

Choose a reason for hiding this comment

vipannalla Mar 25, 2024

Choose a reason for hiding this comment

JoeZijunZhou Mar 25, 2024

Choose a reason for hiding this comment

vipannalla commented Mar 25, 2024

JoeZijunZhou commented Mar 26, 2024

vipannalla left a comment

Choose a reason for hiding this comment

vipannalla Mar 26, 2024

Choose a reason for hiding this comment

JoeZijunZhou Mar 26, 2024

Choose a reason for hiding this comment

vipannalla left a comment

Choose a reason for hiding this comment

JoeZijunZhou commented Mar 15, 2024 •

edited

Loading