Commit 0d003f8
authored
[KVCache] Add implicit KVCache reuse, disable stateful option for chatCompletion (#359)
We introduced the field `stateful` in `chatCompletion()` earlier to
allow easier multi-round chatting in
#330.
However, this is not ideal since we would prefer APIs that are
functional in behavior, giving us various benefits (e.g. better fault
tolerance for future use cases).
Therefore, in this PR:
- We disable `chatCompletionRequest.stateful`, and ask users to maintain
the chat history explicitly
- Instead, we introduce implicit KVCache reuse for multi-round chatting
- When we detect users are doing multi-round chatting, we will not reset
the KV cache, so only the new message will be prefilled
- To detect multi-round chatting, we instantiate a `Conversation`
instance for each request, and compare it with the current internal
`Conversation`. If they match, it means that we can safely not reset the
internal state, and only prefill the new input.
To see the behavior, check out `mainMultiroundChat()` in
`examples/openai-api/src/openai_api.ts`.
Implementation details:
- Instantiate `Conversation` object in `ChatModule.prefill()`, since
this is the place where various workflows meet (streaming,
non-streaming, n > 1, etc.)
- The object's state is determined by system prompt, message history,
and function calling usages
- Inside `prefill()`, we then compare the two objects with
`compareConversationObject()`, reset all internal states if false
- Another detail is that, instead of overriding
`conversation.config.system_message`, we add a field
`conversation.override_system_message`, making `conversation.config`
protected
- We further remove all methods in `ChatModule` that overrides
`this.getPipeline().conversation` by changing
`updateConversationWithChatCompletionMessages()` to
`getConversationFromChatCompletionRequest()`, keeping things more
functional internally1 parent 0ca0c58 commit 0d003f8
File tree
11 files changed
+436
-216
lines changed- examples
- openai-api/src
- web-worker/src
- src
- openai_api_protocols
- tests
11 files changed
+436
-216
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
98 | 100 | | |
99 | | - | |
| 101 | + | |
100 | 102 | | |
101 | | - | |
102 | 103 | | |
103 | 104 | | |
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
108 | 119 | | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
| 120 | + | |
| 121 | + | |
119 | 122 | | |
120 | 123 | | |
121 | 124 | | |
| 125 | + | |
122 | 126 | | |
123 | | - | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
124 | 137 | | |
125 | 138 | | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
| 139 | + | |
| 140 | + | |
131 | 141 | | |
132 | 142 | | |
133 | 143 | | |
| 144 | + | |
134 | 145 | | |
135 | | - | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
136 | 157 | | |
137 | 158 | | |
138 | 159 | | |
| |||
195 | 216 | | |
196 | 217 | | |
197 | 218 | | |
198 | | - | |
| 219 | + | |
| 220 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
66 | 65 | | |
67 | 66 | | |
68 | 67 | | |
| |||
102 | 101 | | |
103 | 102 | | |
104 | 103 | | |
105 | | - | |
106 | 104 | | |
107 | 105 | | |
108 | 106 | | |
| |||
0 commit comments