Skip to content

Commit 9a4f606

Browse files
authored
[https://nvbugs/5480289][fix] release slot manager in mtp MTPHiddenStatesManager (#7340)
Signed-off-by: Yue Weng <[email protected]>
1 parent 4223a9a commit 9a4f606

File tree

3 files changed

+9
-1
lines changed

3 files changed

+9
-1
lines changed

tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,7 @@ def clear(self):
304304
self.static_inputs.clear()
305305
self.graph_outputs.clear()
306306
self.graph_metadata.clear()
307+
self.padding_dummy_request = None
307308
del self.memory_pool
308309
self.memory_pool = None
309310
torch.cuda.empty_cache()

tensorrt_llm/_torch/pyexecutor/resource_manager.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1042,6 +1042,13 @@ def remove_slot(self, request_id: int):
10421042
slot = self.slot_mapping.pop(request_id)
10431043
self.free_slots.add(slot)
10441044

1045+
def shutdown(self):
1046+
req_ids_list = list(self.slot_mapping.keys())
1047+
for rid in req_ids_list:
1048+
self.remove_slot(rid)
1049+
assert len(self.slot_mapping) == 0 and len(
1050+
self.free_slots) == self.max_num_requests
1051+
10451052

10461053
class ResourceManager:
10471054

tensorrt_llm/_torch/speculative/mtp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ def add_dummy_requests(self, request_ids: List[int]):
8585
self.slot_manager.add_slot(rid)
8686

8787
def shutdown(self):
88-
pass
88+
self.slot_manager.shutdown()
8989

9090
def get_max_resource_count(self) -> int:
9191
return self.max_num_requests

0 commit comments

Comments
 (0)