-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[Bugfix] Fix Whisper/encoder-decoder GPU memory leak #32789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -357,7 +357,8 @@ class EncoderDecoderCacheManager(EncoderCacheManager): | |
| def __init__(self, cache_size: int): | ||
| self.cache_size = cache_size | ||
| self.num_free_slots = cache_size | ||
| self.freed: list[str] = [] | ||
| self.allocated: list[str] = [] | ||
| self.to_free: list[str] = [] | ||
|
Comment on lines
+360
to
+361
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| def check_and_update_cache(self, request: Request, input_id: int) -> bool: | ||
| return False | ||
|
|
@@ -383,7 +384,7 @@ def allocate(self, request: Request, input_id: int) -> None: | |
| self.num_free_slots -= num_encoder_embeds | ||
|
|
||
| mm_hash = request.mm_features[input_id].identifier | ||
| self.freed.append(mm_hash) | ||
| self.allocated.append(mm_hash) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| def free(self, request: Request) -> None: | ||
| for input_id in range(len(request.mm_features)): | ||
|
|
@@ -393,9 +394,14 @@ def get_cached_input_ids(self, request: Request) -> set[int]: | |
| return set(range(len(request.mm_features))) | ||
|
|
||
| def get_freed_mm_hashes(self) -> list[str]: | ||
| freed = self.freed | ||
| self.freed = [] | ||
| return freed | ||
| # As encoder cache is not used for enc-dec models, we can free the entries here | ||
| # The actual free happens in the runner, *before* the model is executed. | ||
| # Therefore, `freeable` acts as a buffer to free the entries only after the | ||
| # model is executed, mimicking the state transition of `EncoderCacheManager`. | ||
| to_free = self.to_free | ||
| self.to_free = self.allocated | ||
| self.allocated = [] | ||
|
Comment on lines
+397
to
+403
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This new logic for |
||
| return to_free | ||
|
|
||
| def free_encoder_input(self, request: Request, input_id: int) -> None: | ||
| num_encoder_embeds = request.get_num_encoder_embeds(input_id) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of
test_encoder_cache_cleanupis an excellent and necessary regression test. It directly targets the memory leak issue by verifying that the encoder cache is empty after multiple sequential requests, providing strong assurance that the fix is effective and robust.