[AMD] Fix ZeroDivisionError in compute_normalized_prompt_logprobs#14719
[AMD] Fix ZeroDivisionError in compute_normalized_prompt_logprobs#14719sunxxuns wants to merge 1 commit intosgl-project:mainfrom
Conversation
Summary of ChangesHello @sunxxuns, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of log probability calculations within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses potential ZeroDivisionError issues in the runtime_endpoint.py file. The changes introduce robust checks to ensure that division operations are only performed when the denominator is not zero, preventing runtime crashes and improving the stability of the system. Additionally, an IndexError scenario is handled defensively. The added comments clearly explain the rationale behind the changes.
| # Skip if no logprobs available (can happen on some backends) | ||
| if not input_token_logprobs[i] or not input_token_logprobs[i][0]: | ||
| continue |
| num_tokens = len(input_token_logprobs[i]) | ||
| if num_tokens > 1: | ||
| normalized_prompt_logprobs[i] = ( | ||
| normalized_prompt_logprobs[i] * num_tokens - healed_token_logprob | ||
| ) / (num_tokens - 1) |
There was a problem hiding this comment.
| if not values: | ||
| # Return negative infinity if no valid logprobs - this choice should not be selected | ||
| return float("-inf") |
There was a problem hiding this comment.
This is a crucial fix to prevent a ZeroDivisionError when values is an empty list. Returning float("-inf") is a sensible approach for log-probabilities in such cases, ensuring that this choice is not inadvertently selected in subsequent processing steps. This directly addresses the core problem of division by zero.
ca64638 to
cc98433
Compare
On AMD/ROCm, input_token_logprobs can be empty in some cases, causing ZeroDivisionError when computing normalized prompt logprobs for the select operation. Changes: 1. Return -inf when no valid logprobs (choice should not be selected) 2. Skip token healing adjustment if no logprobs available 3. Guard against division by zero when num_tokens <= 1 This fixes test_hellaswag_select and test_select failures on AMD.
cc98433 to
8341fbe
Compare
|
Closing in favor of #14721 which has a cleaner diff (just the logprobs fix without the diffusion CI changes) |
Status Summary
PR #14721 (ZeroDivisionError fix):
PR #13760 (Diffusion AMD support):
5b4d2406: fix ciI notice the attached file changes show reverting the AMD tolerance and runai_model_streamer fixes. Did you push those changes? That would explain why multimodal tests are now failing.
Would you like me to: