[AMD/NPU] Pin transformers to stable 4.57.1 by sunxxuns · Pull Request #14722 · sgl-project/sglang

sunxxuns · 2025-12-09T09:28:46Z

Summary

Fix test_hellaswag_select and test_select failures on AMD/ROCm in stage-a-test-1-amd.

Root Cause

transformers==5.0.0rc0 has a regression on AMD where input_token_logprobs returns only 1 aggregated entry with None values instead of per-token logprobs. This breaks the select functionality.

Solution

Change pyproject_other.toml (used by AMD and NPU) to use the stable 4.57.1 version instead of the release candidate 5.0.0rc0.

Testing

Tested locally on MI300x:

transformers 4.57.1: works correctly (3 logprob entries per choice)
transformers 5.0.0rc0: broken (1 entry with None values)

Test plan

stage-a-test-1-amd should pass
Existing tests on CUDA should still pass (uses main pyproject.toml with 5.0.0rc0)

gemini-code-assist · 2025-12-09T09:28:57Z

Summary of Changes

Hello @sunxxuns, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the stability of logprob computations within the runtime_endpoint.py module. It specifically targets and resolves a ZeroDivisionError that could arise during the normalization of prompt logprobs, particularly in scenarios involving token healing where the number of tokens might lead to division by zero. The changes introduce conditional checks to safeguard against such errors and ensure that calculations are performed only when valid data is present, ultimately improving the robustness of the system and fixing a reported test failure.

Highlights

ZeroDivisionError Fix: Resolved a ZeroDivisionError that could occur in the compute_normalized_prompt_logprobs function when len(input_token_logprobs[i]) - 1 evaluated to zero.
Robustness Improvement: Added checks to ensure logprob calculations only proceed if input_token_logprobs is not empty and contains valid entries, preventing potential errors from missing data.
Test Failure Resolution: This fix addresses and resolves the stage-a-test-1-amd failure.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a ZeroDivisionError that occurs when recalculating normalized prompt log probabilities after removing a healed token. The fix correctly introduces a check to ensure the number of tokens is greater than one before division. Additionally, a new check is added to gracefully handle cases where logprobs may be empty, preventing potential IndexError exceptions. While the changes are good, I've identified a logic issue where the normalized log probability is not updated when the last remaining token is removed, which could lead to incorrect selections. I've provided a suggestion to address this.

python/sglang/lang/backend/runtime_endpoint.py

transformers 5.0.0rc0 has a regression on AMD where input_token_logprobs returns only 1 aggregated entry with None values instead of per-token logprobs. This breaks the select functionality (test_select and test_hellaswag_select). Change pyproject_other.toml (used by AMD and NPU) to use the stable 4.57.1 version instead of the release candidate 5.0.0rc0. Tested locally on MI300x: - transformers 4.57.1: works correctly (3 logprob entries per choice) - transformers 5.0.0rc0: broken (1 entry with None values)

HaiShaw · 2025-12-10T20:08:57Z

#14801 merged

sunxxuns force-pushed the fix-amd-select-logprobs branch from 8341fbe to cb6c96a Compare December 9, 2025 09:29

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

python/sglang/lang/backend/runtime_endpoint.py Outdated Show resolved Hide resolved

sunxxuns force-pushed the fix-amd-select-logprobs branch 6 times, most recently from b6556dc to 4cc4db3 Compare December 9, 2025 09:40

sunxxuns changed the title ~~[AMD] Fix ZeroDivisionError in compute_normalized_prompt_logprobs~~ Fix ZeroDivisionError in compute_normalized_prompt_logprobs Dec 9, 2025

sunxxuns added the run-ci label Dec 9, 2025

sunxxuns force-pushed the fix-amd-select-logprobs branch 2 times, most recently from 50a2d82 to f23d2a8 Compare December 9, 2025 09:48

sunxxuns changed the title ~~Fix ZeroDivisionError in compute_normalized_prompt_logprobs~~ [AMD] Fix ZeroDivisionError in compute_normalized_prompt_logprobs Dec 9, 2025

sunxxuns mentioned this pull request Dec 9, 2025

[AMD] Fix ZeroDivisionError in compute_normalized_prompt_logprobs #14719

Closed

sunxxuns force-pushed the fix-amd-select-logprobs branch 5 times, most recently from c43d687 to 58a6e9a Compare December 9, 2025 10:23

sunxxuns changed the title ~~[AMD] Fix ZeroDivisionError in compute_normalized_prompt_logprobs~~ [CI test] amd Dec 9, 2025

sunxxuns force-pushed the fix-amd-select-logprobs branch 2 times, most recently from 9683762 to b612dc6 Compare December 10, 2025 06:13

github-actions bot added the amd label Dec 10, 2025

sunxxuns force-pushed the fix-amd-select-logprobs branch from b612dc6 to b2dc76e Compare December 10, 2025 06:16

sunxxuns changed the title ~~[CI test] amd~~ [AMD/NPU] Pin transformers to stable 4.57.1 Dec 10, 2025

github-actions bot added the dependencies Pull requests that update a dependency file label Dec 10, 2025

HaiShaw closed this Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD/NPU] Pin transformers to stable 4.57.1#14722

[AMD/NPU] Pin transformers to stable 4.57.1#14722
sunxxuns wants to merge 1 commit intosgl-project:mainfrom
sunxxuns:fix-amd-select-logprobs

sunxxuns commented Dec 9, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

HaiShaw commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunxxuns commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Solution

Testing

Test plan

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

HaiShaw commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sunxxuns commented Dec 9, 2025 •

edited

Loading