[wip] Support Transformers v5.x#14584
Conversation
|
/tag-and-rerun-ci |
Summary of ChangesHello @byjiang1996, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the project's dependency stack to support the upcoming Transformers v5.x, which is crucial for integrating with newer models like GLM 4.6v. The changes involve upgrading the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the project to support transformers v5.x by bumping the version to 5.0.0rc0 and adding flash_attn as a dependency. The changes also include necessary adaptations in the codebase to accommodate breaking changes in the new transformers version, such as the removal of rope_config_validation. Additionally, the build process for sgl-kernel is modified to ensure flash_attn is built from source to include support for fa4.
My feedback focuses on improving dependency management for better reproducibility. While using a release candidate for transformers is a conscious choice for this upgrade, it's important to be aware of potential stability risks. The changes to the build system are well-justified workarounds.
python/pyproject.toml
Outdated
| "fastapi", | ||
| "flashinfer_python==0.5.3", # keep it aligned with jit-cache version in Dockerfile | ||
| "flashinfer_cubin==0.5.3", | ||
| "flash_attn", # required by 5.x transformer |
There was a problem hiding this comment.
It's a good practice to pin dependency versions to ensure reproducible builds. The flash_attn dependency is added without a version specifier, which could lead to unexpected issues if a new, incompatible version is released. Please consider pinning it to a specific version (e.g., flash_attn=="x.y.z") or a minimum compatible version (e.g., flash_attn>="x.y.z").
30df140 to
05d4f4f
Compare
d80f20b to
e8066bd
Compare
c32f2c3 to
c3a3bad
Compare
7246348 to
86b2fea
Compare
Motivation
From #14356, it is required for supporting glm 4.6v
Co-authored-by: @yhyang201
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist