[Refactor] GLM-ASR Modeling by JaredforReal · Pull Request #31779 · vllm-project/vllm

JaredforReal · 2026-01-06T06:26:21Z

🚀 Key Improvements

1. Native vLLM Audio Encoder Implementation (glmasr.py)

Completely rewrote GlmAsrEncoder as a vLLM-native implementation with full optimization support:
- QKVParallelLinear: Fused Q/K/V projections for efficient attention computation
- ColumnParallelLinear/RowParallelLinear: Tensor parallelism support for distributed inference
- Quantization support: Compatible with vLLM's quantization framework
- Flash Attention (SDPA): Leverages PyTorch's scaled_dot_product_attention for optimized attention
Implemented GlmAsrRotaryEmbedding with pre-computed cos/sin cache for faster RoPE computation
Built GlmAsrAttention, GlmAsrMLP, and GlmAsrEncoderLayer with optimized layer norms and residual connections
Proper grouped query attention (GQA) handling with k/v repetition

2. Cleaner Architecture: Direct BaseMultiModalProcessor Inheritance (glmasr.py)

Refactored GlmAsrMultiModalProcessor to inherit directly from BaseMultiModalProcessor
Refactored GlmAsrProcessingInfo to inherit directly from BaseProcessingInfo
Refactored GlmAsrMultiModalDataParse to inherit directly from BaseMultiModalDataParse
Removed dependency on AudioFlamingo3 for cleaner, more maintainable code
Streamlined processing pipeline with better performance and reduced complexity
Maintained full compatibility with existing API and functionality

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: JaredforReal <w13431838023@gmail.com>

gemini-code-assist

Code Review

This is a great refactoring that replaces the GlmAsrEncoder with a native vLLM implementation, improving performance and maintainability by removing dependencies on AudioFlamingo3. The new implementation is well-structured and leverages vLLM's optimizations. I found one critical issue related to handling input sequences longer than the configured maximum, which could lead to a runtime error. I've provided a suggestion to fix it. Overall, this is a solid contribution.

vllm/model_executor/models/glmasr.py

vllm/model_executor/models/glmasr_utils.py

vllm/model_executor/models/glmasr.py

Signed-off-by: JaredforReal <w13431838023@gmail.com>

Copilot

Pull request overview

This pull request refactors the GLM-ASR model implementation to use a native vLLM encoder with comprehensive performance optimizations including tensor parallelism, quantization support, and Flash Attention. The refactoring also simplifies the architecture by removing dependencies on AudioFlamingo3 base classes and instead directly inheriting from vLLM's base multimodal processor classes.

Key Changes

Native vLLM encoder: Complete rewrite of GlmAsrEncoder with optimized components (GlmAsrRotaryEmbedding, GlmAsrAttention, GlmAsrMLP, GlmAsrEncoderLayer) using QKVParallelLinear for fused projections, tensor parallelism support, and Flash Attention
Direct base class inheritance: Refactored GlmAsrProcessingInfo, GlmAsrMultiModalProcessor, and GlmAsrMultiModalDataParser to inherit from BaseProcessingInfo, BaseMultiModalProcessor, and MultiModalDataParser respectively, removing AudioFlamingo3 dependencies
Enhanced utility functions: Added rotary embedding helpers (_rotate_half, _apply_rotary_pos_emb, _repeat_kv) and improved audio length calculation logic in glmasr_utils.py

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File	Description
`vllm/model_executor/models/glmasr_utils.py`	Added RoPE and GQA utility functions; improved audio output length calculation with better documentation and refactored logic
`vllm/model_executor/models/glmasr.py`	Complete native encoder implementation with optimized attention/MLP layers; refactored processor classes to remove AudioFlamingo3 dependency; inlined and improved audio processing logic in `_call_hf_processor`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm/model_executor/models/glmasr_utils.py

vllm/model_executor/models/glmasr.py

…hf_processor Signed-off-by: JaredforReal <w13431838023@gmail.com>

Signed-off-by: JaredforReal <w13431838023@gmail.com>

JaredforReal · 2026-01-06T11:40:52Z

@DarkLight1337 PTAL, thanks!

vllm/model_executor/models/glmasr.py

Signed-off-by: JaredforReal <w13431838023@gmail.com>

vllm/model_executor/models/glmasr.py

…mb_cos Signed-off-by: JaredforReal <w13431838023@gmail.com>

vllm/model_executor/models/glmasr.py

Signed-off-by: JaredforReal <w13431838023@gmail.com>

vllm/model_executor/models/glmasr.py

Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337 · 2026-01-07T03:44:57Z

/gemini review

gemini-code-assist

Code Review

This pull request is a significant refactoring of the GLM-ASR modeling implementation. It successfully replaces the HuggingFace GlmAsrEncoder with a vLLM-native version, which should improve performance and maintainability. The code is well-structured and leverages vLLM's optimized components effectively. I've found one minor correctness issue in a helper function where the implementation doesn't match its documentation, which I've flagged for correction to prevent potential future bugs.

vllm/model_executor/models/glmasr.py

Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337

LGTM now. @Isotr0py can you check as well?

Isotr0py

LGTM now

JaredforReal · 2026-01-07T04:46:38Z

@DarkLight1337 @Isotr0py Thanks

JaredforReal · 2026-01-07T06:24:25Z

@DarkLight1337 @Isotr0py Sorry guys, I need one more change to pass the device to transformers.whisper.feature_extractor(), for better performance
PTAL

vllm/model_executor/models/glmasr.py

JaredforReal · 2026-01-07T09:57:13Z

@DarkLight1337 @Isotr0py Back to the version you guys approved. Let's land it for now, Thanks!

Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

JaredforReal added 8 commits January 6, 2026 13:28

perf glmasr

526b890

Signed-off-by: JaredforReal <w13431838023@gmail.com>

fix glmasr_utils

fb1048e

Signed-off-by: JaredforReal <w13431838023@gmail.com>

use hf_feature_extractor.mel_filter to ensure accuracy

d3fb8d4

Signed-off-by: JaredforReal <w13431838023@gmail.com>

fix shape problem

15509b8

Signed-off-by: JaredforReal <w13431838023@gmail.com>

perf

b3e5e51

Signed-off-by: JaredforReal <w13431838023@gmail.com>

move GlmAsrEncoder from utils to glmasr

58ad75d

Signed-off-by: JaredforReal <w13431838023@gmail.com>

get rid of audioflamingo3 dependency

c3a81de

Signed-off-by: JaredforReal <w13431838023@gmail.com>

get rid of self-implemented GPUWhisperFeatureExtractor

3dd5217

Signed-off-by: JaredforReal <w13431838023@gmail.com>

Copilot AI review requested due to automatic review settings January 6, 2026 06:26

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr_utils.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Show resolved Hide resolved

remove logger and GPU-related comment

e90a9fb

Signed-off-by: JaredforReal <w13431838023@gmail.com>

Copilot started reviewing on behalf of JaredforReal January 6, 2026 07:08 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

JaredforReal added 7 commits January 6, 2026 15:19

go back to original implmentation of GlmAsrMultiModalProcessor._call_…

063b4b5

…hf_processor Signed-off-by: JaredforReal <w13431838023@gmail.com>

handle sampling_rate

d1ea079

Signed-off-by: JaredforReal <w13431838023@gmail.com>

delete logger in utils

ae68aad

Signed-off-by: JaredforReal <w13431838023@gmail.com>

try use vllm.applyrotaryemb

a73cf85

Signed-off-by: JaredforReal <w13431838023@gmail.com>

fix ApplyRotaryEmb import error

4c2b5a6

Signed-off-by: JaredforReal <w13431838023@gmail.com>

fix

d5d930e

Signed-off-by: JaredforReal <w13431838023@gmail.com>

fix ci error

4eeb067

Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

rewrite RotaryEmbedding and add some docstring for readability

2a6b538

Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

remove unnecessary comment && rename var name for cos to rotary_pos_e…

1e6355b

…mb_cos Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337 reviewed Jan 7, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

clean code

0f81e25

Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337 reviewed Jan 7, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jan 7, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

accept reivews

492080c

Signed-off-by: JaredforReal <w13431838023@gmail.com>

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

accept reivew

1944742

Signed-off-by: JaredforReal <w13431838023@gmail.com>

DarkLight1337 approved these changes Jan 7, 2026

View reviewed changes

Isotr0py approved these changes Jan 7, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) January 7, 2026 04:37

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 7, 2026

Merge branch 'main' into refactor/glmasr

430b19c

auto-merge was automatically disabled January 7, 2026 06:25
Head branch was pushed to by a user without write access

DarkLight1337 reviewed Jan 7, 2026

View reviewed changes

vllm/model_executor/models/glmasr.py Outdated Show resolved Hide resolved

JaredforReal marked this pull request as draft January 7, 2026 09:08

JaredforReal force-pushed the refactor/glmasr branch from 56a9ed9 to 430b19c Compare January 7, 2026 09:54

Isotr0py marked this pull request as ready for review January 7, 2026 10:06

Merge branch 'main' into refactor/glmasr

913fdaf

Isotr0py enabled auto-merge (squash) January 7, 2026 10:07

Isotr0py merged commit 9741387 into vllm-project:main Jan 7, 2026
53 checks passed

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[Refactor] GLM-ASR Modeling (vllm-project#31779)

ffc88ff

Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Refactor] GLM-ASR Modeling (vllm-project#31779)

cbd85c5

Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Refactor] GLM-ASR Modeling (vllm-project#31779)

431a20c

Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Uh oh!

Conversation

JaredforReal commented Jan 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Key Improvements

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JaredforReal commented Jan 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

JaredforReal commented Jan 7, 2026

Uh oh!

JaredforReal commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JaredforReal commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JaredforReal commented Jan 6, 2026 •

edited by github-actions bot

Loading

JaredforReal commented Jan 7, 2026 •

edited

Loading