mtmd, llama, ggml : Update HunyuanVL support by ManaEstras · Pull Request #22029 · ggml-org/llama.cpp

ManaEstras · 2026-04-17T04:51:10Z

Overview

Update support for HunyuanVL vision-language model.

This PR includes:

ggml

New ggml_interpolate_sf() API for explicit scale factor interpolation (needed for HunyuanVL's (H+0.1)/n_grid position embedding scaling)
New GGML_SCALE_FLAG_CUSTOM_SF flag
Fix nearest interpolation out-of-bounds access in CPU/CUDA/Metal/SYCL/Vulkan/OpenCL backends
Add test_interpolate_sf test cases (38 tests covering various modes and edge cases)

llama

New HUNYUAN_VL architecture
Support for rope.scaling.alpha and rope_dimension_sections
M-RoPE support for HunyuanVL text model

mtmd

New PROJECTOR_TYPE_HUNYUANVL projector type
Special image token layout handling (BOI + rows with newlines + EOI)
New set_position_mrope_hunyuanvl() and mtmd_decode_use_mrope_hunyuanvl() APIs
Add HunyuanVL smoke test in tests.sh

convert

Add HunyuanVLVisionModel (mmproj) and HunyuanVLTextModel export support

Testing

ctest -L main passed
test-backend-ops passed (38 interpolate_sf tests on CPU and Metal)
tools/mtmd/tests.sh smoke test added

Additional information

HunyuanVL uses a special position embedding interpolation that differs from standard models - it requires explicit scale factors (H+0.1)/n_grid instead of the simple H/n_grid ratio. This necessitated the new ggml_interpolate_sf() API.

The image token layout is also non-standard: instead of a simple nx * ny grid, HunyuanVL uses BOI + (patch rows with newline separators) + EOI, which required special handling in mtmd.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - AI was used for code review, documentation drafting, and test case suggestions. All code was written and reviewed by human contributors.

Add support for HunyuanVL vision-language model. ggml changes: - add ggml_interpolate_sf() for explicit scale factor interpolation - add GGML_SCALE_FLAG_CUSTOM_SF flag - fix nearest interpolation out-of-bounds access in all backends - add test_interpolate_sf test cases (38 tests) llama changes: - add HUNYUAN_VL architecture - add rope.scaling.alpha and rope_dimension_sections support - add M-RoPE support for HunyuanVL mtmd changes: - add PROJECTOR_TYPE_HUNYUANVL - add special image token layout (BOI + rows with newlines + EOI) - add set_position_mrope_hunyuanvl() for HunyuanVL M-RoPE - add mtmd_decode_use_mrope_hunyuanvl() API convert changes: - add HunyuanVLVisionModel (mmproj) export - add HunyuanVLTextModel export

ggml-gh-bot · 2026-04-17T04:55:25Z

Hi @ManaEstras, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple backend changes in one PR: When adding support for a new model or feature, focus on CPU support only in the initial PR. Add support for other backends like CUDA in follow-up PRs. If you have a good reason to modify multiple backends in one PR, please explain it.
Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

ngxson

A general note that if you want to accelerate merging this PR:

Do not add a new ggml op, follow the recommendation in my comment
If you still have a very good reason to add a new op, you should only add CPU support in this PR

The more changes you include here, the longer it take for other reviewers to approve, and the slower it can be merged.

ngxson · 2026-04-17T08:37:43Z

+        if name.startswith("vit.perceive."):
+            suffix = name[len("vit.perceive."):]
+            if suffix.startswith("proj."):
+                # proj.0.weight -> mm.0.weight, proj.2.weight -> mm.2.weight
+                new_name = "mm." + suffix[len("proj."):]
+            elif suffix.startswith("mlp."):
+                # mlp.weight -> mm.model.fc.weight
+                new_name = "mm.model.fc." + suffix[len("mlp."):]
+            elif suffix.startswith("before_rms."):
+                # before_rms.weight -> mm.pre_norm.weight
+                new_name = "mm.pre_norm." + suffix[len("before_rms."):]
+            elif suffix.startswith("after_rms."):
+                # after_rms.weight -> mm.post_norm.weight
+                new_name = "mm.post_norm." + suffix[len("after_rms."):]
+            elif suffix == "image_newline":
+                new_name = "v.image_newline"
+            elif suffix == "image_sep":
+                new_name = "v.view_seperator"
+            else:
+                # image_begin, image_end -> mm.image_begin, mm.image_end
+                new_name = "mm." + suffix
+            yield (new_name, data_torch)
+            return


please use proper tensor mapping like all other models

ngxson · 2026-04-17T08:38:19Z

        }
    }

+    void set_position_mrope_hunyuanvl(llama_pos pos_0, int nx, int ny, llama_seq_id seq_id, int image_count = 0) {


remove any changes in mtmd-helper, implement it in mtmd_image_tokens_get_decoder_pos instead

ngxson · 2026-04-17T08:39:40Z

    uint32_t ny; // number of tokens in y direction
    bool use_mrope_pos = false; // use M-RoPE position counting (the whole image is 1 temporal position)
-    uint32_t n_tokens() const { return nx * ny; }
+    uint32_t n_tokens_total = 0;


it should ne uint32_t n_boi, the number of BOI tokens

please cherry-pick the logic from ngxson#100

ngxson · 2026-04-17T08:40:08Z


+// whether the current model uses HunyuanVL-style M-RoPE
+// (token layout differs from standard 2D grid: BOI + rows-with-newlines + EOI)
+MTMD_API bool mtmd_decode_use_mrope_hunyuanvl(mtmd_context * ctx);


remove this API, use decoder_pos, see ngxson#100 (same idea)

ngxson · 2026-04-17T08:44:35Z

+            pos_patch = ggml_interpolate_sf(ctx0, pos_patch, pw, ph, n_embd, 1,
+                                            GGML_SCALE_MODE_BILINEAR,
+                                            (float)(pw + 0.1f) / n_grid,
+                                            (float)(ph + 0.1f) / n_grid);


IMO the new op is quite hacky (though important note is that I'm not the one who can give the approval for a new op), it's better to simply resize the embedding on CPU, and set the resized as graph input

ManaEstras · 2026-04-17T10:34:58Z

Thx @ngxson let me clean up some of the code and commit. I've added another two PRs regarding the same modifications as this one. so please temporarily ignore those PRs.

ngxson · 2026-04-17T15:17:13Z

@ManaEstras To clarify a bit, what I mean is that let's not change anything in GGML at this time, to avoid putting too much stress for backend maintainers (I'm not a backend maintainer btw, so I can only help you on the multimodal part)

My idea is that you can either:

(Recommended way) to call ggml_backend_tensor_get() to get the tensor data, resize it inside clip_image_batch_encode and set the resized version as input data on the graph via set_input_f32()
Or, you can use ggml_custom_4d but it's not very well documented, you may loss time to debug it

An alternative method is that you can also see if you can implement the same functionality with the existing ggml_interpolate, plus ggml_view to crop the output. But I can be wrong about how it work.

wendadawen · 2026-04-18T09:23:42Z

@ngxson Thanks for the review comments. Here's what I've addressed:

Tensor mapping — Replaced manual if-elif remapping with standard tensor_mapping.py.
XD-RoPE — Replaced the old M-RoPE approach entirely with a fresh decoder_pos based XD-RoPE implementation in mtmd-helper. No longer uses the previous M-RoPE logic.
ggml_interpolate_sf — Kept CPU backend only, reverted other backend changes.

ngxson · 2026-04-19T10:03:03Z

@wendadawen seems like you misunderstood my comments, please refer to #22037

Btw, there are 2 similar PRs and I don't know which one you are working on

ManaEstras requested review from a team, CISC and ggerganov as code owners April 17, 2026 04:51

ngxson reviewed Apr 17, 2026

View reviewed changes

wendadawen added 4 commits April 18, 2026 15:55

fix: use decoder_pos for mrope HunyuanVL implementation

c3e4d2d

fix: keep only CPU backend for ggml_interpolate_sf

fe078d2

fix: rename use_mrope to use_xdrope in Hunyuan model

87c7cb2

fix：use standard tensor_mapping

45ce055

ngxson mentioned this pull request Apr 19, 2026

mtmd, llama : Update HunyuanVL vision-language model support #22037

Merged

3 tasks

ManaEstras closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd, llama, ggml : Update HunyuanVL support#22029

mtmd, llama, ggml : Update HunyuanVL support#22029
ManaEstras wants to merge 5 commits into
ggml-org:masterfrom
ManaEstras:hyvl

ManaEstras commented Apr 17, 2026

Uh oh!

ggml-gh-bot Bot commented Apr 17, 2026

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

ngxson Apr 17, 2026

Uh oh!

ngxson Apr 17, 2026

Uh oh!

ngxson Apr 17, 2026

Uh oh!

ngxson Apr 17, 2026

Uh oh!

ngxson Apr 17, 2026

Uh oh!

ManaEstras commented Apr 17, 2026

Uh oh!

ngxson commented Apr 17, 2026

Uh oh!

wendadawen commented Apr 18, 2026

Uh oh!

ngxson commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ManaEstras commented Apr 17, 2026

Overview

Testing

Additional information

Requirements

Uh oh!

ggml-gh-bot Bot commented Apr 17, 2026

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ManaEstras commented Apr 17, 2026

Uh oh!

ngxson commented Apr 17, 2026

Uh oh!

wendadawen commented Apr 18, 2026

Uh oh!

ngxson commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson left a comment •

edited

Loading