[Bugfix] multimodal_gen(hunyuan3d): honor config precisions for delight/paint by jy-song-hub · Pull Request #22289 · sgl-project/sglang

jy-song-hub · 2026-04-07T22:58:46Z

Motivation

Hunyuan3D paint currently hardcodes fp16 for the delight pipeline, VAE, and UNet, ignoring Hunyuan3D2PipelineConfig.{dit_precision, vae_precision} . This breaks CPU/MPS (lack of half/bfloat support) and makes precision behavior inconsistent across pipelines. Also, the delight stage ignores delight_negative_prompt from config.

Modifications

Use PRECISION_TO_TYPE to honor config dtypes with a CPU/MPS-safe fallback:
- Delight pipeline: load and move use dit_precision; fallback to fp32 on CPU/MPS.
- VAE: .to(...) uses vae_precision; fallback to fp32 on CPU/MPS.
- UNet: from_pretrained(..., torch_dtype=...) uses dit_precision; fallback to fp32 on CPU/MPS.
Plumb delight_negative_prompt into the delight call.

Accuracy Tests

Tested via unit test. To avoid expanding the PR surface area, the unittest is not included in this PR. For the unittest details, see the code snippet in the comment. Local unit check (not included in this PR) validated that:

Delight load/move uses dit_precision.
VAE .to(...) uses vae_precision.
UNet from_pretrained uses dit_precision.
CPU/MPS fallback coerces to fp32 as expected.

Speed Tests and Profiling

No expected throughput change when configs remain fp16 / bf16 on CUDA/ROCm.
On CPU/MPS the fallback to fp32 is for safety and only applies when half/bfloat was requested.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-07T22:58:56Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

jy-song-hub · 2026-04-07T22:59:26Z

The following unit test was used to validate the correctness of this change. As the code modification is small while the test is relatively lengthy, it is included here instead of in the codebase to avoid unnecessary bloat.

import os
import sys
import json
import types
import tempfile

import importlib.util
import torch

from sglang.multimodal_gen.configs.pipeline_configs.hunyuan3d import (
    Hunyuan3D2PipelineConfig,
)
from sglang.multimodal_gen.utils import PRECISION_TO_TYPE
from sglang.multimodal_gen.runtime.server_args import (
    set_global_server_args,
    ServerArgs,
)


class _MinimalServerArgs:
    def __init__(self, model_path: str, enable_torch_compile: bool = False):
        self.model_path = model_path
        self.enable_torch_compile = enable_torch_compile


def _install_fake_diffusers_modules():
    # Fake diffusers module tree with minimal APIs used by hunyuan3d_paint
    diffusers = types.ModuleType("diffusers")

    class DummyPipeline:
        def __init__(self):
            self.scheduler = types.SimpleNamespace(config={})
            self.to_args = None
            self.set_progress_bar_called = False
            self.from_pretrained_dtype = None

        def set_progress_bar_config(self, **kwargs):
            self.set_progress_bar_called = True

        def to(self, device, dtype=None):
            self.to_args = (device, dtype)
            return self

        def __call__(self, **kwargs):
            # Mimic diffusers API: return object with .images
            class R:
                images = [None]

            return R()

    class SDIPP:
        @staticmethod
        def from_pretrained(local_path, torch_dtype=None, safety_checker=None):
            p = DummyPipeline()
            p.from_pretrained_dtype = torch_dtype
            return p

    class DummySched:
        def __init__(self):
            # match diffusers schedulers: config object with attributes
            self.config = types.SimpleNamespace(num_train_timesteps=1000)
            self.alphas_cumprod = torch.ones(1000)

        @staticmethod
        def from_config(cfg, **kwargs):
            return DummySched()

    class VaeImageProcessor:
        def __init__(self, vae_scale_factor):
            self.vae_scale_factor = vae_scale_factor

    class AutoencoderKL:
        def __init__(self, **kwargs):
            self.config = types.SimpleNamespace(
                block_out_channels=[1, 2, 3], scaling_factor=1.0
            )
            self._to_dtype = None

        def load_state_dict(self, state_dict):
            pass

        def to(self, device=None, dtype=None):
            self._to_dtype = dtype
            return self

        def eval(self):
            return self

    diffusers.StableDiffusionInstructPix2PixPipeline = SDIPP
    diffusers.EulerAncestralDiscreteScheduler = DummySched
    diffusers.AutoencoderKL = AutoencoderKL

    image_processor = types.ModuleType("diffusers.image_processor")
    image_processor.VaeImageProcessor = VaeImageProcessor
    sys.modules["diffusers.image_processor"] = image_processor
    sys.modules["diffusers"] = diffusers

    # safetensors stub (used when .safetensors exists)
    st_mod = types.ModuleType("safetensors")
    st_torch = types.ModuleType("safetensors.torch")

    def _load_file(path):
        return {}

    st_torch.load_file = _load_file
    sys.modules["safetensors"] = st_mod
    sys.modules["safetensors.torch"] = st_torch


def _patch_unet_from_pretrained(monkeypatch, recorded):
    # Patch UNet2p5DConditionModel.from_pretrained to avoid file IO and capture dtype
    import sglang.multimodal_gen.runtime.models.dits.hunyuan3d as hy3d

    def _fake_from_pretrained(path, **kwargs):
        class DummyUNet(torch.nn.Module):
            def __init__(self):
                super().__init__()
                self.called_to = False

            def to(self, device):
                self.called_to = True
                return self

        recorded["dit_dtype"] = kwargs.get("torch_dtype") or kwargs.get("dtype")
        return DummyUNet()

    monkeypatch.setattr(hy3d.UNet2p5DConditionModel, "from_pretrained", _fake_from_pretrained)


def _load_hunyuan3d_paint_module():
    here = os.path.dirname(__file__)
    target = os.path.abspath(
        os.path.join(
            here,
            "..",
            "runtime",
            "pipelines_core",
            "stages",
            "hunyuan3d_paint.py",
        )
    )
    spec = importlib.util.spec_from_file_location(
        "_hunyuan3d_paint_local", target
    )
    mod = importlib.util.module_from_spec(spec)
    assert spec and spec.loader
    spec.loader.exec_module(mod)  # type: ignore[attr-defined]
    return mod


def test_hunyuan3d_paint_uses_config_precisions(monkeypatch):
    _install_fake_diffusers_modules()

    with tempfile.TemporaryDirectory() as tmp:
        # Create expected subfolders
        delight_dir = os.path.join(tmp, "hunyuan3d-delight-v2-0")
        os.makedirs(delight_dir, exist_ok=True)

        paint_dir = os.path.join(tmp, "hunyuan3d-paint-v2-0")
        os.makedirs(os.path.join(paint_dir, "vae"), exist_ok=True)
        os.makedirs(os.path.join(paint_dir, "unet"), exist_ok=True)
        os.makedirs(os.path.join(paint_dir, "scheduler"), exist_ok=True)

        # Minimal files consumed by loader
        with open(os.path.join(paint_dir, "vae", "config.json"), "w") as f:
            json.dump({}, f)
        # Prefer .bin path; stub torch.load later
        open(os.path.join(paint_dir, "vae", "diffusion_pytorch_model.bin"), "wb").close()
        with open(os.path.join(paint_dir, "scheduler", "scheduler_config.json"), "w") as f:
            json.dump({"num_train_timesteps": 1000}, f)

        # Stub torch.load for VAE weights
        monkeypatch.setattr(torch, "load", lambda *a, **k: {})

        # Record dtype used for UNet
        recorded = {}
        _patch_unet_from_pretrained(monkeypatch, recorded)

        # Use stable CPU-friendly expectations: fp16 for DiT, fp32 for VAE
        cfg = Hunyuan3D2PipelineConfig()
        cfg.dit_precision = "fp16"
        cfg.vae_precision = "fp32"
        cfg.delight_prompt = "test"

        # Set minimal global ServerArgs to satisfy PipelineStage base class
        # Avoid registry/model discovery by constructing ServerArgs directly
        set_global_server_args(ServerArgs(model_path=tmp, pipeline_config=Hunyuan3D2PipelineConfig()))

        # Preprocess stage (delight)
        mod = _load_hunyuan3d_paint_module()
        Hunyuan3DPaintPreprocessStage = mod.Hunyuan3DPaintPreprocessStage
        Hunyuan3DPaintTexGenStage = mod.Hunyuan3DPaintTexGenStage

        pre = Hunyuan3DPaintPreprocessStage(cfg)
        pre._load_delight_model(_MinimalServerArgs(model_path=tmp))
        # From-pretrained dtype for delight equals dit precision (with CPU fallback to fp32)
        expected_delight_dtype = PRECISION_TO_TYPE[cfg.dit_precision]
        # On CPU/MPS, the stage falls back to fp32 for safety
        if pre.device.type in ("cpu", "mps"):
            expected_delight_dtype = torch.float32
        assert getattr(pre._delight_pipeline, "from_pretrained_dtype") == expected_delight_dtype
        assert pre._delight_pipeline.to_args[1] == expected_delight_dtype

        # TexGen stage (VAE + UNet)
        tex = Hunyuan3DPaintTexGenStage(cfg, paint_dir=paint_dir)
        tex._do_load_paint(_MinimalServerArgs(model_path=tmp))

        # VAE moved to dtype per config (with CPU/MPS fallback)
        expected_vae_dtype = PRECISION_TO_TYPE[cfg.vae_precision]
        if tex.device.type in ("cpu", "mps") and expected_vae_dtype in (torch.float16, torch.bfloat16):
            expected_vae_dtype = torch.float32
        assert tex.vae._to_dtype == expected_vae_dtype

        # UNet loaded with torch_dtype per dit_precision (with CPU/MPS fallback)
        expected_dit_dtype = PRECISION_TO_TYPE[cfg.dit_precision]
        if tex.device.type in ("cpu", "mps") and expected_dit_dtype in (torch.float16, torch.bfloat16):
            expected_dit_dtype = torch.float32
        assert recorded["dit_dtype"] == expected_dit_dtype

jy-song-hub · 2026-04-08T00:02:49Z

@mickqian Please take a look. Thanks!

Replace hardcoded fp16 casts with config-driven dtypes for delight, VAE and UNet, with CPU/MPS-safe fp32 fallback. Also pass delight_negative_prompt from config. No behavior change on CUDA when config is fp16/bf16.

mickqian · 2026-04-23T01:51:55Z

/tag-and-rerun-ci

jy-song-hub · 2026-05-05T23:16:19Z

/rerun-failed-ci

jy-song-hub requested review from mickqian, ping1jing2 and yhyang201 as code owners April 7, 2026 22:58

github-actions Bot added the diffusion SGLang Diffusion label Apr 7, 2026

This was referenced Apr 7, 2026

Why Hardcoded BF16 Exists in Diffusion Pipelines — and When It Is (Not) Safe to Fix #22295

Open

[BugFix] Respect configured precision in Qwen layered path #21980

Open

multimodal_gen(hunyuan3d): honor config precisions for delight/paint

3716fb7

Replace hardcoded fp16 casts with config-driven dtypes for delight, VAE and UNet, with CPU/MPS-safe fp32 fallback. Also pass delight_negative_prompt from config. No behavior change on CUDA when config is fp16/bf16.

jy-song-hub force-pushed the fix/hunyuan3d-precision-config branch from fbfb6d9 to 3716fb7 Compare April 8, 2026 21:52

mickqian approved these changes Apr 23, 2026

View reviewed changes

github-actions Bot added the run-ci label Apr 23, 2026

jy-song-hub added 2 commits April 23, 2026 15:37

Merge branch 'main' into fix/hunyuan3d-precision-config

8c3db95

Merge branch 'main' into fix/hunyuan3d-precision-config

98d77e1

Kangyan-Zhou mentioned this pull request Apr 28, 2026

ci: clean up stale-CUDA mooncake variant in install_extra_deps #23960

Merged

2 tasks

jy-song-hub added 3 commits April 29, 2026 11:52

Merge branch 'main' into fix/hunyuan3d-precision-config

6f87c34

Merge branch 'main' into fix/hunyuan3d-precision-config

f012d98

Merge branch 'main' into fix/hunyuan3d-precision-config

8d068e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] multimodal_gen(hunyuan3d): honor config precisions for delight/paint#22289

[Bugfix] multimodal_gen(hunyuan3d): honor config precisions for delight/paint#22289
jy-song-hub wants to merge 6 commits into
sgl-project:mainfrom
bytedance-iaas:fix/hunyuan3d-precision-config

jy-song-hub commented Apr 7, 2026

Uh oh!

gemini-code-assist Bot commented Apr 7, 2026

Uh oh!

jy-song-hub commented Apr 7, 2026

Uh oh!

jy-song-hub commented Apr 8, 2026

Uh oh!

mickqian commented Apr 23, 2026

Uh oh!

jy-song-hub commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jy-song-hub commented Apr 7, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 7, 2026

Uh oh!

jy-song-hub commented Apr 7, 2026

Uh oh!

jy-song-hub commented Apr 8, 2026

Uh oh!

mickqian commented Apr 23, 2026

Uh oh!

jy-song-hub commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jy-song-hub commented May 5, 2026 •

edited

Loading