Skip to content

[Bugfix] multimodal_gen(hunyuan3d): honor config precisions for delight/paint#22289

Open
jy-song-hub wants to merge 6 commits into
sgl-project:mainfrom
bytedance-iaas:fix/hunyuan3d-precision-config
Open

[Bugfix] multimodal_gen(hunyuan3d): honor config precisions for delight/paint#22289
jy-song-hub wants to merge 6 commits into
sgl-project:mainfrom
bytedance-iaas:fix/hunyuan3d-precision-config

Conversation

@jy-song-hub
Copy link
Copy Markdown
Contributor

Motivation

Hunyuan3D paint currently hardcodes fp16 for the delight pipeline, VAE, and UNet, ignoring Hunyuan3D2PipelineConfig.{dit_precision, vae_precision} . This breaks CPU/MPS (lack of half/bfloat support) and makes precision behavior inconsistent across pipelines. Also, the delight stage ignores delight_negative_prompt from config.

Modifications

  • Use PRECISION_TO_TYPE to honor config dtypes with a CPU/MPS-safe fallback:
    • Delight pipeline: load and move use dit_precision; fallback to fp32 on CPU/MPS.
    • VAE: .to(...) uses vae_precision; fallback to fp32 on CPU/MPS.
    • UNet: from_pretrained(..., torch_dtype=...) uses dit_precision; fallback to fp32 on CPU/MPS.
  • Plumb delight_negative_prompt into the delight call.

Accuracy Tests

Tested via unit test. To avoid expanding the PR surface area, the unittest is not included in this PR. For the unittest details, see the code snippet in the comment. Local unit check (not included in this PR) validated that:

  • Delight load/move uses dit_precision.
  • VAE .to(...) uses vae_precision.
  • UNet from_pretrained uses dit_precision.
  • CPU/MPS fallback coerces to fp32 as expected.

Speed Tests and Profiling

  • No expected throughput change when configs remain fp16 / bf16 on CUDA/ROCm.
  • On CPU/MPS the fallback to fp32 is for safety and only applies when half/bfloat was requested.

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Apr 7, 2026
@jy-song-hub
Copy link
Copy Markdown
Contributor Author

The following unit test was used to validate the correctness of this change. As the code modification is small while the test is relatively lengthy, it is included here instead of in the codebase to avoid unnecessary bloat.

import os
import sys
import json
import types
import tempfile

import importlib.util
import torch

from sglang.multimodal_gen.configs.pipeline_configs.hunyuan3d import (
    Hunyuan3D2PipelineConfig,
)
from sglang.multimodal_gen.utils import PRECISION_TO_TYPE
from sglang.multimodal_gen.runtime.server_args import (
    set_global_server_args,
    ServerArgs,
)


class _MinimalServerArgs:
    def __init__(self, model_path: str, enable_torch_compile: bool = False):
        self.model_path = model_path
        self.enable_torch_compile = enable_torch_compile


def _install_fake_diffusers_modules():
    # Fake diffusers module tree with minimal APIs used by hunyuan3d_paint
    diffusers = types.ModuleType("diffusers")

    class DummyPipeline:
        def __init__(self):
            self.scheduler = types.SimpleNamespace(config={})
            self.to_args = None
            self.set_progress_bar_called = False
            self.from_pretrained_dtype = None

        def set_progress_bar_config(self, **kwargs):
            self.set_progress_bar_called = True

        def to(self, device, dtype=None):
            self.to_args = (device, dtype)
            return self

        def __call__(self, **kwargs):
            # Mimic diffusers API: return object with .images
            class R:
                images = [None]

            return R()

    class SDIPP:
        @staticmethod
        def from_pretrained(local_path, torch_dtype=None, safety_checker=None):
            p = DummyPipeline()
            p.from_pretrained_dtype = torch_dtype
            return p

    class DummySched:
        def __init__(self):
            # match diffusers schedulers: config object with attributes
            self.config = types.SimpleNamespace(num_train_timesteps=1000)
            self.alphas_cumprod = torch.ones(1000)

        @staticmethod
        def from_config(cfg, **kwargs):
            return DummySched()

    class VaeImageProcessor:
        def __init__(self, vae_scale_factor):
            self.vae_scale_factor = vae_scale_factor

    class AutoencoderKL:
        def __init__(self, **kwargs):
            self.config = types.SimpleNamespace(
                block_out_channels=[1, 2, 3], scaling_factor=1.0
            )
            self._to_dtype = None

        def load_state_dict(self, state_dict):
            pass

        def to(self, device=None, dtype=None):
            self._to_dtype = dtype
            return self

        def eval(self):
            return self

    diffusers.StableDiffusionInstructPix2PixPipeline = SDIPP
    diffusers.EulerAncestralDiscreteScheduler = DummySched
    diffusers.AutoencoderKL = AutoencoderKL

    image_processor = types.ModuleType("diffusers.image_processor")
    image_processor.VaeImageProcessor = VaeImageProcessor
    sys.modules["diffusers.image_processor"] = image_processor
    sys.modules["diffusers"] = diffusers

    # safetensors stub (used when .safetensors exists)
    st_mod = types.ModuleType("safetensors")
    st_torch = types.ModuleType("safetensors.torch")

    def _load_file(path):
        return {}

    st_torch.load_file = _load_file
    sys.modules["safetensors"] = st_mod
    sys.modules["safetensors.torch"] = st_torch


def _patch_unet_from_pretrained(monkeypatch, recorded):
    # Patch UNet2p5DConditionModel.from_pretrained to avoid file IO and capture dtype
    import sglang.multimodal_gen.runtime.models.dits.hunyuan3d as hy3d

    def _fake_from_pretrained(path, **kwargs):
        class DummyUNet(torch.nn.Module):
            def __init__(self):
                super().__init__()
                self.called_to = False

            def to(self, device):
                self.called_to = True
                return self

        recorded["dit_dtype"] = kwargs.get("torch_dtype") or kwargs.get("dtype")
        return DummyUNet()

    monkeypatch.setattr(hy3d.UNet2p5DConditionModel, "from_pretrained", _fake_from_pretrained)


def _load_hunyuan3d_paint_module():
    here = os.path.dirname(__file__)
    target = os.path.abspath(
        os.path.join(
            here,
            "..",
            "runtime",
            "pipelines_core",
            "stages",
            "hunyuan3d_paint.py",
        )
    )
    spec = importlib.util.spec_from_file_location(
        "_hunyuan3d_paint_local", target
    )
    mod = importlib.util.module_from_spec(spec)
    assert spec and spec.loader
    spec.loader.exec_module(mod)  # type: ignore[attr-defined]
    return mod


def test_hunyuan3d_paint_uses_config_precisions(monkeypatch):
    _install_fake_diffusers_modules()

    with tempfile.TemporaryDirectory() as tmp:
        # Create expected subfolders
        delight_dir = os.path.join(tmp, "hunyuan3d-delight-v2-0")
        os.makedirs(delight_dir, exist_ok=True)

        paint_dir = os.path.join(tmp, "hunyuan3d-paint-v2-0")
        os.makedirs(os.path.join(paint_dir, "vae"), exist_ok=True)
        os.makedirs(os.path.join(paint_dir, "unet"), exist_ok=True)
        os.makedirs(os.path.join(paint_dir, "scheduler"), exist_ok=True)

        # Minimal files consumed by loader
        with open(os.path.join(paint_dir, "vae", "config.json"), "w") as f:
            json.dump({}, f)
        # Prefer .bin path; stub torch.load later
        open(os.path.join(paint_dir, "vae", "diffusion_pytorch_model.bin"), "wb").close()
        with open(os.path.join(paint_dir, "scheduler", "scheduler_config.json"), "w") as f:
            json.dump({"num_train_timesteps": 1000}, f)

        # Stub torch.load for VAE weights
        monkeypatch.setattr(torch, "load", lambda *a, **k: {})

        # Record dtype used for UNet
        recorded = {}
        _patch_unet_from_pretrained(monkeypatch, recorded)

        # Use stable CPU-friendly expectations: fp16 for DiT, fp32 for VAE
        cfg = Hunyuan3D2PipelineConfig()
        cfg.dit_precision = "fp16"
        cfg.vae_precision = "fp32"
        cfg.delight_prompt = "test"

        # Set minimal global ServerArgs to satisfy PipelineStage base class
        # Avoid registry/model discovery by constructing ServerArgs directly
        set_global_server_args(ServerArgs(model_path=tmp, pipeline_config=Hunyuan3D2PipelineConfig()))

        # Preprocess stage (delight)
        mod = _load_hunyuan3d_paint_module()
        Hunyuan3DPaintPreprocessStage = mod.Hunyuan3DPaintPreprocessStage
        Hunyuan3DPaintTexGenStage = mod.Hunyuan3DPaintTexGenStage

        pre = Hunyuan3DPaintPreprocessStage(cfg)
        pre._load_delight_model(_MinimalServerArgs(model_path=tmp))
        # From-pretrained dtype for delight equals dit precision (with CPU fallback to fp32)
        expected_delight_dtype = PRECISION_TO_TYPE[cfg.dit_precision]
        # On CPU/MPS, the stage falls back to fp32 for safety
        if pre.device.type in ("cpu", "mps"):
            expected_delight_dtype = torch.float32
        assert getattr(pre._delight_pipeline, "from_pretrained_dtype") == expected_delight_dtype
        assert pre._delight_pipeline.to_args[1] == expected_delight_dtype

        # TexGen stage (VAE + UNet)
        tex = Hunyuan3DPaintTexGenStage(cfg, paint_dir=paint_dir)
        tex._do_load_paint(_MinimalServerArgs(model_path=tmp))

        # VAE moved to dtype per config (with CPU/MPS fallback)
        expected_vae_dtype = PRECISION_TO_TYPE[cfg.vae_precision]
        if tex.device.type in ("cpu", "mps") and expected_vae_dtype in (torch.float16, torch.bfloat16):
            expected_vae_dtype = torch.float32
        assert tex.vae._to_dtype == expected_vae_dtype

        # UNet loaded with torch_dtype per dit_precision (with CPU/MPS fallback)
        expected_dit_dtype = PRECISION_TO_TYPE[cfg.dit_precision]
        if tex.device.type in ("cpu", "mps") and expected_dit_dtype in (torch.float16, torch.bfloat16):
            expected_dit_dtype = torch.float32
        assert recorded["dit_dtype"] == expected_dit_dtype

@jy-song-hub
Copy link
Copy Markdown
Contributor Author

@mickqian Please take a look. Thanks!

Replace hardcoded fp16 casts with config-driven dtypes for delight, VAE and UNet, with CPU/MPS-safe fp32 fallback. Also pass delight_negative_prompt from config. No behavior change on CUDA when config is fp16/bf16.
@jy-song-hub jy-song-hub force-pushed the fix/hunyuan3d-precision-config branch from fbfb6d9 to 3716fb7 Compare April 8, 2026 21:52
@mickqian
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@jy-song-hub
Copy link
Copy Markdown
Contributor Author

jy-song-hub commented May 5, 2026

/rerun-failed-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants