Skip to content

CUDA Performance Regression on Jetson AGX Orin #16815

@TinyServal

Description

@TinyServal

Noticed a 10% performance loss in tg on the AGX Orin this week, a bisect led me to f77c13b (#16715).


ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes

model size params backend ngl threads n_ubatch fa mmap test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg32 37.09 ± 0.58
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg64 37.31 ± 0.05
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg128 37.33 ± 0.02
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg512 37.20 ± 0.01

build: 3cfa9c3 (6840)


ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes

model size params backend ngl threads n_ubatch fa mmap test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg32 33.21 ± 0.44
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg64 33.39 ± 0.04
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg128 33.40 ± 0.02
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B CUDA 99 1 2048 1 0 tg512 33.29 ± 0.01

build: f77c13b (6841)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Nvidia GPUIssues specific to Nvidia GPUs

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions