-
Couldn't load subscription status.
- Fork 13.5k
Closed
Labels
Nvidia GPUIssues specific to Nvidia GPUsIssues specific to Nvidia GPUs
Description
Noticed a 10% performance loss in tg on the AGX Orin this week, a bisect led me to f77c13b (#16715).
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes
| model | size | params | backend | ngl | threads | n_ubatch | fa | mmap | test | t/s |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg32 | 37.09 ± 0.58 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg64 | 37.31 ± 0.05 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg128 | 37.33 ± 0.02 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg512 | 37.20 ± 0.01 |
build: 3cfa9c3 (6840)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes
| model | size | params | backend | ngl | threads | n_ubatch | fa | mmap | test | t/s |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg32 | 33.21 ± 0.44 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg64 | 33.39 ± 0.04 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg128 | 33.40 ± 0.02 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 99 | 1 | 2048 | 1 | 0 | tg512 | 33.29 ± 0.01 |
build: f77c13b (6841)
Metadata
Metadata
Assignees
Labels
Nvidia GPUIssues specific to Nvidia GPUsIssues specific to Nvidia GPUs