- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.5k
Description
Name and Version
$ ./build/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 6793 (38355c6)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu
Built with:
cmake -S . -B build 
    -DGGML_HIP=ON
    -DAMDGPU_TARGETS=gfx1100
    -DCMAKE_BUILD_TYPE=Release
    -DGGML_NATIVE=ON
    -DGGML_HIP_ROCWMMA_FATTN=ON
    -DGGML_HIP_GRAPHS=ON
This also occurs with the HIP build on Windows using the same hardware.
Operating systems
Linux (and Windows)
GGML backends
HIP
Hardware
Radeon RX 7900 XTX
Models
Qwen3-30b-a3b-thinking-2507 Q4_K_XL (Unsloth)
Problem description & steps to reproduce
When I run
llama-server
        --threads 12
        --gpu-layers 99
        --flash-attn auto
        --jinja
        --hf-repo unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF:Q4_K_XL
        --ctx-size 40960
        --temp 0.6
        --top-k 20
        --top-p 0.95
        --min-p 0.0
        --ubatch-size 2048
on b6792 output is as normal as you would expect. Outputs are well thought-out, detailed, and somewhat lengthy.
When I run the same on b6793, I get shorter answers with less accurate/detailed information. It is also less inclined to format the output with Markdown.
First Bad Commit
Relevant log output
N/A, logs look normal.