Skip to content

step - Model failed with error: WithBacktrace { inner: Cuda(MatMulNonContiguous #1497

@sempervictus

Description

@sempervictus

Describe the bug

Using tcals/code-llama-7b-query0809-200w-completion-2048-400step as plain at q4k.

2025-06-20T21:13:40.720909Z  INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Some(Q4K) to 225 tensors out of 225 total tensors. Took 15.47s
mistralrs-svc2  | 2025-06-20T21:13:40.721021Z  INFO mistralrs_core::paged_attention: Allocating 16384 MB for PagedAttention KV cache per GPU
mistralrs-svc2  | 2025-06-20T21:13:40.721029Z  INFO mistralrs_core::paged_attention: Using PagedAttention with block size 32 and 1024 GPU blocks: available context length is 32768 tokens
mistralrs-svc2  | 2025-06-20T21:13:40.879084Z  INFO mistralrs_core::pipeline::chat_template: bos_toks = "<s>", eos_toks = "</s>", unk_tok = <unk>
mistralrs-svc2  | 2025-06-20T21:13:40.881196Z  INFO mistralrs_server_core::mistralrs_for_server_builder: Model loaded.
mistralrs-svc2  | 2025-06-20T21:13:40.881405Z  INFO mistralrs_core: Pipeline input modalities are [📝 Text]
mistralrs-svc2  | 2025-06-20T21:13:40.881411Z  INFO mistralrs_core: Pipeline output modalities are [📝 Text]
mistralrs-svc2  | 2025-06-20T21:13:40.881471Z  INFO mistralrs_core: Beginning dummy run.
mistralrs-svc2  | 2025-06-20T21:13:40.883572Z  INFO mistralrs_core::prefix_cacher: PrefixCacherV2 is enabled. Expect higher multi-turn throughput for both text and multimodal.
mistralrs-svc2  | 2025-06-20T21:13:52.204346Z ERROR mistralrs_core::engine: step - Model failed with error: WithBacktrace { inner: Cuda(MatMulNonContiguous { lhs_stride: Layout { shape: [1, 32, 2, 2], stride: [128, 4, 2, 1], start_offset: 0 }, rhs_stride: Layout { shape: [1, 32, 2, 128], stride: [8192, 128, 4096, 1], start_offset: 0 }, mnk: (2, 128, 2) }), backtrace: Backtrace [{ fn: "candle_core::error::Error::bt" }, { fn: "candle_core::cuda_backend::gemm_config" }, { fn: "<candle_core::cuda_backend::CudaStorage as candle_core::backend::BackendStorage>::matmul_with_alpha" }, { fn: "candle_core::tensor::Tensor::matmul" }, { fn: "mistralrs_core::attention::backends::naive::naive_sdpa" }, { fn: "mistralrs_core::attention::Sdpa::run_attention" }, { fn: "mistralrs_core::paged_attention::layers::paged_attention::PagedAttention::forward" }, { fn: "mistralrs_core::models::llama::Llama::forward_embeds" }, { fn: "<mistralrs_core::models::llama::Llama as mistralrs_core::pipeline::loaders::normal_loaders::NormalModel>::forward" }, { fn: "<mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "mistralrs_core::pipeline::Pipeline::step::{{closure}}" }, { fn: "mistralrs_core::engine::Engine::run::{{closure}}.61251" }, { fn: "tokio::runtime::runtime::Runtime::block_on" }, { fn: "std::sys::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::FnOnce::call_once{{vtable.shim}}" }, { fn: "std::sys::pal::unix::thread::Thread::new::thread_start" }, { fn: "clone" }] }
mistralrs-svc2  | 2025-06-20T21:13:52.207236Z  INFO mistralrs_core: Dummy run completed in 11.325758642s.
mistralrs-svc2  | 2025-06-20T21:13:52.207262Z  INFO mistralrs_server: MCP server listening on http://0.0.0.0:6652/mcp.
mistralrs-svc2  | 2025-06-20T21:13:52.207266Z  INFO mistralrs_server: MCP protocol version is 2025-03-26.
mistralrs-svc2  | 2025-06-20T21:13:52.208001Z  INFO mistralrs_server: OpenAI-compatible server listening on http://0.0.0.0:7652.
mistralrs-svc2  | 2025-06-20T21:14:00.883935Z  INFO mistralrs_core::engine::logger: Throughput (T/s) 0.40, Prefix cache hitrate 0.00%, 0 running, 0 waiting

The model presents as an API-callable target but get nothing back on calls to it.

Latest commit or version

Which commit or version you ran with.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions