Skip to content

Conversation

@tdakhran
Copy link
Contributor

@tdakhran tdakhran commented Dec 2, 2025

LFM2-Audio-1.5B supports audio input and audio output.

PR adds only ASR support. To perform ASR invoke CLI with

bin/llama-mtmd-cli -m LFM2-Audio-1.5B-F32.gguf --mmproj mmproj-LFM2-Audio-1.5b-F32.gguf -n 30 --audio input.wav -sys "Perform ASR." -p "<__media__>"

Changes to existing code:

  • model requires system prompt, -sys enabled for llama-mtmd-cli
  • mel bins generation reworked, now it is generated dynamically and supports different n_fft values
  • OP_SSM_CONV for CUDA backend is extended to support kernel size 9

cc: @ngxson

@tdakhran
Copy link
Contributor Author

tdakhran commented Dec 2, 2025

tested that llama-server works as intended with input

[
        {"role": "system", "content": "Perform ASR."},
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "format": "wav",
                        "data": base64.b64encode(pathlib.Path("/data/playground/issue_400/10.wav").read_bytes()).decode(
                            "utf-8"
                        ),
                    },
                },
            ],
        },
    ]

@tdakhran tdakhran changed the title model : add LFM2-Audio-1.5B support model : add ASR support for LFM2-Audio-1.5B Dec 2, 2025
@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs examples python python script changes ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant