Skip to content

Conversation

@infil00p
Copy link
Contributor

@infil00p infil00p commented Dec 6, 2025

No description provided.

infil00p and others added 7 commits December 4, 2025 20:00
Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp,
enabling GPU-accelerated inference for vision-language models with platform-specific
execution providers (DirectML, CUDA, CoreML).

Changes:
- Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML)
- Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models
- Extend ModelManager with ONNX model download functionality from HuggingFace
- Add Tauri commands for ONNX model operations (download, load, generate)
- Update UI with ONNX Models tab in model selection modal
- Add quantization selector (Q4, Q8, FP16) for ONNX models
- Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure
  (ONNX files in onnx/ subdirectory, config/tokenizer at root)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Implement logic to detect model backend type and route to appropriate inference engine,
enabling seamless switching between llama.cpp and ONNX Runtime backends.

Changes:
- Update model loading useEffect to check backend type
- Route to load_onnx_model for ONNX models, load_model for llama.cpp
- Disable audio capability check for ONNX models (not yet supported)
- Add backend detection in handleSendMessage for inference routing
- Convert image data appropriately for each backend:
  - RGB array for llama.cpp (existing)
  - JPEG bytes for ONNX Runtime (new)
- Call generate_onnx_response for ONNX, generate_response for llama.cpp

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Update list_downloaded_models to include ONNX models by checking for
_onnx_ pattern in directory names and verifying .onnx files exist.
Add frontend model ID normalization to properly match ONNX models from
HuggingFace repos with normalized names (handle slashes, dashes, case).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Updated llama.cpp libraries from b7063 to b7272 to support Ministral 3
architecture, resolving "unknown model architecture: 'mistral3'" error.

Major changes:
- Add F16 mmproj preference over Q8 for better vision encoder quality
- Fix quantization string parsing to handle "(F16 mmproj)" suffix
- Remove model verification that was causing app crashes
- Streamline model list in UI (removed unsupported models)
- Update tests for new model manager signature

Fixes #7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Implemented mtmd-based detection to verify vision/audio support by
loading GGUF files and checking their metadata, following llama.cpp
best practices.

Changes:
- Add check_multimodal_support method to LlamaInference
- Expose mtmd_support_vision/audio FFI functions
- Fix FFI signatures (*mut → *const for read-only operations)
- Add comprehensive test suite for vision, audio, and unified models

This provides accurate detection of model capabilities without relying
on HuggingFace API metadata, which can be incomplete or inconsistent.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp,
enabling GPU-accelerated inference for vision-language models with platform-specific
execution providers (DirectML, CUDA, CoreML).

Changes:
- Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML)
- Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models
- Extend ModelManager with ONNX model download functionality from HuggingFace
- Add Tauri commands for ONNX model operations (download, load, generate)
- Update UI with ONNX Models tab in model selection modal
- Add quantization selector (Q4, Q8, FP16) for ONNX models
- Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure
  (ONNX files in onnx/ subdirectory, config/tokenizer at root)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants