ONNX Engine #10

infil00p · 2025-12-06T00:01:56Z

No description provided.

Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp, enabling GPU-accelerated inference for vision-language models with platform-specific execution providers (DirectML, CUDA, CoreML). Changes: - Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML) - Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models - Extend ModelManager with ONNX model download functionality from HuggingFace - Add Tauri commands for ONNX model operations (download, load, generate) - Update UI with ONNX Models tab in model selection modal - Add quantization selector (Q4, Q8, FP16) for ONNX models - Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure (ONNX files in onnx/ subdirectory, config/tokenizer at root) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Implement logic to detect model backend type and route to appropriate inference engine, enabling seamless switching between llama.cpp and ONNX Runtime backends. Changes: - Update model loading useEffect to check backend type - Route to load_onnx_model for ONNX models, load_model for llama.cpp - Disable audio capability check for ONNX models (not yet supported) - Add backend detection in handleSendMessage for inference routing - Convert image data appropriately for each backend: - RGB array for llama.cpp (existing) - JPEG bytes for ONNX Runtime (new) - Call generate_onnx_response for ONNX, generate_response for llama.cpp 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Update list_downloaded_models to include ONNX models by checking for _onnx_ pattern in directory names and verifying .onnx files exist. Add frontend model ID normalization to properly match ONNX models from HuggingFace repos with normalized names (handle slashes, dashes, case). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Updated llama.cpp libraries from b7063 to b7272 to support Ministral 3 architecture, resolving "unknown model architecture: 'mistral3'" error. Major changes: - Add F16 mmproj preference over Q8 for better vision encoder quality - Fix quantization string parsing to handle "(F16 mmproj)" suffix - Remove model verification that was causing app crashes - Streamline model list in UI (removed unsupported models) - Update tests for new model manager signature Fixes #7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Implemented mtmd-based detection to verify vision/audio support by loading GGUF files and checking their metadata, following llama.cpp best practices. Changes: - Add check_multimodal_support method to LlamaInference - Expose mtmd_support_vision/audio FFI functions - Fix FFI signatures (*mut → *const for read-only operations) - Add comprehensive test suite for vision, audio, and unified models This provides accurate detection of model capabilities without relying on HuggingFace API metadata, which can be incomplete or inconsistent. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp, enabling GPU-accelerated inference for vision-language models with platform-specific execution providers (DirectML, CUDA, CoreML). Changes: - Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML) - Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models - Extend ModelManager with ONNX model download functionality from HuggingFace - Add Tauri commands for ONNX model operations (download, load, generate) - Update UI with ONNX Models tab in model selection modal - Add quantization selector (Q4, Q8, FP16) for ONNX models - Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure (ONNX files in onnx/ subdirectory, config/tokenizer at root) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

infil00p and others added 7 commits December 4, 2025 20:00

Ministral failed to load, need to address issue #7

5cc79b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX Engine #10

ONNX Engine #10

Uh oh!

infil00p commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ONNX Engine #10

Are you sure you want to change the base?

ONNX Engine #10

Uh oh!

Conversation

infil00p commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants