-
Notifications
You must be signed in to change notification settings - Fork 0
ONNX Engine #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
infil00p
wants to merge
7
commits into
main
Choose a base branch
from
onnx_engine
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
ONNX Engine #10
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp, enabling GPU-accelerated inference for vision-language models with platform-specific execution providers (DirectML, CUDA, CoreML). Changes: - Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML) - Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models - Extend ModelManager with ONNX model download functionality from HuggingFace - Add Tauri commands for ONNX model operations (download, load, generate) - Update UI with ONNX Models tab in model selection modal - Add quantization selector (Q4, Q8, FP16) for ONNX models - Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure (ONNX files in onnx/ subdirectory, config/tokenizer at root) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implement logic to detect model backend type and route to appropriate inference engine, enabling seamless switching between llama.cpp and ONNX Runtime backends. Changes: - Update model loading useEffect to check backend type - Route to load_onnx_model for ONNX models, load_model for llama.cpp - Disable audio capability check for ONNX models (not yet supported) - Add backend detection in handleSendMessage for inference routing - Convert image data appropriately for each backend: - RGB array for llama.cpp (existing) - JPEG bytes for ONNX Runtime (new) - Call generate_onnx_response for ONNX, generate_response for llama.cpp 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Update list_downloaded_models to include ONNX models by checking for _onnx_ pattern in directory names and verifying .onnx files exist. Add frontend model ID normalization to properly match ONNX models from HuggingFace repos with normalized names (handle slashes, dashes, case). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updated llama.cpp libraries from b7063 to b7272 to support Ministral 3 architecture, resolving "unknown model architecture: 'mistral3'" error. Major changes: - Add F16 mmproj preference over Q8 for better vision encoder quality - Fix quantization string parsing to handle "(F16 mmproj)" suffix - Remove model verification that was causing app crashes - Streamline model list in UI (removed unsupported models) - Update tests for new model manager signature Fixes #7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implemented mtmd-based detection to verify vision/audio support by loading GGUF files and checking their metadata, following llama.cpp best practices. Changes: - Add check_multimodal_support method to LlamaInference - Expose mtmd_support_vision/audio FFI functions - Fix FFI signatures (*mut → *const for read-only operations) - Add comprehensive test suite for vision, audio, and unified models This provides accurate detection of model capabilities without relying on HuggingFace API metadata, which can be incomplete or inconsistent. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp, enabling GPU-accelerated inference for vision-language models with platform-specific execution providers (DirectML, CUDA, CoreML). Changes: - Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML) - Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models - Extend ModelManager with ONNX model download functionality from HuggingFace - Add Tauri commands for ONNX model operations (download, load, generate) - Update UI with ONNX Models tab in model selection modal - Add quantization selector (Q4, Q8, FP16) for ONNX models - Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure (ONNX files in onnx/ subdirectory, config/tokenizer at root) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.