RKLLAMA 0.0.4

Latest

Latest

NotPunchnox released this 24 Mar 18:46

· 120 commits to main since this release

0e35e79

Version 0.0.4 (Current)

Major Features

Ollama API Compatibility: Added support for the Ollama API interface, allowing RKLLAMA to work with Ollama clients and tools.
Enhanced Streaming Responses: Improved reliability of streaming responses with better handling of completion signals.
Optional Debug Mode: Added detailed debugging tools that can be enabled with --debug flag.
CPU Model Auto-detection: Automatic detection of RK3588 or RK3576 platform with fallback to interactive selection.

New API Endpoints

/api/tags - List all available models (Ollama-compatible)
/api/show - Show model information
/api/create - Create a new model from a Modelfile
/api/pull - Pull a model from Hugging Face
/api/delete - Delete a model
/api/generate - Generate a completion for a prompt
/api/chat - Generate a chat completion
/api/embeddings - (Placeholder) Generate embeddings
/api/debug - Diagnostic endpoint (available only in debug mode)

Improvements

More reliable "done" signaling for streaming responses
Auto-detection of CPU model (RK3588 or RK3576) with fallback to user selection
Better error handling and error messages
Fixed threading issues in request processing
Automatic content formatting for various response types
Improved stream handling with token tracking
Optional debugging mode with detailed logs

Technical Changes

Added new utility modules for debugging and API handling
Improved thread management for streaming responses
Added CPU model detection and selection
Updated server configuration options
Made debugging tools optional through environment variable and command line flag

Assets 3