Skip to content

RKLLAMA 0.0.4

Latest
Compare
Choose a tag to compare
@NotPunchnox NotPunchnox released this 24 Mar 18:46
· 120 commits to main since this release
0e35e79

Version 0.0.4 (Current)

Major Features

  • Ollama API Compatibility: Added support for the Ollama API interface, allowing RKLLAMA to work with Ollama clients and tools.
  • Enhanced Streaming Responses: Improved reliability of streaming responses with better handling of completion signals.
  • Optional Debug Mode: Added detailed debugging tools that can be enabled with --debug flag.
  • CPU Model Auto-detection: Automatic detection of RK3588 or RK3576 platform with fallback to interactive selection.

New API Endpoints

  • /api/tags - List all available models (Ollama-compatible)
  • /api/show - Show model information
  • /api/create - Create a new model from a Modelfile
  • /api/pull - Pull a model from Hugging Face
  • /api/delete - Delete a model
  • /api/generate - Generate a completion for a prompt
  • /api/chat - Generate a chat completion
  • /api/embeddings - (Placeholder) Generate embeddings
  • /api/debug - Diagnostic endpoint (available only in debug mode)

Improvements

  • More reliable "done" signaling for streaming responses
  • Auto-detection of CPU model (RK3588 or RK3576) with fallback to user selection
  • Better error handling and error messages
  • Fixed threading issues in request processing
  • Automatic content formatting for various response types
  • Improved stream handling with token tracking
  • Optional debugging mode with detailed logs

Technical Changes

  • Added new utility modules for debugging and API handling
  • Improved thread management for streaming responses
  • Added CPU model detection and selection
  • Updated server configuration options
  • Made debugging tools optional through environment variable and command line flag