LLM: llama-cpp #186

kreneskyp · 2023-08-27T16:07:13Z

Description

Adds initial support for Llama-cpp LLM for running local models. This enables the model to be used but streaming and some other things don't work exactly right yet.

Setup

download models:
Tested with GGUF based models from huggingface
- https://huggingface.co/TheBloke/CodeLlama-7B-GGUF
- https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF
save model to <ix root>/llama/<model>
create LLM + chain

set model_path to /var/app/ix/llama/<model>

Changes

Adds LLAMA_CPP_LLM

How Tested

manual testing

TODOs

Streaming isn't working with LLAMA_CPP_LLM:
- IxHandler isn't receiving all kwargs the model is initialized with so can't tell if streaming was enabled. This is a potential blocker. LLAMA_CPP appears to be intentionally filtering these out from invocation params
- IxHandler needs to be updated to start streaming for LLMs (only supported for chat models right now)
- Might be necessary to implement streaming using the officially blessed method of iterating over chain.astream() if workaround can't be found.
Docker image isn't setup for GPU acceleration. I made a short attempt at adding libraries to compile GPU support with ENV LLAMA_CUBLAS=1 but the required libraries weren't installed in python:3.11 docker image. Wasn't readily apparent how to install the library. May require switching to a different base image with better support.

kreneskyp added 2 commits August 27, 2023 08:45

LLM: llama-cpp

97daab7

Add llama_cpp to node_types.json

0446f42

kreneskyp merged commit 5575d95 into master Aug 27, 2023
5 checks passed

kreneskyp deleted the llama_cpp branch August 27, 2023 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM: llama-cpp #186

LLM: llama-cpp #186

kreneskyp commented Aug 27, 2023

LLM: llama-cpp #186

LLM: llama-cpp #186

Conversation

kreneskyp commented Aug 27, 2023

Description

Setup

Changes

How Tested

TODOs