Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM: llama-cpp #186

Merged
merged 2 commits into from
Aug 27, 2023
Merged

LLM: llama-cpp #186

merged 2 commits into from
Aug 27, 2023

Conversation

kreneskyp
Copy link
Owner

Description

Adds initial support for Llama-cpp LLM for running local models. This enables the model to be used but streaming and some other things don't work exactly right yet.

image

Setup
  1. download models:
    Tested with GGUF based models from huggingface

  2. save model to <ix root>/llama/<model>

  3. create LLM + chain

    set model_path to /var/app/ix/llama/<model>

    image

Changes

Adds LLAMA_CPP_LLM

How Tested

  • manual testing

TODOs

  • Streaming isn't working with LLAMA_CPP_LLM:

    • IxHandler isn't receiving all kwargs the model is initialized with so can't tell if streaming was enabled. This is a potential blocker. LLAMA_CPP appears to be intentionally filtering these out from invocation params
    • IxHandler needs to be updated to start streaming for LLMs (only supported for chat models right now)
    • Might be necessary to implement streaming using the officially blessed method of iterating over chain.astream() if workaround can't be found.
  • Docker image isn't setup for GPU acceleration. I made a short attempt at adding libraries to compile GPU support with ENV LLAMA_CUBLAS=1 but the required libraries weren't installed in python:3.11 docker image. Wasn't readily apparent how to install the library. May require switching to a different base image with better support.

@kreneskyp kreneskyp merged commit 5575d95 into master Aug 27, 2023
5 checks passed
@kreneskyp kreneskyp deleted the llama_cpp branch August 27, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant