Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2 CodeLlama (100+LLMs) - using LiteLLM #53

ishaan-jaff · 2023-09-13T15:37:02Z

This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/
LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.

Example

from litellm import completion

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages)

# anthropic call
response = completion(model="claude-instant-1", messages=messages)

ishaan-jaff · 2023-09-13T15:38:11Z

camel/model_backend.py

@@ -66,7 +67,7 @@ def run(self, *args, **kwargs) -> Dict[str, Any]:
        num_max_token = num_max_token_map[self.model_type.value]
        num_max_completion_tokens = num_max_token - num_prompt_tokens
        self.model_config_dict['max_tokens'] = num_max_completion_tokens


we also expose a util called get_max_tokens() happy to expose this in this PR too

ishaan-jaff · 2023-09-13T15:39:56Z

@thinkwee @qianc62 can i get a review on this PR ?

happy to add docs/testing on this too if this initial commit looks good

qianc62 · 2023-09-19T03:01:19Z

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

abbott · 2023-09-19T06:02:34Z

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

Yes, the following temperature example shifts probability. The request body spec does include top_p as well.

import os
from litellm import completion

os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_API_BASE"] = "https://api.openai.com/v1"
os.environ["MODEL"] = "gpt-3.5-turbo"

response = completion(
    model = os.getenv('MODEL'),
    messages = [{ "content": "The sky is", "role": "user" }],
    temperature = 0.8,
    max_tokens = 80,
    api_base = os.getenv('OPENAI_API_BASE'),
    request_timeout = 300,
)

thoys · 2023-09-21T08:16:26Z

camel/model_backend.py

@@ -15,6 +15,7 @@
 from typing import Any, Dict

 import openai


this import isn't used anymore it seems

arch1v1st · 2023-10-02T20:27:41Z

Stoked to see this PR get merged!

krrishdholakia · 2023-10-04T23:34:20Z

bump @ishaan-jaff

ishaan-jaff · 2023-10-05T23:02:09Z

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

@qianc62 yes we support all params OpenAI supports + we allow you to pass provider specific params if necessary more info here: https://docs.litellm.ai/docs/completion/input

ishaan-jaff · 2023-10-05T23:02:31Z

@qianc62 any blockers to merging ? anything you need from me ?

venim1103 · 2023-10-10T19:46:22Z

Couple things to update here.
I got my Mistral 7B models to work with LiteLLM (+ Ollama).

First problem: I needed to ignore OPEN_AI_API_KEY by setting it to some arbitrary value.

Second problem: ChatDev was sending too many arguments to the Ollama which I handled with:
import litellm
litellm.drop_params = True

Third problem: As I don't know how to create a real model class for the LiteLLM models with all required information, I just used GPT_3_5_TURBO as my model but then in the model_backend.py I replaced the response with:
response = litellm.completion(*args, **kwargs, model="ollama/my_local_model", api_base="http://localhost:11434", **self.model_config_dict)

Fourth (bigger problem) I encountered: LiteLLM's OpenAI API seems to be newer version than ChatDev's, which causes response (completion) to return "logprobs" inside the "choises list" back to the ChatDev which then causes multiple errors as ChatDev doesn't support logprobs. With a crude hack (removing the "logprobs" from the response) I managed to get past this error.

Anyway here is the early chat with my Mistral 7B (Chief Product Officer) writing some crude code for my request.

krrishdholakia · 2023-10-10T20:30:07Z

Hey @venim1103 did the proxy not work for you?

milorddev · 2023-10-11T01:23:34Z

I am extremely interested in this PR

krrishdholakia · 2023-10-11T01:30:35Z

Hey @venim1103 i've filed your issue re: logprobs. I'll make sure we have a fix for this on our (litellm) end.

Extremely sorry for the frustration that must've caused.

yhyu13 · 2023-10-11T06:33:12Z

@ishaan-jaff

#53 (comment)

Wait, we don't need to change openai_api_base to local url?

venim1103 · 2023-10-11T07:01:10Z

@krrishdholakia Thank you!
As I only tried to get things running as fast as possible (hacking things together) I didn't test any proxy, I just hard coded my local model name (that I made using Ollama) into the "response request".
When I was using AutoGen with LiteLLM I just had to put all the model info to OAI_CONFIG_LIST, (like the "model", "api_base" and "api_type") but in ChatDev I didn't know how or where to put all this info so I just did that hack for now...

Anyway my initial testing with Mistral 7B model has some issues (the model itself doesn't really understand the "<INFO" context and is mostly too chatty or starts changing the subject too early thus not moving trough the process).

cielonet · 2023-10-12T16:06:28Z

Hey guys so here is a list of changes I made to get it up and running with a self-hosted llm (i.e. hf text-generation-inference).

litellm-changes.diff

However I need help if someone could replicate my issues. I built chatdev inside a docker container file provide:
Dockerfile.txt

when I run everything with networking turned on in the docker container everything works fine as it should. However when I isolate the self-hosted llm and the docker container to it's own docker isolated network, things start to break.
I don't know if the issue is with litellm or chatdev. I narrowed it down I think to the usage of tiktoken but because the code has a lot of try/except it's hard to find out where the failure is happing because it's a 'silent failure' so it's hard to spot. Any help would be appreciated.

the .log error only says this:
[2023-12-10 16:39:09 WARNING] expected string or buffer, retrying in 0 seconds...

[UPDATE]
I think the issue could be in my changes to this line, currently troubleshooting :-/
#output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **dict(choice["message"])) for choice in response["choices"]]

output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **{k: v for k, v in choice["message"].items() if k != "logprobs"}) for choice in response["choices"]]

But I don't understand how going 'offline' changes this?

[SECOND UPDATE]
since I am running in a container when I run the app it looks like there is a library trying to reach the internet and that is where things are tripping up. something called zeet-berri.zeet.app? IPs resoved to amazonaws.
ec2-52-37-239-96.us-west-2.compute.amazonaws.com
ec2-35-86-16-11.us-west-2.compute.amazonaws.com

Running on a connected container!
ss -atp|grep -i slirp4netns
ESTAB 0 0 192.168.1.169:37824 52.37.239.96:https users:(("slirp4netns",pid=949054,fd=10))
ESTAB 0 0 10.10.10.1:33630 10.10.10.2:webcache users:(("slirp4netns",pid=949054,fd=13))
ESTAB 0 0 192.168.1.169:51482 35.86.16.11:https users:(("slirp4netns",pid=949054,fd=12))

Running on an 'isolated' container!
ss -anput |grep slirp4netns
udp UNCONN 0 0 0.0.0.0:37887 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=7))
udp UNCONN 0 0 0.0.0.0:46187 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=4))
udp UNCONN 0 0 0.0.0.0:46274 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=10))
udp UNCONN 0 0 0.0.0.0:50229 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=3))
udp UNCONN 0 0 0.0.0.0:54771 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=9))
udp UNCONN 0 0 0.0.0.0:59316 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=8))

Can anyone tell me what is going on here and if there an environmental variable I can set to avoid this issue. This might be a metrics collection thing from litellm? Don't know!

[SOVLED]
It looks like there is a bug in litellm. I updated to latest version and added these to the model_backend.py
import litellm
litellm.set_verbose=False
litellm.drop_params=False
litellm.telemetry = False
litellm.utils.logging.raiseExceptions =False

Also modified mistral prompt with litellm and things started working perfectly.

krrishdholakia · 2023-10-13T22:37:27Z

@cielonet i'm the maintainer of litellm. can't see the exact issue you faced. Is this because we raise errors for unmapped params?

krrishdholakia · 2023-10-13T22:37:58Z

some context would be helpful - i'd like to get this fixed on our end asap.

cielonet · 2023-10-14T18:04:21Z

@krrishdholakia No prob. I'm currently out of town and will be back on Monday. I'll repost the error msg I was getting. It looked to me like the msg "expected string or buffer" was a msg generated by litellm because a value (I think it was part of the logging key) in the api call was not correctly formatted. When I ran it with raiseExceptions=False the api calls never sent that particular field and the system started working again. I did use the logging http copy/paste so if you have access to the logs/feedback people submit you should see mine from Thursday when I was working on this (e.g. focus on looking for "expected string or buffer") Anyways like I said I will be back Monday and will provide more feedback. I suggest adding a timeout to your telemetry as well if internet is not avaiable because otherwise it freezes this system and it as a pain to figure out that the telemetry was causing everything to pause until it finds an internet connection. :-/ Thanks again.

noahnoahk · 2023-10-16T11:51:36Z

how about the PR?

OhNotWilliam · 2023-10-16T12:18:37Z

I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.

krrishdholakia · 2023-10-16T14:38:18Z

@OhNotWilliam we don't log any of the responses - it's all client-side (even the web url you saw was just an encoded url string). If you have the traceback, please let me know - happy to help debug.

krrishdholakia · 2023-10-16T14:38:49Z

We've also had people running this via the local OpenAI-proxy - https://docs.litellm.ai/docs/proxy_server

dnhkng · 2023-10-18T05:46:34Z

I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.

@OhNotWilliam: Check my PR #192, which gets Azure working

ImBIOS

Love it, with several little fix to go.

sammcj · 2023-11-26T00:05:56Z

Any movement on getting this PR merged?

dsnid3r · 2023-12-12T15:10:06Z

Where do we stand on this? What is still outstanding/how can I help?

dsnid3r · 2023-12-15T16:40:55Z

@ishaan-jaff

nobodykr · 2023-12-15T20:59:21Z

Hi, is this still open ? Very confused

ChieF-TroN · 2024-01-09T05:29:12Z

Any update on when this will be implemented?

TGM · 2024-02-11T22:17:54Z

Ollama annouced OpenAI compability making LiteLLM irrelevant
https://ollama.com/blog/openai-compatibility

nobodykr · 2024-02-12T02:49:05Z

@TGM thank you for the headsup.
I think this is great. thank you .

hemangjoshi37a · 2024-02-20T12:08:54Z

If anyone has any documentation on clearly how to implement this what is provided in the title of this issue/PR please provide with that . thanks a lot. Happy coding .

BreeanaBellflower

litellm is great. I'm looking forward to seeing this get introduced to ChatDev. Not sure if you're looking to add support directly right now, but if so you may want to either add or generalize entries in ModelTypes, num_max_token_map, etc.

BreeanaBellflower · 2024-03-18T07:04:28Z

camel/model_backend.py

@@ -66,7 +67,7 @@ def run(self, *args, **kwargs) -> Dict[str, Any]:
        num_max_token = num_max_token_map[self.model_type.value]
        num_max_completion_tokens = num_max_token - num_prompt_tokens
        self.model_config_dict['max_tokens'] = num_max_completion_tokens
-        response = openai.ChatCompletion.create(*args, **kwargs,
+        response = litellm.completion(*args, **kwargs,


When testing this for Claude, line 78 below ("if not isinstance(response, Dict)") fails because the response is an instance of ModelResponse. Similar thing happening in chat_agent.py line 192.

mororo250 · 2024-06-04T10:16:28Z

If this feature is still desired, I'd be happy to help facilitate the merge

v0 litellm

7b1dd8d

ishaan-jaff commented Sep 13, 2023

View reviewed changes

thoys reviewed Sep 21, 2023

View reviewed changes

camel/model_backend.py

@@ -15,6 +15,7 @@

from typing import Any, Dict

import openai

Copy link

thoys Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import isn't used anymore it seems

LiamPerson approved these changes Oct 6, 2023

View reviewed changes

MarBeanInc1111 approved these changes Oct 7, 2023

View reviewed changes

yogamuliawan approved these changes Oct 8, 2023

View reviewed changes

dpsinghvij approved these changes Oct 8, 2023

View reviewed changes

m1ckk3y approved these changes Oct 9, 2023

View reviewed changes

This was referenced Oct 10, 2023

support claude 2, and support auto analysis github repo and write code based on public repo #73

Closed

Can we use a local LLm #156

Closed

Proven to run local #102

Closed

Can we run on self hosted llama model ? #148

Closed

krrishdholakia mentioned this pull request Oct 11, 2023

[Bug]: Remove logprobs from response object BerriAI/litellm#579

Closed

Alphamasterliu added the good first issue Good for newcomers label Oct 11, 2023

mbahiense approved these changes Oct 13, 2023

View reviewed changes

Alphamasterliu added the enhancement New feature or request label Oct 16, 2023

This was referenced Oct 16, 2023

Please add support to connect falcon and llama models with this . #33

Closed

Feature Request: Support for Multiple LLM AI API Endpoints for Self-Hosting and Model Selection #98

Closed

Alphamasterliu mentioned this pull request Oct 20, 2023

Use AWS Bedrock with ChatDev #190

Closed

ImBIOS approved these changes Oct 30, 2023

View reviewed changes

This was referenced Nov 9, 2023

How can I config API host to my own instead of the default: https://api.openai.com ? #206

Closed

API Key #184

Closed

sammcj mentioned this pull request Nov 26, 2023

Local Models integration #27

Open

BreeanaBellflower reviewed Mar 18, 2024

View reviewed changes

Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2 CodeLlama (100+LLMs) - using LiteLLM #53

Are you sure you want to change the base?

Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2 CodeLlama (100+LLMs) - using LiteLLM #53

Conversation

ishaan-jaff commented Sep 13, 2023

ishaan-jaff Sep 13, 2023

Choose a reason for hiding this comment

ishaan-jaff commented Sep 13, 2023

qianc62 commented Sep 19, 2023

abbott commented Sep 19, 2023

thoys Sep 21, 2023

Choose a reason for hiding this comment

arch1v1st commented Oct 2, 2023

krrishdholakia commented Oct 4, 2023

ishaan-jaff commented Oct 5, 2023

ishaan-jaff commented Oct 5, 2023

venim1103 commented Oct 10, 2023 • edited Loading

krrishdholakia commented Oct 10, 2023

milorddev commented Oct 11, 2023

krrishdholakia commented Oct 11, 2023

yhyu13 commented Oct 11, 2023

venim1103 commented Oct 11, 2023

cielonet commented Oct 12, 2023 • edited Loading

krrishdholakia commented Oct 13, 2023

krrishdholakia commented Oct 13, 2023

cielonet commented Oct 14, 2023

noahnoahk commented Oct 16, 2023

OhNotWilliam commented Oct 16, 2023

krrishdholakia commented Oct 16, 2023

krrishdholakia commented Oct 16, 2023 • edited Loading

dnhkng commented Oct 18, 2023

ImBIOS left a comment

Choose a reason for hiding this comment

sammcj commented Nov 26, 2023

dsnid3r commented Dec 12, 2023

dsnid3r commented Dec 15, 2023

nobodykr commented Dec 15, 2023

ChieF-TroN commented Jan 9, 2024

TGM commented Feb 11, 2024

nobodykr commented Feb 12, 2024

hemangjoshi37a commented Feb 20, 2024

BreeanaBellflower left a comment

Choose a reason for hiding this comment

BreeanaBellflower Mar 18, 2024

Choose a reason for hiding this comment

mororo250 commented Jun 4, 2024 • edited Loading

venim1103 commented Oct 10, 2023 •

edited

Loading

cielonet commented Oct 12, 2023 •

edited

Loading

krrishdholakia commented Oct 16, 2023 •

edited

Loading

mororo250 commented Jun 4, 2024 •

edited

Loading