Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2 CodeLlama (100+LLMs) - using LiteLLM #53

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ishaan-jaff
Copy link

This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/
LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.

Example

from litellm import completion

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages)

# anthropic call
response = completion(model="claude-instant-1", messages=messages)

@@ -66,7 +67,7 @@ def run(self, *args, **kwargs) -> Dict[str, Any]:
num_max_token = num_max_token_map[self.model_type.value]
num_max_completion_tokens = num_max_token - num_prompt_tokens
self.model_config_dict['max_tokens'] = num_max_completion_tokens
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also expose a util called get_max_tokens() happy to expose this in this PR too

@ishaan-jaff
Copy link
Author

@thinkwee @qianc62 can i get a review on this PR ?

happy to add docs/testing on this too if this initial commit looks good

@qianc62
Copy link
Collaborator

qianc62 commented Sep 19, 2023

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

@abbott
Copy link

abbott commented Sep 19, 2023

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

Yes, the following temperature example shifts probability. The request body spec does include top_p as well.

import os
from litellm import completion

os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_API_BASE"] = "https://api.openai.com/v1"
os.environ["MODEL"] = "gpt-3.5-turbo"

response = completion(
    model = os.getenv('MODEL'),
    messages = [{ "content": "The sky is", "role": "user" }],
    temperature = 0.8,
    max_tokens = 80,
    api_base = os.getenv('OPENAI_API_BASE'),
    request_timeout = 300,
)

@@ -15,6 +15,7 @@
from typing import Any, Dict

import openai
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import isn't used anymore it seems

@arch1v1st
Copy link

Stoked to see this PR get merged!

@krrishdholakia
Copy link

bump @ishaan-jaff

@ishaan-jaff
Copy link
Author

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

@qianc62 yes we support all params OpenAI supports + we allow you to pass provider specific params if necessary more info here: https://docs.litellm.ai/docs/completion/input

@ishaan-jaff
Copy link
Author

@qianc62 any blockers to merging ? anything you need from me ?

@venim1103
Copy link

venim1103 commented Oct 10, 2023

Couple things to update here.
I got my Mistral 7B models to work with LiteLLM (+ Ollama).

First problem: I needed to ignore OPEN_AI_API_KEY by setting it to some arbitrary value.

Second problem: ChatDev was sending too many arguments to the Ollama which I handled with:
import litellm
litellm.drop_params = True

Third problem: As I don't know how to create a real model class for the LiteLLM models with all required information, I just used GPT_3_5_TURBO as my model but then in the model_backend.py I replaced the response with:
response = litellm.completion(*args, **kwargs, model="ollama/my_local_model", api_base="http://localhost:11434", **self.model_config_dict)

Fourth (bigger problem) I encountered: LiteLLM's OpenAI API seems to be newer version than ChatDev's, which causes response (completion) to return "logprobs" inside the "choises list" back to the ChatDev which then causes multiple errors as ChatDev doesn't support logprobs. With a crude hack (removing the "logprobs" from the response) I managed to get past this error.

Anyway here is the early chat with my Mistral 7B (Chief Product Officer) writing some crude code for my request.

image

@krrishdholakia
Copy link

Hey @venim1103 did the proxy not work for you?

@milorddev
Copy link

I am extremely interested in this PR

@krrishdholakia
Copy link

Hey @venim1103 i've filed your issue re: logprobs. I'll make sure we have a fix for this on our (litellm) end.

Extremely sorry for the frustration that must've caused.

@Alphamasterliu Alphamasterliu added the good first issue Good for newcomers label Oct 11, 2023
@yhyu13
Copy link

yhyu13 commented Oct 11, 2023

@ishaan-jaff

#53 (comment)

Wait, we don't need to change openai_api_base to local url?

@venim1103
Copy link

@krrishdholakia Thank you!
As I only tried to get things running as fast as possible (hacking things together) I didn't test any proxy, I just hard coded my local model name (that I made using Ollama) into the "response request".
When I was using AutoGen with LiteLLM I just had to put all the model info to OAI_CONFIG_LIST, (like the "model", "api_base" and "api_type") but in ChatDev I didn't know how or where to put all this info so I just did that hack for now...

Anyway my initial testing with Mistral 7B model has some issues (the model itself doesn't really understand the "<INFO" context and is mostly too chatty or starts changing the subject too early thus not moving trough the process).

@cielonet
Copy link

cielonet commented Oct 12, 2023

Hey guys so here is a list of changes I made to get it up and running with a self-hosted llm (i.e. hf text-generation-inference).

litellm-changes.diff

However I need help if someone could replicate my issues. I built chatdev inside a docker container file provide:
Dockerfile.txt

when I run everything with networking turned on in the docker container everything works fine as it should. However when I isolate the self-hosted llm and the docker container to it's own docker isolated network, things start to break.
I don't know if the issue is with litellm or chatdev. I narrowed it down I think to the usage of tiktoken but because the code has a lot of try/except it's hard to find out where the failure is happing because it's a 'silent failure' so it's hard to spot. Any help would be appreciated.

the .log error only says this:
[2023-12-10 16:39:09 WARNING] expected string or buffer, retrying in 0 seconds...

[UPDATE]
I think the issue could be in my changes to this line, currently troubleshooting :-/
#output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **dict(choice["message"])) for choice in response["choices"]]

output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **{k: v for k, v in choice["message"].items() if k != "logprobs"}) for choice in response["choices"]]

But I don't understand how going 'offline' changes this?

[SECOND UPDATE]
since I am running in a container when I run the app it looks like there is a library trying to reach the internet and that is where things are tripping up. something called zeet-berri.zeet.app? IPs resoved to amazonaws.
ec2-52-37-239-96.us-west-2.compute.amazonaws.com
ec2-35-86-16-11.us-west-2.compute.amazonaws.com

Running on a connected container!
ss -atp|grep -i slirp4netns
ESTAB 0 0 192.168.1.169:37824 52.37.239.96:https users:(("slirp4netns",pid=949054,fd=10))
ESTAB 0 0 10.10.10.1:33630 10.10.10.2:webcache users:(("slirp4netns",pid=949054,fd=13))
ESTAB 0 0 192.168.1.169:51482 35.86.16.11:https users:(("slirp4netns",pid=949054,fd=12))

Running on an 'isolated' container!
ss -anput |grep slirp4netns
udp UNCONN 0 0 0.0.0.0:37887 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=7))
udp UNCONN 0 0 0.0.0.0:46187 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=4))
udp UNCONN 0 0 0.0.0.0:46274 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=10))
udp UNCONN 0 0 0.0.0.0:50229 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=3))
udp UNCONN 0 0 0.0.0.0:54771 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=9))
udp UNCONN 0 0 0.0.0.0:59316 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=8))

Can anyone tell me what is going on here and if there an environmental variable I can set to avoid this issue. This might be a metrics collection thing from litellm? Don't know!

[SOVLED]
It looks like there is a bug in litellm. I updated to latest version and added these to the model_backend.py
import litellm
litellm.set_verbose=False
litellm.drop_params=False
litellm.telemetry = False
litellm.utils.logging.raiseExceptions =False

Also modified mistral prompt with litellm and things started working perfectly.

@krrishdholakia
Copy link

@cielonet i'm the maintainer of litellm. can't see the exact issue you faced. Is this because we raise errors for unmapped params?

@krrishdholakia
Copy link

some context would be helpful - i'd like to get this fixed on our end asap.

@cielonet
Copy link

@krrishdholakia No prob. I'm currently out of town and will be back on Monday. I'll repost the error msg I was getting. It looked to me like the msg "expected string or buffer" was a msg generated by litellm because a value (I think it was part of the logging key) in the api call was not correctly formatted. When I ran it with raiseExceptions=False the api calls never sent that particular field and the system started working again. I did use the logging http copy/paste so if you have access to the logs/feedback people submit you should see mine from Thursday when I was working on this (e.g. focus on looking for "expected string or buffer") Anyways like I said I will be back Monday and will provide more feedback. I suggest adding a timeout to your telemetry as well if internet is not avaiable because otherwise it freezes this system and it as a pain to figure out that the telemetry was causing everything to pause until it finds an internet connection. :-/ Thanks again.

@noahnoahk
Copy link

how about the PR?

@OhNotWilliam
Copy link

I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.

@krrishdholakia
Copy link

@OhNotWilliam we don't log any of the responses - it's all client-side (even the web url you saw was just an encoded url string). If you have the traceback, please let me know - happy to help debug.

@krrishdholakia
Copy link

krrishdholakia commented Oct 16, 2023

We've also had people running this via the local OpenAI-proxy - https://docs.litellm.ai/docs/proxy_server

@dnhkng
Copy link

dnhkng commented Oct 18, 2023

I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.

@OhNotWilliam: Check my PR #192, which gets Azure working

Copy link
Contributor

@ImBIOS ImBIOS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, with several little fix to go.

@sammcj
Copy link

sammcj commented Nov 26, 2023

Any movement on getting this PR merged?

@dsnid3r
Copy link

dsnid3r commented Dec 12, 2023

Where do we stand on this? What is still outstanding/how can I help?

@dsnid3r
Copy link

dsnid3r commented Dec 15, 2023

@ishaan-jaff

@nobodykr
Copy link

Hi, is this still open ? Very confused

@ChieF-TroN
Copy link

Any update on when this will be implemented?

@TGM
Copy link

TGM commented Feb 11, 2024

Ollama annouced OpenAI compability making LiteLLM irrelevant
https://ollama.com/blog/openai-compatibility

@nobodykr
Copy link

@TGM thank you for the headsup.
I think this is great. thank you .

@hemangjoshi37a
Copy link

If anyone has any documentation on clearly how to implement this what is provided in the title of this issue/PR please provide with that . thanks a lot. Happy coding .

Copy link

@BreeanaBellflower BreeanaBellflower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

litellm is great. I'm looking forward to seeing this get introduced to ChatDev. Not sure if you're looking to add support directly right now, but if so you may want to either add or generalize entries in ModelTypes, num_max_token_map, etc.

@@ -66,7 +67,7 @@ def run(self, *args, **kwargs) -> Dict[str, Any]:
num_max_token = num_max_token_map[self.model_type.value]
num_max_completion_tokens = num_max_token - num_prompt_tokens
self.model_config_dict['max_tokens'] = num_max_completion_tokens
response = openai.ChatCompletion.create(*args, **kwargs,
response = litellm.completion(*args, **kwargs,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing this for Claude, line 78 below ("if not isinstance(response, Dict)") fails because the response is an instance of ModelResponse. Similar thing happening in chat_agent.py line 192.

@mororo250
Copy link

mororo250 commented Jun 4, 2024

If this feature is still desired, I'd be happy to help facilitate the merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet