Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom openai compatible endpoint #1290

Open
riyajatar37003 opened this issue Oct 1, 2024 · 6 comments
Open

Custom openai compatible endpoint #1290

riyajatar37003 opened this issue Oct 1, 2024 · 6 comments

Comments

@riyajatar37003
Copy link

Hi,
I have custom llm and embedding deployment using triton server and also a wrapper around it which is openai compatible.
how can i use this in .toml config file.
I have tested it with litellm proxy server and its working.

@emrgnt-cmplxty
Copy link
Contributor

@riyajatar37003 - You could for example use the openai provider and then point the OPENAI_BASE_URL to your custom deployment. Be sure that the model name aligns with the deployment, like openai/your-custom-model.

@riyajatar37003
Copy link
Author

thanks could you share any doc link. where in the .toml i need to set this

@underlines
Copy link

underlines commented Oct 4, 2024

myconfig.toml

[completion]
provider = "litellm"
concurrent_request_limit = 16

  [completion.generation_config]
  model = "openai/llama3.2" #add your model name here
  temperature = 0.1
  top_p = 1
  max_tokens_to_sample = 1_024
  stream = true
  add_generation_kwargs = { }

then you do
r2r serve --docker --config-path=/home/riyajatar/myconfig.toml

@ArturTanona
Copy link

Let's say I have a openai-like endpoint served locally under "http://localhost:8004" + it's called "custom-model". It is in line with OpenAI V1 API. How to connect it to r2r?

@qdrddr
Copy link

qdrddr commented Dec 27, 2024

I believe the correct environment variable is OPENAI_API_BASE.
not OPENAI_BASE_URL.

@qdrddr
Copy link

qdrddr commented Dec 27, 2024

Also if you are using LiteLLM Proxy with R2R, then since it internally uses LiteLLM SDK, the name of the model in the r2r.toml config file should include openai/ + the name of how it's named in LiteLLM Proxy,

so if for instance in Proxy you have a model named openai/ollama3.3, then in r2r.toml the name of the model would be openai/openai/llama3.3 @riyajatar37003

Assuming the name of your model in LiteLLM Proxy is openai/llama3.3, and you wish to use provider = "litellm" then r2r.toml would look like this:

[completion]
provider = "litellm"
concurrent_request_limit = 64

  [completion.generation_config]
  model = "openai/openai/llama3.3"

Assuming your LiteLLM Proxy config looks like this:

proxy_config:
  litellm_settings:
      drop_params: True
  model_list:
    # At least one model must exist for the proxy to start.
    - model_name: "openai/llama3.3"
      litellm_params:
        model: "openai/llama3.3"
        api_key: fake-key
        api_base: "http://ollama.mywebsite.com:11434"

Assuming you have an ollama app running on port 11434 and accessing that ollama via OpenAI-compatible API, and you have a model llama3.3 pulled into your ollama that you can see with ollama list llama3.3.

It might be confusing, but in the r2r.toml when you see provider = "litellm" that means LiteLLM SDK, not proxy.
These are two separate things. LiteLLM SDK by default would always use the native backend base URL automatically of the model's provider.
The LiteLLM SDK in r2r will overwrite the provider's base URL when you specify OPENAI_API_BASE explicitly for r2r.

And the name prefix openai/ tells LiteLLM SDK which provider it is. And everything after the prefix is the actual name of the model that will be requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants