Custom openai compatible endpoint #1290

riyajatar37003 · 2024-10-01T05:11:41Z

Hi,
I have custom llm and embedding deployment using triton server and also a wrapper around it which is openai compatible.
how can i use this in .toml config file.
I have tested it with litellm proxy server and its working.

emrgnt-cmplxty · 2024-10-01T05:34:42Z

@riyajatar37003 - You could for example use the openai provider and then point the OPENAI_BASE_URL to your custom deployment. Be sure that the model name aligns with the deployment, like openai/your-custom-model.

riyajatar37003 · 2024-10-01T05:53:33Z

thanks could you share any doc link. where in the .toml i need to set this

underlines · 2024-10-04T12:06:06Z

myconfig.toml

[completion]
provider = "litellm"
concurrent_request_limit = 16

  [completion.generation_config]
  model = "openai/llama3.2" #add your model name here
  temperature = 0.1
  top_p = 1
  max_tokens_to_sample = 1_024
  stream = true
  add_generation_kwargs = { }

then you do
r2r serve --docker --config-path=/home/riyajatar/myconfig.toml

ArturTanona · 2024-11-28T09:34:34Z

Let's say I have a openai-like endpoint served locally under "http://localhost:8004" + it's called "custom-model". It is in line with OpenAI V1 API. How to connect it to r2r?

qdrddr · 2024-12-27T16:24:20Z

I believe the correct environment variable is OPENAI_API_BASE.
not OPENAI_BASE_URL.

qdrddr · 2024-12-27T16:38:21Z

Also if you are using LiteLLM Proxy with R2R, then since it internally uses LiteLLM SDK, the name of the model in the r2r.toml config file should include openai/ + the name of how it's named in LiteLLM Proxy,

so if for instance in Proxy you have a model named openai/ollama3.3, then in r2r.toml the name of the model would be openai/openai/llama3.3 @riyajatar37003

Assuming the name of your model in LiteLLM Proxy is openai/llama3.3, and you wish to use provider = "litellm" then r2r.toml would look like this:

[completion]
provider = "litellm"
concurrent_request_limit = 64

  [completion.generation_config]
  model = "openai/openai/llama3.3"

Assuming your LiteLLM Proxy config looks like this:

proxy_config:
  litellm_settings:
      drop_params: True
  model_list:
    # At least one model must exist for the proxy to start.
    - model_name: "openai/llama3.3"
      litellm_params:
        model: "openai/llama3.3"
        api_key: fake-key
        api_base: "http://ollama.mywebsite.com:11434"

Assuming you have an ollama app running on port 11434 and accessing that ollama via OpenAI-compatible API, and you have a model llama3.3 pulled into your ollama that you can see with ollama list llama3.3.

It might be confusing, but in the r2r.toml when you see provider = "litellm" that means LiteLLM SDK, not proxy.
These are two separate things. LiteLLM SDK by default would always use the native backend base URL automatically of the model's provider.
The LiteLLM SDK in r2r will overwrite the provider's base URL when you specify OPENAI_API_BASE explicitly for r2r.

And the name prefix openai/ tells LiteLLM SDK which provider it is. And everything after the prefix is the actual name of the model that will be requested.

This was referenced Dec 27, 2024

Support external LiteLLM proxy #814

Closed

Support external LiteLLM proxy #815

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom openai compatible endpoint #1290

Custom openai compatible endpoint #1290

riyajatar37003 commented Oct 1, 2024

emrgnt-cmplxty commented Oct 1, 2024

riyajatar37003 commented Oct 1, 2024

underlines commented Oct 4, 2024 •

edited

Loading

ArturTanona commented Nov 28, 2024

qdrddr commented Dec 27, 2024

qdrddr commented Dec 27, 2024 •

edited

Loading

Custom openai compatible endpoint #1290

Custom openai compatible endpoint #1290

Comments

riyajatar37003 commented Oct 1, 2024

emrgnt-cmplxty commented Oct 1, 2024

riyajatar37003 commented Oct 1, 2024

underlines commented Oct 4, 2024 • edited Loading

ArturTanona commented Nov 28, 2024

qdrddr commented Dec 27, 2024

qdrddr commented Dec 27, 2024 • edited Loading

underlines commented Oct 4, 2024 •

edited

Loading

qdrddr commented Dec 27, 2024 •

edited

Loading