Support custom OpenAI-compatible HTTP endpoints in LlamaStack config templates

Hi LlamaStack team,

I'd like to request native support in the stack configuration for using **custom OpenAI-compatible HTTP endpoints** for both inference and embeddings.

### Use Case

I'm integrating with internal APIs that expose OpenAI-style `/v1/embeddings` and `/v1/completions` endpoints (e.g., IBM Granite, MiniLM via APIcast). These endpoints require custom model IDs, auth tokens, and base URLs.

I'd like to declare this setup in YAML, something like:

```yaml
runtime:
  template: remote-httpx
  params:
    model_url: ${LLM_MODEL_URL}
    model_id: ${LLM_MODEL_ID}
    api_key: ${LLM_ACCESS_KEY}
    api_key_header: Authorization
    api_key_prefix: Bearer

embedding:
  template: remote-httpx
  params:
    model_url: ${MINILLM_MODEL_URL}
    model_id: ${MINILLM_MODEL_ID}
    api_key: ${MINILLM_ACCESS_KEY}
    api_key_header: Authorization
    api_key_prefix: Bearer
```

### Why it matters

This would allow declarative use of enterprise models behind auth layers, without needing to override logic in Python. It would also unlock easier use of hosted or private models (Granite, DeepSeek, Ollama, etc.).

### Question

Is there a **recommended workaround** today for supporting these remote providers via config? Or is this currently only possible by overriding via code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support custom OpenAI-compatible HTTP endpoints in LlamaStack config templates #2390

Use Case

Why it matters

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support custom OpenAI-compatible HTTP endpoints in LlamaStack config templates #2390

Description

Use Case

Why it matters

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions