Skip to content

Support custom OpenAI-compatible HTTP endpoints in LlamaStack config templates #2390

@jangel97

Description

@jangel97

Hi LlamaStack team,

I'd like to request native support in the stack configuration for using custom OpenAI-compatible HTTP endpoints for both inference and embeddings.

Use Case

I'm integrating with internal APIs that expose OpenAI-style /v1/embeddings and /v1/completions endpoints (e.g., IBM Granite, MiniLM via APIcast). These endpoints require custom model IDs, auth tokens, and base URLs.

I'd like to declare this setup in YAML, something like:

runtime:
  template: remote-httpx
  params:
    model_url: ${LLM_MODEL_URL}
    model_id: ${LLM_MODEL_ID}
    api_key: ${LLM_ACCESS_KEY}
    api_key_header: Authorization
    api_key_prefix: Bearer

embedding:
  template: remote-httpx
  params:
    model_url: ${MINILLM_MODEL_URL}
    model_id: ${MINILLM_MODEL_ID}
    api_key: ${MINILLM_ACCESS_KEY}
    api_key_header: Authorization
    api_key_prefix: Bearer

Why it matters

This would allow declarative use of enterprise models behind auth layers, without needing to override logic in Python. It would also unlock easier use of hosted or private models (Granite, DeepSeek, Ollama, etc.).

Question

Is there a recommended workaround today for supporting these remote providers via config? Or is this currently only possible by overriding via code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions