diff --git a/README.md b/README.md index 942213c7..1b0f9ecd 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,7 @@ The service includes comprehensive user data collection capabilities for various * [K8s based authentication](#k8s-based-authentication) * [JSON Web Keyset based authentication](#json-web-keyset-based-authentication) * [No-op authentication](#no-op-authentication) +* [RAG Configuration](#rag-configuration) * [Usage](#usage) * [Make targets](#make-targets) * [Running Linux container image](#running-linux-container-image) @@ -451,7 +452,21 @@ service: Credentials are not allowed with wildcard origins per CORS/Fetch spec. See https://fastapi.tiangolo.com/tutorial/cors/ +# RAG Configuration +The [guide to RAG setup](docs/rag_guide.md) provides guidance on setting up RAG and includes tested examples for both inference and vector store integration. + +## Example configurations for inference + +The following configurations are llama-stack config examples from production deployments: + +- [Granite on vLLM example](examples/vllm-granite-run.yaml) +- [Qwen3 on vLLM example](examples/vllm-qwen3-run.yaml) +- [Gemini example](examples/gemini-run.yaml) +- [VertexAI example](examples/vertexai-run.yaml) + +> [!NOTE] +> RAG functionality is **not tested** for these configurations. # Usage diff --git a/docs/rag_guide.md b/docs/rag_guide.md index 810cd2f3..ec62ad1e 100644 --- a/docs/rag_guide.md +++ b/docs/rag_guide.md @@ -61,7 +61,7 @@ Update the `run.yaml` file used by Llama Stack to point to: * Your downloaded **embedding model** * Your generated **vector database** -Example: +### FAISS example ```yaml models: @@ -100,10 +100,113 @@ Where: - `db_path` is the path to the vector index (.db file in this case) - `vector_db_id` is the index ID used to generate the db +See the full working [config example](examples/openai-faiss-run.yaml) for more details. + +### pgvector example + +This example shows how to configure a remote PostgreSQL database with the [pgvector](https://github.com/pgvector/pgvector) extension for storing embeddings. + +> You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with: +> ```sql +> CREATE EXTENSION IF NOT EXISTS vector; +> ``` + +Update the connection details (`host`, `port`, `db`, `user`, `password`) to match your PostgreSQL setup. + +Each pgvector-backed table follows this schema: + +- `id` (`text`): UUID identifier of the chunk +- `document` (`jsonb`): json containing content and metadata associated with the embedding +- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and will match the model's output size (e.g. 768 for `all-mpnet-base-v2`) + +> [!NOTE] +> The `vector_db_id` (e.g. `rhdocs`) is used to point to the table named `vector_store_rhdocs` in the specified database, which stores the vector embeddings. + + +```yaml +[...] +providers: + [...] + vector_io: + - provider_id: pgvector-example + provider_type: remote::pgvector + config: + host: localhost + port: 5432 + db: pgvector_example # PostgreSQL database (psql -d pgvector_example) + user: lightspeed # PostgreSQL user + password: password123 + kvstore: + type: sqlite + db_path: .llama/distributions/pgvector/pgvector_registry.db + +vector_dbs: +- embedding_dimension: 768 + embedding_model: sentence-transformers/all-mpnet-base-v2 + provider_id: pgvector-example + # A unique ID that becomes the PostgreSQL table name, prefixed with 'vector_store_'. + # e.g., 'rhdocs' will create the table 'vector_store_rhdocs'. + # If the table was already created, this value must match the ID used at creation. + vector_db_id: rhdocs +``` + +See the full working [config example](examples/openai-pgvector-run.yaml) for more details. + --- ## Add an Inference Model (LLM) +### vLLM on RHEL AI (Llama 3.1) example + +> [!NOTE] +> The following example assumes that podman's CDI has been properly configured to [enable GPU support](https://podman-desktop.io/docs/podman/gpu). + +The [`vllm-openai`](https://hub.docker.com/r/vllm/vllm-openai) Docker image is used to serve the Llama-3.1-8B-Instruct model. +The following example shows how to run it on **RHEL AI** with `podman`: + +```bash +podman run \ + --device "${CONTAINER_DEVICE}" \ + --gpus ${GPUS} \ + -v ~/.cache/huggingface:/root/.cache/huggingface \ + --env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \ + -p ${EXPORTED_PORT}:8000 \ + --ipc=host \ + docker.io/vllm/vllm-openai:latest \ + --model meta-llama/Llama-3.1-8B-Instruct \ + --enable-auto-tool-choice \ + --tool-call-parser llama3_json --chat-template examples/tool_chat_template_llama3.1_json.jinja +``` + +> The example command above enables tool calling for Llama 3.1 models. +> For other supported models and configuration options, see the vLLM documentation: +> [vLLM: Tool Calling](https://docs.vllm.ai/en/stable/features/tool_calling.html) + +After starting the container edit your `run.yaml` file, matching `model_id` with the model provided in the `podman run` command. + +```yaml +[...] +models: +[...] +- model_id: meta-llama/Llama-3.1-8B-Instruct # Same as the model name in the 'podman run' command + provider_id: vllm + model_type: llm + provider_model_id: null + +providers: + [...] + inference: + - provider_id: vllm + provider_type: remote::vllm + config: + url: http://localhost:${env.EXPORTED_PORT:=8000}/v1/ # Replace localhost with the url of the vLLM instance + api_token: # if any +``` + +See the full working [config example](examples/vllm-llama-faiss-run.yaml) for more details. + +### OpenAI example + Add a provider for your language model (e.g., OpenAI): ```yaml @@ -133,6 +236,24 @@ export OPENAI_API_KEY= > When experimenting with different `models`, `providers` and `vector_dbs`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_dbs list`) +See the full working [config example](examples/openai-faiss-run.yaml) for more details. + +### Azure OpenAI + +Not yet supported. + +### Ollama + +The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG. +While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider. + +There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama. +Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released. + +### vLLM Mistral + +The RAG tool calls where not working properly when experimenting with `mistralai/Mistral-7B-Instruct-v0.3` on vLLM. + --- # Complete Configuration Reference diff --git a/examples/gemini-run.yaml b/examples/gemini-run.yaml new file mode 100644 index 00000000..91edfb5d --- /dev/null +++ b/examples/gemini-run.yaml @@ -0,0 +1,112 @@ +# Example llama-stack configuration for Google Gemini inference +# +# Contributed by @eranco74 (2025-08). See https://github.com/rh-ecosystem-edge/assisted-chat/blob/main/template.yaml#L282-L386 +# This file shows how to integrate Gemini with LCS. +# +# Notes: +# - You will need valid Gemini API credentials to run this. +# - You will need a postgres instance to run this config. +# +version: 2 +image_name: gemini-config +apis: +- agents +- datasetio +- eval +- files +- inference +- safety +- scoring +- telemetry +- tool_runtime +- vector_io +providers: + inference: + - provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER} + provider_type: remote::gemini + config: + api_key: ${env.GEMINI_API_KEY} + vector_io: [] + files: [] + safety: [] + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: postgres + host: ${env.LLAMA_STACK_POSTGRES_HOST} + port: ${env.LLAMA_STACK_POSTGRES_PORT} + db: ${env.LLAMA_STACK_POSTGRES_NAME} + user: ${env.LLAMA_STACK_POSTGRES_USER} + password: ${env.LLAMA_STACK_POSTGRES_PASSWORD} + responses_store: + type: postgres + host: ${env.LLAMA_STACK_POSTGRES_HOST} + port: ${env.LLAMA_STACK_POSTGRES_PORT} + db: ${env.LLAMA_STACK_POSTGRES_NAME} + user: ${env.LLAMA_STACK_POSTGRES_USER} + password: ${env.LLAMA_STACK_POSTGRES_PASSWORD} + telemetry: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + service_name: "${LLAMA_STACK_OTEL_SERVICE_NAME}" + sinks: ${LLAMA_STACK_TELEMETRY_SINKS} + sqlite_db_path: ${STORAGE_MOUNT_PATH}/sqlite/trace_store.db + eval: [] + datasetio: [] + scoring: + - provider_id: basic + provider_type: inline::basic + config: {} + - provider_id: llm-as-judge + provider_type: inline::llm-as-judge + config: {} + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + - provider_id: model-context-protocol + provider_type: remote::model-context-protocol + config: {} +metadata_store: + type: sqlite + db_path: ${STORAGE_MOUNT_PATH}/sqlite/registry.db +inference_store: + type: postgres + host: ${env.LLAMA_STACK_POSTGRES_HOST} + port: ${env.LLAMA_STACK_POSTGRES_PORT} + db: ${env.LLAMA_STACK_POSTGRES_NAME} + user: ${env.LLAMA_STACK_POSTGRES_USER} + password: ${env.LLAMA_STACK_POSTGRES_PASSWORD} +models: +- metadata: {} + model_id: ${LLAMA_STACK_2_0_FLASH_MODEL} + provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER} + provider_model_id: ${LLAMA_STACK_2_0_FLASH_MODEL} + model_type: llm +- metadata: {} + model_id: ${LLAMA_STACK_2_5_PRO_MODEL} + provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER} + provider_model_id: ${LLAMA_STACK_2_5_PRO_MODEL} + model_type: llm +- metadata: {} + model_id: ${LLAMA_STACK_2_5_FLASH_MODEL} + provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER} + provider_model_id: ${LLAMA_STACK_2_5_FLASH_MODEL} + model_type: llm +shields: [] +vector_dbs: [] +datasets: [] +scoring_fns: [] +benchmarks: [] +tool_groups: +- toolgroup_id: builtin::rag + provider_id: rag-runtime +- toolgroup_id: mcp::assisted + provider_id: model-context-protocol + mcp_endpoint: + uri: "${MCP_SERVER_URL}" +server: + port: ${LLAMA_STACK_SERVER_PORT} diff --git a/examples/openai-faiss-run.yaml b/examples/openai-faiss-run.yaml new file mode 100644 index 00000000..4068dea8 --- /dev/null +++ b/examples/openai-faiss-run.yaml @@ -0,0 +1,83 @@ +# Example llama-stack configuration for OpenAI inference + FAISS (RAG) +# +# Notes: +# - You will need an OpenAI API key +# - You can generate the vector index with the rag-content tool (https://github.com/lightspeed-core/rag-content) +# +version: 2 +image_name: openai-faiss-config + +apis: +- agents +- inference +- vector_io +- tool_runtime +- safety + +models: +- model_id: gpt-test + provider_id: openai # This ID is a reference to 'providers.inference' + model_type: llm + provider_model_id: gpt-4o-mini + +- model_id: sentence-transformers/all-mpnet-base-v2 + metadata: + embedding_dimension: 768 + model_type: embedding + provider_id: sentence-transformers # This ID is a reference to 'providers.inference' + provider_model_id: /home/USER/lightspeed-stack/embedding_models/all-mpnet-base-v2 + +providers: + inference: + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + + - provider_id: openai + provider_type: remote::openai + config: + api_key: ${env.OPENAI_API_KEY} + + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + db_path: .llama/distributions/ollama/agents_store.db + responses_store: + type: sqlite + db_path: .llama/distributions/ollama/responses_store.db + + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + + vector_io: + - provider_id: ocp-docs + provider_type: inline::faiss + config: + kvstore: + type: sqlite + db_path: /home/USER/lightspeed-stack/vector_dbs/ocp_docs/faiss_store.db + namespace: null + + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + +# Enable the RAG tool +tool_groups: +- provider_id: rag-runtime + toolgroup_id: builtin::rag + args: null + mcp_endpoint: null + +vector_dbs: +- embedding_dimension: 768 + embedding_model: sentence-transformers/all-mpnet-base-v2 + provider_id: ocp-docs # This ID is a reference to 'providers.vector_io' + vector_db_id: openshift-index # This ID was defined during index generation \ No newline at end of file diff --git a/examples/openai-pgvector-run.yaml b/examples/openai-pgvector-run.yaml new file mode 100644 index 00000000..a8e1da34 --- /dev/null +++ b/examples/openai-pgvector-run.yaml @@ -0,0 +1,87 @@ +# Example llama-stack configuration for OpenAI inference + PSQL (pgvector) vector index (RAG) +# +# Notes: +# - You will need an OpenAI API key +# - You will need to setup PSQL with pgvector +# - The table schema must follow the expected schema in llama-stack (see rag_guide.md) +# +version: 2 +image_name: openai-pgvector-config + +apis: +- agents +- inference +- vector_io +- tool_runtime +- safety + +models: +- model_id: gpt-test + provider_id: openai + model_type: llm + provider_model_id: gpt-4o-mini +- model_id: sentence-transformers/all-mpnet-base-v2 + metadata: + embedding_dimension: 768 + model_type: embedding + provider_id: sentence-transformers + provider_model_id: /home/USER/lightspeed-stack/embedding_models/all-mpnet-base-v2 + +providers: + inference: + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + - provider_id: openai + provider_type: remote::openai + config: + api_key: ${env.OPENAI_API_KEY} + + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + db_path: .llama/distributions/ollama/agents_store.db + responses_store: + type: sqlite + db_path: .llama/distributions/ollama/responses_store.db + + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + + vector_io: + - provider_id: pgvector-example + provider_type: remote::pgvector + config: + host: localhost + port: 5432 + db: pgvector_example # PostgreSQL database (psql -d pgvector_example) + user: lightspeed # PostgreSQL user + password: empty + kvstore: + type: sqlite + db_path: .llama/distributions/pgvector/pgvector_registry.db + + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + +tool_groups: +- provider_id: rag-runtime + toolgroup_id: builtin::rag + args: null + mcp_endpoint: null + +vector_dbs: +- embedding_dimension: 768 + embedding_model: sentence-transformers/all-mpnet-base-v2 + provider_id: pgvector-example + # A unique ID that becomes the PostgreSQL table name, prefixed with 'vector_store_'. + # e.g., 'rhdocs' will create the table 'vector_store_rhdocs'. + vector_db_id: rhdocs \ No newline at end of file diff --git a/examples/vertexai-run.yaml b/examples/vertexai-run.yaml new file mode 100644 index 00000000..41056048 --- /dev/null +++ b/examples/vertexai-run.yaml @@ -0,0 +1,91 @@ +# Example llama-stack configuration for VertexAI inference +# +# Contributed by @eloycoto (2025-08). See https://github.com/rhdhorchestrator/LS-core-test/blob/master/run-llama-stack.yaml +# This file shows how to integrate VertexAI with LCS. +# +# Notes: +# - You will need to configure Gemini inference on VertexAI. +# +version: '3' +image_name: ollama-llama-stack-config +apis: + - agents + - inference + - safety + - telemetry + - tool_runtime + - vector_io +logging: + level: DEBUG # Set root logger to DEBUG + category_levels: + llama_stack: DEBUG # Enable DEBUG for all llama_stack modules + llama_stack.providers.remote.inference.vllm: DEBUG + llama_stack.providers.inline.agents.meta_reference: DEBUG + llama_stack.providers.inline.agents.meta_reference.agent_instance: DEBUG + llama_stack.providers.inline.vector_io.faiss: DEBUG + llama_stack.providers.inline.telemetry.meta_reference: DEBUG + llama_stack.core: DEBUG + llama_stack.apis: DEBUG + uvicorn: DEBUG + uvicorn.access: INFO # Keep HTTP requests at INFO to reduce noise + fastapi: DEBUG + +providers: + vector_io: + - config: + kvstore: + db_path: /tmp/faiss_store.db + type: sqlite + provider_id: faiss + provider_type: inline::faiss + + agents: + - config: + persistence_store: + db_path: /tmp/agents_store.db + namespace: null + type: sqlite + responses_store: + db_path: /tmp/responses_store.db + type: sqlite + provider_id: meta-reference + provider_type: inline::meta-reference + + + inference: + - provider_id: vllm-inference + provider_type: remote::vllm + config: + url: ${env.VLLM_URL:=http://localhost:8000/v1} + max_tokens: ${env.VLLM_MAX_TOKENS:=4096} + api_token: ${env.VLLM_API_TOKEN:=fake} + tls_verify: ${env.VLLM_TLS_VERIFY:=false} + + - provider_id: google-vertex + provider_type: remote::vertexai + config: + project: ${env.VERTEXAI_PROJECT} + region: ${env.VERTEXAI_REGION:=us-east5} + + tool_runtime: + - provider_id: model-context-protocol + provider_type: remote::model-context-protocol + config: {} + module: null + + telemetry: + - config: + service_name: 'llama-stack' + sinks: console,sqlite + sqlite_db_path: /tmp/trace_store.db + provider_id: meta-reference + provider_type: inline::meta-reference + +metadata_store: + type: sqlite + db_path: /tmp/registry.db + namespace: null + +inference_store: + type: sqlite + db_path: /tmp/inference_store.db \ No newline at end of file diff --git a/examples/vllm-granite-run.yaml b/examples/vllm-granite-run.yaml new file mode 100644 index 00000000..198095ad --- /dev/null +++ b/examples/vllm-granite-run.yaml @@ -0,0 +1,148 @@ +# Example llama-stack configuration for IBM Granite using vLLM (no RAG) + +# +# Contributed by @eranco74 (2025-08). +# +# Notes: +# - You will need to serve Granite on a vLLM instance +# +version: '2' +image_name: vllm-granite-config +apis: +- agents +- datasetio +- eval +- files +- inference +- post_training +- safety +- scoring +- telemetry +- tool_runtime +- vector_io +providers: + inference: + - provider_id: granite + provider_type: remote::vllm + config: + url: ${env.VLLM_URL} + api_token: ${env.VLLM_API_TOKEN:fake} + max_tokens: 10000 + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + namespace: null + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/agents_store.db + responses_store: + type: sqlite + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/responses_store.db + telemetry: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + service_name: "${env.OTEL_SERVICE_NAME:\u200B}" + sinks: ${env.TELEMETRY_SINKS:console,sqlite} + sqlite_db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/trace_store.db + eval: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + kvstore: + type: sqlite + namespace: null + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/meta_reference_eval.db + datasetio: + - provider_id: huggingface + provider_type: remote::huggingface + config: + kvstore: + type: sqlite + namespace: null + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/huggingface_datasetio.db + - provider_id: localfs + provider_type: inline::localfs + config: + kvstore: + type: sqlite + namespace: null + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/localfs_datasetio.db + scoring: + - provider_id: basic + provider_type: inline::basic + config: {} + - provider_id: llm-as-judge + provider_type: inline::llm-as-judge + config: {} + - provider_id: braintrust + provider_type: inline::braintrust + config: + openai_api_key: ${env.OPENAI_API_KEY:} + files: + - provider_id: meta-reference-files + provider_type: inline::localfs + config: + storage_dir: ${env.FILES_STORAGE_DIR:~/.llama/distributions/ollama/files} + metadata_store: + type: sqlite + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/files_metadata.db + post_training: + - provider_id: huggingface + provider_type: inline::huggingface + config: + checkpoint_format: huggingface + distributed_backend: null + device: cpu + tool_runtime: + - provider_id: brave-search + provider_type: remote::brave-search + config: + api_key: ${env.BRAVE_SEARCH_API_KEY:} + max_results: 3 + - provider_id: tavily-search + provider_type: remote::tavily-search + config: + api_key: ${env.TAVILY_SEARCH_API_KEY:} + max_results: 3 + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + - provider_id: model-context-protocol + provider_type: remote::model-context-protocol + config: {} + - provider_id: wolfram-alpha + provider_type: remote::wolfram-alpha + config: + api_key: ${env.WOLFRAM_ALPHA_API_KEY:} +metadata_store: + type: sqlite + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/registry.db +inference_store: + type: sqlite + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/inference_store.db +models: +- metadata: {} + model_id: ${env.INFERENCE_MODEL} + provider_id: granite + provider_model_id: null +shields: [] +vector_dbs: [] +datasets: [] +scoring_fns: [] +benchmarks: [] +tool_groups: +- toolgroup_id: builtin::websearch + provider_id: tavily-search +- toolgroup_id: builtin::rag + provider_id: rag-runtime +- toolgroup_id: builtin::wolfram_alpha + provider_id: wolfram-alpha +server: + port: 8321 \ No newline at end of file diff --git a/examples/vllm-llama-faiss-run.yaml b/examples/vllm-llama-faiss-run.yaml new file mode 100644 index 00000000..92457747 --- /dev/null +++ b/examples/vllm-llama-faiss-run.yaml @@ -0,0 +1,80 @@ +# Example llama-stack configuration for vLLM on RHEL, Meta Llama 3.1 Instruct + FAISS (RAG) +# +# Notes: +# - You will need to serve Llama 3.1 Instruct on a vLLM instance +# +version: 2 +image_name: vllm-llama-faiss-config + +apis: +- agents +- inference +- vector_io +- tool_runtime +- safety + +models: +- model_id: meta-llama/Llama-3.1-8B-Instruct + provider_id: vllm + model_type: llm + provider_model_id: null +- model_id: sentence-transformers/all-mpnet-base-v2 + metadata: + embedding_dimension: 768 + model_type: embedding + provider_id: sentence-transformers + provider_model_id: /home/USER/embedding_models/all-mpnet-base-v2 + +providers: + inference: + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + - provider_id: vllm + provider_type: remote::vllm + config: + url: http://localhost:8000/v1/ + api_token: key + + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + db_path: .llama/distributions/ollama/agents_store.db + responses_store: + type: sqlite + db_path: .llama/distributions/ollama/responses_store.db + + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + + vector_io: + - provider_id: rhel-db + provider_type: inline::faiss + config: + kvstore: + type: sqlite + db_path: /home/USER/vector_dbs/rhel_index/faiss_store.db + namespace: null + + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + +tool_groups: +- provider_id: rag-runtime + toolgroup_id: builtin::rag + args: null + mcp_endpoint: null + +vector_dbs: +- embedding_dimension: 768 + embedding_model: sentence-transformers/all-mpnet-base-v2 + provider_id: rhel-db + vector_db_id: rhel-docs \ No newline at end of file diff --git a/examples/vllm-qwen3-run.yaml b/examples/vllm-qwen3-run.yaml new file mode 100644 index 00000000..9de77f2e --- /dev/null +++ b/examples/vllm-qwen3-run.yaml @@ -0,0 +1,108 @@ +# Example llama-stack configuration for Alibaba Qwen3 using vLLM (no RAG) + +# +# Contributed by @eranco74 (2025-08). +# +# Notes: +# - You will need to serve Qwen3 on a vLLM instance +# +version: 2 +image_name: vllm-qwen3-config +apis: +- agents +- datasetio +- eval +- files +- inference +- safety +- scoring +- telemetry +- tool_runtime +- vector_io +providers: + inference: + - provider_id: qwen + provider_type: remote::vllm + config: + url: https://qwen3.rosa.openshiftapps.com/v1 + max_tokens: 32768 + api_token: + tls_verify: true + vector_io: [] + files: [] + safety: [] + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: postgres + host: ${env.POSTGRES_HOST:=localhost} + port: ${env.POSTGRES_PORT:=5432} + db: ${env.POSTGRES_DB:=llamastack} + user: ${env.POSTGRES_USER:=user} + password: ${env.POSTGRES_PASSWORD:=password} + responses_store: + type: postgres + host: ${env.POSTGRES_HOST:=localhost} + port: ${env.POSTGRES_PORT:=5432} + db: ${env.POSTGRES_DB:=llamastack} + user: ${env.POSTGRES_USER:=user} + password: ${env.POSTGRES_PASSWORD:=password} + telemetry: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + service_name: "${env.OTEL_SERVICE_NAME:=\u200B}" + sinks: ${env.TELEMETRY_SINKS:=console,sqlite} + sqlite_db_path: ${env.SQLITE_STORE_DIR:=/tmp/.llama/distributions/starter}/trace_store.db + eval: [] + datasetio: [] + scoring: + - provider_id: basic + provider_type: inline::basic + config: {} + - provider_id: llm-as-judge + provider_type: inline::llm-as-judge + config: {} + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + - provider_id: model-context-protocol + provider_type: remote::model-context-protocol + config: {} +metadata_store: + type: postgres + host: ${env.POSTGRES_HOST:=localhost} + port: ${env.POSTGRES_PORT:=5432} + db: ${env.POSTGRES_DB:=llamastack} + user: ${env.POSTGRES_USER:=user} + password: ${env.POSTGRES_PASSWORD:=password} + table_name: llamastack_kvstore +inference_store: + type: postgres + host: ${env.POSTGRES_HOST:=localhost} + port: ${env.POSTGRES_PORT:=5432} + db: ${env.POSTGRES_DB:=llamastack} + user: ${env.POSTGRES_USER:=user} + password: ${env.POSTGRES_PASSWORD:=password} +models: +- metadata: {} + model_id: qwen3-32b-maas + provider_id: qwen + provider_model_id: null +shields: [] +vector_dbs: [] +datasets: [] +scoring_fns: [] +benchmarks: [] +tool_groups: +- toolgroup_id: builtin::rag + provider_id: rag-runtime +- toolgroup_id: mcp::assisted + provider_id: model-context-protocol + mcp_endpoint: + uri: "http://assisted-service-mcp:8000/sse" +server: + port: 8321 \ No newline at end of file