diff --git a/README.md b/README.md
index 942213c7..1b0f9ecd 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,7 @@ The service includes comprehensive user data collection capabilities for various
         * [K8s based authentication](#k8s-based-authentication)
         * [JSON Web Keyset based authentication](#json-web-keyset-based-authentication)
         * [No-op authentication](#no-op-authentication)
+* [RAG Configuration](#rag-configuration)
 * [Usage](#usage)
     * [Make targets](#make-targets)
     * [Running Linux container image](#running-linux-container-image)
@@ -451,7 +452,21 @@ service:
 Credentials are not allowed with wildcard origins per CORS/Fetch spec.
 See https://fastapi.tiangolo.com/tutorial/cors/
 
+# RAG Configuration
 
+The [guide to RAG setup](docs/rag_guide.md) provides guidance on setting up RAG and includes tested examples for both inference and vector store integration.
+
+## Example configurations for inference
+
+The following configurations are llama-stack config examples from production deployments:
+
+- [Granite on vLLM example](examples/vllm-granite-run.yaml)
+- [Qwen3 on vLLM example](examples/vllm-qwen3-run.yaml)
+- [Gemini example](examples/gemini-run.yaml)
+- [VertexAI example](examples/vertexai-run.yaml)
+
+> [!NOTE]
+> RAG functionality is **not tested** for these configurations.
 
 # Usage
 
diff --git a/docs/rag_guide.md b/docs/rag_guide.md
index 810cd2f3..ec62ad1e 100644
--- a/docs/rag_guide.md
+++ b/docs/rag_guide.md
@@ -61,7 +61,7 @@ Update the `run.yaml` file used by Llama Stack to point to:
 * Your downloaded **embedding model**
 * Your generated **vector database**
 
-Example:
+### FAISS example
 
 ```yaml
 models:
@@ -100,10 +100,113 @@ Where:
 - `db_path` is the path to the vector index (.db file in this case)
 - `vector_db_id` is the index ID used to generate the db
 
+See the full working [config example](examples/openai-faiss-run.yaml) for more details.
+
+### pgvector example
+
+This example shows how to configure a remote PostgreSQL database with the [pgvector](https://github.com/pgvector/pgvector) extension for storing embeddings.
+
+> You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with:
+> ```sql
+> CREATE EXTENSION IF NOT EXISTS vector;
+> ```
+
+Update the connection details (`host`, `port`, `db`, `user`, `password`) to match your PostgreSQL setup.
+
+Each pgvector-backed table follows this schema:
+
+- `id` (`text`): UUID identifier of the chunk
+- `document` (`jsonb`): json containing content and metadata associated with the embedding  
+- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and will match the model's output size (e.g. 768 for `all-mpnet-base-v2`) 
+
+> [!NOTE]
+> The `vector_db_id` (e.g. `rhdocs`) is used to point to the table named `vector_store_rhdocs` in the specified database, which stores the vector embeddings.
+
+
+```yaml
+[...]
+providers:
+  [...]
+  vector_io:
+  - provider_id: pgvector-example 
+    provider_type: remote::pgvector
+    config:
+      host: localhost
+      port: 5432
+      db: pgvector_example # PostgreSQL database (psql -d pgvector_example)
+      user: lightspeed # PostgreSQL user
+      password: password123
+      kvstore:
+        type: sqlite
+        db_path: .llama/distributions/pgvector/pgvector_registry.db
+
+vector_dbs:
+- embedding_dimension: 768
+  embedding_model: sentence-transformers/all-mpnet-base-v2
+  provider_id: pgvector-example 
+  # A unique ID that becomes the PostgreSQL table name, prefixed with 'vector_store_'.
+  # e.g., 'rhdocs' will create the table 'vector_store_rhdocs'.
+  # If the table was already created, this value must match the ID used at creation.
+  vector_db_id: rhdocs
+```
+
+See the full working [config example](examples/openai-pgvector-run.yaml) for more details.
+
 ---
 
 ## Add an Inference Model (LLM)
 
+### vLLM on RHEL AI (Llama 3.1) example
+
+> [!NOTE]
+> The following example assumes that podman's CDI has been properly configured to [enable GPU support](https://podman-desktop.io/docs/podman/gpu).
+
+The [`vllm-openai`](https://hub.docker.com/r/vllm/vllm-openai) Docker image is used to serve the Llama-3.1-8B-Instruct model.  
+The following example shows how to run it on **RHEL AI** with `podman`:  
+
+```bash
+podman run \
+  --device "${CONTAINER_DEVICE}" \
+  --gpus ${GPUS} \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  --env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \
+  -p ${EXPORTED_PORT}:8000 \
+  --ipc=host \
+  docker.io/vllm/vllm-openai:latest \
+  --model meta-llama/Llama-3.1-8B-Instruct \
+  --enable-auto-tool-choice \
+  --tool-call-parser llama3_json --chat-template examples/tool_chat_template_llama3.1_json.jinja
+```
+
+> The example command above enables tool calling for Llama 3.1 models.
+> For other supported models and configuration options, see the vLLM documentation:
+> [vLLM: Tool Calling](https://docs.vllm.ai/en/stable/features/tool_calling.html)
+
+After starting the container edit your `run.yaml` file, matching `model_id` with the model provided in the `podman run` command.
+
+```yaml
+[...]
+models:
+[...]
+- model_id: meta-llama/Llama-3.1-8B-Instruct # Same as the model name in the 'podman run' command
+  provider_id: vllm
+  model_type: llm
+  provider_model_id: null
+
+providers:
+  [...]
+  inference:
+  - provider_id: vllm
+    provider_type: remote::vllm
+    config:
+      url: http://localhost:${env.EXPORTED_PORT:=8000}/v1/ # Replace localhost with the url of the vLLM instance
+      api_token: <your-key-here> # if any
+```
+
+See the full working [config example](examples/vllm-llama-faiss-run.yaml) for more details.
+
+### OpenAI example
+
 Add a provider for your language model (e.g., OpenAI):
 
 ```yaml
@@ -133,6 +236,24 @@ export OPENAI_API_KEY=<your-key-here>
 > When experimenting with different `models`, `providers` and `vector_dbs`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_dbs list`)
 
 
+See the full working [config example](examples/openai-faiss-run.yaml) for more details.
+
+### Azure OpenAI
+
+Not yet supported.
+
+### Ollama
+
+The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG.  
+While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider. 
+
+There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama.  
+Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released.  
+
+### vLLM Mistral
+
+The RAG tool calls where not working properly when experimenting with `mistralai/Mistral-7B-Instruct-v0.3` on vLLM.
+
 ---
 
 # Complete Configuration Reference
diff --git a/examples/gemini-run.yaml b/examples/gemini-run.yaml
new file mode 100644
index 00000000..91edfb5d
--- /dev/null
+++ b/examples/gemini-run.yaml
@@ -0,0 +1,112 @@
+# Example llama-stack configuration for Google Gemini inference
+# 
+# Contributed by @eranco74 (2025-08). See https://github.com/rh-ecosystem-edge/assisted-chat/blob/main/template.yaml#L282-L386
+# This file shows how to integrate Gemini with LCS.
+# 
+# Notes:
+# - You will need valid Gemini API credentials to run this.
+# - You will need a postgres instance to run this config.
+#
+version: 2
+image_name: gemini-config
+apis:
+- agents
+- datasetio
+- eval
+- files
+- inference
+- safety
+- scoring
+- telemetry
+- tool_runtime
+- vector_io
+providers:
+  inference:
+  - provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER}
+    provider_type: remote::gemini
+    config:
+      api_key: ${env.GEMINI_API_KEY}
+  vector_io: []
+  files: []
+  safety: []
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: postgres
+        host: ${env.LLAMA_STACK_POSTGRES_HOST}
+        port: ${env.LLAMA_STACK_POSTGRES_PORT}
+        db: ${env.LLAMA_STACK_POSTGRES_NAME}
+        user: ${env.LLAMA_STACK_POSTGRES_USER}
+        password: ${env.LLAMA_STACK_POSTGRES_PASSWORD}
+      responses_store:
+        type: postgres
+        host: ${env.LLAMA_STACK_POSTGRES_HOST}
+        port: ${env.LLAMA_STACK_POSTGRES_PORT}
+        db: ${env.LLAMA_STACK_POSTGRES_NAME}
+        user: ${env.LLAMA_STACK_POSTGRES_USER}
+        password: ${env.LLAMA_STACK_POSTGRES_PASSWORD}
+  telemetry:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      service_name: "${LLAMA_STACK_OTEL_SERVICE_NAME}"
+      sinks: ${LLAMA_STACK_TELEMETRY_SINKS}
+      sqlite_db_path: ${STORAGE_MOUNT_PATH}/sqlite/trace_store.db
+  eval: []
+  datasetio: []
+  scoring:
+  - provider_id: basic
+    provider_type: inline::basic
+    config: {}
+  - provider_id: llm-as-judge
+    provider_type: inline::llm-as-judge
+    config: {}
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+  - provider_id: model-context-protocol
+    provider_type: remote::model-context-protocol
+    config: {}
+metadata_store:
+  type: sqlite
+  db_path: ${STORAGE_MOUNT_PATH}/sqlite/registry.db
+inference_store:
+  type: postgres
+  host: ${env.LLAMA_STACK_POSTGRES_HOST}
+  port: ${env.LLAMA_STACK_POSTGRES_PORT}
+  db: ${env.LLAMA_STACK_POSTGRES_NAME}
+  user: ${env.LLAMA_STACK_POSTGRES_USER}
+  password: ${env.LLAMA_STACK_POSTGRES_PASSWORD}
+models:
+- metadata: {}
+  model_id: ${LLAMA_STACK_2_0_FLASH_MODEL}
+  provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER}
+  provider_model_id: ${LLAMA_STACK_2_0_FLASH_MODEL}
+  model_type: llm
+- metadata: {}
+  model_id: ${LLAMA_STACK_2_5_PRO_MODEL}
+  provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER}
+  provider_model_id: ${LLAMA_STACK_2_5_PRO_MODEL}
+  model_type: llm
+- metadata: {}
+  model_id: ${LLAMA_STACK_2_5_FLASH_MODEL}
+  provider_id: ${LLAMA_STACK_INFERENCE_PROVIDER}
+  provider_model_id: ${LLAMA_STACK_2_5_FLASH_MODEL}
+  model_type: llm
+shields: []
+vector_dbs: []
+datasets: []
+scoring_fns: []
+benchmarks: []
+tool_groups:
+- toolgroup_id: builtin::rag
+  provider_id: rag-runtime
+- toolgroup_id: mcp::assisted
+  provider_id: model-context-protocol
+  mcp_endpoint:
+    uri: "${MCP_SERVER_URL}"
+server:
+  port: ${LLAMA_STACK_SERVER_PORT}
diff --git a/examples/openai-faiss-run.yaml b/examples/openai-faiss-run.yaml
new file mode 100644
index 00000000..4068dea8
--- /dev/null
+++ b/examples/openai-faiss-run.yaml
@@ -0,0 +1,83 @@
+# Example llama-stack configuration for OpenAI inference + FAISS (RAG)
+# 
+# Notes:
+# - You will need an OpenAI API key
+# - You can generate the vector index with the rag-content tool (https://github.com/lightspeed-core/rag-content)
+# 
+version: 2
+image_name: openai-faiss-config
+
+apis:
+- agents
+- inference
+- vector_io
+- tool_runtime
+- safety
+
+models:
+- model_id: gpt-test 
+  provider_id: openai # This ID is a reference to 'providers.inference'
+  model_type: llm
+  provider_model_id: gpt-4o-mini
+
+- model_id: sentence-transformers/all-mpnet-base-v2
+  metadata:
+      embedding_dimension: 768
+  model_type: embedding
+  provider_id: sentence-transformers # This ID is a reference to 'providers.inference'
+  provider_model_id: /home/USER/lightspeed-stack/embedding_models/all-mpnet-base-v2 
+  
+providers:
+  inference:
+  - provider_id: sentence-transformers 
+    provider_type: inline::sentence-transformers
+    config: {}
+
+  - provider_id: openai 
+    provider_type: remote::openai
+    config:
+      api_key: ${env.OPENAI_API_KEY}
+
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+
+  vector_io:
+  - provider_id: ocp-docs 
+    provider_type: inline::faiss
+    config:
+      kvstore:
+        type: sqlite
+        db_path: /home/USER/lightspeed-stack/vector_dbs/ocp_docs/faiss_store.db
+        namespace: null
+
+  tool_runtime:
+  - provider_id: rag-runtime 
+    provider_type: inline::rag-runtime
+    config: {}
+
+# Enable the RAG tool
+tool_groups:
+- provider_id: rag-runtime
+  toolgroup_id: builtin::rag
+  args: null
+  mcp_endpoint: null
+
+vector_dbs:
+- embedding_dimension: 768
+  embedding_model: sentence-transformers/all-mpnet-base-v2 
+  provider_id: ocp-docs # This ID is a reference to 'providers.vector_io'
+  vector_db_id: openshift-index  # This ID was defined during index generation
\ No newline at end of file
diff --git a/examples/openai-pgvector-run.yaml b/examples/openai-pgvector-run.yaml
new file mode 100644
index 00000000..a8e1da34
--- /dev/null
+++ b/examples/openai-pgvector-run.yaml
@@ -0,0 +1,87 @@
+# Example llama-stack configuration for OpenAI inference + PSQL (pgvector) vector index (RAG)
+# 
+# Notes:
+# - You will need an OpenAI API key
+# - You will need to setup PSQL with pgvector
+# - The table schema must follow the expected schema in llama-stack (see rag_guide.md)
+# 
+version: 2
+image_name: openai-pgvector-config
+
+apis:
+- agents
+- inference
+- vector_io
+- tool_runtime
+- safety
+
+models:
+- model_id: gpt-test
+  provider_id: openai
+  model_type: llm
+  provider_model_id: gpt-4o-mini
+- model_id: sentence-transformers/all-mpnet-base-v2
+  metadata:
+      embedding_dimension: 768
+  model_type: embedding
+  provider_id: sentence-transformers
+  provider_model_id: /home/USER/lightspeed-stack/embedding_models/all-mpnet-base-v2
+  
+providers:
+  inference:
+  - provider_id: sentence-transformers
+    provider_type: inline::sentence-transformers
+    config: {}
+  - provider_id: openai
+    provider_type: remote::openai
+    config:
+      api_key: ${env.OPENAI_API_KEY}
+
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+
+  vector_io:
+  - provider_id: pgvector-example 
+    provider_type: remote::pgvector
+    config:
+      host: localhost
+      port: 5432
+      db: pgvector_example # PostgreSQL database (psql -d pgvector_example)
+      user: lightspeed # PostgreSQL user
+      password: empty
+      kvstore:
+        type: sqlite
+        db_path: .llama/distributions/pgvector/pgvector_registry.db
+
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+
+tool_groups:
+- provider_id: rag-runtime
+  toolgroup_id: builtin::rag
+  args: null
+  mcp_endpoint: null
+
+vector_dbs:
+- embedding_dimension: 768
+  embedding_model: sentence-transformers/all-mpnet-base-v2
+  provider_id: pgvector-example 
+  # A unique ID that becomes the PostgreSQL table name, prefixed with 'vector_store_'.
+  # e.g., 'rhdocs' will create the table 'vector_store_rhdocs'.
+  vector_db_id: rhdocs
\ No newline at end of file
diff --git a/examples/vertexai-run.yaml b/examples/vertexai-run.yaml
new file mode 100644
index 00000000..41056048
--- /dev/null
+++ b/examples/vertexai-run.yaml
@@ -0,0 +1,91 @@
+# Example llama-stack configuration for VertexAI inference
+# 
+# Contributed by @eloycoto (2025-08). See https://github.com/rhdhorchestrator/LS-core-test/blob/master/run-llama-stack.yaml
+# This file shows how to integrate VertexAI with LCS.
+# 
+# Notes:
+# - You will need to configure Gemini inference on VertexAI.
+#
+version: '3'
+image_name: ollama-llama-stack-config
+apis:
+  - agents
+  - inference
+  - safety
+  - telemetry
+  - tool_runtime
+  - vector_io
+logging:
+  level: DEBUG  # Set root logger to DEBUG
+  category_levels:
+    llama_stack: DEBUG  # Enable DEBUG for all llama_stack modules
+    llama_stack.providers.remote.inference.vllm: DEBUG
+    llama_stack.providers.inline.agents.meta_reference: DEBUG
+    llama_stack.providers.inline.agents.meta_reference.agent_instance: DEBUG
+    llama_stack.providers.inline.vector_io.faiss: DEBUG
+    llama_stack.providers.inline.telemetry.meta_reference: DEBUG
+    llama_stack.core: DEBUG
+    llama_stack.apis: DEBUG
+    uvicorn: DEBUG
+    uvicorn.access: INFO  # Keep HTTP requests at INFO to reduce noise
+    fastapi: DEBUG
+
+providers:
+  vector_io:
+    - config:
+        kvstore:
+          db_path: /tmp/faiss_store.db
+          type: sqlite
+      provider_id: faiss
+      provider_type: inline::faiss
+
+  agents:
+  - config:
+      persistence_store:
+        db_path: /tmp/agents_store.db
+        namespace: null
+        type: sqlite
+      responses_store:
+        db_path: /tmp/responses_store.db
+        type: sqlite
+    provider_id: meta-reference
+    provider_type: inline::meta-reference
+
+
+  inference:
+    - provider_id: vllm-inference
+      provider_type: remote::vllm
+      config:
+        url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
+        api_token: ${env.VLLM_API_TOKEN:=fake}
+        tls_verify: ${env.VLLM_TLS_VERIFY:=false}
+
+    - provider_id: google-vertex
+      provider_type: remote::vertexai
+      config:
+        project: ${env.VERTEXAI_PROJECT}
+        region: ${env.VERTEXAI_REGION:=us-east5}
+
+  tool_runtime:
+    - provider_id: model-context-protocol
+      provider_type: remote::model-context-protocol
+      config: {}
+      module: null
+
+  telemetry:
+    - config:
+        service_name: 'llama-stack'
+        sinks: console,sqlite
+        sqlite_db_path: /tmp/trace_store.db
+      provider_id: meta-reference
+      provider_type: inline::meta-reference
+
+metadata_store:
+  type: sqlite
+  db_path: /tmp/registry.db
+  namespace: null
+
+inference_store:
+  type: sqlite
+  db_path: /tmp/inference_store.db
\ No newline at end of file
diff --git a/examples/vllm-granite-run.yaml b/examples/vllm-granite-run.yaml
new file mode 100644
index 00000000..198095ad
--- /dev/null
+++ b/examples/vllm-granite-run.yaml
@@ -0,0 +1,148 @@
+# Example llama-stack configuration for IBM Granite using vLLM (no RAG)
+
+# 
+# Contributed by @eranco74 (2025-08).
+# 
+# Notes:
+# - You will need to serve Granite on a vLLM instance
+#
+version: '2'
+image_name: vllm-granite-config
+apis:
+- agents
+- datasetio
+- eval
+- files
+- inference
+- post_training
+- safety
+- scoring
+- telemetry
+- tool_runtime
+- vector_io
+providers:
+  inference:
+  - provider_id: granite
+    provider_type: remote::vllm
+    config:
+      url: ${env.VLLM_URL}
+      api_token: ${env.VLLM_API_TOKEN:fake}
+      max_tokens: 10000
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        namespace: null
+        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/responses_store.db
+  telemetry:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      service_name: "${env.OTEL_SERVICE_NAME:\u200B}"
+      sinks: ${env.TELEMETRY_SINKS:console,sqlite}
+      sqlite_db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/trace_store.db
+  eval:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      kvstore:
+        type: sqlite
+        namespace: null
+        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/meta_reference_eval.db
+  datasetio:
+  - provider_id: huggingface
+    provider_type: remote::huggingface
+    config:
+      kvstore:
+        type: sqlite
+        namespace: null
+        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/huggingface_datasetio.db
+  - provider_id: localfs
+    provider_type: inline::localfs
+    config:
+      kvstore:
+        type: sqlite
+        namespace: null
+        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/localfs_datasetio.db
+  scoring:
+  - provider_id: basic
+    provider_type: inline::basic
+    config: {}
+  - provider_id: llm-as-judge
+    provider_type: inline::llm-as-judge
+    config: {}
+  - provider_id: braintrust
+    provider_type: inline::braintrust
+    config:
+      openai_api_key: ${env.OPENAI_API_KEY:}
+  files:
+  - provider_id: meta-reference-files
+    provider_type: inline::localfs
+    config:
+      storage_dir: ${env.FILES_STORAGE_DIR:~/.llama/distributions/ollama/files}
+      metadata_store:
+        type: sqlite
+        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/files_metadata.db
+  post_training:
+  - provider_id: huggingface
+    provider_type: inline::huggingface
+    config:
+      checkpoint_format: huggingface
+      distributed_backend: null
+      device: cpu
+  tool_runtime:
+  - provider_id: brave-search
+    provider_type: remote::brave-search
+    config:
+      api_key: ${env.BRAVE_SEARCH_API_KEY:}
+      max_results: 3
+  - provider_id: tavily-search
+    provider_type: remote::tavily-search
+    config:
+      api_key: ${env.TAVILY_SEARCH_API_KEY:}
+      max_results: 3
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+  - provider_id: model-context-protocol
+    provider_type: remote::model-context-protocol
+    config: {}
+  - provider_id: wolfram-alpha
+    provider_type: remote::wolfram-alpha
+    config:
+      api_key: ${env.WOLFRAM_ALPHA_API_KEY:}
+metadata_store:
+  type: sqlite
+  db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/registry.db
+inference_store:
+  type: sqlite
+  db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/inference_store.db
+models:
+- metadata: {}
+  model_id: ${env.INFERENCE_MODEL}
+  provider_id: granite
+  provider_model_id: null
+shields: []
+vector_dbs: []
+datasets: []
+scoring_fns: []
+benchmarks: []
+tool_groups:
+- toolgroup_id: builtin::websearch
+  provider_id: tavily-search
+- toolgroup_id: builtin::rag
+  provider_id: rag-runtime
+- toolgroup_id: builtin::wolfram_alpha
+  provider_id: wolfram-alpha
+server:
+  port: 8321
\ No newline at end of file
diff --git a/examples/vllm-llama-faiss-run.yaml b/examples/vllm-llama-faiss-run.yaml
new file mode 100644
index 00000000..92457747
--- /dev/null
+++ b/examples/vllm-llama-faiss-run.yaml
@@ -0,0 +1,80 @@
+# Example llama-stack configuration for vLLM on RHEL, Meta Llama 3.1 Instruct + FAISS (RAG)
+# 
+# Notes:
+# - You will need to serve Llama 3.1 Instruct on a vLLM instance
+#
+version: 2
+image_name: vllm-llama-faiss-config
+
+apis:
+- agents
+- inference
+- vector_io
+- tool_runtime
+- safety
+
+models:
+- model_id: meta-llama/Llama-3.1-8B-Instruct
+  provider_id: vllm
+  model_type: llm
+  provider_model_id: null
+- model_id: sentence-transformers/all-mpnet-base-v2
+  metadata:
+      embedding_dimension: 768
+  model_type: embedding
+  provider_id: sentence-transformers
+  provider_model_id: /home/USER/embedding_models/all-mpnet-base-v2
+  
+providers:
+  inference:
+  - provider_id: sentence-transformers
+    provider_type: inline::sentence-transformers
+    config: {}
+  - provider_id: vllm
+    provider_type: remote::vllm
+    config:
+      url: http://localhost:8000/v1/
+      api_token: key
+
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+
+  vector_io:
+  - provider_id: rhel-db
+    provider_type: inline::faiss
+    config:
+      kvstore:
+        type: sqlite
+        db_path: /home/USER/vector_dbs/rhel_index/faiss_store.db
+        namespace: null
+
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+
+tool_groups:
+- provider_id: rag-runtime
+  toolgroup_id: builtin::rag
+  args: null
+  mcp_endpoint: null
+
+vector_dbs:
+- embedding_dimension: 768
+  embedding_model: sentence-transformers/all-mpnet-base-v2
+  provider_id: rhel-db
+  vector_db_id: rhel-docs
\ No newline at end of file
diff --git a/examples/vllm-qwen3-run.yaml b/examples/vllm-qwen3-run.yaml
new file mode 100644
index 00000000..9de77f2e
--- /dev/null
+++ b/examples/vllm-qwen3-run.yaml
@@ -0,0 +1,108 @@
+# Example llama-stack configuration for Alibaba Qwen3 using vLLM (no RAG)
+
+# 
+# Contributed by @eranco74 (2025-08).
+# 
+# Notes:
+# - You will need to serve Qwen3 on a vLLM instance
+#
+version: 2
+image_name: vllm-qwen3-config
+apis:
+- agents
+- datasetio
+- eval
+- files
+- inference
+- safety
+- scoring
+- telemetry
+- tool_runtime
+- vector_io
+providers:
+  inference:
+  - provider_id: qwen
+    provider_type: remote::vllm
+    config:
+      url: https://qwen3.rosa.openshiftapps.com/v1
+      max_tokens: 32768
+      api_token: <add your api key>
+      tls_verify: true
+  vector_io: []
+  files: []
+  safety: []
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: postgres
+        host: ${env.POSTGRES_HOST:=localhost}
+        port: ${env.POSTGRES_PORT:=5432}
+        db: ${env.POSTGRES_DB:=llamastack}
+        user: ${env.POSTGRES_USER:=user}
+        password: ${env.POSTGRES_PASSWORD:=password}
+      responses_store:
+        type: postgres
+        host: ${env.POSTGRES_HOST:=localhost}
+        port: ${env.POSTGRES_PORT:=5432}
+        db: ${env.POSTGRES_DB:=llamastack}
+        user: ${env.POSTGRES_USER:=user}
+        password: ${env.POSTGRES_PASSWORD:=password}
+  telemetry:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
+      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
+      sqlite_db_path: ${env.SQLITE_STORE_DIR:=/tmp/.llama/distributions/starter}/trace_store.db
+  eval: []
+  datasetio: []
+  scoring:
+  - provider_id: basic
+    provider_type: inline::basic
+    config: {}
+  - provider_id: llm-as-judge
+    provider_type: inline::llm-as-judge
+    config: {}
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+  - provider_id: model-context-protocol
+    provider_type: remote::model-context-protocol
+    config: {}
+metadata_store:
+  type: postgres
+  host: ${env.POSTGRES_HOST:=localhost}
+  port: ${env.POSTGRES_PORT:=5432}
+  db: ${env.POSTGRES_DB:=llamastack}
+  user: ${env.POSTGRES_USER:=user}
+  password: ${env.POSTGRES_PASSWORD:=password}
+  table_name: llamastack_kvstore
+inference_store:
+  type: postgres
+  host: ${env.POSTGRES_HOST:=localhost}
+  port: ${env.POSTGRES_PORT:=5432}
+  db: ${env.POSTGRES_DB:=llamastack}
+  user: ${env.POSTGRES_USER:=user}
+  password: ${env.POSTGRES_PASSWORD:=password}
+models:
+- metadata: {}
+  model_id: qwen3-32b-maas
+  provider_id: qwen
+  provider_model_id: null
+shields: []
+vector_dbs: []
+datasets: []
+scoring_fns: []
+benchmarks: []
+tool_groups:
+- toolgroup_id: builtin::rag
+  provider_id: rag-runtime
+- toolgroup_id: mcp::assisted
+  provider_id: model-context-protocol
+  mcp_endpoint:
+    uri: "http://assisted-service-mcp:8000/sse"
+server:
+  port: 8321
\ No newline at end of file