Skip to content

Commit 6771d6e

Browse files
committed
Adding files
1 parent 1d84fcf commit 6771d6e

File tree

3 files changed

+107
-17
lines changed

3 files changed

+107
-17
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,7 @@ The following configurations are llama-stack config examples from production dep
463463
- [Granite on vLLM example](examples/vllm-granite-run.yaml)
464464
- [Qwen3 on vLLM example](examples/vllm-qwen3-run.yaml)
465465
- [Gemini example](examples/gemini-run.yaml)
466+
- [VertexAI example](examples/vertexai-run.yaml)
466467

467468
> [!NOTE]
468469
> RAG functionality is **not tested** for these configurations.

docs/rag_guide.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Update the `run.yaml` file used by Llama Stack to point to:
6161
* Your downloaded **embedding model**
6262
* Your generated **vector database**
6363

64-
### FAISS Example
64+
### FAISS example
6565

6666
```yaml
6767
models:
@@ -102,11 +102,11 @@ Where:
102102

103103
See the full working [config example](examples/openai-faiss-run.yaml) for more details.
104104

105-
### pgvector Example
105+
### pgvector example
106106

107107
This example shows how to configure a remote PostgreSQL database with the [pgvector](https://github.com/pgvector/pgvector) extension for storing embeddings.
108108

109-
> You will need to install PostgreSQL, the matching version of pgvector, then log in with `psql` and enable the extension with:
109+
> You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with:
110110
> ```sql
111111
> CREATE EXTENSION IF NOT EXISTS vector;
112112
> ```
@@ -117,10 +117,10 @@ Each pgvector-backed table follows this schema:
117117

118118
- `id` (`text`): UUID identifier of the chunk
119119
- `document` (`jsonb`): json containing content and metadata associated with the embedding
120-
- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and must match the model's output size (e.g. 768 for `all-mpnet-base-v2`)
120+
- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and will match the model's output size (e.g. 768 for `all-mpnet-base-v2`)
121121

122122
> [!NOTE]
123-
> The vector_db_id (e.g. rhdocs) is used to point to the table named vector_store_rhdocs in the specified database, which stores the vector embeddings.
123+
> The `vector_db_id` (e.g. `rhdocs`) is used to point to the table named `vector_store_rhdocs` in the specified database, which stores the vector embeddings.
124124

125125

126126
```yaml
@@ -146,6 +146,7 @@ vector_dbs:
146146
provider_id: pgvector-example
147147
# A unique ID that becomes the PostgreSQL table name, prefixed with 'vector_store_'.
148148
# e.g., 'rhdocs' will create the table 'vector_store_rhdocs'.
149+
# If the table was already created, this value must match the ID used at creation.
149150
vector_db_id: rhdocs
150151
```
151152

@@ -155,7 +156,10 @@ See the full working [config example](examples/openai-pgvector-run.yaml) for mor
155156

156157
## Add an Inference Model (LLM)
157158

158-
### vLLM on RHEL AI (Llama 3.1) Example
159+
### vLLM on RHEL AI (Llama 3.1) example
160+
161+
> [!NOTE]
162+
> The following example assumes that podman's CDI has been properly configured to [enable GPU support](https://podman-desktop.io/docs/podman/gpu).
159163

160164
The [`vllm-openai`](https://hub.docker.com/r/vllm/vllm-openai) Docker image is used to serve the Llama-3.1-8B-Instruct model.
161165
The following example shows how to run it on **RHEL AI** with `podman`:
@@ -178,19 +182,13 @@ podman run \
178182
> For other supported models and configuration options, see the vLLM documentation:
179183
> [vLLM: Tool Calling](https://docs.vllm.ai/en/stable/features/tool_calling.html)
180184

181-
After starting the container, you can check which model is being served by running:
182-
183-
```bash
184-
curl http://localhost:8000/v1/models # Replace localhost with the url of the vLLM instance
185-
```
186-
187-
The response will include the `model_id`, which you can then use in your `run.yaml` configuration.
185+
After starting the container edit your `run.yaml` file, matching `model_id` with the model provided in the `podman run` command.
188186

189187
```yaml
190188
[...]
191189
models:
192190
[...]
193-
- model_id: meta-llama/Llama-3.1-8B-Instruct
191+
- model_id: meta-llama/Llama-3.1-8B-Instruct # Same as the model name in the 'podman run' command
194192
provider_id: vllm
195193
model_type: llm
196194
provider_model_id: null
@@ -201,13 +199,13 @@ providers:
201199
- provider_id: vllm
202200
provider_type: remote::vllm
203201
config:
204-
url: http://localhost:8000/v1/ # Replace localhost with the url of the vLLM instance
205-
api_token: <your-key-here>
202+
url: http://localhost:${env.EXPORTED_PORT:=8000}/v1/ # Replace localhost with the url of the vLLM instance
203+
api_token: <your-key-here> # if any
206204
```
207205

208206
See the full working [config example](examples/vllm-llama-faiss-run.yaml) for more details.
209207

210-
### OpenAI Example
208+
### OpenAI example
211209

212210
Add a provider for your language model (e.g., OpenAI):
213211

examples/vertexai-run.yaml

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Example llama-stack configuration for VertexAI inference
2+
#
3+
# Contributed by @eloycoto (2025-08). See https://github.com/rhdhorchestrator/LS-core-test/blob/master/run-llama-stack.yaml
4+
# This file shows how to integrate VertexAI with LCS.
5+
#
6+
# Notes:
7+
# - You will need to configure Gemini inference on VertexAI.
8+
#
9+
version: '3'
10+
image_name: ollama-llama-stack-config
11+
apis:
12+
- agents
13+
- inference
14+
- safety
15+
- telemetry
16+
- tool_runtime
17+
- vector_io
18+
logging:
19+
level: DEBUG # Set root logger to DEBUG
20+
category_levels:
21+
llama_stack: DEBUG # Enable DEBUG for all llama_stack modules
22+
llama_stack.providers.remote.inference.vllm: DEBUG
23+
llama_stack.providers.inline.agents.meta_reference: DEBUG
24+
llama_stack.providers.inline.agents.meta_reference.agent_instance: DEBUG
25+
llama_stack.providers.inline.vector_io.faiss: DEBUG
26+
llama_stack.providers.inline.telemetry.meta_reference: DEBUG
27+
llama_stack.core: DEBUG
28+
llama_stack.apis: DEBUG
29+
uvicorn: DEBUG
30+
uvicorn.access: INFO # Keep HTTP requests at INFO to reduce noise
31+
fastapi: DEBUG
32+
33+
providers:
34+
vector_io:
35+
- config:
36+
kvstore:
37+
db_path: /tmp/faiss_store.db
38+
type: sqlite
39+
provider_id: faiss
40+
provider_type: inline::faiss
41+
42+
agents:
43+
- config:
44+
persistence_store:
45+
db_path: /tmp/agents_store.db
46+
namespace: null
47+
type: sqlite
48+
responses_store:
49+
db_path: /tmp/responses_store.db
50+
type: sqlite
51+
provider_id: meta-reference
52+
provider_type: inline::meta-reference
53+
54+
55+
inference:
56+
- provider_id: vllm-inference
57+
provider_type: remote::vllm
58+
config:
59+
url: ${env.VLLM_URL:=http://localhost:8000/v1}
60+
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
61+
api_token: ${env.VLLM_API_TOKEN:=fake}
62+
tls_verify: ${env.VLLM_TLS_VERIFY:=false}
63+
64+
- provider_id: google-vertex
65+
provider_type: remote::vertexai
66+
config:
67+
project: ${env.VERTEXAI_PROJECT}
68+
region: ${env.VERTEXAI_REGION:=us-east5}
69+
70+
tool_runtime:
71+
- provider_id: model-context-protocol
72+
provider_type: remote::model-context-protocol
73+
config: {}
74+
module: null
75+
76+
telemetry:
77+
- config:
78+
service_name: 'llama-stack'
79+
sinks: console,sqlite
80+
sqlite_db_path: /tmp/trace_store.db
81+
provider_id: meta-reference
82+
provider_type: inline::meta-reference
83+
84+
metadata_store:
85+
type: sqlite
86+
db_path: /tmp/registry.db
87+
namespace: null
88+
89+
inference_store:
90+
type: sqlite
91+
db_path: /tmp/inference_store.db

0 commit comments

Comments
 (0)