You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,7 @@ The service includes comprehensive user data collection capabilities for various
35
35
*[K8s based authentication](#k8s-based-authentication)
36
36
*[JSON Web Keyset based authentication](#json-web-keyset-based-authentication)
37
37
*[No-op authentication](#no-op-authentication)
38
+
*[RAG Configuration](#rag-configuration)
38
39
*[Usage](#usage)
39
40
*[Make targets](#make-targets)
40
41
*[Running Linux container image](#running-linux-container-image)
@@ -451,7 +452,21 @@ service:
451
452
Credentials are not allowed with wildcard origins per CORS/Fetch spec.
452
453
See https://fastapi.tiangolo.com/tutorial/cors/
453
454
455
+
# RAG Configuration
454
456
457
+
The [guide to RAG setup](docs/rag-guide.md) provides guidance on setting up RAG and includes tested examples for both inference and vector store integration.
458
+
459
+
460
+
## Example configurations for inference
461
+
462
+
The following configurations are llama-stack config examples from production deployments:
463
+
464
+
- [Granite on vLLM example](examples/vllm-granite-run.yaml)
465
+
- [Qwen3 on vLLM example](examples/vllm-qwen3-run.yaml)
466
+
- [Gemini example](examples/gemini-run.yaml)
467
+
468
+
> [!NOTE]
469
+
> RAG functionality is **not tested** for these configurations.
Copy file name to clipboardExpand all lines: docs/rag_guide.md
+124-1Lines changed: 124 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,7 +61,7 @@ Update the `run.yaml` file used by Llama Stack to point to:
61
61
* Your downloaded **embedding model**
62
62
* Your generated **vector database**
63
63
64
-
Example:
64
+
### FAISS Example
65
65
66
66
```yaml
67
67
models:
@@ -100,10 +100,115 @@ Where:
100
100
- `db_path`is the path to the vector index (.db file in this case)
101
101
- `vector_db_id`is the index ID used to generate the db
102
102
103
+
See the full working [config example](examples/openai-faiss-run.yaml) for more details.
104
+
105
+
### pgvector Example
106
+
107
+
This example shows how to configure a remote PostgreSQL database with the [pgvector](https://github.com/pgvector/pgvector) extension for storing embeddings.
108
+
109
+
> You will need to install PostgreSQL, the matching version of pgvector, then log in with `psql` and enable the extension with:
110
+
> ```sql
111
+
> CREATE EXTENSION IF NOT EXISTS vector;
112
+
> ```
113
+
114
+
Update the connection details (`host`, `port`, `db`, `user`, `password`) to match your PostgreSQL setup.
115
+
116
+
Each pgvector-backed table follows this schema:
117
+
118
+
- `id` (`text`): UUID identifier of the chunk
119
+
- `document` (`jsonb`): json containing content and metadata associated with the embedding
120
+
- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and must match the model's output size (e.g. 768 for `all-mpnet-base-v2`)
121
+
122
+
> [!NOTE]
123
+
> The vector_db_id (e.g. rhdocs) is used to point to the table named vector_store_rhdocs in the specified database, which stores the vector embeddings.
> When experimenting with different `models`, `providers` and `vector_dbs`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_dbs list`)
134
239
135
240
241
+
See the full working [config example](examples/openai-faiss-run.yaml) for more details.
242
+
243
+
### Azure OpenAI
244
+
245
+
Not yet supported.
246
+
247
+
### Ollama
248
+
249
+
The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG.
250
+
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider.
251
+
252
+
There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama.
253
+
Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released.
254
+
255
+
### vLLM Mistral
256
+
257
+
The RAG tool calls where not working properly when experimenting with `mistralai/Mistral-7B-Instruct-v0.3` on vLLM.
0 commit comments