You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/rag_guide.md
+15-17Lines changed: 15 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,7 +61,7 @@ Update the `run.yaml` file used by Llama Stack to point to:
61
61
* Your downloaded **embedding model**
62
62
* Your generated **vector database**
63
63
64
-
### FAISS Example
64
+
### FAISS example
65
65
66
66
```yaml
67
67
models:
@@ -102,11 +102,11 @@ Where:
102
102
103
103
See the full working [config example](examples/openai-faiss-run.yaml) for more details.
104
104
105
-
### pgvector Example
105
+
### pgvector example
106
106
107
107
This example shows how to configure a remote PostgreSQL database with the [pgvector](https://github.com/pgvector/pgvector) extension for storing embeddings.
108
108
109
-
> You will need to install PostgreSQL, the matching version of pgvector, then log in with `psql` and enable the extension with:
109
+
> You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with:
110
110
> ```sql
111
111
> CREATE EXTENSION IF NOT EXISTS vector;
112
112
> ```
@@ -117,10 +117,10 @@ Each pgvector-backed table follows this schema:
117
117
118
118
- `id` (`text`): UUID identifier of the chunk
119
119
- `document` (`jsonb`): json containing content and metadata associated with the embedding
120
-
- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and must match the model's output size (e.g. 768 for `all-mpnet-base-v2`)
120
+
- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and will match the model's output size (e.g. 768 for `all-mpnet-base-v2`)
121
121
122
122
> [!NOTE]
123
-
> The vector_db_id (e.g. rhdocs) is used to point to the table named vector_store_rhdocs in the specified database, which stores the vector embeddings.
123
+
> The `vector_db_id` (e.g. `rhdocs`) is used to point to the table named `vector_store_rhdocs` in the specified database, which stores the vector embeddings.
124
124
125
125
126
126
```yaml
@@ -146,6 +146,7 @@ vector_dbs:
146
146
provider_id: pgvector-example
147
147
# A unique ID that becomes the PostgreSQL table name, prefixed with 'vector_store_'.
148
148
# e.g., 'rhdocs' will create the table 'vector_store_rhdocs'.
149
+
# If the table was already created, this value must match the ID used at creation.
149
150
vector_db_id: rhdocs
150
151
```
151
152
@@ -155,7 +156,10 @@ See the full working [config example](examples/openai-pgvector-run.yaml) for mor
155
156
156
157
## Add an Inference Model (LLM)
157
158
158
-
### vLLM on RHEL AI (Llama 3.1) Example
159
+
### vLLM on RHEL AI (Llama 3.1) example
160
+
161
+
> [!NOTE]
162
+
> The following example assumes that podman's CDI has been properly configured to [enable GPU support](https://podman-desktop.io/docs/podman/gpu).
159
163
160
164
The [`vllm-openai`](https://hub.docker.com/r/vllm/vllm-openai) Docker image is used to serve the Llama-3.1-8B-Instruct model.
161
165
The following example shows how to run it on **RHEL AI** with `podman`:
@@ -178,19 +182,13 @@ podman run \
178
182
> For other supported models and configuration options, see the vLLM documentation:
0 commit comments