Skip to content

Commit b71068e

Browse files
juanpardoamotl
authored andcommitted
Data modelling: Fix page about "vector data"
1 parent 536eb70 commit b71068e

File tree

1 file changed

+10
-35
lines changed

1 file changed

+10
-35
lines changed

docs/start/modelling/vector.md

Lines changed: 10 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ CrateDB natively supports **vector embeddings** for efficient **similarity searc
55
Whether you’re working with text, images, sensor data, or any domain represented as high-dimensional embeddings, CrateDB enables **real-time vector search at scale**, in combination with other data types like full-text, geospatial, and time-series.\
66

77

8-
## 1. Data Type: VECTOR
8+
## Data Type: VECTOR
99

1010
CrateDB introduces a native `VECTOR` type with the following key characteristics:
1111

@@ -27,32 +27,7 @@ CREATE TABLE documents (
2727
* `VECTOR(FLOAT[768])` declares a fixed-size vector column.
2828
* You can ingest vectors directly or compute them externally and store them via SQL
2929

30-
## 2. Indexing: Enabling Vector Search
31-
32-
To use fast similarity search, define an **HNSW index** on the vector column:
33-
34-
```sql
35-
CREATE INDEX embedding_hnsw
36-
ON documents (embedding)
37-
USING HNSW
38-
WITH (
39-
m = 16,
40-
ef_construction = 128,
41-
ef_search = 64,
42-
similarity = 'cosine'
43-
);
44-
```
45-
46-
**Parameters:**
47-
48-
* `m`: controls the number of bi-directional links per node (default: 16)
49-
* `ef_construction`: affects index build accuracy/speed (default: 128)
50-
* `ef_search`: controls recall/latency trade-off at query time
51-
* `similarity`: choose from `'cosine'`, `'l2'` (Euclidean), `'dot_product'`
52-
53-
> CrateDB automatically builds the ANN index in the background, allowing for real-time updates.
54-
55-
## 3. Querying Vectors with SQL
30+
## Querying Vectors with SQL
5631

5732
Use the `nearest_neighbors` predicate to perform similarity search:
5833

@@ -79,7 +54,7 @@ LIMIT 10;
7954
Combine vector similarity with full-text, metadata, or geospatial filters!
8055
:::
8156

82-
## 4. Ingestion: Working with Embeddings
57+
## Ingestion: Working with Embeddings
8358

8459
You can ingest vectors in several ways:
8560

@@ -92,7 +67,7 @@ You can ingest vectors in several ways:
9267
* **Batched imports** via `COPY FROM` using JSON or CSV
9368
* CrateDB doesn't currently compute embeddings internally—you bring your own model or use pipelines that call CrateDB.
9469
95-
## 5. Use Cases
70+
## Use Cases
9671
9772
| Use Case | Description |
9873
| ----------------------- | ------------------------------------------------------------------ |
@@ -112,17 +87,17 @@ ORDER BY features <-> [vector] ASC
11287
LIMIT 10;
11388
```
11489
115-
## 6. Performance & Scaling
90+
## Performance & Scaling
11691
11792
* Vector search uses **HNSW**: state-of-the-art ANN algorithm with logarithmic search complexity.
11893
* CrateDB parallelizes ANN search across shards/nodes.
11994
* Ideal for 100K to tens of millions of vectors; supports real-time ingestion and queries.
12095
12196
:::{note}
122-
Note: vector dimensionality must be consistent for each column.
97+
vector dimensionality must be consistent for each column.
12398
:::
12499
125-
## 7. Best Practices
100+
## Best Practices
126101
127102
| Area | Recommendation |
128103
| -------------- | ----------------------------------------------------------------------- |
@@ -133,19 +108,19 @@ Note: vector dimensionality must be consistent for each column.
133108
| Updates | Re-inserting or updating vectors is fully supported |
134109
| Data pipelines | Use external tools for vector generation; push to CrateDB via REST/SQL |
135110
136-
## 8. Integrations
111+
## Integrations
137112
138113
* **Python / pandas / LangChain**: CrateDB has native drivers and REST interface
139114
* **Embedding models**: Use OpenAI, HuggingFace, Cohere, or in-house models
140115
* **RAG architecture**: CrateDB stores vector + metadata + raw text in a unified store
141116
142-
## 9. Further Learning & Resources
117+
## Further Learning & Resources
143118
144119
* CrateDB Docs – Vector Search
145120
* Blog: Using CrateDB for Hybrid Search (Vector + Full-Text)
146121
* CrateDB Academy – Vector Data
147122
* [Sample notebooks on GitHub](https://github.com/crate/cratedb-examples)
148123
149-
## 10. Summary
124+
## Summary
150125
151126
CrateDB gives you the power of **vector similarity search** with the **flexibility of SQL** and the **scalability of a distributed database**. It lets you unify structured, unstructured, and semantic data—enabling modern applications in AI, search, and recommendation without additional vector databases or pipelines.

0 commit comments

Comments
 (0)