Data modelling: Fix page about "vector data"

juanpardo · amotl · commit b71068e1a5dc · 2025-08-23T13:17:02.000+02:00
diff --git a/docs/start/modelling/vector.md b/docs/start/modelling/vector.md
@@ -5,7 +5,7 @@ CrateDB natively supports **vector embeddings** for efficient **similarity searc
 Whether you’re working with text, images, sensor data, or any domain represented as high-dimensional embeddings, CrateDB enables **real-time vector search at scale**, in combination with other data types like full-text, geospatial, and time-series.\
 
 
-## 1. Data Type: VECTOR
+## Data Type: VECTOR
 
 CrateDB introduces a native `VECTOR` type with the following key characteristics:
 
@@ -27,32 +27,7 @@ CREATE TABLE documents (
 * `VECTOR(FLOAT[768])` declares a fixed-size vector column.
 * You can ingest vectors directly or compute them externally and store them via SQL
 
-## 2. Indexing: Enabling Vector Search
-
-To use fast similarity search, define an **HNSW index** on the vector column:
-
-```sql
-CREATE INDEX embedding_hnsw
-ON documents (embedding)
-USING HNSW
-WITH (
-  m = 16,
-  ef_construction = 128,
-  ef_search = 64,
-  similarity = 'cosine'
-);
-```
-
-**Parameters:**
-
-* `m`: controls the number of bi-directional links per node (default: 16)
-* `ef_construction`: affects index build accuracy/speed (default: 128)
-* `ef_search`: controls recall/latency trade-off at query time
-* `similarity`: choose from `'cosine'`, `'l2'` (Euclidean), `'dot_product'`
-
-> CrateDB automatically builds the ANN index in the background, allowing for real-time updates.
-
-## 3. Querying Vectors with SQL
+## Querying Vectors with SQL
 
 Use the `nearest_neighbors` predicate to perform similarity search:
 
@@ -79,7 +54,7 @@ LIMIT 10;
 Combine vector similarity with full-text, metadata, or geospatial filters!
 :::
 
-## 4. Ingestion: Working with Embeddings
+## Ingestion: Working with Embeddings
 
 You can ingest vectors in several ways:
 
@@ -92,7 +67,7 @@ You can ingest vectors in several ways:
 * **Batched imports** via `COPY FROM` using JSON or CSV
 * CrateDB doesn't currently compute embeddings internally—you bring your own model or use pipelines that call CrateDB.
 
-## 5. Use Cases
+## Use Cases
 
 | Use Case                | Description                                                        |
 | ----------------------- | ------------------------------------------------------------------ |
@@ -112,17 +87,17 @@ ORDER BY features <-> [vector] ASC
 LIMIT 10;
 ```
 
-## 6. Performance & Scaling
+## Performance & Scaling
 
 * Vector search uses **HNSW**: state-of-the-art ANN algorithm with logarithmic search complexity.
 * CrateDB parallelizes ANN search across shards/nodes.
 * Ideal for 100K to tens of millions of vectors; supports real-time ingestion and queries.
 
 :::{note}
-Note: vector dimensionality must be consistent for each column.
+vector dimensionality must be consistent for each column.
 :::
 
-## 7. Best Practices
+## Best Practices
 
 | Area           | Recommendation                                                          |
 | -------------- | ----------------------------------------------------------------------- |
@@ -133,19 +108,19 @@ Note: vector dimensionality must be consistent for each column.
 | Updates        | Re-inserting or updating vectors is fully supported                     |
 | Data pipelines | Use external tools for vector generation; push to CrateDB via REST/SQL  |
 
-## 8. Integrations
+## Integrations
 
 * **Python / pandas / LangChain**: CrateDB has native drivers and REST interface
 * **Embedding models**: Use OpenAI, HuggingFace, Cohere, or in-house models
 * **RAG architecture**: CrateDB stores vector + metadata + raw text in a unified store
 
-## 9. Further Learning & Resources
+## Further Learning & Resources
 
 * CrateDB Docs – Vector Search
 * Blog: Using CrateDB for Hybrid Search (Vector + Full-Text)
 * CrateDB Academy – Vector Data
 * [Sample notebooks on GitHub](https://github.com/crate/cratedb-examples)
 
-## 10. Summary
+## Summary
 
 CrateDB gives you the power of **vector similarity search** with the **flexibility of SQL** and the **scalability of a distributed database**. It lets you unify structured, unstructured, and semantic data—enabling modern applications in AI, search, and recommendation without additional vector databases or pipelines.