You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whether you’re working with text, images, sensor data, or any domain represented as high-dimensional embeddings, CrateDB enables **real-time vector search at scale**, in combination with other data types like full-text, geospatial, and time-series.\
6
6
7
7
8
-
## 1. Data Type: VECTOR
8
+
## Data Type: VECTOR
9
9
10
10
CrateDB introduces a native `VECTOR` type with the following key characteristics:
11
11
@@ -27,32 +27,7 @@ CREATE TABLE documents (
27
27
*`VECTOR(FLOAT[768])` declares a fixed-size vector column.
28
28
* You can ingest vectors directly or compute them externally and store them via SQL
29
29
30
-
## 2. Indexing: Enabling Vector Search
31
-
32
-
To use fast similarity search, define an **HNSW index** on the vector column:
33
-
34
-
```sql
35
-
CREATEINDEXembedding_hnsw
36
-
ON documents (embedding)
37
-
USING HNSW
38
-
WITH (
39
-
m =16,
40
-
ef_construction =128,
41
-
ef_search =64,
42
-
similarity ='cosine'
43
-
);
44
-
```
45
-
46
-
**Parameters:**
47
-
48
-
*`m`: controls the number of bi-directional links per node (default: 16)
49
-
*`ef_construction`: affects index build accuracy/speed (default: 128)
50
-
*`ef_search`: controls recall/latency trade-off at query time
51
-
*`similarity`: choose from `'cosine'`, `'l2'` (Euclidean), `'dot_product'`
52
-
53
-
> CrateDB automatically builds the ANN index in the background, allowing for real-time updates.
54
-
55
-
## 3. Querying Vectors with SQL
30
+
## Querying Vectors with SQL
56
31
57
32
Use the `nearest_neighbors` predicate to perform similarity search:
58
33
@@ -79,7 +54,7 @@ LIMIT 10;
79
54
Combine vector similarity with full-text, metadata, or geospatial filters!
80
55
:::
81
56
82
-
## 4. Ingestion: Working with Embeddings
57
+
## Ingestion: Working with Embeddings
83
58
84
59
You can ingest vectors in several ways:
85
60
@@ -92,7 +67,7 @@ You can ingest vectors in several ways:
92
67
***Batched imports** via `COPY FROM` using JSON or CSV
93
68
* CrateDB doesn't currently compute embeddings internally—you bring your own model or use pipelines that call CrateDB.
@@ -133,19 +108,19 @@ Note: vector dimensionality must be consistent for each column.
133
108
| Updates | Re-inserting or updating vectors is fully supported |
134
109
| Data pipelines | Use external tools for vector generation; push to CrateDB via REST/SQL |
135
110
136
-
## 8. Integrations
111
+
## Integrations
137
112
138
113
* **Python / pandas / LangChain**: CrateDB has native drivers and REST interface
139
114
* **Embedding models**: Use OpenAI, HuggingFace, Cohere, or in-house models
140
115
* **RAG architecture**: CrateDB stores vector + metadata + raw text in a unified store
141
116
142
-
## 9. Further Learning & Resources
117
+
## Further Learning & Resources
143
118
144
119
* CrateDB Docs – Vector Search
145
120
* Blog: Using CrateDB for Hybrid Search (Vector + Full-Text)
146
121
* CrateDB Academy – Vector Data
147
122
* [Sample notebooks on GitHub](https://github.com/crate/cratedb-examples)
148
123
149
-
## 10. Summary
124
+
## Summary
150
125
151
126
CrateDB gives you the power of **vector similarity search** with the **flexibility of SQL** and the **scalability of a distributed database**. It lets you unify structured, unstructured, and semantic data—enabling modern applications in AI, search, and recommendation without additional vector databases or pipelines.
0 commit comments