diff --git a/docs/start/connect.md b/docs/connect/connect.md similarity index 100% rename from docs/start/connect.md rename to docs/connect/connect.md diff --git a/docs/connect/index.md b/docs/connect/index.md index 58e4e24b..130004d0 100644 --- a/docs/connect/index.md +++ b/docs/connect/index.md @@ -109,6 +109,7 @@ Database driver connection examples. :hidden: configure +connect CLI programs ide Drivers @@ -137,7 +138,7 @@ ruby [CrateDB PostgreSQL interface]: inv:crate-reference:*:label#interface-postgresql [HTTP interface]: inv:crate-reference:*:label#interface-http [HTTP protocol]: https://en.wikipedia.org/wiki/HTTP -[JDBC]: https://en.wikipedia.org/wiki/Java_Database_Connectivity +[JDBC]: https://en.wikipedia.org/wiki/Java_Database_Connectivity [ODBC]: https://en.wikipedia.org/wiki/Open_Database_Connectivity [PostgreSQL interface]: inv:crate-reference:*:label#interface-postgresql [PostgreSQL wire protocol]: https://www.postgresql.org/docs/current/protocol.html diff --git a/docs/index.md b/docs/index.md index 596306ce..8aa577a0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,7 +9,7 @@ Guides and tutorials about how to use CrateDB and CrateDB Cloud in practice. -::::{grid} 1 2 2 2 +::::{grid} 4 :padding: 0 @@ -17,7 +17,7 @@ Guides and tutorials about how to use CrateDB and CrateDB Cloud in practice. :link: getting-started :link-type: ref :link-alt: Getting started with CrateDB -:padding: 3 +:padding: 1 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 @@ -31,7 +31,7 @@ Guides and tutorials about how to use CrateDB and CrateDB Cloud in practice. :link: install :link-type: ref :link-alt: Installing CrateDB -:padding: 3 +:padding: 1 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 @@ -45,7 +45,7 @@ Guides and tutorials about how to use CrateDB and CrateDB Cloud in practice. :link: administration :link-type: ref :link-alt: CrateDB Administration -:padding: 3 +:padding: 1 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 @@ -59,7 +59,7 @@ Guides and tutorials about how to use CrateDB and CrateDB Cloud in practice. :link: performance :link-type: ref :link-alt: CrateDB Performance Guides -:padding: 3 +:padding: 1 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 diff --git a/docs/performance/inserts/index.rst b/docs/performance/inserts/index.rst index 6934363e..e11462ac 100644 --- a/docs/performance/inserts/index.rst +++ b/docs/performance/inserts/index.rst @@ -30,6 +30,5 @@ This section of the guide will show you how. parallel tuning testing - sequences .. _Abstract Syntax Tree: https://en.wikipedia.org/wiki/Abstract_syntax_tree diff --git a/docs/performance/inserts/sequences.rst b/docs/performance/inserts/sequences.rst deleted file mode 100644 index d381d931..00000000 --- a/docs/performance/inserts/sequences.rst +++ /dev/null @@ -1,205 +0,0 @@ -.. _autogenerated_sequences_performance: - -########################################################### - Autogenerated sequences and PRIMARY KEY values in CrateDB -########################################################### - -As you begin working with CrateDB, you might be puzzled why CrateDB does not -have a built-in, auto-incrementing "serial" data type as PostgreSQL or MySQL. - -As a distributed database, designed to scale horizontally, CrateDB needs as many -operations as possible to complete independently on each node without any -coordination between nodes. - -Maintaining a global auto-increment value requires that a node checks with other -nodes before allocating a new value. This bottleneck would be hindering our -ability to achieve `extremely fast ingestion speeds`_. - -That said, there are many alternatives available and we can also implement true -consistent/synchronized sequences if we want to. - -************************************ - Using a timestamp as a primary key -************************************ - -This option involves declaring a column as follows: - -.. code:: psql - - BIGINT DEFAULT now() PRIMARY KEY - -:Pros: - Always increasing number - ideal if we need to timestamp records creation - anyway - -:Cons: - gaps between the numbers, not suitable if we may have more than one record on - the same millisecond - -************* - Using UUIDs -************* - -This option involves declaring a column as follows: - -.. code:: psql - - TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY - -:Pros: - Globally unique, no risk of conflicts if merging things from different - tables/environments - -:Cons: - No order guarantee. Not as human-friendly as numbers. String format may not - be applicable to cover all scenarios. Range queries are not possible. - -************************ - Use UUIDv7 identifiers -************************ - -`Version 7 UUIDs`_ are a relatively new kind of UUIDs which feature a -time-ordered value. We can use these in CrateDB with an UDF_ with the code from -`UUIDv7 in N languages`_. - -:Pros: - Same as `gen_random_text_uuid` above but almost sequential, which enables - range queries. - -:Cons: - not as human-friendly as numbers and slight performance impact from UDF use - -********************************* - Use IDs from an external system -********************************* - -In cases where data is imported into CrateDB from external systems that employ -identifier governance, CrateDB does not need to generate any identifier values -and primary key values can be inserted as-is from the source system. - -See `Replicating data from other databases to CrateDB with Debezium and Kafka`_ -for an example. - -********************* - Implement sequences -********************* - -This approach involves a table to keep the latest values that have been consumed -and client side code to keep it up-to-date in a way that guarantees unique -values even when many ingestion processes run in parallel. - -:Pros: - Can have any arbitrary type of sequences, (we may for instance want to - increment values by 10 instead of 1 - prefix values with a year number - - combine numbers and letters - etc) - -:Cons: - Need logic for the optimistic update implemented client-side, the sequences - table becomes a bottleneck so not suitable for high-velocity ingestion - scenarios - -We will first create a table to keep the latest values for our sequences: - -.. code:: psql - - CREATE TABLE sequences ( - name TEXT PRIMARY KEY, - last_value BIGINT - ) CLUSTERED INTO 1 SHARDS; - -We will then initialize it with one new sequence at 0: - -.. code:: psql - - INSERT INTO sequences (name,last_value) - VALUES ('mysequence',0); - -And we are going to do an example with a new table defined as follows: - -.. code:: psql - - CREATE TABLE mytable ( - id BIGINT PRIMARY KEY, - field1 TEXT - ); - -The Python code below reads the last value used from the sequences table, and -then attempts an `optimistic UPDATE`_ with a ``RETURNING`` clause, if a -contending process already consumed the identity nothing will be returned so our -process will retry until a value is returned, then it uses that value as the new -ID for the record we are inserting into the ``mytable`` table. - -.. code:: python - - # /// script - # requires-python = ">=3.8" - # dependencies = [ - # "records", - # "sqlalchemy-cratedb", - # ] - # /// - - import time - - import records - - db = records.Database("crate://") - sequence_name = "mysequence" - - max_retries = 5 - base_delay = 0.1 # 100 milliseconds - - for attempt in range(max_retries): - select_query = """ - SELECT last_value, - _seq_no, - _primary_term - FROM sequences - WHERE name = :sequence_name; - """ - row = db.query(select_query, sequence_name=sequence_name).first() - new_value = row.last_value + 1 - - update_query = """ - UPDATE sequences - SET last_value = :new_value - WHERE name = :sequence_name - AND _seq_no = :seq_no - AND _primary_term = :primary_term - RETURNING last_value; - """ - if ( - str( - db.query( - update_query, - new_value=new_value, - sequence_name=sequence_name, - seq_no=row._seq_no, - primary_term=row._primary_term, - ).all() - ) - != "[]" - ): - break - - delay = base_delay * (2**attempt) - print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f} seconds...") - time.sleep(delay) - else: - raise Exception(f"Failed after {max_retries} retries with exponential backoff") - - insert_query = "INSERT INTO mytable (id, field1) VALUES (:id, :field1)" - db.query(insert_query, id=new_value, field1="abc") - db.close() - -.. _extremely fast ingestion speeds: https://cratedb.com/blog/how-we-scaled-ingestion-to-one-million-rows-per-second - -.. _optimistic update: https://cratedb.com/docs/crate/reference/en/latest/general/occ.html#optimistic-update - -.. _replicating data from other databases to cratedb with debezium and kafka: https://cratedb.com/blog/replicating-data-from-other-databases-to-cratedb-with-debezium-and-kafka - -.. _udf: https://cratedb.com/docs/crate/reference/en/latest/general/user-defined-functions.html - -.. _uuidv7 in n languages: https://github.com/nalgeon/uuidv7/blob/main/src/uuidv7.cratedb - -.. _version 7 uuids: https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-7 diff --git a/docs/start/going-further.md b/docs/start/going-further.md index 87436921..3c0e3b0d 100644 --- a/docs/start/going-further.md +++ b/docs/start/going-further.md @@ -18,8 +18,9 @@ of the documentation portal. ::: :::{sd-row} -```{sd-item} Data modelling +```{sd-item} :class: sd-font-weight-bolder +{ref}`Data modelling ` ``` ```{sd-item} Learn the different types of structured, semi-structured, and unstructured data. diff --git a/docs/start/index.md b/docs/start/index.md index 896d9fbf..c71a511f 100644 --- a/docs/start/index.md +++ b/docs/start/index.md @@ -18,6 +18,7 @@ and explore key features. :link: first-steps :link-type: ref :link-alt: First steps with CrateDB +:columns: 6 3 3 3 :padding: 3 :text-align: center :class-card: sd-pt-3 @@ -31,6 +32,7 @@ and explore key features. :link: connect :link-type: ref :link-alt: Connect to CrateDB +:columns: 6 3 3 3 :padding: 3 :text-align: center :class-card: sd-pt-3 @@ -44,6 +46,7 @@ and explore key features. :link: query-capabilities :link-type: ref :link-alt: Query Capabilities +:columns: 6 3 3 3 :padding: 3 :text-align: center :class-card: sd-pt-3 @@ -57,6 +60,7 @@ and explore key features. :link: ingest :link-type: ref :link-alt: Ingesting Data +:columns: 6 3 3 3 :padding: 3 :text-align: center :class-card: sd-pt-3 @@ -78,6 +82,7 @@ and explore key features. :link: example-applications :link-type: ref :link-alt: Sample Applications +:columns: 6 3 3 3 :padding: 3 :text-align: center :class-card: sd-pt-3 @@ -91,6 +96,7 @@ and explore key features. :link: start-going-further :link-type: ref :link-alt: Going Further +:columns: 6 3 3 3 :padding: 3 :text-align: center :class-card: sd-pt-3 @@ -108,11 +114,11 @@ and explore key features. :hidden: first-steps -connect +going-further +modelling/index query/index Ingesting data <../ingest/index> application/index -going-further ``` diff --git a/docs/start/modelling/fulltext.md b/docs/start/modelling/fulltext.md new file mode 100644 index 00000000..2e9ab927 --- /dev/null +++ b/docs/start/modelling/fulltext.md @@ -0,0 +1,160 @@ +(model-fulltext)= +# Full-text data + +CrateDB offers **native full-text search** powered by **Apache Lucene** and Okapi +BM25 ranking, accessible via SQL for easy modelling and querying of large-scale +textual data. It supports fuzzy matching, multi-language analysis, and composite +indexing, while fully integrating with data types such as JSON, time-series, +geospatial, vectors, and more for comprehensive multi-model queries. Whether you +need document search, catalog lookup, or content analytics, CrateDB is an ideal +solution. + +## Data Types & Indexing + +By default, all text columns are indexed as `plain` (raw, unanalyzed)—efficient +for equality search but not suitable for full-text queries. + +To use full-text search, add a FULLTEXT index with an optional analyzer to the +text columns you want to search: + +```sql +CREATE TABLE documents ( + title TEXT, + body TEXT, + INDEX ft_title USING FULLTEXT(title) WITH (analyzer = 'english'), + INDEX ft_body USING FULLTEXT(body) WITH (analyzer = 'english') +); +``` + +You can also index multiple columns with **composite full-text indices**: + +```sql +INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english'); +``` + +For detailed options, check out the [Reference Manual](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/fulltext-indices.html). + +## Analyzers + +An analyzer splits text into searchable terms and consists of the following components: + +* **Tokenizer -** splits on whitespace/characters +* **Token Filters -** e.g. lowercase, stemming, stop‑word removal +* **Char Filters -** pre-processing (e.g. stripping HTML). + +CrateDB offers about 50 [**built-in analyzers**](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/analyzers.html#built-in-analyzers) supporting more than 30 [languages](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/analyzers.html#language). + +You can **extend** a built-in analyzer: + +```sql +CREATE ANALYZER german_snowball + EXTENDS snowball + WITH (language = 'german'); +``` + +or create your own **custom** analyzer: + +```sql +CREATE ANALYZER myanalyzer ( + TOKENIZER whitespace, + TOKEN_FILTERS (lowercase, kstem), + CHAR_FILTERS (html_strip) +); +``` + +Learn more about the [built-in analyzers](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/analyzers.html#built-in-analyzers) and how to [define your own](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/fulltext-indices.html#creating-a-custom-analyzer) with custom [tokenizers](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/analyzers.html#built-in-tokenizers) and [token filters.](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/analyzers.html#built-in-token-filters) + + +## Querying: MATCH Predicate & Scoring + +CrateDB uses the SQL `MATCH` predicate to run full‑text queries against +full‑text indices. It optionally returns a relevance score `_score`, ranked via +BM25. + +**Basic usage:** + +```sql +SELECT title, _score FROM documents +WHERE MATCH(ft_body, 'search term') +ORDER BY _score DESC; +``` + +**Searching multiple indices with weighted ranking:** + +```sql +SELECT title, _score FROM documents +WHERE MATCH((ft_body, ft_title 2.0), 'search term'); +ORDER BY _score DESC; +``` +Here `ft_title` is weighted twice as much as `ft_body`. + +**You can configure match options like:** + +* `using best_fields` (default) +* `fuzziness = 1` (tolerate minor typos) +* `operator = 'AND'` or `OR` +* `slop = N` for phrase proximity + +**Example: Fuzzy Search** + +```sql +SELECT title, _score +FROM documents +WHERE MATCH(ft_body, 'Jamse') USING best_fields WITH (fuzziness = 2) +ORDER BY _score DESC; +``` + +This matches similar words like ‘James’. + +**Example: Multi‑language Composite Search** + +```sql +CREATE TABLE documents ( + title TEXT, + body TEXT, + INDEX ft_en USING FULLTEXT(body) WITH (analyzer = 'english'), + INDEX ft_de USING FULLTEXT(body) WITH (analyzer = 'german') +); + +SELECT title, _score +FROM documents +WHERE MATCH((ft_en, ft_de), 'jupm OR verwrlost') USING best_fields WITH (fuzziness = 1) +ORDER BY _score DESC; +``` + +## Use Cases & Integration + +CrateDB is ideal for searching **semi-structured large text data**—product +catalogs, article archives, user-generated content, descriptions and logs. + +Because full-text indices are updated in real-time, search results reflect newly +ingested data almost instantly. This tight integration avoids the complexity of +maintaining separate search infrastructure. + +You can **combine full-text search with other data domains**, for example: + +```sql +SELECT * +FROM listings +WHERE + MATCH(ft_desc, 'garden deck') AND + price < 500000 AND + within(location, :polygon); +``` + +This blend lets you query by text relevance, numeric filters, and spatial +constraints, all in one. + +## Further Learning & Resources + +* [**Full-text Search**](../../feature/search/fts/index.md): In-depth + walkthrough of full-text search capabilities. +* Reference Manual: + * {ref}`Full-text indices `: Defining + indices, extending builtin analyzers, custom analyzers. + * {ref}`Full-text analyzers `: Builtin + analyzers, tokenizers, token and char filters. + * {ref}`SQL MATCH predicate `: + Details about MATCH predicate arguments and options. +* [**Hands‑On Academy Course**](https://learn.cratedb.com/cratedb-fundamentals?lesson=fulltext-search): + explore FTS on real datasets (e.g. Chicago neighborhoods). diff --git a/docs/start/modelling/geospatial.md b/docs/start/modelling/geospatial.md new file mode 100644 index 00000000..0ff74f8e --- /dev/null +++ b/docs/start/modelling/geospatial.md @@ -0,0 +1,97 @@ +(model-geospatial)= +# Geospatial data + +CrateDB supports **real-time geospatial analytics at scale**, enabling you to +store, query, and analyze 2D location-based data using standard SQL over two +dedicated types: **GEO\_POINT** and **GEO\_SHAPE**. You can seamlessly combine +spatial data with full-text, vector, JSON, or time-series in the same SQL +queries. + +The strength of CrateDB's support for geospatial data includes: + +* Designed for **real-time geospatial tracking and analytics** (e.g., fleet + tracking, mapping, location-layered apps) +* **Unified SQL platform**: spatial data can be combined with full-text search, + JSON, vectors, time-series — in the same table or query +* **High ingest and query throughput**, suitable for large-scale location-based + workloads + +## Geospatial Data Types + +CrateDB has two geospatial data types: + +### GEO_POINT + +* Stores a single location via latitude/longitude. +* Insert using + * coordinate array `[lon, lat]` + * [Well-Known Text](https://libgeos.org/specifications/wkt/) (WKT) string + `'POINT (lon lat)'`. +* Must be declared explicitly; dynamic schema inference will not detect + `geo_point` type. + +### GEO_SHAPE + +* Represents more complex 2D shapes defined via GeoJSON or WKT formats. +* Supported geometry types: + * `Point`, `MultiPoint` + * `LineString`, `MultiLineString` + * `Polygon`, `MultiPolygon` + * `GeometryCollection` +* Indexed using geohash, quadtree, or BKD-tree, with configurable precision + (e.g. `50m`) and error threshold. The indexes are described in the [reference + manual](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#type-geo-shape-index). + You can choose and configure the indexing method when defining your table + schema. + +## Defining a Geospatial Column + +Here’s an example of how to define a `GEO_SHAPE` column with a specific index: + +```sql +CREATE TABLE parks ( + name TEXT, + area GEO_SHAPE INDEX USING quadtree WITH (precision = '50m') +); +``` + +## Inserting Geospatial Data + +You can insert geospatial values using either **GeoJSON** or **WKT** formats. + +```sql +-- Insert a shape (WKT format) +INSERT INTO parks (name, area) +VALUES ('My Park', 'POLYGON ((5 5, 30 5, 30 30, 5 30, 5 5))'); +``` + +## Querying with spatial operations + +For example, check whether a point lies within a park: + +```sql +SELECT name FROM parks +WHERE within('POINT(10 10)'::geo_shape, area); +``` + +CrateDB provides key scalar functions for spatial operations such as `distance(...)`, +`within(...)`, `intersects(...)`, `area(...)`, `geohash(...)`, `latitude(...)` and `longitude(...)`. + +Furthermore, it is possible to use the **match** predicate with geospatial data +in queries. + +See {ref}`Geo Search ` for details. + +## Further Learning & Resources + +* Reference manual: + * {ref}`Geo Search ` + * {ref}`Geo functions `: distance, within, + intersects, latitude, longitude, geohash, area +* CrateDB Academy [**Hands-on: Geospatial + Data**](https://cratedb.com/academy/fundamentals/data-modelling-with-cratedb/hands-on-geospatial-data) + modules, with sample datasets (Chicago 311 calls, taxi rides, community zones) + and example queries. +* CrateDB Blog: [**Geospatial Queries with + CrateDB**](https://cratedb.com/blog/geospatial-queries-with-crate-data) – + outlines capabilities, limitations, and practical use cases. diff --git a/docs/start/modelling/index.md b/docs/start/modelling/index.md new file mode 100644 index 00000000..ae468744 --- /dev/null +++ b/docs/start/modelling/index.md @@ -0,0 +1,120 @@ +(modelling)= +(data-modelling)= +# Data modelling + +:::{div} sd-text-muted +CrateDB provides a unified storage engine that supports different data types. +::: + +:::::{grid} 2 3 3 3 +:padding: 0 +:class-container: installation-grid + +::::{grid-item-card} Relational data +:link: model-relational +:link-type: ref +:link-alt: Relational data +:padding: 3 +:text-align: center +:class-card: sd-pt-3 +:class-body: sd-fs-1 +:class-title: sd-fs-6 + +{fas}`table-list` +:::: + +::::{grid-item-card} JSON data +:link: model-json +:link-type: ref +:link-alt: JSON data +:padding: 3 +:text-align: center +:class-card: sd-pt-3 +:class-body: sd-fs-1 +:class-title: sd-fs-6 + +{fas}`file-lines` +:::: + +::::{grid-item-card} Timeseries data +:link: model-timeseries +:link-type: ref +:link-alt: Timeseries data +:padding: 3 +:text-align: center +:class-card: sd-pt-3 +:class-body: sd-fs-1 +:class-title: sd-fs-6 + +{fas}`timeline` +:::: + +::::{grid-item-card} Geospatial data +:link: model-geospatial +:link-type: ref +:link-alt: Geospatial data +:padding: 3 +:text-align: center +:class-card: sd-pt-3 +:class-body: sd-fs-1 +:class-title: sd-fs-6 + +{fas}`globe` +:::: + +::::{grid-item-card} Fulltext data +:link: model-fulltext +:link-type: ref +:link-alt: Fulltext data +:padding: 3 +:text-align: center +:class-card: sd-pt-3 +:class-body: sd-fs-1 +:class-title: sd-fs-6 + +{fas}`font` +:::: + +::::{grid-item-card} Vector data +:link: model-vector +:link-type: ref +:link-alt: Vector data +:padding: 3 +:text-align: center +:class-card: sd-pt-3 +:class-body: sd-fs-1 +:class-title: sd-fs-6 + +{fas}`lightbulb` +:::: + +::::: + + +```{toctree} +:maxdepth: 1 +:hidden: + +relational +json +timeseries +geospatial +fulltext +vector +``` + + +:::{card} Primary key strategies +:link: model-primary-key +:link-type: ref +CrateDB is built for horizontal scalability and high ingestion throughput. +To achieve this, auto-incrementing primary keys are not supported, and other +solutions are required instead. +::: + +```{toctree} +:maxdepth: 1 +:hidden: + +Primary key strategies +``` diff --git a/docs/start/modelling/json.md b/docs/start/modelling/json.md new file mode 100644 index 00000000..ea4b65b5 --- /dev/null +++ b/docs/start/modelling/json.md @@ -0,0 +1,177 @@ +(model-json)= +# JSON data + +CrateDB combines the flexibility of NoSQL document stores with the power of SQL. +It enables you to store, query, and index **semi-structured JSON data** using +**standard SQL**, making it an excellent choice for applications that handle +diverse or evolving schemas. + +CrateDB’s support for dynamic objects, nested structures, and bracket notation +querying brings the best of both relational and document-based data +modelling — without leaving the SQL world. + +## A Simple Table with JSON + +CrateDB allows you to define **object columns** that can store JSON-style data +structures. + +```sql +CREATE TABLE events ( + id TEXT PRIMARY KEY, + timestamp TIMESTAMP, + payload OBJECT +); +``` + +This allows inserting flexible, nested JSON data into `payload`: + +```json +{ + "user": { + "id": 42, + "name": "Alice" + }, + "action": "login", + "device": { + "type": "mobile", + "os": "iOS" + } +} +``` + +## Column Policy — Strict vs Dynamic + +You can control how CrateDB handles unexpected fields in an object column: + +| Column Policy | Behavior | +| ------------- |-----------------------------------------------------------------------| +| `DYNAMIC` | (Default) New fields are automatically added to the schema at runtime | +| `STRICT` | Only explicitly defined fields are allowed | +| `IGNORED` | Extra fields are stored but not indexed or queryable | + +Let’s evolve our table to restrict the structure of `payload`: + +```sql +CREATE TABLE events2 ( + id TEXT PRIMARY KEY, + timestamp TIMESTAMP, + payload OBJECT(STRICT) AS ( + temperature DOUBLE, + humidity DOUBLE + ) +); +``` + +You can no longer use fields other than temperature and humidity in the payload +object. + +## Querying JSON Fields + +Use **bracket notation** to access nested fields: + +```sql +SELECT payload['temperature'], payload['humidity'] +FROM events2 +WHERE payload['temperature'] >= 20.0; +``` + +CrateDB also supports **filtering, sorting, and aggregations** on nested values: + +```sql +-- count events with high humidity +SELECT COUNT(*) AS high_humidity_events +FROM events2 +WHERE payload['humidity'] > 70 +``` + +```{note} +Bracket notation works for both explicitly and dynamically added fields. +``` + +## Querying DYNAMIC OBJECTs Safely + +When working with dynamic objects, some keys may not exist. CrateDB provides the +[error_on_unknown_object_key](inv:crate-reference:*:label#conf-session-error_on_unknown_object_key) +session setting to control behavior in such cases. + +By default, CrateDB will raise an error if any of the queried object keys are +unknown. When adjusting this setting to `false`, it will return `NULL` as the +value of the corresponding key. + +```sql +cr> CREATE TABLE events (payload OBJECT(DYNAMIC)); +CREATE OK, 1 row affected (0.563 sec) + +cr> SELECT payload['unknown'] FROM events; +ColumnUnknownException[Column payload['unknown'] unknown] + +cr> SET error_on_unknown_object_key = false; +SET OK, 0 rows affected (0.001 sec) + +cr> SELECT payload['unknown'] FROM events; ++-------------------+ +| payload['unknown']| ++-------------------+ +SELECT 0 rows in set (0.051 sec) +``` + +## Aggregating JSON Fields + +CrateDB allows full SQL-style aggregations on nested fields: + +```sql +SELECT AVG(payload['temperature']) AS avg_temp +FROM events +WHERE payload['humidity'] > 20.0; +``` + +## Combining Structured & Semi-Structured Data + +As you can see in the events table, CrateDB supports **hybrid schemas**, mixing +standard columns with JSON fields. + +This allows you to: + +* Query by fixed attributes (`temperature`) +* Flexibly store structured or unstructured metadata in `payload` +* Add new fields on the fly without altering a table, skipping migrations + +## Indexing Behavior + +CrateDB **automatically indexes** object fields if: + +* Column policy is `DYNAMIC` +* Field type can be inferred at insert time + +You can also explicitly define and index object fields. Let’s extend the payload +with a message field with full-text index, and also disable index for `humidity`: + +```sql +CREATE TABLE events3 ( + id TEXT PRIMARY KEY, + timestamp TIMESTAMP, + tags ARRAY(TEXT), + payload OBJECT(DYNAMIC) AS ( + temperature DOUBLE, + humidity DOUBLE INDEX OFF, + message TEXT INDEX USING FULLTEXT + ) +); +``` + +```{note} +When using dynamic objects too many columns could be created, the default per +table is 1000, more could impact performance. + Use `STRICT` or `IGNORED`if needed. +``` + +Object fields are treated as any other column, therefore **`GROUP BY`**, +**`HAVING`**, and **window functions** are supported. + +## Further Learning & Resources + +* Reference Manual: + * {ref}`Objects ` + * {ref}`Object Column policy ` + * {ref}`json data type ` + * {ref}`Inserting objects as JSON ` diff --git a/docs/start/modelling/primary-key.md b/docs/start/modelling/primary-key.md new file mode 100644 index 00000000..ca36e203 --- /dev/null +++ b/docs/start/modelling/primary-key.md @@ -0,0 +1,233 @@ +(model-primary-key)= +(autogenerated-sequences)= +# Primary key strategies and autogenerated sequences + +:::{rubric} Introduction +::: + +As you begin working with CrateDB, you might be puzzled why CrateDB does not +have a built-in, auto-incrementing "serial" data type, like PostgreSQL or MySQL. + +This page explains why that is and walks you through **five common alternatives** +to generate unique primary key values in CrateDB, including a recipe to implement +your own auto-incrementing sequence mechanism when needed. + +:::{rubric} Why auto-increment sequences don't exist in CrateDB +::: +In traditional RDBMS systems, auto-increment fields rely on a central counter. +In a distributed system like CrateDB, maintaining a global auto-increment value +would require that a node checks with other nodes before allocating a new value. +This would create a **global coordination bottleneck**, limit insert throughput, +and reduce scalability. + +CrateDB is designed for horizontal scalability and [high ingestion throughput]. +To achieve this, operations must complete independently on each node—without +central coordination. This design choice means CrateDB does **not** support +traditional auto-incrementing primary key types like `SERIAL` in PostgreSQL +or MySQL. + +:::{rubric} Solutions +::: +CrateDB provides flexibility: You can choose a primary key strategy +tailored to your use case, whether for strict uniqueness, time ordering, or +external system integration. You can also implement true consistent/synchronized +sequences if you want to. + +## Using a timestamp as a primary key + +This option involves declaring a column using `DEFAULT now()`. +```psql +CREATE TABLE example ( + id BIGINT DEFAULT now() PRIMARY KEY +); +``` + +:Pros: + - Auto-generated, always-increasing value + - Useful when records are timestamped anyway + +:Cons: + - Can result in gaps + - Collisions possible if multiple records are created in the same millisecond + +## Using elasticflake identifiers + +This option involves declaring a column using `DEFAULT gen_random_text_uuid()`. +```psql +CREATE TABLE example2 ( + id TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY +); +``` + +:Pros: + - Universally unique + - No conflicts when merging from multiple environments or sources + +:Cons: + - Not ordered + - Harder to read/debug + - No efficient range queries + +## Using UUIDv7 identifiers + +[UUIDv7] is a new format that preserves **temporal ordering**, making UUIDs +better suited for inserts and range queries in distributed databases. + +You can use [UUIDv7 for CrateDB] via a {ref}`User-Defined Function (UDF) ` +in JavaScript, or use a [UUIDv7 library] in your application layer. + +:Pros: + - Globally unique and **almost sequential** + - Efficient range queries possible + +:Cons: + - Not as human-friendly as integer numbers + - Slight overhead due to UDF use + +## Using IDs from external systems + +If you are importing data from a source system that **already generates unique +IDs**, you can reuse those by inserting primary key values as-is from the +source system. + +In this case, CrateDB does not need to generate any identifier values, +and consistency is ensured across systems. + +:::{seealso} +An example for that is [Replicating data from other databases to CrateDB with Debezium and Kafka]. +::: + +## Implementing a custom sequence table + +If you **must** have an auto-incrementing numeric ID (e.g., for compatibility +or legacy reasons), you can implement a simple sequence generator using a +dedicated table and client-side logic. + +This approach involves a table to keep the latest values that have been consumed +and client side code to keep it up-to-date in a way that guarantees unique +values even when many ingestion processes run in parallel. + +:Pros: + - Fully customizable (you can add prefixes, adjust increment size, etc.) + - Sequential IDs possible + +:Cons: + - Additional client logic about optimistic updates is required for writing + - The sequence table may become a bottleneck at very high ingestion rates + +### Step 1: Create a sequence tracking table +Create a table to keep the latest values for the sequences. +```psql +CREATE TABLE sequences ( + name TEXT PRIMARY KEY, + last_value BIGINT +) CLUSTERED INTO 1 SHARDS; +``` + +### Step 2: Initialize your sequence +Initialize the table with one new sequence at 0. +```psql +INSERT INTO sequences (name,last_value) +VALUES ('mysequence',0); +``` + +### Step 3: Create a target table +Start an example with a newly defined table. +```psql +CREATE TABLE mytable ( + id BIGINT PRIMARY KEY, + field1 TEXT +); +``` + +### Step 4: Generate and use sequence values in Python + +Use optimistic concurrency control to generate unique, incrementing values +even in parallel ingestion scenarios. + +The Python code below reads the last value used from the sequences table, and +then attempts an [optimistic UPDATE] with a `RETURNING` clause. If a +contending process already consumed the identity nothing will be returned so our +process will retry until a value is returned. Then it uses that value as the new +ID for the record we are inserting into the `mytable` table. + +```python +# Requires: records, sqlalchemy-cratedb +# +# /// script +# requires-python = ">=3.8" +# dependencies = [ +# "records", +# "sqlalchemy-cratedb", +# ] +# /// + +import time +import records + +db = records.Database("crate://") +sequence_name = "mysequence" + +max_retries = 5 +base_delay = 0.1 # 100 milliseconds + +for attempt in range(max_retries): + select_query = """ + SELECT last_value, _seq_no, _primary_term + FROM sequences + WHERE name = :sequence_name; + """ + row = db.query(select_query, sequence_name=sequence_name).first() + new_value = row.last_value + 1 + + update_query = """ + UPDATE sequences + SET last_value = :new_value + WHERE name = :sequence_name + AND _seq_no = :seq_no + AND _primary_term = :primary_term + RETURNING last_value; + """ + if ( + str( + db.query( + update_query, + new_value=new_value, + sequence_name=sequence_name, + seq_no=row._seq_no, + primary_term=row._primary_term, + ).all() + ) + != "[]" + ): + break + + delay = base_delay * (2**attempt) + print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f} seconds...") + time.sleep(delay) +else: + raise Exception(f"Failed after {max_retries} retries with exponential backoff") + +insert_query = "INSERT INTO mytable (id, field1) VALUES (:id, :field1)" +db.query(insert_query, id=new_value, field1="abc") +db.close() +``` + +## Summary + +| Strategy | Ordered | Unique | Scalable | Human-friendly | Range queries | Notes | +|---------------------|----------| ------ | -------- |----------------|---------------| -------------------- | +| Timestamp | ✅ | ⚠️ | ✅ | ✅ | ✅ | Potential collisions | +| Elasticflake | ❌ | ✅ | ✅ | ❌ | ❌ | Default UUIDs | +| UUIDv7 | ✅ | ✅ | ✅ | ❌ | ✅ | Requires UDF | +| External system IDs | ✅/❌ | ✅ | ✅ | ✅ | ✅ | Depends on source | +| Sequence table | ✅ | ✅ | ⚠️ | ✅ | ✅ | Manual retry logic | + + +[high ingestion throughput]: https://cratedb.com/blog/how-we-scaled-ingestion-to-one-million-rows-per-second +[optimistic update]: https://cratedb.com/docs/crate/reference/en/latest/general/occ.html#optimistic-update +[replicating data from other databases to cratedb with debezium and kafka]: https://cratedb.com/blog/replicating-data-from-other-databases-to-cratedb-with-debezium-and-kafka +[udf]: https://cratedb.com/docs/crate/reference/en/latest/general/user-defined-functions.html +[UUIDv7]: https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-7 +[UUIDv7 for CrateDB]: https://github.com/nalgeon/uuidv7/blob/main/src/uuidv7.cratedb +[UUIDv7 library]: https://github.com/nalgeon/uuidv7 diff --git a/docs/start/modelling/relational.md b/docs/start/modelling/relational.md new file mode 100644 index 00000000..314b5fdb --- /dev/null +++ b/docs/start/modelling/relational.md @@ -0,0 +1,182 @@ +(model-relational)= +# Relational data + +CrateDB is a **distributed SQL database** that offers rich **relational data +modelling** with the flexibility of dynamic schemas and the scalability of NoSQL +systems. It supports **primary keys,** **joins**, **aggregations**, and +**subqueries**, just like traditional RDBMS systems—while also enabling hybrid +use cases with time-series, geospatial, full-text, vector search, and +semi-structured data. + +Use CrateDB when you need to scale relational workloads horizontally while +keeping the simplicity of **SQL**. + +## Table Definitions + +CrateDB supports strongly typed relational schemas using familiar SQL syntax: + +```sql +CREATE TABLE customers ( + id TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY, + name TEXT, + email TEXT, + created_at TIMESTAMP DEFAULT now() +); +``` + +**Key Features:** + +* Supports scalar types (`TEXT`, `INTEGER`, `DOUBLE`, `BOOLEAN`, `TIMESTAMP`, +etc.) +* `gen_random_text_uuid()`, `now()` or `current_timestamp()` recommended for +primary keys in distributed environments +* Default **replication**, **sharding**, and **partitioning** options are +built-in for scale + + +## Normalization vs. Embedding + +CrateDB supports both **normalized** (relational) and **denormalized** (embedded +JSON) approaches with {ref}`column_policy = 'dynamic' `. + +* For strict referential integrity and modularity: use normalized tables with + joins. +* For performance in high-ingest or read-optimized workloads: embed reference + data as nested JSON. + +Example: Embedded products inside an `orders` table: + +```sql +CREATE TABLE orders ( + order_id TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY, + customer_id TEXT, + total_amount DOUBLE, + items ARRAY( + OBJECT(DYNAMIC) AS ( + name TEXT, + quantity INTEGER, + price DOUBLE + ) + ), + created_at TIMESTAMP DEFAULT now() +); +``` + +:::{note} +CrateDB lets you **query nested fields** directly using bracket +notation: `items['name']`, `items['price']`, etc. +::: + +## Joins & Relationships + +CrateDB supports **inner joins**, **left/right joins**, **cross joins**, **outer +joins**, and even **self joins**. + +**Example: Join Customers and Orders** + +```sql +SELECT c.name, o.order_id, o.total_amount +FROM customers c +JOIN orders o ON c.id = o.customer_id +WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days'; +``` + +Joins are executed efficiently across shards in a **distributed query planner** +that parallelizes execution. + +## Aggregations & Grouping + +Use familiar SQL aggregation functions (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`) +with `GROUP BY`, `HAVING`, `WINDOW FUNCTIONS` ... etc. + +```sql +SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue +FROM orders +GROUP BY customer_id +HAVING SUM(total_amount) > 1000; +``` + +:::{note} +CrateDB's **columnar storage** optimizes performance for +aggregations — even on large datasets. +::: + +## Constraints & Indexing + +CrateDB supports: + +* **Primary Keys** – enforced for uniqueness and data distribution +* **Check -** enforces custom value validation +* **Indexes** – automatic index for all columns +* **Full-text indexes -** manually defined, supports many tokenizers, analyzers + and filters + +In CrateDB every column is indexed by default, depending on the datatype a +different index is used, indexing is controlled and maintained by the database, +there is no need to `vacuum` or `re-index` like in other systems. Indexing can +be manually turned off with `INDEX OFF`. + +```sql +CREATE TABLE products ( + id TEXT PRIMARY KEY, + name TEXT, + price DOUBLE CHECK (price >= 0), + tag TEXT INDEX OFF, -- <------- INDEX WILL NOT BE CREATED + description TEXT INDEX USING FULLTEXT +); +``` + +## Views & Subqueries + +CrateDB supports **views**, **CTEs**, and **nested subqueries**. + +**Example: Reusable View** + +```sql +CREATE VIEW recent_orders AS +SELECT * FROM orders +WHERE created_at >= CAST(CURRENT_DATE AS TIMESTAMP) - INTERVAL '7 days'; +``` + +**Example: Correlated Subquery** + +```sql +SELECT name, + (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count +FROM customers c; +``` + +**Example: Common table expression** + +```sql +WITH order_counts AS ( + SELECT + o.customer_id, + COUNT(*) AS order_count + FROM orders o + GROUP BY o.customer_id +) +SELECT + c.name, + COALESCE(oc.order_count, 0) AS order_count +FROM customers c +LEFT JOIN order_counts oc + ON c.id = oc.customer_id; +``` + +## Further Learning & Resources + +* Reference Manual: + * How to {ref}`query with joins ` + * {ref}`SQL join statements ` + * {ref}`Join types and their implementation ` +* Blog posts: + * [How to fine-tune the query + optimizer](https://cratedb.com/blog/join-performance-to-the-rescue) + * [Adding support for joins on virtual tables and multi-row + subselects](https://cratedb.com/blog/joins-multi-row-subselects) + * How we made Joins twenty three thousand times faster - part + [#1](https://cratedb.com/blog/joins-faster-part-one), + [#2](https://cratedb.com/blog/lab-notes-how-we-made-joins-23-thousand-times-faster-part-two), + [#3](https://cratedb.com/blog/lab-notes-how-we-made-joins-23-thousand-times-faster-part-three), + [Video](https://cratedb.com/resources/videos/distributed-join-algorithms) diff --git a/docs/start/modelling/timeseries.md b/docs/start/modelling/timeseries.md new file mode 100644 index 00000000..b2d66f62 --- /dev/null +++ b/docs/start/modelling/timeseries.md @@ -0,0 +1,170 @@ +(model-timeseries)= +# Time series data + +## Why CrateDB for Time Series? + +CrateDB employs a relational representation for time‑series, enabling you to +work with timestamped data using standard SQL, while also seamlessly combining +with document and context data. + +* While maintaining a high ingest rate, the **columnar storage** and **automatic + indexing** let you access and analyze the data immediately with **fast + aggregations** and **near-real-time queries**. +* Handles **high cardin­ality** and **a variety of data types**, including + nested JSON, geospatial and vector data — all queryable via the same SQL + statements. + +## Data Model Template + +A typical time‑series schema looks like this: + +```sql +CREATE TABLE devices_readings ( + ts TIMESTAMP WITH TIME ZONE, + device_id TEXT, + battery OBJECT AS ( + level BIGINT, + status TEXT, + temperature DOUBLE PRECISION + ), + cpu OBJECT AS ( + avg_1min DOUBLE PRECISION, + avg_5min DOUBLE PRECISION, + avg_15min DOUBLE PRECISION + ), + memory OBJECT AS ( + free BIGINT, + used BIGINT + ), + month TIMESTAMP GENERATED ALWAYS AS date_trunc('month', ts) +) PARTITIONED BY (month); + +CREATE TABLE devices_info ( + "device_id" TEXT, + "api_version" TEXT, + "manufacturer" TEXT, + "model" TEXT, + "os_name" TEXT +); +``` + +Key points: + +* `month` is the partitioning key, optimizing data storage and retrieval. +* Every column is stored in the column store by default for fast aggregations. +* Using **OBJECT columns** provides a structured and efficient way to organize + complex nested data in CrateDB, enhancing both data integrity and flexibility. + +## Ingesting and Querying + +### **Data Ingestion** + +* Use SQL `INSERT` or bulk import techniques like `COPY FROM` with JSON or CSV + files. +* Schema inference can often happen automatically during import. + +### **Aggregation and Transformations** + +CrateDB offers built‑in SQL functions tailor‑made for time‑series analyses: + +* **`DATE_BIN(interval, timestamp, origin)`** for bucketed aggregations + (down‑sampling). +* **Window functions** like `LAG()` and `LEAD()` to detect trends or gaps. +* **`MAX_BY(returnField, SearchField)` / `MIN_BY(returnField, SearchField)` ** returns the value from one column matching the min/max value of + another column in a group. + +**Example**: compute hourly average battery levels and join with metadata: + +```sql +WITH avg_metrics AS ( + SELECT device_id, + DATE_BIN('1 hour'::interval, ts, 0) AS period, + AVG(battery['level']) AS avg_battery + FROM devices_readings + GROUP BY device_id, period +) +SELECT period, t.device_id, i.manufacturer, avg_battery +FROM avg_metrics t +JOIN devices_info i USING (device_id) +WHERE i.model = 'mustang'; +``` + +**Example**: gap detection interpolation: + +```sql +WITH all_hours AS ( + SELECT + generate_series( + '2025-01-01', + '2025-01-02', + INTERVAL '30 second' + ) AS expected_time +), +raw AS ( + SELECT + ts, + battery['level'] + FROM + devices_readings +) +SELECT + expected_time, + r.battery['level'] +FROM + all_hours + LEFT JOIN raw r ON expected_time = r.ts +ORDER BY + expected_time; +``` + +### Typical time-series functions + +* **Time extraction:** `date_trunc(...)`, `extract(...)`, `date_part(...)`, `now()`, `current_timestamp` +* **Time bucketing:** `date_bin()`, `interval`, `age()` +* **Window functions:** `avg(...)`, `over(...)`, `lag(...)`, `lead(...)`, + `first_value(...)`, `last_value(...)`, `row_number()`, `rank()` , `WINDOW ... AS (...)` +* **Null handling:** coalesce, nullif +* **Statistical aggregates:** `percentile(...)`, `stddev(...)`, `variance()`, `min()`, + `max(...)`, `sum(...)`, `topk(...)` +* **Advanced filtering & logic:** `greatest(...)`, `least(...)`, `case when ... then ... end` + +## Downsampling & Interpolation + +To reduce volume while preserving trends, use `DATE_BIN`. Missing data can be +handled using `LAG()`/`LEAD()` or other interpolation logic within SQL. + +## Schema Evolution & Contextual Data + +With `column_policy = 'dynamic'`, ingest JSON payloads containing extra +attributes—new columns are auto‑created and indexed. Perfect for capturing +evolving sensor metadata. For column-level control, use `OBJECT(DYNAMIC)` to +auto-create (and, by default, index) subcolumns, or `OBJECT(IGNORED)` to accept +unknown keys without creating or indexing subcolumns. + +You can also store: + +* **Geospatial** (`GEO_POINT`, `GEO_SHAPE`) +* **Vectors** (up to 2048 dims via HNSW indexing) +* **BLOBs** for binary data (e.g. images, logs) + +All types are supported within the same table or joined together. + +## Storage Optimization + +* **Partitioning and sharding**: data can be partitioned by time (e.g. + daily/monthly) and sharded across a cluster. +* Supports long‑term retention with performant historic storage. +* Columnar layout reduces storage footprint and accelerates aggregation queries. + +## Further Learning & Resources + +* **Documentation:** {ref}`Advanced Time Series Analysis `, + {ref}`Time Series Long Term Storage ` +* **Video:** [Time Series Data + Modelling](https://cratedb.com/resources/videos/time-series-data-modeling) – + covers relational & time series, document, geospatial, vector, and full-text + in one tutorial. +* **CrateDB Academy:** [Advanced Time Series Modelling + course](https://cratedb.com/academy/time-series/getting-started/introduction-to-time-series-data). +* **Tutorial:** [Downsampling with LTTB + algorithm](https://community.cratedb.com/t/advanced-downsampling-with-the-lttb-algorithm/1287) diff --git a/docs/start/modelling/vector.md b/docs/start/modelling/vector.md new file mode 100644 index 00000000..4efc41ba --- /dev/null +++ b/docs/start/modelling/vector.md @@ -0,0 +1,79 @@ +(model-vector)= +# Vector data + +CrateDB natively supports **vector embeddings** for efficient **similarity +search** using **k-nearest neighbour (kNN)** algorithms. This makes it a +powerful engine for building AI-powered applications involving semantic search, +recommendations, anomaly detection, and multimodal analytics, all in the +simplicity of SQL. + +Whether you’re working with text, images, sensor data, or any domain represented +as high-dimensional embeddings, CrateDB enables **real-time vector search at +scale**, in combination with other data types like full-text, geospatial, and +time-series. + +## Data Type: FLOAT_VECTOR + +CrateDB has a native {ref}`FLOAT_VECTOR type ` +type with the following key characteristics: + +* Fixed-length float arrays (1-2048 dimensions) +* Backed by Lucene’s HNSW approximate nearest neighbor (ANN) search +* Similarity and scoring exposed via {ref}`KNN_MATCH ` +and {ref}`VECTOR_SIMILARITY `. + +**Example: Define a Table with Vector Embeddings** + +```sql +CREATE TABLE documents ( + title TEXT, + content TEXT, + embedding FLOAT_VECTOR(3) +); +``` + +* `FLOAT_VECTOR(3)` declares a vector column with 3 floats. + +## Ingestion: Working with Embeddings + +You can ingest vectors in several ways: + +* **Precomputed embeddings** from models: + ```sql + INSERT INTO documents (title, embedding) + VALUES ('AI and Databases', [0.12, 0.34, 0.01]); + ``` + You must insert the exact number of floats defined in the table or an error + will be thrown. + +* **Batched imports** via {ref}`COPY FROM ` +using JSON or CSV. +* CrateDB doesn't currently compute embeddings internally — you bring your own +model or use pipelines that call CrateDB. + +## Querying Vectors with SQL + +Use {ref}`KNN_MATCH ` to perform similarity +search: + +```sql +SELECT title, content, _score +FROM documents +WHERE knn_match(embedding, [3.14, 5.1, 8.2], 2) +ORDER BY _score DESC; +``` + +This ranks results by **vector similarity** to the vector supplied by searching +top 2 nearest neighbours. + +## Further Learning & Resources + +* {ref}`Vector Search `: More details about searching with + vectors +Reference manual: + * {ref}`FLOAT_VECTOR type ` + * {ref}`KNN_MATCH ` + * {ref}`VECTOR_SIMILARITY ` +* Blog: [Vector support and KNN search](https://cratedb.com/blog/unlocking-the-power-of-vector-support-and-knn-search-in-cratedb) +* CrateDB Academy: [Vector similarity + search](https://learn.cratedb.com/cratedb-fundamentals?lesson=vector-similarity-search)