Skip to content

Conversation

@amotl
Copy link
Member

@amotl amotl commented Aug 7, 2025

About

This patch adds content for a dedicated "data modelling" section based on suggestions by colleagues.

Source: https://cratedb.gitbook.io/cratedb-docs/K3l1K4ZBSqj0AL16dbZi/data-modelling/
Preview: https://cratedb-guide--233.org.readthedocs.build/start/modelling/

Screenshots

image

Caveats

Warning

Fragments of this content might have been generated using GenAI / LLMs. In this spirit, the patch needs special attention on review procedures and possibly also some mitigations to tune down overconfidency and jargon, and to improve coherency and correctness.

The content has been copied 1:1 from a GitBook instance with only minor copy-editing about markup syntax differences.

References

@coderabbitai
Copy link

coderabbitai bot commented Aug 7, 2025

Warning

Rate limit exceeded

@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between eb2269c and 036ebcf.

📒 Files selected for processing (12)
  • docs/index.md (4 hunks)
  • docs/performance/inserts/index.rst (0 hunks)
  • docs/performance/inserts/sequences.rst (0 hunks)
  • docs/start/index.md (7 hunks)
  • docs/start/modelling/fulltext.md (1 hunks)
  • docs/start/modelling/geospatial.md (1 hunks)
  • docs/start/modelling/index.md (1 hunks)
  • docs/start/modelling/json.md (1 hunks)
  • docs/start/modelling/primary-key.md (1 hunks)
  • docs/start/modelling/relational.md (1 hunks)
  • docs/start/modelling/timeseries.md (1 hunks)
  • docs/start/modelling/vector.md (1 hunks)

Walkthrough

Adds a new "Data modelling" section with an index and six modelling topic pages, updates the Getting Started toctree to include modelling/index, adjusts homepage and start-page grid/card layouts, and removes the standalone sequences performance page. All changes are documentation-only under docs/ and do not modify public APIs.

Changes

Cohort / File(s) Summary
Start ToC & grid adjustments
docs/start/index.md
Inserted modelling/index into the toctree after query/index and added +:columns: 6 3 3 3 to Getting Started grid-item-cards.
Site homepage layout
docs/index.md
Changed top-level grid to four columns, reduced grid-item-card padding, added :class-title: sd-fs-5 to many cards, and inserted an extra intro line.
Modelling section index
docs/start/modelling/index.md
Added Data modelling landing page with a six-card grid (relational, json, timeseries, geospatial, fulltext, vector), hidden toctrees, and primary-key/implementation notes.
Modelling topic pages
docs/start/modelling/relational.md, docs/start/modelling/json.md, docs/start/modelling/timeseries.md, docs/start/modelling/geospatial.md, docs/start/modelling/fulltext.md, docs/start/modelling/vector.md
Added comprehensive modelling docs covering types, indexing, example schemas/queries, best practices, use cases, architecture notes, and further resources.
Primary key & sequences
docs/start/modelling/primary-key.md, docs/performance/inserts/sequences.rst
Added primary-key.md describing five PK strategies and a custom sequence recipe; removed the old performance/inserts/sequences.rst page and its toctree entry.
Performance inserts toctree
docs/performance/inserts/index.rst
Removed sequences from the inserts toctree (navigation change only).

Sequence Diagram(s)

No sequence diagrams included — changes are documentation-only and do not introduce runtime control-flow or new interactions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • hammerhead
  • karynzv
  • surister

Poem

I hop through docs with whiskered cheer,
New models bloom both far and near.
Maps, vectors, JSON bright and true,
Keys and searches — carrots for you! 🥕📚

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch data-modelling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@amotl amotl changed the title Data modelling: Add new section Data modelling: Add new section [GenAI] Aug 7, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (19)
docs/modelling/index.md (1)

15-16: Grammar: “designed store” → “designed to store”; tighten phrasing

Minor grammar fix and crisper phrasing.

-Because CrateDB is a distributed OLAP database designed store large volumes
-of data, it needs a few special considerations on certain details.
+Because CrateDB is a distributed OLAP database designed to store large volumes
+of data, there are a few special considerations to keep in mind.
docs/modelling/json.md (2)

216-222: Add real links to referenced resources

Replace placeholders with actual links to improve usefulness.

-* CrateDB Docs – Object Columns
-* Working with JSON in CrateDB
-* CrateDB Academy – Modeling with JSON
-* Understanding Column Policies
+* CrateDB Docs – Object Columns: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#object
+* Working with JSON in CrateDB: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#object
+* CrateDB Academy – Modeling with JSON: https://learn.cratedb.com/
+* Understanding Column Policies: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/column-policy.html

75-77: Small clarity notes

  • Mention that disabling error_on_unknown_object_key can hide typos in field names.
  • “Too many dynamic fields” note is good; consider pointing to information_schema query example.
-:::{note}
-Dot-notation works for both explicitly and dynamically added fields.
-:::
+:::{note}
+Dot-notation works for both explicitly and dynamically added fields. Be aware that turning off
+`error_on_unknown_object_key` may hide typos in field names by returning NULL instead of raising an error.
+:::
-:::{note}
-Too many dynamic fields can lead to schema explosion. Use `STRICT` or `IGNORED` if needed.
-:::
+:::{note}
+Too many dynamic fields can lead to schema explosion. Use `STRICT` or `IGNORED` if needed.
+You can monitor dynamic columns via:
+
+```sql
+SELECT table_schema, table_name, count(*) AS columns
+FROM information_schema.columns
+GROUP BY 1,2
+ORDER BY columns DESC;
+```
+:::

Also applies to: 180-182

docs/modelling/vector.md (3)

16-16: Fix markdown lint (MD036): Use headings instead of bold for section labels

Convert emphasized lines acting as headings into actual headings.

-**Example: Define a Table with Vector Embeddings**
+### Example: Define a table with vector embeddings
-**Parameters:**
+### Parameters
-:::{note}
-Note: vector dimensionality must be consistent for each column.
-:::
+:::{note}
+Vector dimensionality must be consistent for each column.
+:::

Also applies to: 46-52, 121-123


55-64: Wording: there is no “nearest_neighbors predicate”; describe the distance operator

CrateDB uses the <-> distance operator with ORDER BY (and HNSW index). Adjust wording.

-Use the `nearest_neighbors` predicate to perform similarity search:
+Use the `<->` distance operator to perform similarity search:

3-7: Tone: reduce marketing superlatives; keep neutral, factual style

Trim phrases like “powerful engine”, “state-of-the-art”, “logarithmic” claims unless cited.

-CrateDB natively supports **vector embeddings** for efficient **similarity search** using **approximate nearest neighbor (ANN)** algorithms. This makes it a powerful engine for building AI-powered applications involving semantic search, recommendations, anomaly detection, and multimodal analytics—all in the simplicity of SQL.
+CrateDB supports **vector embeddings** for efficient **similarity search** using **approximate nearest neighbor (ANN)** algorithms. This enables applications such as semantic search, recommendations, anomaly detection, and multimodal analytics with SQL.
-* Vector search uses **HNSW**: state-of-the-art ANN algorithm with logarithmic search complexity.
+* Vector search uses **HNSW**, an ANN algorithm designed for fast approximate search.
-CrateDB gives you the power of **vector similarity search** with the **flexibility of SQL** and the **scalability of a distributed database**. It lets you unify structured, unstructured, and semantic data—enabling modern applications in AI, search, and recommendation without additional vector databases or pipelines.
+CrateDB provides **vector similarity search** with SQL and distributed scalability. It can unify structured, unstructured, and semantic data for applications in AI, search, and recommendation.

Also applies to: 115-121, 151-151

docs/modelling/primary-key.md (4)

48-63: UUIDv7: add example or cross-reference; tighten wording

Provide a short example (or link) for a UDF-based UUIDv7 generator to make this actionable. Also reduce wording per style lint.

Would you like me to add a minimal UDF example (e.g., Python or Java) that emits UUIDv7?

Example wording tweak:

-UUIDv7 is a new format that preserves **temporal ordering**, making them better suited for distributed inserts and range queries.
+UUIDv7 preserves **temporal ordering**, which benefits distributed inserts and range queries.

21-25: MD036: Use proper subheadings instead of bold “Pros/Cons”

Switch emphasized labels to subheadings for consistency and to satisfy markdownlint.

-**Pros**
+#### Pros
-**Cons**
+#### Cons

Also applies to: 26-30, 37-41, 42-47, 54-63, 156-165


71-71: Link the Debezium + Kafka reference

Add a link to the relevant guide to improve discoverability.

-> See Replicating data from other databases to CrateDB with Debezium and Kafka for an example.
+> See “Replicating data from other databases to CrateDB with Debezium and Kafka” for an example: https://cratedb.com/docs/guide/en/latest/integrate/connectors/debezium.html

80-84: Shard count rationale: call out single-shard trade-off

Good choice for sequence semantics; add a brief rationale about centralization vs throughput.

-) CLUSTERED INTO 1 SHARDS;
+) CLUSTERED INTO 1 SHARDS;  -- centralized counter; simplifies OCC at the cost of single-shard throughput
docs/modelling/geospatial.md (3)

16-17: Verify geo indexing details and configurables

“Indexed using geohash, quadtree, or BKD-tree, with configurable precision (e.g. 50m) and error threshold” may not reflect current, user-configurable options in CrateDB SQL. Please cross-check official docs and either link specifics or remove/soften.

I can adjust wording once the exact supported index internals/parameters are confirmed.


20-27: Avoid HTML in code blocks; use plain fenced SQL

The

/ markup inside the SQL example can be replaced by a standard fenced block for readability and consistency.

-<pre class="language-sql"><code class="lang-sql"><strong>CREATE TABLE parcel_zones (
-</strong>    zone_id INTEGER PRIMARY KEY,
-<strong>    name VARCHAR,
-</strong>    area GEO_SHAPE,
-    centroid GEO_POINT
-)
-WITH (column_policy = 'dynamic');
-</code></pre>
+```sql
+CREATE TABLE parcel_zones (
+    zone_id INTEGER PRIMARY KEY,
+    name VARCHAR,
+    area GEO_SHAPE,
+    centroid GEO_POINT
+)
+WITH (column_policy = 'dynamic');
+```

97-97: Unclosed parenthesis

Close the parenthesis in this bullet.

-* CrateDB Blog: **Geospatial Queries with CrateDB** – outlines capabilities, limitations, and practical use cases (available since version 0.40
+* CrateDB Blog: **Geospatial Queries with CrateDB** – outlines capabilities, limitations, and practical use cases (available since version 0.40)
docs/modelling/fulltext.md (2)

67-68: MATCH weighted syntax: please verify and align with CrateDB docs

Weighted matching syntax can be picky. Cross-check the canonical form for boosting multiple full-text indices or columns, then update the example accordingly.

Once you confirm the supported MATCH syntax/variants, I can propose a precise example.


150-150: Remove stray trailing backslash

There’s a trailing “\” at the end of the paragraph.

-CrateDB combines powerful Lucene‑based full‑text search capabilities with SQL, making it easy to model and query textual data at scale. It supports fuzzy matching, multi-language analysis, composite indexing, and integrates fully with other data types for rich, multi-model queries. Whether you're building document search, catalog lookup, or content analytics—CrateDB offers a flexible and scalable foundation.\
+CrateDB combines powerful Lucene‑based full‑text search capabilities with SQL, making it easy to model and query textual data at scale. It supports fuzzy matching, multi-language analysis, composite indexing, and integrates fully with other data types for rich, multi-model queries. Whether you're building document search, catalog lookup, or content analytics—CrateDB offers a flexible and scalable foundation.
docs/modelling/relational.md (1)

41-41: Fix markdownlint MD036: use headings instead of bold emphasis

Convert emphasized “Example:” lines into proper headings for consistency and to satisfy linting.

-**Example: Join Customers and Orders**
+### Example: Join Customers and Orders
-**Example: Reusable View**
+### Example: Reusable View
-**Example: Correlated Subquery**
+### Example: Correlated Subquery

Also applies to: 118-118, 126-126

docs/modelling/timeseries.md (3)

15-23: Replace HTML-styled code with fenced SQL; remove

Use a plain fenced SQL block for readability and consistency.

-<pre class="language-sql"><code class="lang-sql">CREATE TABLE IF NOT EXISTS weather_data (
-    ts TIMESTAMP,
-    location VARCHAR,
-<strong>    temperature DOUBLE,
-</strong><strong>    humidity DOUBLE CHECK (humidity >= 0),
-</strong>    PRIMARY KEY (ts, location)
-)
-WITH (column_policy = 'dynamic');
-</code></pre>
+```sql
+CREATE TABLE IF NOT EXISTS weather_data (
+    ts TIMESTAMP,
+    location VARCHAR,
+    temperature DOUBLE,
+    humidity DOUBLE CHECK (humidity >= 0),
+    PRIMARY KEY (ts, location)
+)
+WITH (column_policy = 'dynamic');
+```

8-8: Normalize “cardinality” spelling

There’s a soft hyphen in “cardin­ality”. Replace with plain “cardinality”.

-* Handles **high cardin­ality** and **mixed data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.
+* Handles **high cardinality** and **mixed data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.

88-91: Verify vector feature limits

“Vectors (up to 2048 dims via HNSW indexing)” should be verified against the currently supported maximum dimensions and feature flags.

Confirm the current dimensional limit and I can update wording and examples accordingly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af871ff and 371f259.

📒 Files selected for processing (9)
  • docs/modelling/fulltext.md (1 hunks)
  • docs/modelling/geospatial.md (1 hunks)
  • docs/modelling/index.md (1 hunks)
  • docs/modelling/json.md (1 hunks)
  • docs/modelling/primary-key.md (1 hunks)
  • docs/modelling/relational.md (1 hunks)
  • docs/modelling/timeseries.md (1 hunks)
  • docs/modelling/vector.md (1 hunks)
  • docs/start/index.md (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:8-20
Timestamp: 2025-05-18T12:50:38.681Z
Learning: In the CrateDB guide repository, references with the `ctk:` prefix (like `ctk:query/mcp/landscape`) are intersphinx references that link to resources in the cratedb-toolkit repository (https://github.com/crate/cratedb-toolkit/tree/main/doc), which are rendered at https://cratedb-toolkit.readthedocs.io/. These are valid cross-references between separate Sphinx documentation sets, not local file references.
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:6-8
Timestamp: 2025-05-18T13:25:11.786Z
Learning: In the CrateDB Guide documentation, particularly for MCP-related pages, the author prefers a 1-column grid layout for multiple cards to stack them vertically rather than side by side, as this is an intentional design choice.
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:22-33
Timestamp: 2025-05-18T12:50:36.393Z
Learning: In the CrateDB Guide repository, the prefix `ctk:` in documentation links (like `ctk:query/mcp/server`) is an intersphinx reference that points to external content in the CrateDB Toolkit documentation at https://cratedb-toolkit.readthedocs.io/. These references are intentionally not pointing to local files within the repository.
📚 Learning: 2025-05-18T12:50:38.681Z
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:8-20
Timestamp: 2025-05-18T12:50:38.681Z
Learning: In the CrateDB guide repository, references with the `ctk:` prefix (like `ctk:query/mcp/landscape`) are intersphinx references that link to resources in the cratedb-toolkit repository (https://github.com/crate/cratedb-toolkit/tree/main/doc), which are rendered at https://cratedb-toolkit.readthedocs.io/. These are valid cross-references between separate Sphinx documentation sets, not local file references.

Applied to files:

  • docs/modelling/index.md
  • docs/modelling/relational.md
  • docs/modelling/fulltext.md
  • docs/modelling/geospatial.md
📚 Learning: 2025-05-18T13:25:11.786Z
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:6-8
Timestamp: 2025-05-18T13:25:11.786Z
Learning: In the CrateDB Guide documentation, particularly for MCP-related pages, the author prefers a 1-column grid layout for multiple cards to stack them vertically rather than side by side, as this is an intentional design choice.

Applied to files:

  • docs/modelling/index.md
  • docs/modelling/json.md
  • docs/modelling/relational.md
  • docs/modelling/timeseries.md
📚 Learning: 2025-05-18T12:50:36.393Z
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:22-33
Timestamp: 2025-05-18T12:50:36.393Z
Learning: In the CrateDB Guide repository, the prefix `ctk:` in documentation links (like `ctk:query/mcp/server`) is an intersphinx reference that points to external content in the CrateDB Toolkit documentation at https://cratedb-toolkit.readthedocs.io/. These references are intentionally not pointing to local files within the repository.

Applied to files:

  • docs/modelling/index.md
  • docs/modelling/fulltext.md
📚 Learning: 2025-04-15T22:20:30.716Z
Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-04-15T22:20:30.716Z
Learning: UUIDv7 (UUID version 7) supports efficient range queries because it generates time-ordered sequential values, unlike earlier random UUID versions. This makes UUIDv7 particularly useful for primary keys in distributed database systems like CrateDB where both uniqueness and range query performance are important.

Applied to files:

  • docs/modelling/primary-key.md
🪛 markdownlint-cli2 (0.17.2)
docs/modelling/relational.md

41-41: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


118-118: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


126-126: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/modelling/vector.md

16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/modelling/primary-key.md

21-21: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


26-26: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


37-37: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


42-42: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


54-54: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


59-59: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


77-77: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


93-93: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


102-102: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


156-156: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


161-161: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 LanguageTool
docs/modelling/primary-key.md

[style] ~50-~50: Consider using a shorter alternative to avoid wordiness.
Context: ...t that preserves temporal ordering, making them better suited for distributed inserts and rang...

(MADE_IT_JJR)

🔇 Additional comments (4)
docs/modelling/index.md (1)

4-13: Verified: All modelling toctree pages are present

I’ve confirmed that every page referenced in the docs/modelling toctree exists in the repo, so there will be no broken links. This verification also covers lines 17–21.

docs/start/index.md (1)

10-10: Confirm Sphinx sidebar entry

The reference to “../modelling/index” in docs/start/index.md (line 10) correctly resolves to docs/modelling/index.md (verified). Please run your Sphinx docs build (e.g. make html in the docs/ folder) to ensure the new navigation entry appears in the sidebar.

docs/modelling/vector.md (1)

48-52: Similarity metric identifiers: manual confirmation required

Please verify that the implementation accepts exactly the metric names listed in the docs. In particular, confirm whether the HNSW-index parser recognizes:

  • cosine
  • l2
  • dot_product or inner_product (which one does the code expect?)

Check the code path where the similarity parameter is read (e.g., HNSW index creation logic) and align the docs in docs/modelling/vector.md accordingly to prevent mismatches.

docs/modelling/geospatial.md (1)

41-42: Validate area() units

“area returns approximate area in square degrees” is likely inaccurate for many systems; confirm actual units returned by CrateDB’s area(geo_shape).

If square meters are returned, update accordingly.

Comment on lines 21 to 22
INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Composite full-text index: ensure valid SQL syntax

As written, “INDEX ft_all …” is ambiguous out of context. Show it either within CREATE TABLE or as a standalone CREATE INDEX statement.

Option A (inside CREATE TABLE):

-INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
+INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english')

Option B (standalone statement):

-INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
+CREATE INDEX ft_all USING FULLTEXT ON documents (title, body) WITH (analyzer = 'english');

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In docs/modelling/fulltext.md around lines 21 to 22, the example for creating a
composite full-text index uses ambiguous syntax "INDEX ft_all USING
FULLTEXT(title, body) WITH (analyzer = 'english');" without context. Fix this by
showing the index creation either inside a CREATE TABLE statement or as a
standalone CREATE INDEX statement, making the SQL syntax valid and clear.

Comment on lines 51 to 57
```sql
SELECT asset_id, DISTANCE(center_point, asset_location) AS dist
FROM assets
WHERE center_point = 'POINT(-1.234 51.050)'::GEO_POINT
ORDER BY dist
LIMIT 10;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix “nearby assets” query: avoid equality filter on a point column

Filtering with column = constant point likely matches no/one row and isn’t needed. Use a constant or parameter directly in distance().

-SELECT asset_id, DISTANCE(center_point, asset_location) AS dist
-FROM assets
-WHERE center_point = 'POINT(-1.234 51.050)'::GEO_POINT
-ORDER BY dist
+SELECT asset_id,
+       distance('POINT(-1.234 51.050)'::geo_point, asset_location) AS dist
+FROM assets
+ORDER BY dist
 LIMIT 10;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```sql
SELECT asset_id, DISTANCE(center_point, asset_location) AS dist
FROM assets
WHERE center_point = 'POINT(-1.234 51.050)'::GEO_POINT
ORDER BY dist
LIMIT 10;
```
SELECT asset_id,
distance('POINT(-1.234 51.050)'::geo_point, asset_location) AS dist
FROM assets
ORDER BY dist
LIMIT 10;
🤖 Prompt for AI Agents
In docs/modelling/geospatial.md around lines 51 to 57, the SQL query incorrectly
uses an equality filter on the point column center_point, which is ineffective
for spatial queries. Remove the WHERE clause that filters center_point by
equality and instead use the constant point directly inside the DISTANCE
function to calculate distances. This will correctly find nearby assets by
ordering them by distance without filtering out relevant rows.

Comment on lines 61 to 66
```sql
SELECT area_id, count(*) AS incident_count
FROM incidents
WHERE within(incidents.location, service_areas.area)
GROUP BY area_id;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix invalid cross-table reference: add a join

The query references service_areas without joining it. Add an explicit join and group by the area identifier.

-SELECT area_id, count(*) AS incident_count
-FROM incidents
-WHERE within(incidents.location, service_areas.area)
-GROUP BY area_id;
+SELECT sa.area_id, count(*) AS incident_count
+FROM incidents i
+JOIN service_areas sa
+  ON within(i.location, sa.area)
+GROUP BY sa.area_id;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```sql
SELECT area_id, count(*) AS incident_count
FROM incidents
WHERE within(incidents.location, service_areas.area)
GROUP BY area_id;
```
🤖 Prompt for AI Agents
In docs/modelling/geospatial.md around lines 61 to 66, the SQL query references
the service_areas table without joining it, causing an invalid cross-table
reference. Fix this by adding an explicit JOIN between incidents and
service_areas on the appropriate spatial condition, and ensure the GROUP BY
clause uses the correct area identifier from the joined table.

@@ -0,0 +1,178 @@
# Relational data

CrateDB is a **distributed SQL database** that offers full **relational data modeling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary and foreign keys**, **joins**, **aggregations**, and **subqueries**, just like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector, and semi-structured data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reword foreign key claim to avoid contradiction

You state “supports primary and foreign keys,” but later note FKs aren’t enforced at write time. Clarify upfront to avoid confusion.

-CrateDB is a **distributed SQL database** that offers full **relational data modeling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary and foreign keys**, **joins**, **aggregations**, and **subqueries**, just like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector, and semi-structured data.
+CrateDB is a **distributed SQL database** that offers full **relational data modeling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary keys**, **joins**, **aggregations**, and **subqueries** like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector, and semi-structured data. Note: Foreign key relationships can be modeled but are not enforced at write time.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CrateDB is a **distributed SQL database** that offers full **relational data modeling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary and foreign keys**, **joins**, **aggregations**, and **subqueries**, just like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector, and semi-structured data.
CrateDB is a **distributed SQL database** that offers full **relational data modeling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary keys**, **joins**, **aggregations**, and **subqueries** like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector, and semi-structured data. Note: Foreign key relationships can be modeled but are not enforced at write time.
🤖 Prompt for AI Agents
In docs/modelling/relational.md at line 3, reword the statement about foreign
keys to clarify that while CrateDB supports primary keys and foreign key
concepts, foreign key constraints are not enforced at write time. Adjust the
sentence to explicitly mention this limitation upfront to prevent confusion
about foreign key enforcement.

Comment on lines 83 to 87
SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
HAVING revenue > 1000;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

HAVING should not rely on a SELECT alias

Use the aggregate expression in HAVING for SQL portability and to match CrateDB behavior.

 SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
 FROM orders
 GROUP BY customer_id
-HAVING revenue > 1000;
+HAVING SUM(total_amount) > 1000;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
HAVING revenue > 1000;
```
SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 1000;
🤖 Prompt for AI Agents
In docs/modelling/relational.md around lines 83 to 87, the HAVING clause uses
the alias 'revenue' which can cause issues in some SQL databases like CrateDB.
Replace the alias 'revenue' in the HAVING clause with the full aggregate
expression SUM(total_amount) to ensure SQL portability and correct behavior.

Comment on lines 48 to 60
```postgresql
WITH avg_metrics AS (
SELECT device_id,
DATE_BIN('1 hour', time, 0) AS period,
AVG(battery_level) AS avg_battery
FROM devices.readings
GROUP BY device_id, period
)
SELECT period, t.device_id, i.manufacturer, avg_battery
FROM avg_metrics t
JOIN devices.info i USING (device_id)
WHERE i.model = 'mustang';
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize code fence language and DATE_BIN origin

  • Prefer sql as the fence language for consistency.
  • DATE_BIN origin should be a timestamp; avoid 0.
-```postgresql
+```sql
 WITH avg_metrics AS (
   SELECT device_id,
-         DATE_BIN('1 hour', time, 0) AS period,
+         DATE_BIN('1 hour', time, TIMESTAMP '1970-01-01 00:00:00') AS period,
          AVG(battery_level) AS avg_battery
   FROM devices.readings
   GROUP BY device_id, period
 )
 SELECT period, t.device_id, i.manufacturer, avg_battery
 FROM avg_metrics t
 JOIN devices.info i USING (device_id)
 WHERE i.model = 'mustang';

<details>
<summary>🤖 Prompt for AI Agents</summary>

In docs/modelling/timeseries.md around lines 48 to 60, update the code fence
language from postgresql to sql for consistency, and replace the DATE_BIN
function's origin argument from 0 to a proper timestamp literal like TIMESTAMP
'1970-01-01 00:00:00' to ensure correct usage.


</details>

<!-- fingerprinting:phantom:triton:capybara -->

<!-- This is an auto-generated comment by CodeRabbit -->


```text
WITH all_hours AS (
SELECT generate_series(ts_start, ts_end, ‘30 second’) AS expected_time
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix quotes and interval literal in generate_series example

Use straight quotes and a valid interval literal.

-  SELECT generate_series(ts_start, ts_end, ‘30 second’) AS expected_time
+  SELECT generate_series(ts_start, ts_end, '30 seconds') AS expected_time
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT generate_series(ts_start, ts_end, 30 second’) AS expected_time
SELECT generate_series(ts_start, ts_end, '30 seconds') AS expected_time
🤖 Prompt for AI Agents
In docs/modelling/timeseries.md at line 66, the generate_series function uses
incorrect quote characters and an invalid interval literal. Replace the curly
quotes around '30 second' with straight single quotes and ensure the interval is
specified correctly as an interval literal, for example '30 seconds'::interval.

Comment on lines 19 to 25
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(FLOAT[768])
);
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure MATCH usage aligns with schema; add FULLTEXT index on content or adjust query

MATCH requires a FULLTEXT index on the target column. Either index content, or remove MATCH.

Option A: Index content and keep MATCH.

 CREATE TABLE documents (
   id UUID PRIMARY KEY,
   title TEXT,
-  content TEXT,
+  content TEXT INDEX USING FULLTEXT,
   embedding VECTOR(FLOAT[768])
 );
-WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
+WHERE MATCH(content, 'machine learning') AND author = 'Alice'

Option B: Remove MATCH and filter with plain predicates (no index change needed).

Also applies to: 71-76

🤖 Prompt for AI Agents
In docs/modelling/vector.md around lines 19 to 25 and also lines 71 to 76, the
use of MATCH in queries requires a FULLTEXT index on the content column to
function correctly. To fix this, either add a FULLTEXT index on the content
column in the documents table schema to support MATCH, or remove the MATCH
clause from the queries and replace it with standard WHERE predicates that do
not require indexing changes.

@amotl amotl marked this pull request as ready for review August 8, 2025 00:28
@amotl amotl added help wanted Extra attention is needed cross linking Linking to different locations of the documentation. new content New content being added. guidance Matters of layout, shape, and structure. question Further information is requested labels Aug 8, 2025
@amotl amotl changed the title Data modelling: Add new section [GenAI] Data modelling: Add new section [GenAI variant] Aug 8, 2025
@amotl amotl changed the title Data modelling: Add new section [GenAI variant] Data modelling: Add new section (GenAI variant) Aug 8, 2025
@amotl amotl force-pushed the getting-started branch from af871ff to 7257275 Compare August 8, 2025 21:09
Base automatically changed from getting-started to ingest August 8, 2025 21:09
@amotl amotl changed the base branch from ingest to getting-started August 8, 2025 21:17
@amotl amotl force-pushed the data-modelling branch 2 times, most recently from 9a01f78 to 5387ff5 Compare August 8, 2025 21:28
@amotl amotl force-pushed the getting-started branch 5 times, most recently from 4b40a92 to 80aca5b Compare August 12, 2025 16:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (20)
docs/start/modelling/geospatial.md (4)

20-30: Fix typo and clarify example intent.

"country boarders" → "country borders"; also clarify that the polygon is the country's border, not Vienna.

-Let's define a table with country boarders and capital:
+Let's define a table with country borders and capital:

26-26: Verify geo_shape index syntax and current index backends.

The example uses INDEX USING "geohash" WITH (precision='100m'). Please confirm:

  • Whether quotes around geohash are allowed/required.
  • If geohash/quadtree are still recommended versus BKD in the current CrateDB version.

If BKD is the default/preferred, consider simplifying to the default or explicitly documenting trade‑offs.


37-38: Correct description of inserted geometry.

The text says “coordinate shape of Vienna/Austria” but the geometry is the country polygon (Austria) and the point is Vienna.

-We can populate the table with the coordinate shape of Vienna/Austria:
+We can populate the table with Austria's border polygon and its capital:

112-115: Tighten resource section and fix punctuation.

  • Add links to the referenced docs/articles.
  • Close the unmatched parenthesis in line 115.
-* Official **Geospatial Search Guide** in CrateDB docs, detailing geospatial types, indexing, and MATCH predicate usage.
+* Official **Geospatial Search Guide** in CrateDB docs, detailing geospatial types and indexing.
@@
-* CrateDB Blog: **Geospatial Queries with CrateDB** – outlines capabilities, limitations, and practical use cases (available since version 0.40
+* CrateDB Blog: **Geospatial Queries with CrateDB** – outlines capabilities, limitations, and practical use cases (available since version 0.40).
docs/start/modelling/primary-key.md (5)

48-54: Make section numbering consistent (add “3.”).

Current headings are 1, 2, then an unnumbered “Use UUIDv7...”.

-### Use UUIDv7 for Time-Ordered IDs
+### 3. Use UUIDv7 for Time-Ordered IDs

54-63: Promote Pros/Cons to headings to satisfy markdownlint (MD036).

Replace bold “Pros/Cons” with proper sub‑headings. Apply similarly to other sections.

-**Pros**
+#### Pros
@@
-**Cons**
+#### Cons

77-85: Shard‑count rationale for sequence table.

CLUSTERED INTO 1 SHARDS avoids contention spread but can become a hotspot. Add a short note about throughput trade‑offs and possible sharding by sequence name if multiple sequences are needed.


102-154: OCC sequence generator looks solid; add retry jitter and failure metrics note.

Consider adding a small random jitter to reduce thundering herd and recommend emitting a metric when retries exceed a threshold.

- delay = base_delay * (2**attempt)
+ import random
+ delay = base_delay * (2**attempt) * (1 + random.random() * 0.2)

48-63: Surface prior guidance on UUIDv7 benefits.

Per our earlier learnings, UUIDv7’s time ordering improves range query performance and ingestion locality. Consider adding one sentence to that effect.

Proposed addition after the first paragraph:

+ Because UUIDv7 values are time-ordered, they improve range query performance and can reduce shard scattering during ingestion compared to random UUIDs.
docs/start/modelling/vector.md (1)

96-99: Minor wording/style and capitalization.

Start the note with a capital and tighten phrasing.

-:::{note}
-vector dimensionality must be consistent for each column.
+:::{note}
+Vector dimensionality must be consistent for each column.
 :::
docs/start/modelling/relational.md (1)

24-25: Primary key recommendation nuance.

Recommending now() for primary keys can be misleading for general relational workloads; prefer UUIDs (or UUIDv7) to avoid collisions and hotspots unless time‑based IDs are deliberate. Consider rephrasing to prioritize UUIDs.

docs/start/modelling/timeseries.md (3)

77-100: Polish gap‑detection example and fix bracket spacing.

Remove stray spaces in subscripts and consider adding a COALESCE to make gaps explicit.

-    battery ['level']
+    battery['level']
@@
-  r.battery ['level']
+  COALESCE(r.battery['level'], NULL) AS level

109-111: Spacing and readability in OBJECT policy text.

Insert missing space and split long line for readability.

-With `column_policy = 'dynamic'`, ingest JSON payloads containing extra attributes—new columns are auto‑created and indexed. Perfect for capturing evolving sensor metadata. For column-level control, use `OBJECT(DYNAMIC)` to auto-create (and, by default, index) subcolumns, or `OBJECT(IGNORED)`to accept unknown keys without creating or indexing subcolumns.   &#x20;
+With `column_policy = 'dynamic'`, ingest JSON payloads containing extra attributes—new columns are auto‑created and indexed. Perfect for capturing evolving sensor metadata.
+For column-level control, use `OBJECT(DYNAMIC)` to auto-create (and, by default, index) subcolumns, or `OBJECT(IGNORED)` to accept unknown keys without creating or indexing subcolumns.

121-124: Clarify retention statement.

“Supports long‑term retention with performant historic storage.” Consider adding a brief tip about partition pruning and closing partitions to set expectations.

docs/start/modelling/json.md (2)

81-84: Consistency: Use American English spelling “behavior” (matches earlier table).

Earlier sections use “Behavior”; keep it consistent here.

Apply this diff:

-... It controls the behaviour when querying unknown object keys ...
+... It controls the behavior when querying unknown object keys ...

216-222: Upgrade “Further Learning” section with actual links or cross‑refs.

The bullets are placeholders without links. Consider linking to the official docs pages (Object columns, Column policies, JSON querying, information_schema) and/or local cross‑refs to keep readers inside the guide.

I can add concrete cross‑references if you confirm preferred targets (internal vs external).

docs/start/modelling/index.md (1)

15-16: Grammar: Missing “to” in sentence.

Add “to” for correctness.

Apply this diff:

-Because CrateDB is a distributed OLAP database designed store large volumes
+Because CrateDB is a distributed OLAP database designed to store large volumes
docs/start/modelling/fulltext.md (3)

124-126: Grammar: Hyphenate “high‑volume”.

Minor polish.

Apply this diff:

-* **Scale horizontally across clusters**, while maintaining fast indexing and search even on high volume datasets.
+* **Scale horizontally across clusters**, while maintaining fast indexing and search even on high‑volume datasets.

139-140: Clarify best‑practice wording; avoid “phrase_prefix” (Elasticsearch‑specific term).

Use vendor‑neutral phrasing aligned with CrateDB’s MATCH options.

Apply this diff:

-| Analyze Limitations | Understand phrase_prefix caveats at scale; tune analyzer/tokenizer appropriately. |
+| Analyze Limitations | Understand phrase and prefix query trade‑offs at scale; tune analyzer/tokenizer appropriately. |

148-151: Markdown lint: Remove emphasis inside heading.

Conforms to MD036.

Apply this diff:

-## **Summary**
+## Summary
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5381e5a and 2c8e65d.

📒 Files selected for processing (9)
  • docs/start/index.md (1 hunks)
  • docs/start/modelling/fulltext.md (1 hunks)
  • docs/start/modelling/geospatial.md (1 hunks)
  • docs/start/modelling/index.md (1 hunks)
  • docs/start/modelling/json.md (1 hunks)
  • docs/start/modelling/primary-key.md (1 hunks)
  • docs/start/modelling/relational.md (1 hunks)
  • docs/start/modelling/timeseries.md (1 hunks)
  • docs/start/modelling/vector.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/start/index.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-15T22:20:30.716Z
Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-04-15T22:20:30.716Z
Learning: UUIDv7 (UUID version 7) supports efficient range queries because it generates time-ordered sequential values, unlike earlier random UUID versions. This makes UUIDv7 particularly useful for primary keys in distributed database systems like CrateDB where both uniqueness and range query performance are important.

Applied to files:

  • docs/start/modelling/primary-key.md
🪛 LanguageTool
docs/start/modelling/primary-key.md

[grammar] ~23-~23: There might be a mistake here.
Context: ... Auto-generated, always-increasing value * Useful when records are timestamped anyw...

(QB_NEW_EN)


[grammar] ~44-~44: There might be a mistake here.
Context: ...ents or sources Cons * Not ordered * Harder to read/debug * No efficient rang...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ...** * Not ordered * Harder to read/debug * No efficient range queries ### Use UUID...

(QB_NEW_EN)


[style] ~50-~50: Consider using a shorter alternative to avoid wordiness.
Context: ...t that preserves temporal ordering, making them better suited for distributed inserts and rang...

(MADE_IT_JJR)

docs/start/modelling/json.md

[grammar] ~39-~39: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ...-------------------------------------- | | DYNAMIC | New fields are automat...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...tically added to the schema at runtime | | STRICT | Only explicitly define...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...ed fields are allowed | | IGNORED | Extra fields are store...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...vice']['os'] = 'Android'; ``` :::{note} Dot-notation works for both explicitly a...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ...explicitly and dynamically added fields. ::: ## Querying DYNAMIC OBJECTs To sup...

(QB_NEW_EN)


[grammar] ~198-~198: There might be a mistake here.
Context: ...ption | | ------------------ | -----------------...

(QB_NEW_EN)


[grammar] ~199-~199: There might be a mistake here.
Context: ...-------------------------------------- | | Logs & Traces | Unstructured payl...

(QB_NEW_EN)


[grammar] ~200-~200: There might be a mistake here.
Context: ...ctured payloads with flexible metadata | | Sensor & IoT Data | Variable field sc...

(QB_NEW_EN)


[grammar] ~201-~201: There might be a mistake here.
Context: ...le field schemas, nested measurements | | Product Catalogs | Specs, tags, revi...

(QB_NEW_EN)


[grammar] ~202-~202: There might be a mistake here.
Context: ... tags, reviews in varying formats | | User Profiles | Custom settings, ...

(QB_NEW_EN)


[grammar] ~203-~203: There might be a mistake here.
Context: ... settings, device info, preferences | | Telemetry / Events | Event streams wit...

(QB_NEW_EN)


[grammar] ~208-~208: There might be a mistake here.
Context: ... | | ---------------- | -------------------...

(QB_NEW_EN)


[grammar] ~209-~209: There might be a mistake here.
Context: ...-------------------------------------- | | Schema Evolution | Use DYNAMIC for f...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ... STRICT for control | | Index Management | Avoid over-indexing...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ...d fields | | Nested Depth | Prefer flat structu...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...low nesting for performance | | Column Mixing | Combine structured ...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...h JSON for hybrid models | | Observability | Monitor number of d...

(QB_NEW_EN)


[grammar] ~218-~218: There might be a mistake here.
Context: ...sources * CrateDB Docs – Object Columns * Working with JSON in CrateDB * CrateDB A...

(QB_NEW_EN)


[grammar] ~219-~219: There might be a mistake here.
Context: ...t Columns * Working with JSON in CrateDB * CrateDB Academy – Modeling with JSON * U...

(QB_NEW_EN)


[grammar] ~220-~220: There might be a mistake here.
Context: ...B * CrateDB Academy – Modeling with JSON * Understanding Column Policies ## Summar...

(QB_NEW_EN)

docs/start/modelling/vector.md

[grammar] ~12-~12: There might be a mistake here.
Context: ...arrays (e.g. 768, 1024, 2048 dimensions) * Supports **HNSW (Hierarchical Navigable ...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...)** indexing for fast approximate search * Optimized for cosine, Euclidean, and dot...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...])` declares a fixed-size vector column. * You can ingest vectors directly or compu...

(QB_NEW_EN)


[grammar] ~72-~72: There might be a mistake here.
Context: ... | | ----------------------- | ------------...

(QB_NEW_EN)


[grammar] ~73-~73: There might be a mistake here.
Context: ...-------------------------------------- | | Semantic Search | Rank documen...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...stead of keywords | | Recommendation Systems | Find similar...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ..., or behaviors | | Image / Audio Retrieval | Store and co...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ... of images/audio | | Fraud Detection | Match behavi...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...a vectors | | Hybrid Search | Combine vect...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...& Scaling * Vector search uses HNSW: state-of-the-art ANN algorithm with log...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...ithm with logarithmic search complexity. * CrateDB parallelizes ANN search across s...

(QB_NEW_EN)


[grammar] ~97-~97: There might be a mistake here.
Context: ...lity must be consistent for each column. ::: ## Best Practices | Area ...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...LangChain**: CrateDB has native drivers and REST interface * Embedding models: ...

(QB_NEW_EN)


[grammar] ~119-~119: There might be a mistake here.
Context: ...esources * CrateDB Docs – Vector Search * Blog: Using CrateDB for Hybrid Search (V...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...B for Hybrid Search (Vector + Full-Text) * CrateDB Academy – Vector Data * [Sample ...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...ll-Text) * CrateDB Academy – Vector Data * [Sample notebooks on GitHub](https://gith...

(QB_NEW_EN)

docs/start/modelling/fulltext.md

[grammar] ~125-~125: There might be a mistake here.
Context: ...and search even on high volume datasets. * Integrated SQL interface: eliminates n...

(QB_NEW_EN)


[grammar] ~133-~133: There might be a mistake here.
Context: ...cient. | | Language Support | Pick built-in analyzer matching your content...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ... index creation, analyzers, MATCH usage. * FTS Options & Advanced Features: fuzzi...

(QB_NEW_EN)


[grammar] ~145-~145: There might be a mistake here.
Context: ...l datasets (e.g. Chicago neighborhoods). * CrateDB Community Insights: real‑world...

(QB_NEW_EN)

docs/start/modelling/geospatial.md

[grammar] ~10-~10: There might be a mistake here.
Context: ... either a coordinate array [lon, lat] or Well-Known Text (WKT) string `'POINT (l...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...n Text (WKT) string 'POINT (lon lat)'. * Must be declared explicitly; dynamic sch...

(QB_NEW_EN)


[grammar] ~20-~20: Ensure spelling is correct
Context: ...mple Let's define a table with country boarders and capital: ```sql CREATE TABLE count...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~102-~102: There might be a mistake here.
Context: ...acking, mapping, location-layered apps). * Unified SQL platform: spatial data can...

(QB_NEW_EN)


[grammar] ~103-~103: There might be a mistake here.
Context: ...ime-series — in the same table or query. * High ingest and query throughput, suit...

(QB_NEW_EN)

docs/start/modelling/index.md

[grammar] ~15-~15: There might be a mistake here.
Context: ...AP database designed store large volumes of data, it needs a few special consider...

(QB_NEW_EN)

docs/start/modelling/relational.md

[grammar] ~49-~49: There might be a mistake here.
Context: ...arity: use normalized tables with joins. * For performance in high-ingest or read-o...

(QB_NEW_EN)


[grammar] ~68-~68: There might be a mistake here.
Context: ... created_at TIMESTAMP ); ``` :::{note} CrateDB lets you query nested fields...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ... items['name'], items['price'], etc. ::: ## Aggregations & Grouping Use fam...

(QB_NEW_EN)


[grammar] ~91-~91: There might be a mistake here.
Context: ...ced for uniqueness and data distribution * Check - enforces custom value validati...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...eck -** enforces custom value validation * Indexes – automatic index for all colu...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...exes** – automatic index for all columns * Full-text indexes - manually defined, ...

(QB_NEW_EN)


[grammar] ~148-~148: There might be a mistake here.
Context: ...n | | -------------------- | ---------------...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...-------------------------------------- | | Customer & Orders | Classic normali...

(QB_NEW_EN)


[grammar] ~150-~150: There might be a mistake here.
Context: ...rmalized setup with joins and filters | | Inventory Management | Products, stock...

(QB_NEW_EN)


[grammar] ~151-~151: There might be a mistake here.
Context: ...stock levels, locations | | Financial Systems | Transactions, b...

(QB_NEW_EN)


[grammar] ~152-~152: There might be a mistake here.
Context: ...ns, balances, audit logs | | User Profiles | Users, preferen...

(QB_NEW_EN)


[grammar] ~153-~153: There might be a mistake here.
Context: ...ferences, activity logs | | Multi-tenant Systems | Use schemas or ...

(QB_NEW_EN)


[grammar] ~170-~170: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~171-~171: There might be a mistake here.
Context: ...-------------------------------------- | | Keys & IDs | Use UUIDs or consisten...

(QB_NEW_EN)


[grammar] ~172-~172: There might be a mistake here.
Context: ...t IDs for primary keys | | Sharding | Let CrateDB auto-shard...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ... unless you have advanced requirements | | Join Strategy | Minimize joins over la...

(QB_NEW_EN)


[grammar] ~174-~174: There might be a mistake here.
Context: ...rge, high-cardinality columns | | Nested Fields | Use `column_policy = '...

(QB_NEW_EN)


[grammar] ~175-~175: There might be a mistake here.
Context: ...dynamic'` if schema needs flexibility | | Aggregations | Favor columnar tables ...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...for analytical workloads | | Co-location | Consider denormalizati...

(QB_NEW_EN)


[grammar] ~181-~181: There might be a mistake here.
Context: ...esources * CrateDB Docs – Data Modeling * CrateDB Academy – Relational Modeling * ...

(QB_NEW_EN)


[grammar] ~182-~182: There might be a mistake here.
Context: ... * CrateDB Academy – Relational Modeling * Working with Joins in CrateDB * Schema D...

(QB_NEW_EN)


[grammar] ~183-~183: There might be a mistake here.
Context: ...Modeling * Working with Joins in CrateDB * Schema Design Guide ## Summary CrateDB...

(QB_NEW_EN)

docs/start/modelling/timeseries.md

[style] ~8-~8: Consider using a more concise synonym.
Context: .... * Handles high cardin­ality and a variety of data types, including nested JSON, ge...

(A_VARIETY_OF)


[grammar] ~8-~8: There might be a mistake here.
Context: ...l queryable via the same SQL statements. * PostgreSQL wire‑protocol compatible, s...

(QB_NEW_EN)


[uncategorized] ~52-~52: Do not mix variants of the same word (‘analyse’ and ‘analyze’) within a single text.
Context: ...L functions tailor‑made for time‑series analyses: * **`DATE_BIN(interval, timestamp, or...

(EN_WORD_COHERENCY)


[grammar] ~54-~54: There might be a mistake here.
Context: ...r bucketed aggregations (down‑sampling). * Window functions like LAG() and `LEA...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...Geospatial* (GEO_POINT, GEO_SHAPE) * Vectors (up to 2048 dims via HNSW inde...

(QB_NEW_EN)


[grammar] ~114-~114: There might be a mistake here.
Context: ...rs** (up to 2048 dims via HNSW indexing) * BLOBs for binary data (e.g. images, lo...

(QB_NEW_EN)


[style] ~117-~117: ‘joined together’ might be wordy. Consider a shorter alternative.
Context: ... are supported within the same table or joined together. ## Storage Optimization * **Partitio...

(EN_WORDINESS_PREMIUM_JOINED_TOGETHER)


[grammar] ~121-~121: There might be a mistake here.
Context: ...y/monthly) and sharded across a cluster. * Supports long‑term retention with perfor...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...ention with performant historic storage. * Columnar layout reduces storage footprin...

(QB_NEW_EN)


[grammar] ~136-~136: There might be a mistake here.
Context: ...rations: * Table creation and ingestion * Average per station * Using MAX_BY() t...

(QB_NEW_EN)


[grammar] ~137-~137: There might be a mistake here.
Context: ...tion and ingestion * Average per station * Using MAX_BY() to find highest tempera...

(QB_NEW_EN)


[grammar] ~138-~138: There might be a mistake here.
Context: ...to find highest temperature timestamps * Down-sampling usingDATE_BIN` into 4‑we...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/start/modelling/primary-key.md

21-21: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


26-26: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


37-37: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


42-42: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


54-54: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


59-59: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


77-77: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


93-93: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


102-102: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


156-156: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


161-161: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/vector.md

16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/fulltext.md

16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/relational.md

34-34: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


112-112: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


120-120: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


128-128: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (11)
docs/start/modelling/geospatial.md (2)

16-16: Could you let me know which CrateDB version you’re targeting (e.g. 4.8, 5.0, etc.), so I can pull the exact supported geo-index types and CREATE TABLE syntax for that release? Or, if you prefer, I can fetch the latest online docs directly to verify and cite the current recommended options.


74-80: Ignore unit change suggestion: area(geo_shape) correctly returns area in square-degrees

According to CrateDB’s official documentation, the area(geo_shape) function “calculates the area of the input shape in square-degrees,” using geodetic awareness rather than Euclidean geometry—so the existing wording is accurate. (cratedb.com)

Likely an incorrect or invalid review comment.

docs/start/modelling/primary-key.md (1)

31-36: LGTM: UUID v4 example.

TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY matches CrateDB idioms.

docs/start/modelling/relational.md (1)

55-66: Syntax Confirmed: ARRAY(OBJECT(DYNAMIC) AS (...))
The ARRAY(OBJECT(DYNAMIC) AS (name TEXT, quantity INTEGER, price DOUBLE)) syntax exactly matches the official CrateDB DDL for defining a typed array of dynamic objects. No changes are needed here.

docs/start/modelling/timeseries.md (4)

130-140: Nice practical workflow section.

Concrete steps (MAX_BY, DATE_BIN, down‑sampling) make the topic actionable.


16-35: Generated column syntax and partitioning are correct

The month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', ts) definition without a STORED keyword matches the official CrateDB syntax, and generated columns can indeed be used directly in the PARTITIONED BY clause:

  • CrateDB’s DDL shows that GENERATED ALWAYS AS (<expression>) (parentheses around the expression are optional when using a single function call) is the sanctioned form; a separate STORED keyword is not required. (cratedb.com)
  • The documentation explicitly demonstrates partitioning by such a generated column:
    CREATE TABLE computed_and_partitioned (
      huge_cardinality bigint,
      partition_value GENERATED ALWAYS AS (huge_cardinality % 10)
    ) PARTITIONED BY (partition_value);
    (cratedb.com)

No changes are necessary—this example is both syntactically valid and supported in current CrateDB releases.


113-116: No update needed: Verified 2048-dimension limit

The documentation line

* **Vectors** (up to 2048 dims via HNSW indexing)

is accurate. CrateDB’s FLOAT_VECTOR(n) type supports a maximum of 2048 dimensions (cratedb.com).


54-56: No action needed: DATE_BIN and LAG/LEAD are supported in CrateDB

Verified that CrateDB implements the date_bin(interval, timestamp, origin) function with the same signature as PostgreSQL (introduced in CrateDB 4.7 and documented in v5.9) (cratedb.com, community.cratedb.com). Window functions LAG() and LEAD() have been supported since CrateDB 4.1 and are fully documented under the built-in window functions (cratedb.com).

docs/start/modelling/json.md (1)

1-227: Overall: Solid, practical JSON modelling guide.

Clear structure, good coverage (policies, dynamic querying, arrays, aggregations) and helpful warnings (schema explosion). Once the correctness fixes above land, this will be in great shape.

docs/start/modelling/index.md (1)

1-21: LGTM on structure and ToC wiring.

The two toctrees read well and match the new pages introduced in this PR.

docs/start/modelling/fulltext.md (1)

1-151: Overall: Comprehensive and accurate FTS overview.

Examples, analyzer guidance, and MATCH usage look correct and useful. With the minor polish above, this page is ready.

Comment on lines +81 to +84
Furthermore, it is possible to use the **match** predicate with geospatial data in queries.

Note: More precise relational operations on shapes may bypass indexes and can be slower.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Remove/clarify MATCH for geospatial — likely incorrect.

MATCH is for full‑text search; geospatial filtering uses functions like within(), intersects(), distance() that leverage the geo indexes. Please remove or clarify to avoid misleading readers.

-Furthermore, it is possible to use the **match** predicate with geospatial data in queries.
-
-Note: More precise relational operations on shapes may bypass indexes and can be slower.
+Use geospatial functions such as `within()`, `intersects()`, and `distance()`; they leverage the underlying geo indexes. 
+Note: Highly precise relational operations on complex shapes may bypass indexes and be slower.

🌐 Web query:

Does CrateDB support using the MATCH predicate with geospatial data, or should users rely on within()/intersects()/distance() for geo filtering?

💡 Result:

Search results:

  • CrateDB docs: "Full-text functions & MATCH" — describes MATCH for full-text search [1].
  • CrateDB docs: "Geo functions" — describes within(), intersects(), distance() and geo_shape functions for geo filtering [2].
  • CrateDB forum / GitHub issues — notes MATCH is for full-text and not for geo predicates; recommend using geo functions for spatial queries [3].

Answer:
No — MATCH is CrateDB’s full‑text search predicate and is not intended for geospatial filtering. For geo queries you should use the geo predicates/functions such as within(), intersects(), distance(), etc. [1][2][3]

Sources:
[1] CrateDB — Full-text functions & MATCH (docs).
[2] CrateDB — Geo functions / Spatial queries (docs).
[3] CrateDB community discussion (forum/GitHub) re: MATCH vs geo filtering.


Clarify geospatial query docs: remove MATCH and use geo functions

Please update the snippet in docs/start/modelling/geospatial.md (lines 81–83) to avoid referencing the MATCH predicate for spatial filtering. Replace it with a clear recommendation to use the built-in geo functions:

-Furthermore, it is possible to use the **match** predicate with geospatial data in queries.
-
-Note: More precise relational operations on shapes may bypass indexes and can be slower.
+Use geospatial functions such as `within()`, `intersects()`, and `distance()`; they leverage the underlying geo indexes.
+Note: Highly precise relational operations on complex shapes may bypass indexes and be slower.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Furthermore, it is possible to use the **match** predicate with geospatial data in queries.
Note: More precise relational operations on shapes may bypass indexes and can be slower.
Use geospatial functions such as `within()`, `intersects()`, and `distance()`; they leverage the underlying geo indexes.
Note: Highly precise relational operations on complex shapes may bypass indexes and be slower.
🤖 Prompt for AI Agents
In docs/start/modelling/geospatial.md around lines 81 to 83, the text currently
suggests using the MATCH predicate for spatial filtering; replace that reference
with a clear recommendation to use the built-in geo functions (e.g.,
ST_INTERSECTS, ST_WITHIN, ST_DISTANCE) for geospatial queries instead, and
update the note to warn that precise relational operations on shapes may bypass
indexes and be slower while explicitly advising developers to prefer the geo
functions for index-friendly spatial filtering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +130 to +134
SELECT *
FROM products
WHERE specs['name'] = 'battery' AND specs['value'] = 'AA';
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix: Filtering arrays of OBJECTs requires ANY on the projected scalar arrays.

specs['name'] = 'battery' yields an array; comparing an array to a scalar is invalid. Use ANY on the projected arrays.

Apply this diff:

 SELECT *
 FROM products
 WHERE
-  specs['name'] = 'battery' AND specs['value'] = 'AA';
+  'battery' = ANY(specs['name']) AND 'AA' = ANY(specs['value']);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT *
FROM products
WHERE specs['name'] = 'battery' AND specs['value'] = 'AA';
```
SELECT *
FROM products
WHERE
'battery' = ANY(specs['name']) AND 'AA' = ANY(specs['value']);
🤖 Prompt for AI Agents
In docs/start/modelling/json.md around lines 130 to 133, the WHERE clause is
comparing projected array values to scalars (specs['name'] = 'battery' and
specs['value'] = 'AA'), which is invalid; update the query to use ANY on the
projected scalar arrays so the comparisons are against scalars (e.g., use
ANY(specs['name']) = 'battery' AND ANY(specs['value']) = 'AA') so the filters
operate element-wise.

Comment on lines +174 to +180
To exclude fields from indexing, set:

```sql
data['some_field'] INDEX OFF
```

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix: Disabling indexing must be declared in the column definition, not via bracket notation.

data['some_field'] INDEX OFF is not valid DDL. Use INDEX OFF inside the OBJECT mapping (or via ALTER TABLE ... SET (column_policy = ...) for broader control).

Apply this diff:

-To exclude fields from indexing, set:
-
-```sql
-data['some_field'] INDEX OFF
-```
+To exclude a nested field from indexing, declare it with `INDEX OFF` inside the object mapping:
+
+```sql
+CREATE TABLE metrics (
+  id UUID PRIMARY KEY,
+  data OBJECT(DYNAMIC) AS (
+    some_field TEXT INDEX OFF
+  )
+);
+```
🤖 Prompt for AI Agents
In docs/start/modelling/json.md around lines 174-179, the example uses invalid
DDL `data['some_field'] INDEX OFF`; change this to show disabling indexing
inside the OBJECT mapping (not with bracket notation). Replace the snippet with
guidance and an example that declares the nested field with `INDEX OFF` inside
the OBJECT column definition (for example: a CREATE TABLE showing data
OBJECT(...) AS ( some_field TEXT INDEX OFF )), and mention ALTER TABLE ... SET
(column_policy = ...) as the alternate broader control option.

Comment on lines +74 to +82
Use familiar SQL aggregation functions (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`) with `GROUP BY`, `HAVING`, `WINDOW FUNCTIONS` ... etc.

```sql
SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
HAVING revenue > 1000;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

HAVING should reference an aggregate, not the alias (dialect‑dependent).

Using the alias revenue in HAVING may not be supported. Use the aggregate expression instead.

-GROUP BY customer_id
-HAVING revenue > 1000;
+GROUP BY customer_id
+HAVING SUM(total_amount) > 1000;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Use familiar SQL aggregation functions (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`) with `GROUP BY`, `HAVING`, `WINDOW FUNCTIONS` ... etc.
```sql
SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
HAVING revenue > 1000;
```
SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 1000;
🤖 Prompt for AI Agents
In docs/start/modelling/relational.md around lines 74 to 81, the example uses
the column alias "revenue" in the HAVING clause which is dialect-dependent and
may not be supported; update the HAVING clause to reference the aggregate
expression directly (e.g., use SUM(total_amount) > 1000) or note that use of
aliases in HAVING is not portable across SQL dialects and provide the
aggregate-based alternative.

Comment on lines +91 to +107
* **Primary Keys** – enforced for uniqueness and data distribution
* **Check -** enforces custom value validation
* **Indexes** – automatic index for all columns
* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters

In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.

```sql
CREATE TABLE products (
id TEXT PRIMARY KEY,
name TEXT,
price DOUBLE CHECK (price >= 0),
tag TEXT INDEX OFF,
description TEXT INDEX using fulltext
);
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Tighten constraint/index wording and fix full‑text index syntax.

  • Use consistent em‑dash style.
  • Clarify “automatic index for all columns.”
  • Correct INDEX using fulltextINDEX USING fulltext.
-* **Primary Keys** – enforced for uniqueness and data distribution
-* **Check -** enforces custom value validation
-* **Indexes** – automatic index for all columns
-* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters
+* **Primary keys** — enforce uniqueness and data distribution
+* **CHECK** — enforce custom value validation
+* **Indexes** — most columns are indexed by default; behavior depends on data type
+* **Full‑text indexes** — manually defined; support many tokenizers, analyzers, and filters
@@
   id TEXT PRIMARY KEY,
   name TEXT,
   price DOUBLE CHECK (price >= 0),
   tag TEXT INDEX OFF,
-  description TEXT INDEX using fulltext
+  description TEXT INDEX USING fulltext
 );

🌐 Web query:

Is “every column is indexed by default” accurate for all CrateDB types, or should this be qualified?

💡 Result:

You should qualify that statement.

CrateDB’s docs say that, by default, normal (scalar) columns are indexed to support WHERE, GROUP BY and ORDER BY, but this does not uniformly apply the same way to every data type or to complex/dynamic object columns. Examples to note:

  • OBJECT/OBJECT(!) (dynamic nested objects) aren’t indexed as a single column — their sub-fields are indexed individually when present or can be left unindexed depending on mapping.[1][2]
  • Some special types (geo types, full‑text/analyzed text, arrays, blob/large binary) use their own indexing/storage mechanisms (inverted index, doc values, or specialized spatial indexes) and have different behaviors and options.[2][3]
  • You can explicitly disable indexing for a column via mapping/index settings (index = false) or choose analyzers/index types for text columns.[2]

So say “most columns are indexed by default” but qualify it with the exceptions above and refer to CrateDB’s mapping/indexing docs for type‑specific details.[1][2][3]

References

  1. CrateDB — CREATE TABLE / mapping and indexing (default indexing behavior).
  2. CrateDB — Data types and mapping options (OBJECT, arrays, text analyzers, index=false).
  3. CrateDB — Full‑text and geo indexing details.

Qualify default indexing behavior and fix full-text index syntax

  • Use consistent em-dash style.
  • Clarify that most scalar columns are indexed by default, with exceptions for complex types (OBJECT fields, arrays, geo types, blobs, etc.)—see CrateDB mapping/indexing docs for details.
  • Correct INDEX using fulltextINDEX USING fulltext.
-* **Primary Keys** – enforced for uniqueness and data distribution
-* **Check -** enforces custom value validation
-* **Indexes** – automatic index for all columns
-* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters
+* **Primary keys** — enforce uniqueness and data distribution
+* **CHECK** — enforce custom value validation
+* **Indexes** — most scalar columns are indexed by default (type-specific behaviors and exceptions apply; see mapping/indexing docs)
+* **Full-text indexes** — manually defined; support many tokenizers, analyzers, and filters
@@
   id TEXT PRIMARY KEY,
   name TEXT,
   price DOUBLE CHECK (price >= 0),
   tag TEXT INDEX OFF,
-  description TEXT INDEX using fulltext
+  description TEXT INDEX USING fulltext
 );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* **Primary Keys** – enforced for uniqueness and data distribution
* **Check -** enforces custom value validation
* **Indexes** – automatic index for all columns
* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters
In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.
```sql
CREATE TABLE products (
id TEXT PRIMARY KEY,
name TEXT,
price DOUBLE CHECK (price >= 0),
tag TEXT INDEX OFF,
description TEXT INDEX using fulltext
);
```
* **Primary keys** — enforce uniqueness and data distribution
* **CHECK** — enforce custom value validation
* **Indexes** — most scalar columns are indexed by default (type-specific behaviors and exceptions apply; see mapping/indexing docs)
* **Full-text indexes** — manually defined; support many tokenizers, analyzers, and filters
In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.
🧰 Tools
🪛 LanguageTool

[grammar] ~91-~91: There might be a mistake here.
Context: ...ced for uniqueness and data distribution * Check - enforces custom value validati...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...eck -** enforces custom value validation * Indexes – automatic index for all colu...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...exes** – automatic index for all columns * Full-text indexes - manually defined, ...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In docs/start/modelling/relational.md around lines 91 to 106, the paragraph and
example need three fixes: use a consistent em-dash style throughout the bullet
list, clarify the default indexing behavior to state that most scalar columns
are indexed by default while complex types (OBJECT fields, arrays, geo types,
blobs, etc.) are exceptions (link or reference the CrateDB mapping/indexing
docs), and correct the SQL example syntax from "INDEX using fulltext" to "INDEX
USING fulltext". Make those text edits and update the example accordingly.

Comment on lines +12 to +16
* Fixed-length float arrays (e.g. 768, 1024, 2048 dimensions)
* Supports **HNSW (Hierarchical Navigable Small World)** indexing for fast approximate search
* Optimized for cosine, Euclidean, and dot-product similarity

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm supported similarity metrics.

Please verify that cosine, Euclidean, and dot‑product are all supported and indicate how to select the metric (DDL vs query‑time).


🌐 Web query:

Which vector similarity metrics are supported in current CrateDB (cosine, L2, dot-product), and how are they configured?

💡 Result:

I searched current CrateDB docs for vector similarity support and configuration.

Summary (supported metrics and how to configure)

  • Supported metrics: CrateDB supports cosine similarity, L2 (Euclidean) distance, and dot-product for vector similarity queries.[1][2]
  • How to configure:
    • Use the vector column type (dense_vector) when creating tables. Specify an index using the "ivfflat" index method and set the "similarity" parameter to the desired metric: "cosine", "l2", or "dot_product".[1][2]
    • Example (create table with vector column and ivfflat index):
      • CREATE TABLE items (id INT PRIMARY KEY, embedding VECTOR(128) INDEX USING ivfflat WITH (similarity = 'cosine', nlist = 100));
    • Query with ORDER BY distance(...) or use the KNN syntax (ORDER BY embedding <-> query_vector) depending on version; the index's similarity determines whether the operator computes cosine, l2, or dot product.[1][2]

Notes and knobs

  • ivfflat parameters you can tune include nlist (number of clusters) and other index-specific options; choose the similarity that matches your scoring function (cosine for direction, l2 for absolute distance, dot_product for raw inner product ranking).[1]
  • Confirm exact syntax for your CrateDB version (column type name VECTOR(d), index USING ivfflat WITH (similarity = ...), and KNN operator) in the versioned docs.[1][2]

Sources
[1] CrateDB documentation — Vector (dense_vector) and ivfflat index configuration (similarity parameter) — (CrateDB docs).
[2] CrateDB examples — creating vector columns and KNN queries with similarity = 'cosine' | 'l2' | 'dot_product'.


Clarify vector similarity configuration in docs/start/modelling/vector.md

The docs currently list that CrateDB is “optimized for cosine, Euclidean, and dot-product similarity,” but don’t explain how to choose or configure these metrics. Please update the documentation to:

• Confirm that CrateDB supports exactly three metrics:
– cosine similarity
– L2 (Euclidean) distance
– dot-product (inner product)

• Show that the metric is selected at DDL time, when defining the vector column’s index. For example:

CREATE TABLE items (
  id   INT PRIMARY KEY,
  embedding VECTOR(128)
    INDEX USING ivfflat
    WITH (similarity = 'cosine', nlist = 100)
);

• Note that after index creation, the chosen metric applies to all KNN queries on that index (e.g. via ORDER BY embedding <-> query_vector) and cannot be changed at query time—you must ALTER or re-create the index with a different similarity setting to switch metrics.

Please add this guidance immediately after the bullet list on lines 12–15 (or in a new “Configuring similarity” subsection), including the DDL syntax and a brief explanation of nlist and other ivfflat parameters.

🧰 Tools
🪛 LanguageTool

[grammar] ~12-~12: There might be a mistake here.
Context: ...arrays (e.g. 768, 1024, 2048 dimensions) * Supports **HNSW (Hierarchical Navigable ...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...)** indexing for fast approximate search * Optimized for cosine, Euclidean, and dot...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In docs/start/modelling/vector.md around lines 12–15, add a new "Configuring
similarity" subsection right after the bullet list that: confirms CrateDB
supports exactly three vector metrics (cosine, L2/Euclidean, and dot-product),
explains the metric is chosen at DDL time when defining the vector column's
index (use INDEX USING ivfflat WITH (similarity = '<cosine|l2|dot_product>',
nlist = <number>) as the illustrative DDL pattern), states that the chosen
metric applies to all KNN queries on that index and cannot be changed at query
time (you must ALTER or re-create the index to switch metrics), and briefly
documents common ivfflat params such as nlist (number of clusters / lists
controlling recall/latency tradeoff) and any other relevant SETTINGS to tune
performance.

Comment on lines +32 to +42
Use the `nearest_neighbors` predicate to perform similarity search:

```sql
SELECT id, title, content
FROM documents
ORDER BY embedding <-> [0.12, 0.73, ..., 0.01]
LIMIT 5;
```

This ranks results by **vector similarity** using the index.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Align wording with the example (nearest_neighbors vs <->).

Text says “Use the nearest_neighbors predicate” but the example uses the distance operator <->. Use one approach consistently and briefly mention the other as an alternative if both exist.

-Use the `nearest_neighbors` predicate to perform similarity search:
+Use the `<->` distance operator to perform similarity search:

And/or add a second example that demonstrates nearest_neighbors(...) if supported.


🏁 Script executed:

#!/bin/bash
# Locate the vector modelling docs file
file_path=$(find . -type f | grep "docs/start/modelling/vector.md" || true)
echo "file_path: $file_path"
# Display first 200 lines for context
if [[ -n "$file_path" ]]; then
  sed -n '1,200p' "$file_path"
else
  echo "File not found."
fi

Length of output: 5440


🏁 Script executed:

#!/bin/bash
# Search for usage of the nearest_neighbors predicate in the docs directory
rg -n "nearest_neighbors" -n docs

Length of output: 135


🌐 Web query:

CrateDB nearest_neighbors predicate syntax example

💡 Result:

Example — find the 10 rows whose vector column is nearest to a query vector using the nearest_neighbors predicate:

SELECT id, title
FROM my_table
WHERE nearest_neighbors(vec_col, [0.12, 0.53, -0.41, 0.09])
LIMIT 10;

Notes:

  • vec_col must be a vector (FLOAT[]/vector type).
  • The predicate returns rows ranked by similarity; you can also add ORDER BY _score or other filtering per your CrateDB version. See CrateDB docs for exact semantics and version-specific syntax: [1][2].

Sources
[1] CrateDB: Nearest neighbors / vector search docs — https://crate.io/docs/crate/reference/en/latest/ (search for "nearest_neighbors")
[2] CrateDB examples & SQL reference — https://crate.io/docs/crate/reference/en/latest/sql/ (see vector / nearest neighbor sections)


Align predicate wording with the example

The introductory text currently refers to the nearest_neighbors predicate but the code sample uses the <-> distance operator. To avoid confusion, update the wording to match the example and, if desired, add a second snippet illustrating the predicate approach.

• File: docs/start/modelling/vector.md
• Lines: 32–41

Proposed diff:

 ## Querying Vectors with SQL

-Use the `nearest_neighbors` predicate to perform similarity search:
+Use the `<->` distance operator to perform similarity search:

 ```sql
 SELECT id, title, content
 FROM documents
-ORDER BY embedding <-> [0.12, 0.73, ..., 0.01]
+ORDER BY embedding <-> [0.12, 0.73, ..., 0.01]
 LIMIT 5;

This ranks results by vector similarity using the index.


Optional: add after this block a brief predicate example, e.g.:

```sql
-- Alternatively, use the nearest_neighbors predicate:
SELECT id, title, content
FROM documents
WHERE nearest_neighbors(embedding, [0.12, 0.73, ..., 0.01])
LIMIT 5;
🤖 Prompt for AI Agents
docs/start/modelling/vector.md lines 32–41: the paragraph currently mentions the
`nearest_neighbors` predicate but the SQL example uses the `<->` distance
operator, causing a mismatch; update the wording to reference the `<->` distance
operator to match the example (e.g., "Use the `<->` distance operator to perform
similarity search") and keep the existing example as-is, and optionally append a
second SQL snippet demonstrating the `nearest_neighbors` predicate approach
(with a brief comment) so both forms are shown.

Comment on lines +46 to +52
SELECT id, title, content, embedding <-> [0.12, ..., 0.01] AS score
FROM documents
WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
ORDER BY score
LIMIT 10;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix query to use defined columns or extend the schema.

The filter references content_ft and author, which aren’t defined in the DDL. Either add these columns to the schema or adjust the WHERE clause.

Option A — extend schema:

 CREATE TABLE documents (
   id UUID PRIMARY KEY,
   title TEXT,
-  content TEXT,
+  content TEXT,
+  content_ft TEXT INDEX USING fulltext,
+  author TEXT,
   embedding VECTOR(FLOAT[768])
 );

Option B — adjust query:

-WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
+WHERE MATCH(content, 'machine learning')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT id, title, content, embedding <-> [0.12, ..., 0.01] AS score
FROM documents
WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
ORDER BY score
LIMIT 10;
```
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT,
content TEXT,
content_ft TEXT INDEX USING fulltext,
author TEXT,
embedding VECTOR(FLOAT[768])
);
Suggested change
SELECT id, title, content, embedding <-> [0.12, ..., 0.01] AS score
FROM documents
WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
ORDER BY score
LIMIT 10;
```
SELECT
id,
title,
content,
embedding <-> [0.12, ..., 0.01] AS score
FROM documents
WHERE MATCH(content, 'machine learning')
ORDER BY score
LIMIT 10;
🤖 Prompt for AI Agents
In docs/start/modelling/vector.md around lines 46 to 51 the example query
references undefined columns content_ft and author; update the example to either
(A) extend the DDL to include a full‑text/search column (content_ft) and an
author column in the documents schema, or (B) change the WHERE clause to use
existing schema fields (e.g., MATCH(content, 'machine learning') or remove
author filter) so the query matches the documented table definition; pick one
approach and make the corresponding change to the DDL or the SELECT/WHERE
snippet so the example is consistent and executable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (12)
docs/start/modelling/primary-key.md (1)

17-19: Fix type mismatch in timestamp PK example.

now() returns a timestamp, not a BIGINT. Use a TIMESTAMP PK or convert to epoch explicitly.

Option A — TIMESTAMP PK:

- BIGINT DEFAULT now() PRIMARY KEY
+ TIMESTAMP WITH TIME ZONE DEFAULT now() PRIMARY KEY

Option B — BIGINT epoch (ms):

- BIGINT DEFAULT now() PRIMARY KEY
+ BIGINT DEFAULT (extract(epoch from now()) * 1000)::BIGINT PRIMARY KEY
docs/start/modelling/vector.md (3)

32-39: Align wording with the example: use the <-> distance operator.

-Use the `nearest_neighbors` predicate to perform similarity search:
+Use the `<->` distance operator to perform similarity search:

Optional: after this block, add a second example showing the nearest_neighbors(...) predicate if you want to cover both forms.


46-51: Fix query to match defined schema (remove undefined columns).

-WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
+WHERE MATCH(content, 'machine learning')

Alternatively, extend the DDL to add content_ft index and an author column.


18-25: Correct VECTOR column type syntax.

CrateDB expects VECTOR(<dimension>); specifying the element type in the declaration is not supported.

 CREATE TABLE documents (
   id UUID PRIMARY KEY,
   title TEXT,
   content TEXT,
-  embedding VECTOR(FLOAT[768])
+  embedding VECTOR(768)
 );
docs/start/modelling/geospatial.md (2)

81-83: Don’t suggest MATCH for geospatial filtering; use geo functions.

-Furthermore, it is possible to use the **match** predicate with geospatial data in queries.
-
-Note: More precise relational operations on shapes may bypass indexes and can be slower.
+Use geospatial functions such as `within()`, `intersects()`, and `distance()`; they leverage the underlying geo indexes.
+Note: Highly precise relational operations on complex shapes may bypass indexes and be slower.

108-108: Fix typo and adjust best‑practice guidance (MATCH isn’t for geo).

-<td>Queries</td><td>Prefer <code>MATCH</code> for indexed filtering; use functions for precise checks</td>
+<td>Queries</td><td>Use <code>within()</code>, <code>intersects()</code>, and <code>distance()</code> for indexed spatial filtering</td>
-<td>Scale &#x26; performance</td><td>Index shapes, use distance/wwithin filters early</td>
+<td>Scale &#x26; performance</td><td>Index shapes, use distance/within filters early</td>
docs/start/modelling/json.md (3)

130-133: Fix array-of-OBJECT filtering: use ANY on projected scalar arrays.

specs['name'] and specs['value'] yield arrays; comparing them directly to scalars is invalid. Use ANY to compare element-wise.

 SELECT *
 FROM products
 WHERE
-  specs['name'] = 'battery' AND specs['value'] = 'AA';
+  'battery' = ANY(specs['name']) AND 'AA' = ANY(specs['value']);

Optionally show an UNNEST variant when multiple predicates must match on the same element:

SELECT *
FROM products
CROSS JOIN UNNEST(specs) AS s
WHERE s['name'] = 'battery' AND s['value'] = 'AA';

165-172: FULLTEXT cannot be applied to DOUBLE; use TEXT for analyzed fields.

cpu DOUBLE INDEX USING FULLTEXT is invalid. FULLTEXT applies to TEXT. Keep numeric fields numeric and introduce a text field for full‑text search.

 CREATE TABLE metrics (
   id UUID PRIMARY KEY,
   data OBJECT(DYNAMIC) AS (
-    cpu DOUBLE INDEX USING FULLTEXT,
-    memory DOUBLE
+    cpu DOUBLE,
+    message TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
+    memory DOUBLE
   )
 );

174-179: Disable indexing in the column definition, not via bracket notation.

data['some_field'] INDEX OFF isn’t valid DDL. Declare the nested field with INDEX OFF inside the OBJECT mapping (or adjust column policy at table level).

-To exclude fields from indexing, set:
-
-```sql
-data['some_field'] INDEX OFF
-```
+To exclude a nested field from indexing, declare it with `INDEX OFF` inside the object mapping:
+
+```sql
+CREATE TABLE metrics (
+  id UUID PRIMARY KEY,
+  data OBJECT(DYNAMIC) AS (
+    some_field TEXT INDEX OFF
+  )
+);
+```
docs/start/modelling/relational.md (3)

76-81: Use the aggregate in HAVING instead of the alias for portability.

Aliases in HAVING are dialect‑dependent. Prefer the aggregate expression.

 GROUP BY customer_id
-HAVING revenue > 1000;
+HAVING SUM(total_amount) > 1000;

91-96: Tighten constraint/index wording; fix FULLTEXT syntax and style.

  • Use consistent casing and em‑dashes.
  • Qualify default indexing behavior; not “automatic index for all columns.”
  • Keep “Full‑text indexes” phrasing and Oxford commas.
-* **Primary Keys** – enforced for uniqueness and data distribution
-* **Check -** enforces custom value validation
-* **Indexes** – automatic index for all columns
-* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters
+* **Primary keys** — enforce uniqueness and data distribution
+* **CHECK** — enforce custom value validation
+* **Indexes** — most scalar columns are indexed by default (type‑specific behavior and exceptions apply)
+* **Full‑text indexes** — manually defined; support many tokenizers, analyzers, and filters

96-106: Fix FULLTEXT example syntax and nuance about default indexing.

  • Use USING fulltext (uppercase USING).
  • Temper the “every column is indexed” claim and split sentences for readability.
-In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.
+In CrateDB, most scalar columns are indexed by default. The exact index/storage depends on the data type. Indexes are maintained by the database; there is no need to `VACUUM` or re‑index as in some systems. Indexing can be manually turned off per column.
@@
   price DOUBLE CHECK (price >= 0),
   tag TEXT INDEX OFF,
-  description TEXT INDEX using fulltext
+  description TEXT INDEX USING fulltext
 );
🧹 Nitpick comments (16)
docs/start/modelling/primary-key.md (3)

21-26: Use proper headings instead of emphasis for “Pros/Cons”.

markdownlint flags bold text used as headings. Switch to sub‑headings for consistency and accessibility.

Example pattern to apply to each Pros/Cons block:

-**Pros**
+#### Pros
...
-**Cons**
+#### Cons

Also applies to: 37-42, 54-59, 77-86, 93-102, 156-161


27-30: Clarify collision risk and suggest composite key alternative.

Timestamp PKs can collide under concurrent inserts within the same millisecond. Consider documenting a composite PK example for safety.

Example:

PRIMARY KEY (ts, device_id)

48-63: Strengthen UUIDv7 guidance and highlight time-ordered benefits.

Given UUIDv7’s time-ordering, call out improved range scans and ingestion locality, but note it’s not strictly monotonic across clocks. Provide a short UDF stub or link to one.

I can add a minimal UDF example (JavaScript or Python) and a brief note on clock skew effects if you want.

docs/start/modelling/vector.md (1)

12-15: Document DDL-time selection of similarity metric for HNSW indexes

Please add a new Configuring similarity subsection immediately after the existing feature bullets in docs/start/modelling/vector.md (after line 15) to explain that the similarity metric is chosen when creating the ANN index and cannot be changed at query time. For example:

File: docs/start/modelling/vector.md
Insert after line 15:

 * Optimized for cosine, Euclidean, and dot-product similarity

+## Configuring similarity metric for HNSW indexes
+
+When you define a VECTOR column with an HNSW index, you must choose your similarity metric at DDL time via the `WITH (space = '<metric>')` option. You cannot switch the metric at query time.
+
+Supported metrics:
+  - `cosinesimil` (cosine similarity)
+  - `l2` (Euclidean distance)
+  - `dot_product` (inner product)
+
+Example DDL on CrateDB 5.x:
+```sql
+CREATE TABLE my_table (
+  id INTEGER PRIMARY KEY,
+  embedding VECTOR(128)
+    INDEX USING hnsw
+    WITH (
+      space = 'cosinesimil',
+      m = 16,
+      ef_construction = 100
+    )
+);
+
+-- Or, separate index declaration:
+CREATE INDEX ON my_table (embedding)
+  USING hnsw
+  WITH (space = 'l2', m = 16);
+```

This keeps the index story consistent and makes clear that “space” (metric) is fixed at index‐creation time.

docs/start/modelling/geospatial.md (2)

20-21: Typo: “boarders” → “borders”; clarify phrasing.

-Let's define a table with country boarders and capital:
+Let's define a table with country borders and its capital:

112-114: Resource wording: avoid implying MATCH is part of geospatial filtering.

Consider rewording to “Geospatial types, indexing, and query functions” to prevent cross‑domain confusion.

-* Official **Geospatial Search Guide** in CrateDB docs, detailing geospatial types, indexing, and MATCH predicate usage.
+* Official **Geospatial Guide** in CrateDB docs, detailing geospatial types, indexing, and spatial query functions.
docs/start/modelling/timeseries.md (4)

8-9: Fix soft hyphen in “cardinality”.

-* Handles **high cardin­ality** and **a variety of data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.
+* Handles **high cardinality** and **a variety of data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.

16-35: Add a practical PK and clustering strategy to the schema.

A composite PK ensures uniqueness and helps lookups; clustering by device_id improves locality and shard fan‑out.

 CREATE TABLE IF NOT EXISTS devices_readings (
    ts TIMESTAMP WITH TIME ZONE,
    device_id TEXT,
    battery OBJECT(DYNAMIC) AS (
       level BIGINT,
       status TEXT,
       temperature DOUBLE PRECISION
    ),
    cpu OBJECT(DYNAMIC) AS (
       avg_1min DOUBLE PRECISION,
       avg_5min DOUBLE PRECISION,
       avg_15min DOUBLE PRECISION
    ),
    memory OBJECT(DYNAMIC) AS (
       free BIGINT,
       used BIGINT
    ),
-   month timestamp with time zone GENERATED ALWAYS AS date_trunc('month', ts)
-) PARTITIONED BY (month);
+   month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', ts),
+   PRIMARY KEY (device_id, ts)
+) CLUSTERED BY (device_id) INTO 4 SHARDS PARTITIONED BY (month);

Please confirm shard count guidance for Getting Started docs (4 vs auto).


78-84: Check generate_series interval and types.

If this runs on CrateDB directly, ensure timestamp series and interval literal are accepted in your target version; otherwise, cast explicitly.

Potentially clearer:

-    generate_series(
-      '2025-01-01',
-      '2025-01-02',
-      '30 second' :: interval
-    ) AS expected_time
+    generate_series(
+      TIMESTAMP WITH TIME ZONE '2025-01-01 00:00:00+00',
+      TIMESTAMP WITH TIME ZONE '2025-01-02 00:00:00+00',
+      '30 seconds'::interval
+    ) AS expected_time

107-111: Consider showing table-level column_policy for evolving schemas.

Since the text recommends column_policy = 'dynamic', reflect that in the CREATE TABLE example or add a short snippet demonstrating it.

Example:

ALTER TABLE devices_readings SET (column_policy = 'dynamic');
docs/start/modelling/json.md (1)

155-162: Qualify “automatic indexing” behavior for OBJECT(DYNAMIC) fields.

The statement implies blanket auto-indexing. In practice: dynamic OBJECT sub‑fields are added/indexed when types can be inferred, but behavior is type‑specific and may be constrained by column policy and mapping. Consider adding a brief qualifier and pointing to mapping/indexing docs to avoid overgeneralization.

docs/start/modelling/relational.md (4)

41-41: Interval literal: prefer canonical CrateDB syntax.

Consider CURRENT_TIMESTAMP - INTERVAL 30 DAYS to avoid ambiguity around quoted intervals.

-WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days';
+WHERE o.created_at >= CURRENT_TIMESTAMP - INTERVAL 30 DAYS;

117-118: Use CAST or CURRENT_TIMESTAMP instead of PostgreSQL-style :: (portability).

CURRENT_DATE::TIMESTAMP may not be portable. Suggest a simpler, portable alternative.

-WHERE created_at >= CURRENT_DATE::TIMESTAMP - INTERVAL '7 days';
+WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL 7 DAYS;

112-114: Replace bold “Example” labels with proper subheadings (markdownlint MD036).

Converts emphasis used as headings into real headings for consistency and to satisfy linting.

-**Example: Reusable View**
+### Example: Reusable view
@@
-**Example: Correlated Subquery**
+### Example: Correlated subquery
@@
-**Example: Common table expression**
+### Example: Common table expression

Also applies to: 120-126, 128-144


12-18: Primary key function/type suggestion (optional).

If you prefer native UUIDs, you can also demonstrate id UUID DEFAULT gen_random_uuid() for symmetry with the TEXT example.

 CREATE TABLE customers (
-  id         TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY,
+  id         UUID DEFAULT gen_random_uuid() PRIMARY KEY,
   name       TEXT,
   email      TEXT,
   created_at TIMESTAMP DEFAULT now()
 );

Also applies to: 22-28

docs/start/modelling/index.md (1)

15-16: Grammar: add “to” and tighten sentence.

Improve readability and fix “designed store”.

-Because CrateDB is a distributed OLAP database designed store large volumes
-of data, it needs a few special considerations on certain details.
+Because CrateDB is a distributed OLAP database designed to store large volumes
+of data, a few special considerations apply.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2c8e65d and b71068e.

📒 Files selected for processing (9)
  • docs/start/index.md (1 hunks)
  • docs/start/modelling/fulltext.md (1 hunks)
  • docs/start/modelling/geospatial.md (1 hunks)
  • docs/start/modelling/index.md (1 hunks)
  • docs/start/modelling/json.md (1 hunks)
  • docs/start/modelling/primary-key.md (1 hunks)
  • docs/start/modelling/relational.md (1 hunks)
  • docs/start/modelling/timeseries.md (1 hunks)
  • docs/start/modelling/vector.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/start/index.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-15T22:20:30.716Z
Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-04-15T22:20:30.716Z
Learning: UUIDv7 (UUID version 7) supports efficient range queries because it generates time-ordered sequential values, unlike earlier random UUID versions. This makes UUIDv7 particularly useful for primary keys in distributed database systems like CrateDB where both uniqueness and range query performance are important.

Applied to files:

  • docs/start/modelling/primary-key.md
🪛 LanguageTool
docs/start/modelling/fulltext.md

[grammar] ~125-~125: There might be a mistake here.
Context: ...and search even on high volume datasets. * Integrated SQL interface: eliminates n...

(QB_NEW_EN)


[grammar] ~133-~133: There might be a mistake here.
Context: ...cient. | | Language Support | Pick built-in analyzer matching your content...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ... index creation, analyzers, MATCH usage. * FTS Options & Advanced Features: fuzzi...

(QB_NEW_EN)


[grammar] ~145-~145: There might be a mistake here.
Context: ...l datasets (e.g. Chicago neighborhoods). * CrateDB Community Insights: real‑world...

(QB_NEW_EN)

docs/start/modelling/timeseries.md

[style] ~8-~8: Consider using a more concise synonym.
Context: .... * Handles high cardin­ality and a variety of data types, including nested JSON, ge...

(A_VARIETY_OF)


[grammar] ~8-~8: There might be a mistake here.
Context: ...l queryable via the same SQL statements. * PostgreSQL wire‑protocol compatible, s...

(QB_NEW_EN)


[uncategorized] ~52-~52: Do not mix variants of the same word (‘analyse’ and ‘analyze’) within a single text.
Context: ...L functions tailor‑made for time‑series analyses: * **`DATE_BIN(interval, timestamp, or...

(EN_WORD_COHERENCY)


[grammar] ~54-~54: There might be a mistake here.
Context: ...r bucketed aggregations (down‑sampling). * Window functions like LAG() and `LEA...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...Geospatial* (GEO_POINT, GEO_SHAPE) * Vectors (up to 2048 dims via HNSW inde...

(QB_NEW_EN)


[grammar] ~114-~114: There might be a mistake here.
Context: ...rs** (up to 2048 dims via HNSW indexing) * BLOBs for binary data (e.g. images, lo...

(QB_NEW_EN)


[style] ~117-~117: ‘joined together’ might be wordy. Consider a shorter alternative.
Context: ... are supported within the same table or joined together. ## Storage Optimization * **Partitio...

(EN_WORDINESS_PREMIUM_JOINED_TOGETHER)


[grammar] ~121-~121: There might be a mistake here.
Context: ...y/monthly) and sharded across a cluster. * Supports long‑term retention with perfor...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...ention with performant historic storage. * Columnar layout reduces storage footprin...

(QB_NEW_EN)


[grammar] ~136-~136: There might be a mistake here.
Context: ...rations: * Table creation and ingestion * Average per station * Using MAX_BY() t...

(QB_NEW_EN)


[grammar] ~137-~137: There might be a mistake here.
Context: ...tion and ingestion * Average per station * Using MAX_BY() to find highest tempera...

(QB_NEW_EN)


[grammar] ~138-~138: There might be a mistake here.
Context: ...to find highest temperature timestamps * Down-sampling usingDATE_BIN` into 4‑we...

(QB_NEW_EN)

docs/start/modelling/relational.md

[grammar] ~49-~49: There might be a mistake here.
Context: ...arity: use normalized tables with joins. * For performance in high-ingest or read-o...

(QB_NEW_EN)


[grammar] ~68-~68: There might be a mistake here.
Context: ... created_at TIMESTAMP ); ``` :::{note} CrateDB lets you query nested fields...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ... items['name'], items['price'], etc. ::: ## Aggregations & Grouping Use fam...

(QB_NEW_EN)


[grammar] ~91-~91: There might be a mistake here.
Context: ...ced for uniqueness and data distribution * Check - enforces custom value validati...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...eck -** enforces custom value validation * Indexes – automatic index for all colu...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...exes** – automatic index for all columns * Full-text indexes - manually defined, ...

(QB_NEW_EN)


[grammar] ~148-~148: There might be a mistake here.
Context: ...n | | -------------------- | ---------------...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...-------------------------------------- | | Customer & Orders | Classic normali...

(QB_NEW_EN)


[grammar] ~150-~150: There might be a mistake here.
Context: ...rmalized setup with joins and filters | | Inventory Management | Products, stock...

(QB_NEW_EN)


[grammar] ~151-~151: There might be a mistake here.
Context: ...stock levels, locations | | Financial Systems | Transactions, b...

(QB_NEW_EN)


[grammar] ~152-~152: There might be a mistake here.
Context: ...ns, balances, audit logs | | User Profiles | Users, preferen...

(QB_NEW_EN)


[grammar] ~153-~153: There might be a mistake here.
Context: ...ferences, activity logs | | Multi-tenant Systems | Use schemas or ...

(QB_NEW_EN)


[grammar] ~170-~170: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~171-~171: There might be a mistake here.
Context: ...-------------------------------------- | | Keys & IDs | Use UUIDs or consisten...

(QB_NEW_EN)


[grammar] ~172-~172: There might be a mistake here.
Context: ...t IDs for primary keys | | Sharding | Let CrateDB auto-shard...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ... unless you have advanced requirements | | Join Strategy | Minimize joins over la...

(QB_NEW_EN)


[grammar] ~174-~174: There might be a mistake here.
Context: ...rge, high-cardinality columns | | Nested Fields | Use `column_policy = '...

(QB_NEW_EN)


[grammar] ~175-~175: There might be a mistake here.
Context: ...dynamic'` if schema needs flexibility | | Aggregations | Favor columnar tables ...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...for analytical workloads | | Co-location | Consider denormalizati...

(QB_NEW_EN)


[grammar] ~181-~181: There might be a mistake here.
Context: ...esources * CrateDB Docs – Data Modeling * CrateDB Academy – Relational Modeling * ...

(QB_NEW_EN)


[grammar] ~182-~182: There might be a mistake here.
Context: ... * CrateDB Academy – Relational Modeling * Working with Joins in CrateDB * Schema D...

(QB_NEW_EN)


[grammar] ~183-~183: There might be a mistake here.
Context: ...Modeling * Working with Joins in CrateDB * Schema Design Guide ## Summary CrateDB...

(QB_NEW_EN)

docs/start/modelling/geospatial.md

[grammar] ~10-~10: There might be a mistake here.
Context: ... either a coordinate array [lon, lat] or Well-Known Text (WKT) string `'POINT (l...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...n Text (WKT) string 'POINT (lon lat)'. * Must be declared explicitly; dynamic sch...

(QB_NEW_EN)


[grammar] ~20-~20: Ensure spelling is correct
Context: ...mple Let's define a table with country boarders and capital: ```sql CREATE TABLE count...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~102-~102: There might be a mistake here.
Context: ...acking, mapping, location-layered apps). * Unified SQL platform: spatial data can...

(QB_NEW_EN)


[grammar] ~103-~103: There might be a mistake here.
Context: ...ime-series — in the same table or query. * High ingest and query throughput, suit...

(QB_NEW_EN)

docs/start/modelling/index.md

[grammar] ~15-~15: There might be a mistake here.
Context: ...AP database designed store large volumes of data, it needs a few special consider...

(QB_NEW_EN)

docs/start/modelling/json.md

[grammar] ~39-~39: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ...-------------------------------------- | | DYNAMIC | New fields are automat...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...tically added to the schema at runtime | | STRICT | Only explicitly define...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...ed fields are allowed | | IGNORED | Extra fields are store...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...vice']['os'] = 'Android'; ``` :::{note} Dot-notation works for both explicitly a...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ...explicitly and dynamically added fields. ::: ## Querying DYNAMIC OBJECTs To sup...

(QB_NEW_EN)


[grammar] ~198-~198: There might be a mistake here.
Context: ...ption | | ------------------ | -----------------...

(QB_NEW_EN)


[grammar] ~199-~199: There might be a mistake here.
Context: ...-------------------------------------- | | Logs & Traces | Unstructured payl...

(QB_NEW_EN)


[grammar] ~200-~200: There might be a mistake here.
Context: ...ctured payloads with flexible metadata | | Sensor & IoT Data | Variable field sc...

(QB_NEW_EN)


[grammar] ~201-~201: There might be a mistake here.
Context: ...le field schemas, nested measurements | | Product Catalogs | Specs, tags, revi...

(QB_NEW_EN)


[grammar] ~202-~202: There might be a mistake here.
Context: ... tags, reviews in varying formats | | User Profiles | Custom settings, ...

(QB_NEW_EN)


[grammar] ~203-~203: There might be a mistake here.
Context: ... settings, device info, preferences | | Telemetry / Events | Event streams wit...

(QB_NEW_EN)


[grammar] ~208-~208: There might be a mistake here.
Context: ... | | ---------------- | -------------------...

(QB_NEW_EN)


[grammar] ~209-~209: There might be a mistake here.
Context: ...-------------------------------------- | | Schema Evolution | Use DYNAMIC for f...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ... STRICT for control | | Index Management | Avoid over-indexing...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ...d fields | | Nested Depth | Prefer flat structu...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...low nesting for performance | | Column Mixing | Combine structured ...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...h JSON for hybrid models | | Observability | Monitor number of d...

(QB_NEW_EN)


[grammar] ~218-~218: There might be a mistake here.
Context: ...sources * CrateDB Docs – Object Columns * Working with JSON in CrateDB * CrateDB A...

(QB_NEW_EN)


[grammar] ~219-~219: There might be a mistake here.
Context: ...t Columns * Working with JSON in CrateDB * CrateDB Academy – Modeling with JSON * U...

(QB_NEW_EN)


[grammar] ~220-~220: There might be a mistake here.
Context: ...B * CrateDB Academy – Modeling with JSON * Understanding Column Policies ## Summar...

(QB_NEW_EN)

docs/start/modelling/primary-key.md

[grammar] ~23-~23: There might be a mistake here.
Context: ... Auto-generated, always-increasing value * Useful when records are timestamped anyw...

(QB_NEW_EN)


[grammar] ~44-~44: There might be a mistake here.
Context: ...ents or sources Cons * Not ordered * Harder to read/debug * No efficient rang...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ...** * Not ordered * Harder to read/debug * No efficient range queries ### Use UUID...

(QB_NEW_EN)


[style] ~50-~50: Consider using a shorter alternative to avoid wordiness.
Context: ...t that preserves temporal ordering, making them better suited for distributed inserts and rang...

(MADE_IT_JJR)

docs/start/modelling/vector.md

[grammar] ~12-~12: There might be a mistake here.
Context: ...arrays (e.g. 768, 1024, 2048 dimensions) * Supports **HNSW (Hierarchical Navigable ...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...)** indexing for fast approximate search * Optimized for cosine, Euclidean, and dot...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...])` declares a fixed-size vector column. * You can ingest vectors directly or compu...

(QB_NEW_EN)


[grammar] ~72-~72: There might be a mistake here.
Context: ... | | ----------------------- | ------------...

(QB_NEW_EN)


[grammar] ~73-~73: There might be a mistake here.
Context: ...-------------------------------------- | | Semantic Search | Rank documen...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...stead of keywords | | Recommendation Systems | Find similar...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ..., or behaviors | | Image / Audio Retrieval | Store and co...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ... of images/audio | | Fraud Detection | Match behavi...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...a vectors | | Hybrid Search | Combine vect...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...& Scaling * Vector search uses HNSW: state-of-the-art ANN algorithm with log...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...ithm with logarithmic search complexity. * CrateDB parallelizes ANN search across s...

(QB_NEW_EN)


[grammar] ~97-~97: There might be a mistake here.
Context: ...lity must be consistent for each column. ::: ## Best Practices | Area ...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...LangChain**: CrateDB has native drivers and REST interface * Embedding models: ...

(QB_NEW_EN)


[grammar] ~119-~119: There might be a mistake here.
Context: ...esources * CrateDB Docs – Vector Search * Blog: Using CrateDB for Hybrid Search (V...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...B for Hybrid Search (Vector + Full-Text) * CrateDB Academy – Vector Data * [Sample ...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...ll-Text) * CrateDB Academy – Vector Data * [Sample notebooks on GitHub](https://gith...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/start/modelling/fulltext.md

16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/relational.md

34-34: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


112-112: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


120-120: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


128-128: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/primary-key.md

21-21: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


26-26: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


37-37: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


42-42: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


54-54: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


59-59: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


77-77: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


93-93: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


102-102: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


156-156: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


161-161: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (5)
docs/start/modelling/primary-key.md (2)

31-36: Use native UUID type and gen_random_uuid() for v4 UUID primary keys

CrateDB supports a native UUID type and the built-in gen_random_uuid() function returns a UUID typed value, offering better correctness, storage efficiency, and indexing compared to storing UUIDs as TEXT. In contrast, gen_random_text_uuid() returns a TEXT representation of the UUID .

Apply this change:

- TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY
+ UUID DEFAULT gen_random_uuid() PRIMARY KEY

If you’re on a CrateDB version that only provides gen_random_text_uuid(), cast its output to UUID instead:

UUID DEFAULT gen_random_text_uuid()::UUID PRIMARY KEY

Sources:
[1] crate.io/docs/crate/reference/en/latest/sql/functions/uuid.html#gen_random_uuid
[2] crate.io/docs/crate/reference/en/latest/sql/functions/uuid.html#gen_random_text_uuid


111-116: Connection string shorthand is valid; no change required

  • File: docs/start/modelling/primary-key.md, line 111

The DSN "crate://" is a valid shorthand for crate://localhost:4200 with default credentials (cratedb.com). You may keep the existing example as-is or, if you prefer to be explicit, replace it with:

-db = records.Database("crate://")
+db = records.Database("crate://localhost:4200")

Additional connection variants follow the standard URL format crate://[username:password@]host[:port]/[database]?[options] and can illustrate auth or SSL usage:

# With authentication
db = records.Database("crate://crate:yourpassword@localhost:4200")

# With SSL
db = records.Database("crate://crate:yourpassword@localhost:4200/?ssl=true")
``` ([cratedb.com](https://cratedb.com/docs/sqlalchemy-cratedb/overview.html?utm_source=chatgpt.com))

</blockquote></details>
<details>
<summary>docs/start/modelling/timeseries.md (1)</summary><blockquote>

`52-57`: **Remove `MAX_BY()` and confirm vector limits**

The `MAX_BY()` aggregate isnt actually supported in CrateDB. Instead of listing it here, update the docs to use one of the supported patterns forpick the row/value with the max of another column,” for example:

- Use GREATEST()/LEAST() for pairwise comparisons.
- Use `ORDER BY <expr> DESC LIMIT 1`.
- Use `first_value()`/`last_value()` window functions over a properly ordered window.

The maximum vector dimension of **2048** is correct and may be left as-is.

Locations to update:
- `docs/start/modelling/timeseries.md`, lines 5257: remove the `**\`MAX_BY()\`**` bullet and replace with a note about the supported alternatives.
- `docs/start/modelling/timeseries.md`, lines 111115: apply the same change to any repeated mention of `MAX_BY()`.



<details>
<summary>Skipped due to learnings</summary>

Learnt from: amotl
PR: #241
File: docs/integrate/marquez/data_ingestion.py:0-0
Timestamp: 2025-08-11T06:54:45.486Z
Learning: CrateDB version 5.x and later supports the generate_series table function similar to PostgreSQL, accepting both numeric ranges (e.g., generate_series(1, 1000)) and timestamp ranges with interval steps. The function works with PostgreSQL-style type casts like ::TIMESTAMP and ::INTERVAL.


</details>

</blockquote></details>
<details>
<summary>docs/start/modelling/json.md (1)</summary><blockquote>

`121-126`: **Nice use of ANY for tag filtering.**

The tags example correctly uses `ANY(tags)` for scalar membership tests. No action needed.

</blockquote></details>
<details>
<summary>docs/start/modelling/index.md (1)</summary><blockquote>

`1-13`: **ToC looks good and matches the new section layout.**

Both toctrees are well organized with maxdepth: 1 and point to the expected pages. No action needed.

</blockquote></details>

</blockquote></details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +11 to +17
CREATE TABLE documents (
title TEXT,
body TEXT,
INDEX ft_body USING FULLTEXT(body) WITH (analyzer = 'english')
);
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Define ft_title index used in the weighted MATCH example.

The example boosts ft_title, but it isn’t created in the DDL. Add it to the table definition to keep examples consistent.

 CREATE TABLE documents (
   title       TEXT,
   body        TEXT,
+  INDEX ft_title USING FULLTEXT(title) WITH (analyzer = 'english'),
   INDEX ft_body USING FULLTEXT(body) WITH (analyzer = 'english')
 );

Also applies to: 64-69

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In docs/start/modelling/fulltext.md around lines 11 to 16 (and also apply same
change at lines 64 to 69), the DDL creates an ft_body fulltext index but the
examples later boost ft_title in a weighted MATCH even though ft_title is not
defined; add a FULLTEXT index on the title column (e.g., INDEX ft_title USING
FULLTEXT(title) WITH (analyzer = 'english')) to the table definition so the
weighted MATCH example references a real index and keep examples consistent.

Comment on lines +21 to +23
INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make the standalone index snippet executable.

Outside of CREATE TABLE, the statement should be CREATE INDEX ... ON <table>.

-INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
+CREATE INDEX ft_all ON documents USING FULLTEXT (title, body) WITH (analyzer = 'english');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
```
CREATE INDEX ft_all ON documents USING FULLTEXT (title, body) WITH (analyzer = 'english');
🤖 Prompt for AI Agents
In docs/start/modelling/fulltext.md around lines 21 to 22, the standalone index
snippet is invalid SQL because it omits the CREATE INDEX ... ON <table> form;
update the snippet to use a CREATE INDEX statement that specifies the target
table (e.g., CREATE INDEX ft_all ON your_table USING FULLTEXT(title, body) WITH
(analyzer = 'english');), replacing "your_table" with the appropriate table name
used in the surrounding example so the snippet is executable.

Comment on lines +37 to +39
We can populate the table with the coordinate shape of Vienna/Austria:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct description: polygon is for Austria (country), not Vienna.

The inserted shape is the country polygon; Vienna is the capital point.

-We can populate the table with the coordinate shape of Vienna/Austria:
+We can populate the table with the border polygon of Austria and its capital:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
We can populate the table with the coordinate shape of Vienna/Austria:
We can populate the table with the border polygon of Austria and its capital:
🤖 Prompt for AI Agents
In docs/start/modelling/geospatial.md around lines 37 to 38, the example text
incorrectly states the polygon is for Vienna; update the sentence to correctly
state that the inserted `shape` is the country polygon for Austria and that
Vienna is represented as a point (capital), e.g. replace "coordinate shape of
Vienna/Austria" with wording clarifying "country polygon for Austria (Vienna is
a point)". Ensure the wording is concise and unambiguous.

Comment on lines 117 to 184
for attempt in range(max_retries):
select_query = """
SELECT last_value, _seq_no, _primary_term
FROM sequences
WHERE name = :sequence_name;
"""
row = db.query(select_query, sequence_name=sequence_name).first()
new_value = row.last_value + 1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard against missing sequence row before OCC update.

When the sequence name is wrong or not initialized, row can be None, causing an AttributeError.

-    row = db.query(select_query, sequence_name=sequence_name).first()
+    row = db.query(select_query, sequence_name=sequence_name).first()
+    if row is None:
+        raise RuntimeError(f"Sequence '{sequence_name}' not found. Initialize it first.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for attempt in range(max_retries):
select_query = """
SELECT last_value, _seq_no, _primary_term
FROM sequences
WHERE name = :sequence_name;
"""
row = db.query(select_query, sequence_name=sequence_name).first()
new_value = row.last_value + 1
for attempt in range(max_retries):
select_query = """
SELECT last_value, _seq_no, _primary_term
FROM sequences
WHERE name = :sequence_name;
"""
row = db.query(select_query, sequence_name=sequence_name).first()
if row is None:
raise RuntimeError(f"Sequence '{sequence_name}' not found. Initialize it first.")
new_value = row.last_value + 1
🤖 Prompt for AI Agents
In docs/start/modelling/primary-key.md around lines 117 to 125, the code assumes
db.query(...).first() always returns a row; if the sequence name is missing it
can be None causing an AttributeError. Add a guard after fetching row: if row is
None, either raise a clear, descriptive error (e.g., SequenceNotFoundError with
sequence_name) or initialize the sequence row before proceeding (insert a new
row with a sensible starting last_value) and then continue; ensure subsequent
logic uses the validated/created row so the OCC update never dereferences None.

Comment on lines +63 to +67
DATE_BIN('1 hour'::interval, ts, 0) AS period,
AVG(battery['level']) AS avg_battery
FROM devices_readings
GROUP BY device_id, period
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use an explicit timestamp origin with DATE_BIN.

Passing 0 as origin is ambiguous. Prefer an explicit epoch origin.

-         DATE_BIN('1 hour'::interval, ts, 0) AS period,
+         DATE_BIN('1 hour'::interval, ts, TIMESTAMP WITH TIME ZONE '1970-01-01 00:00:00+00') AS period,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DATE_BIN('1 hour'::interval, ts, 0) AS period,
AVG(battery['level']) AS avg_battery
FROM devices_readings
GROUP BY device_id, period
DATE_BIN('1 hour'::interval, ts, TIMESTAMP WITH TIME ZONE '1970-01-01 00:00:00+00') AS period,
AVG(battery['level']) AS avg_battery
FROM devices_readings
GROUP BY device_id, period
🤖 Prompt for AI Agents
In docs/start/modelling/timeseries.md around lines 63 to 66, the DATE_BIN call
uses 0 as the origin which is ambiguous; replace the numeric 0 with an explicit
timestamp origin (for example TIMESTAMP '1970-01-01 00:00:00' or
'epoch'::timestamp) so the call becomes DATE_BIN('1 hour'::interval, ts,
TIMESTAMP '1970-01-01 00:00:00') and ensure the origin literal type matches your
ts column (timestamp vs timestamptz).

@amotl amotl changed the title Data modelling: Add new section (GenAI variant) Data modelling: Add new section (GenAI, unedited) Aug 23, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (12)
docs/start/modelling/primary-key.md (2)

18-20: Fix type mismatch in timestamp PK example.

now() returns a TIMESTAMP, not a BIGINT. Either use a TIMESTAMP PK or cast to epoch explicitly. This will otherwise error at DDL-time.

Apply one of these:

- BIGINT DEFAULT now() PRIMARY KEY
+ TIMESTAMP WITH TIME ZONE DEFAULT now() PRIMARY KEY

or if you explicitly want epoch millis:

- BIGINT DEFAULT now() PRIMARY KEY
+ BIGINT DEFAULT (extract(epoch from now()) * 1000)::BIGINT PRIMARY KEY

124-126: Guard against missing sequence row before OCC update.

db.query(...).first() can return None, leading to AttributeError on row.last_value. Add a null-check and fail fast with a useful error.

-    row = db.query(select_query, sequence_name=sequence_name).first()
-    new_value = row.last_value + 1
+    row = db.query(select_query, sequence_name=sequence_name).first()
+    if row is None:
+        raise RuntimeError(f"Sequence '{sequence_name}' not found. Initialize it first.")
+    new_value = row.last_value + 1

Also applies to: 135-141

docs/start/modelling/geospatial.md (2)

38-39: Clarify description of inserted geometry.

It’s the country polygon of Austria plus the capital point (Vienna). The current text suggests a “coordinate shape of Vienna/Austria.”

-We can populate the table with the coordinate shape of Vienna/Austria:
+We can populate the table with the border polygon of Austria and its capital:

109-109: Fix typos and guidance in Best Practices table.

  • “wwithin” → “within”
  • Don’t suggest MATCH for generic geo filtering; prefer geo functions or geo MATCH depending on need.
-<td>Queries</td><td>Prefer <code>MATCH</code> for indexed filtering; use functions for precise checks</td>
+<td>Queries</td><td>For <code>geo_shape</code> indices use <code>MATCH</code> with <em>intersects/within/disjoint</em>; use scalar functions for precise checks on small sets</td>
-<td>Scale &#x26; performance</td><td>Index shapes, use distance/wwithin filters early</td>
+<td>Scale &#x26; performance</td><td>Index shapes, use distance/within filters early</td>

Geo MATCH and function behavior references. (cratedb.com)

docs/start/modelling/fulltext.md (2)

11-17: Define the ft_title index used later for weighted MATCH.

The examples boost ft_title, but it isn’t created in this DDL. Add a FULLTEXT index on title.

 CREATE TABLE documents (
   title       TEXT,
   body        TEXT,
+  INDEX ft_title USING FULLTEXT(title) WITH (analyzer = 'english'),
   INDEX ft_body USING FULLTEXT(body)  WITH (analyzer = 'english')
 );

21-23: Make the standalone index snippet valid SQL.

Outside CREATE TABLE, it must be CREATE INDEX ... ON <table>.

-INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
+CREATE INDEX ft_all ON documents USING FULLTEXT (title, body) WITH (analyzer = 'english');

See MATCH/fulltext reference for correct index creation semantics. (cratedb.com)

docs/start/modelling/timeseries.md (1)

64-66: Use an explicit timestamp origin in DATE_BIN; avoid 0.

0 is ambiguous and type‑unsafe. Since ts is TIMESTAMP WITH TIME ZONE, pass an explicit timestamptz origin.

-         DATE_BIN('1 hour'::interval, ts, 0) AS period,
+         DATE_BIN(INTERVAL '1 hour', ts, TIMESTAMP WITH TIME ZONE '1970-01-01 00:00:00+00') AS period,
docs/start/modelling/json.md (3)

131-134: Filter arrays of OBJECTs using ANY on projected arrays; current comparison is invalid.

specs['name'] and specs['value'] project arrays; comparing them to scalars yields a type error. Use ANY.

 SELECT *
 FROM products
 WHERE
-  specs['name'] = 'battery' AND specs['value'] = 'AA';
+  'battery' = ANY(specs['name']) AND 'AA' = ANY(specs['value']);

166-172: FULLTEXT cannot be applied to numeric fields. Replace with a TEXT field example.

cpu DOUBLE INDEX USING FULLTEXT is invalid. FULLTEXT applies to TEXT.

 CREATE TABLE metrics (
   id UUID PRIMARY KEY,
-  data OBJECT(DYNAMIC) AS (
-    cpu DOUBLE INDEX USING FULLTEXT,
-    memory DOUBLE
-  )
+  data OBJECT(DYNAMIC) AS (
+    cpu DOUBLE,
+    message TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
+    memory DOUBLE
+  )
 );

175-179: Disable indexing in the column definition, not via bracket notation.

data['some_field'] INDEX OFF isn’t valid DDL. Declare it inside the OBJECT mapping.

-To exclude fields from indexing, set:
-
-```sql
-data['some_field'] INDEX OFF
-```
+To exclude a nested field from indexing, declare it inside the OBJECT mapping:
+
+```sql
+CREATE TABLE metrics (
+  id UUID PRIMARY KEY,
+  data OBJECT(DYNAMIC) AS (
+    some_field TEXT INDEX OFF
+  )
+);
+```
docs/start/modelling/relational.md (2)

81-82: HAVING should reference the aggregate, not the alias.

Using the alias revenue in HAVING is non‑portable.

 GROUP BY customer_id
-HAVING revenue > 1000;
+HAVING SUM(total_amount) > 1000;

92-97: Qualify default indexing behavior and fix full‑text index syntax.

  • Say “most scalar columns are indexed by default,” with type‑specific behavior and exceptions.
  • Use consistent capitalization and em dashes.
  • Correct INDEX using fulltextINDEX USING fulltext.
-* **Primary Keys** – enforced for uniqueness and data distribution
-* **Check -** enforces custom value validation
-* **Indexes** – automatic index for all columns
-* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters
+* **Primary keys** — enforce uniqueness and data distribution
+* **CHECK** — enforce custom value validation
+* **Indexes** — most scalar columns are indexed by default (type‑specific behavior and exceptions apply)
+* **Full‑text indexes** — manually defined; support many tokenizers, analyzers, and filters
@@
-  description TEXT INDEX using fulltext
+  description TEXT INDEX USING fulltext

Also applies to: 100-106

🧹 Nitpick comments (10)
docs/start/modelling/primary-key.md (2)

49-65: Number this subsection and mention UUIDv7’s ordering benefit.

For consistency with sections 1/2/4/5, make this “3. Use UUIDv7…”. Also briefly call out that UUIDv7 is time-ordered and improves range scans compared to v4, which aligns with the rest of the page.

-### Use UUIDv7 for Time-Ordered IDs
+### 3. Use UUIDv7 for Time-Ordered IDs

Optional one-liner after Line 51: “UUIDv7 is time-ordered, enabling efficient range queries and better locality than random UUIDv4.”


146-149: Add jitter to backoff to reduce herd effects.

Exponential backoff is good; adding small random jitter avoids synchronized retries under load.

-    delay = base_delay * (2**attempt)
-    print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f} seconds...")
-    time.sleep(delay)
+    delay = base_delay * (2**attempt)
+    jitter = delay * 0.2
+    sleep_for = delay + (random.random() * jitter)
+    print(f"Attempt {attempt + 1} failed. Retrying in {sleep_for:.2f} seconds...")
+    time.sleep(sleep_for)

Note: import random at the top.

docs/start/modelling/vector.md (2)

28-30: Correct the explanatory bullet to match syntax and usage.

Update wording to reflect FLOAT_VECTOR(n) and how vectors are inserted.

-* `VECTOR(FLOAT[768])` declares a fixed-size vector column.
-* You can ingest vectors directly or compute them externally and store them via SQL
+* `FLOAT_VECTOR(768)` declares a fixed-size vector column (dimension = 768).
+* You can ingest vectors with SQL literals (e.g., `[0.12, ...]`) or bind parameters from your app.

91-99: Double-check HNSW tunables before documenting them.

“Tune ef_search” might not be user-configurable in CrateDB SQL today. Unless there’s a supported setting, drop or rephrase to “tune query parameters where available.”

If there is a supported knob, please add exact syntax and version.
(cratedb.com)

Also applies to: 101-111

docs/start/modelling/geospatial.md (2)

21-21: Typo: “boarders” → “borders”.

Small copy fix.

-Let's define a table with country boarders and capital:
+Let's define a table with country borders and capital:

82-85: Keep MATCH for geo (but be precise) and explain when to use functions.

CrateDB supports MATCH on geo_shape indices for spatial relations (intersects, disjoint, within). Scalar functions like within()/intersects()/distance() are exact but may bypass indexes and be slower on large sets. Please make this explicit.

Proposed tweak:

-Furthermore, it is possible to use the **match** predicate with geospatial data in queries.
-
-Note: More precise relational operations on shapes may bypass indexes and can be slower.
+Use the **MATCH** predicate on `geo_shape` indices for indexed spatial relations (`intersects`/`disjoint`/`within`). 
+Exact scalar functions (`within`, `intersects`, `distance`) provide precision but may bypass indexes and be slower on large result sets.

Official docs for geo MATCH and exact queries. (cratedb.com)

docs/start/modelling/timeseries.md (2)

8-11: Fix soft hyphen in “cardinality” and tighten wording.

There’s a stray soft hyphen in “cardin­ality” and “a variety of” is wordy.

-* Handles **high cardin­ality** and **a variety of data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.
+* Handles **high cardinality** and **varied data types**, including nested JSON, geospatial, and vector data—all queryable via the same SQL statements.

75-76: Heading says “interpolation” but the example only detects gaps.

Either rename the example or add an interpolation step (e.g., using LAG/LEAD or a forward‑fill). Renaming is simplest:

-**Example**: gap detection interpolation:
+**Example**: gap detection (no interpolation yet):

If you prefer to include interpolation, I can propose a portable forward‑fill snippet.

docs/start/modelling/relational.md (1)

4-4: Use British English “modelling” for consistency across the section.

Aligns with the new “Data modelling” section.

-CrateDB is a **distributed SQL database** that offers rich **relational data modeling** ...
+CrateDB is a **distributed SQL database** that offers rich **relational data modelling** ...
docs/start/modelling/index.md (1)

39-50: Normalize naming: “Time series” (with a space) to match the target page.

The topic page is titled “Time series data”; adjust the card title for consistency.

-::::{grid-item-card} Timeseries data
+::::{grid-item-card} Time series data
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b71068e and d34dfb5.

📒 Files selected for processing (10)
  • docs/index.md (4 hunks)
  • docs/start/index.md (7 hunks)
  • docs/start/modelling/fulltext.md (1 hunks)
  • docs/start/modelling/geospatial.md (1 hunks)
  • docs/start/modelling/index.md (1 hunks)
  • docs/start/modelling/json.md (1 hunks)
  • docs/start/modelling/primary-key.md (1 hunks)
  • docs/start/modelling/relational.md (1 hunks)
  • docs/start/modelling/timeseries.md (1 hunks)
  • docs/start/modelling/vector.md (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-23T15:15:18.693Z
Learnt from: amotl
PR: crate/cratedb-guide#235
File: docs/start/query/ai-integration.md:265-273
Timestamp: 2025-08-23T15:15:18.693Z
Learning: In CrateDB documentation, avoid using "data lake" terminology when describing CrateDB's role, as it can create misconceptions. Prefer "real-time analytics store" or similar terms that better reflect CrateDB's actual capabilities as a real-time analytics database.

Applied to files:

  • docs/start/modelling/json.md
  • docs/start/modelling/relational.md
📚 Learning: 2025-05-18T13:25:11.786Z
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:6-8
Timestamp: 2025-05-18T13:25:11.786Z
Learning: In the CrateDB Guide documentation, particularly for MCP-related pages, the author prefers a 1-column grid layout for multiple cards to stack them vertically rather than side by side, as this is an intentional design choice.

Applied to files:

  • docs/index.md
  • docs/start/index.md
  • docs/start/modelling/index.md
📚 Learning: 2025-08-09T16:19:43.966Z
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.

Applied to files:

  • docs/index.md
📚 Learning: 2025-08-23T20:20:42.235Z
Learnt from: amotl
PR: crate/cratedb-guide#235
File: docs/start/going-further.md:20-27
Timestamp: 2025-08-23T20:20:42.235Z
Learning: In the CrateDB Guide documentation, use British English spelling "data modelling" rather than American English "data modeling".

Applied to files:

  • docs/start/modelling/index.md
📚 Learning: 2025-04-15T22:20:30.716Z
Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-04-15T22:20:30.716Z
Learning: UUIDv7 (UUID version 7) supports efficient range queries because it generates time-ordered sequential values, unlike earlier random UUID versions. This makes UUIDv7 particularly useful for primary keys in distributed database systems like CrateDB where both uniqueness and range query performance are important.

Applied to files:

  • docs/start/modelling/primary-key.md
🪛 LanguageTool
docs/start/modelling/json.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-json)= # JSON data CrateDB combines the flexibil...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...-------------------------------------- | | DYNAMIC | New fields are automat...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...tically added to the schema at runtime | | STRICT | Only explicitly define...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...ed fields are allowed | | IGNORED | Extra fields are store...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ...vice']['os'] = 'Android'; ``` :::{note} Dot-notation works for both explicitly a...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...explicitly and dynamically added fields. ::: ## Querying DYNAMIC OBJECTs To sup...

(QB_NEW_EN)


[grammar] ~199-~199: There might be a mistake here.
Context: ...ption | | ------------------ | -----------------...

(QB_NEW_EN)


[grammar] ~200-~200: There might be a mistake here.
Context: ...-------------------------------------- | | Logs & Traces | Unstructured payl...

(QB_NEW_EN)


[grammar] ~201-~201: There might be a mistake here.
Context: ...ctured payloads with flexible metadata | | Sensor & IoT Data | Variable field sc...

(QB_NEW_EN)


[grammar] ~202-~202: There might be a mistake here.
Context: ...le field schemas, nested measurements | | Product Catalogs | Specs, tags, revi...

(QB_NEW_EN)


[grammar] ~203-~203: There might be a mistake here.
Context: ... tags, reviews in varying formats | | User Profiles | Custom settings, ...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ... settings, device info, preferences | | Telemetry / Events | Event streams wit...

(QB_NEW_EN)


[grammar] ~209-~209: There might be a mistake here.
Context: ... | | ---------------- | -------------------...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ...-------------------------------------- | | Schema Evolution | Use DYNAMIC for f...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ... STRICT for control | | Index Management | Avoid over-indexing...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...d fields | | Nested Depth | Prefer flat structu...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...low nesting for performance | | Column Mixing | Combine structured ...

(QB_NEW_EN)


[grammar] ~214-~214: There might be a mistake here.
Context: ...h JSON for hybrid models | | Observability | Monitor number of d...

(QB_NEW_EN)


[grammar] ~219-~219: There might be a mistake here.
Context: ...sources * CrateDB Docs – Object Columns * Working with JSON in CrateDB * CrateDB A...

(QB_NEW_EN)


[grammar] ~220-~220: There might be a mistake here.
Context: ...t Columns * Working with JSON in CrateDB * CrateDB Academy – Modeling with JSON * U...

(QB_NEW_EN)


[grammar] ~221-~221: There might be a mistake here.
Context: ...B * CrateDB Academy – Modeling with JSON * Understanding Column Policies ## Summar...

(QB_NEW_EN)

docs/index.md

[grammar] ~12-~12: There might be a mistake here.
Context: ...rateDB Cloud in practice. ::::{grid} 4 :padding: 0 :::{grid-item-card} Gettin...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ... 0 :::{grid-item-card} Getting Started 🔗 getting-started :link-type: ref :...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...} Getting Started 🔗 getting-started :link-type: ref :link-alt: Getting start...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...d 🔗 getting-started :link-type: ref :link-alt: Getting started with CrateDB ...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ... :link-alt: Getting started with CrateDB :padding: 1 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...Getting started with CrateDB :padding: 1 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~34-~34: There might be a mistake here.
Context: ...link-alt: Installing CrateDB :padding: 1 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~48-~48: There might be a mistake here.
Context: ...-alt: CrateDB Administration :padding: 1 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...: CrateDB Performance Guides :padding: 1 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)

docs/start/modelling/fulltext.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-fulltext)= # Full-text data CrateDB features **nativ...

(QB_NEW_EN)


[grammar] ~126-~126: There might be a mistake here.
Context: ...and search even on high volume datasets. * Integrated SQL interface: eliminates n...

(QB_NEW_EN)


[grammar] ~134-~134: There might be a mistake here.
Context: ...cient. | | Language Support | Pick built-in analyzer matching your content...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ... index creation, analyzers, MATCH usage. * FTS Options & Advanced Features: fuzzi...

(QB_NEW_EN)


[grammar] ~146-~146: There might be a mistake here.
Context: ...l datasets (e.g. Chicago neighborhoods). * CrateDB Community Insights: real‑world...

(QB_NEW_EN)

docs/start/modelling/timeseries.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-timeseries)= # Time series data CrateDB employs a rela...

(QB_NEW_EN)


[style] ~9-~9: Consider using a more concise synonym.
Context: .... * Handles high cardin­ality and a variety of data types, including nested JSON, ge...

(A_VARIETY_OF)


[grammar] ~9-~9: There might be a mistake here.
Context: ...l queryable via the same SQL statements. * PostgreSQL wire‑protocol compatible, s...

(QB_NEW_EN)


[uncategorized] ~53-~53: Do not mix variants of the same word (‘analyse’ and ‘analyze’) within a single text.
Context: ...L functions tailor‑made for time‑series analyses: * **`DATE_BIN(interval, timestamp, or...

(EN_WORD_COHERENCY)


[grammar] ~55-~55: There might be a mistake here.
Context: ...r bucketed aggregations (down‑sampling). * Window functions like LAG() and `LEA...

(QB_NEW_EN)


[grammar] ~114-~114: There might be a mistake here.
Context: ...Geospatial* (GEO_POINT, GEO_SHAPE) * Vectors (up to 2048 dims via HNSW inde...

(QB_NEW_EN)


[grammar] ~115-~115: There might be a mistake here.
Context: ...rs** (up to 2048 dims via HNSW indexing) * BLOBs for binary data (e.g. images, lo...

(QB_NEW_EN)


[style] ~118-~118: ‘joined together’ might be wordy. Consider a shorter alternative.
Context: ... are supported within the same table or joined together. ## Storage Optimization * **Partitio...

(EN_WORDINESS_PREMIUM_JOINED_TOGETHER)


[grammar] ~122-~122: There might be a mistake here.
Context: ...y/monthly) and sharded across a cluster. * Supports long‑term retention with perfor...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...ention with performant historic storage. * Columnar layout reduces storage footprin...

(QB_NEW_EN)


[grammar] ~137-~137: There might be a mistake here.
Context: ...rations: * Table creation and ingestion * Average per station * Using MAX_BY() t...

(QB_NEW_EN)


[grammar] ~138-~138: There might be a mistake here.
Context: ...tion and ingestion * Average per station * Using MAX_BY() to find highest tempera...

(QB_NEW_EN)


[grammar] ~139-~139: There might be a mistake here.
Context: ...to find highest temperature timestamps * Down-sampling usingDATE_BIN` into 4‑we...

(QB_NEW_EN)

docs/start/index.md

[grammar] ~21-~21: There might be a mistake here.
Context: ...rst steps with CrateDB :columns: 6 3 3 3 :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~35-~35: There might be a mistake here.
Context: ...lt: Connect to CrateDB :columns: 6 3 3 3 :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~49-~49: There might be a mistake here.
Context: ...lt: Query Capabilities :columns: 6 3 3 3 :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...nk-alt: Ingesting Data :columns: 6 3 3 3 :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...t: Sample Applications :columns: 6 3 3 3 :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~99-~99: There might be a mistake here.
Context: ...ink-alt: Going Further :columns: 6 3 3 3 :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)

docs/start/modelling/geospatial.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-geospatial)= # Geospatial data CrateDB supports **real...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ... either a coordinate array [lon, lat] or Well-Known Text (WKT) string `'POINT (l...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...n Text (WKT) string 'POINT (lon lat)'. * Must be declared explicitly; dynamic sch...

(QB_NEW_EN)


[grammar] ~21-~21: Ensure spelling is correct
Context: ...mple Let's define a table with country boarders and capital: ```sql CREATE TABLE count...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~103-~103: There might be a mistake here.
Context: ...acking, mapping, location-layered apps). * Unified SQL platform: spatial data can...

(QB_NEW_EN)


[grammar] ~104-~104: There might be a mistake here.
Context: ...ime-series — in the same table or query. * High ingest and query throughput, suit...

(QB_NEW_EN)

docs/start/modelling/index.md

[grammar] ~1-~1: There might be a mistake here.
Context: (modelling)= (data-modelling)= # Data modelling :::{...

(QB_NEW_EN)


[grammar] ~2-~2: There might be a mistake here.
Context: (modelling)= (data-modelling)= # Data modelling :::{div} sd-text-muted C...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...ent data types. ::: :::::{grid} 2 3 3 3 :padding: 0 :class-container: installati...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...es. ::: :::::{grid} 2 3 3 3 :padding: 0 :class-container: installation-grid :::...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...id ::::{grid-item-card} Relational data 🔗 model-relational :link-type: ref ...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ... Relational data 🔗 model-relational :link-type: ref :link-alt: Relational da...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ... 🔗 model-relational :link-type: ref :link-alt: Relational data :padding: 3 :...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...ink-type: ref :link-alt: Relational data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...f :link-alt: Relational data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...nal data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}table-list...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...:class-title: sd-fs-6 {fas}table-list :::: ::::{grid-item-card} JSON data :li...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...st` :::: ::::{grid-item-card} JSON data 🔗 model-json :link-type: ref :link-...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...d-item-card} JSON data 🔗 model-json :link-type: ref :link-alt: JSON data :pa...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...N data 🔗 model-json :link-type: ref :link-alt: JSON data :padding: 3 :text-a...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...son :link-type: ref :link-alt: JSON data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...pe: ref :link-alt: JSON data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...SON data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}file-lines...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ...:class-title: sd-fs-6 {fas}file-lines :::: ::::{grid-item-card} Timeseries da...

(QB_NEW_EN)


[grammar] ~39-~39: There might be a mistake here.
Context: ...:: ::::{grid-item-card} Timeseries data 🔗 model-timeseries :link-type: ref ...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ... Timeseries data 🔗 model-timeseries :link-type: ref :link-alt: Timeseries da...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ... 🔗 model-timeseries :link-type: ref :link-alt: Timeseries data :padding: 3 :...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...ink-type: ref :link-alt: Timeseries data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...f :link-alt: Timeseries data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~44-~44: There might be a mistake here.
Context: ...ies data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~46-~46: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}timeline :...

(QB_NEW_EN)


[grammar] ~49-~49: There might be a mistake here.
Context: ...1 :class-title: sd-fs-6 {fas}timeline :::: ::::{grid-item-card} Geospatial da...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...:: ::::{grid-item-card} Geospatial data 🔗 model-geospatial :link-type: ref ...

(QB_NEW_EN)


[grammar] ~53-~53: There might be a mistake here.
Context: ... Geospatial data 🔗 model-geospatial :link-type: ref :link-alt: Geospatial da...

(QB_NEW_EN)


[grammar] ~54-~54: There might be a mistake here.
Context: ... 🔗 model-geospatial :link-type: ref :link-alt: Geospatial data :padding: 3 :...

(QB_NEW_EN)


[grammar] ~55-~55: There might be a mistake here.
Context: ...ink-type: ref :link-alt: Geospatial data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~56-~56: There might be a mistake here.
Context: ...f :link-alt: Geospatial data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~57-~57: There might be a mistake here.
Context: ...ial data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~58-~58: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~59-~59: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}globe ::::...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...fs-1 :class-title: sd-fs-6 {fas}globe :::: ::::{grid-item-card} Fulltext data...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...:::: ::::{grid-item-card} Fulltext data 🔗 model-fulltext :link-type: ref :l...

(QB_NEW_EN)


[grammar] ~66-~66: There might be a mistake here.
Context: ...ard} Fulltext data 🔗 model-fulltext :link-type: ref :link-alt: Fulltext data...

(QB_NEW_EN)


[grammar] ~67-~67: There might be a mistake here.
Context: ...ta 🔗 model-fulltext :link-type: ref :link-alt: Fulltext data :padding: 3 :te...

(QB_NEW_EN)


[grammar] ~68-~68: There might be a mistake here.
Context: ...:link-type: ref :link-alt: Fulltext data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ...ref :link-alt: Fulltext data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...ext data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~72-~72: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}font :::: ...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...-fs-1 :class-title: sd-fs-6 {fas}font :::: ::::{grid-item-card} Vector data :...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...` :::: ::::{grid-item-card} Vector data 🔗 model-vector :link-type: ref :lin...

(QB_NEW_EN)


[grammar] ~79-~79: There might be a mistake here.
Context: ...em-card} Vector data 🔗 model-vector :link-type: ref :link-alt: Vector data :...

(QB_NEW_EN)


[grammar] ~80-~80: There might be a mistake here.
Context: ...data 🔗 model-vector :link-type: ref :link-alt: Vector data :padding: 3 :text...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...r :link-type: ref :link-alt: Vector data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...: ref :link-alt: Vector data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...tor data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}lightbulb ...

(QB_NEW_EN)


[grammar] ~88-~88: There might be a mistake here.
Context: ... :class-title: sd-fs-6 {fas}lightbulb :::: ::::: ```{toctree} :maxdepth: 1 ...

(QB_NEW_EN)


[grammar] ~106-~106: There might be a mistake here.
Context: ...or ``` :::{rubric} Implementation notes ::: Because CrateDB is a distributed an...

(QB_NEW_EN)


[grammar] ~109-~109: There might be a mistake here.
Context: ...ytical database (OLAP) designed to store large volumes of data, users need to con...

(QB_NEW_EN)


[grammar] ~110-~110: Ensure spelling is correct
Context: ... to consider certain details compared to traditonal RDBMS. :::{card} Primary key strategi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/start/modelling/primary-key.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-primary-key)= # Primary key strategies CrateDB is built...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ... Auto-generated, always-increasing value * Useful when records are timestamped anyw...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ...ents or sources Cons * Not ordered * Harder to read/debug * No efficient rang...

(QB_NEW_EN)


[grammar] ~46-~46: There might be a mistake here.
Context: ...** * Not ordered * Harder to read/debug * No efficient range queries ### Use UUID...

(QB_NEW_EN)


[style] ~51-~51: Consider using a shorter alternative to avoid wordiness.
Context: ...t that preserves temporal ordering, making them better suited for distributed inserts and rang...

(MADE_IT_JJR)

docs/start/modelling/relational.md

[grammar] ~50-~50: There might be a mistake here.
Context: ...arity: use normalized tables with joins. * For performance in high-ingest or read-o...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ... created_at TIMESTAMP ); ``` :::{note} CrateDB lets you query nested fields...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ... items['name'], items['price'], etc. ::: ## Aggregations & Grouping Use fam...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...ced for uniqueness and data distribution * Check - enforces custom value validati...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...eck -** enforces custom value validation * Indexes – automatic index for all colu...

(QB_NEW_EN)


[grammar] ~94-~94: There might be a mistake here.
Context: ...exes** – automatic index for all columns * Full-text indexes - manually defined, ...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...n | | -------------------- | ---------------...

(QB_NEW_EN)


[grammar] ~150-~150: There might be a mistake here.
Context: ...-------------------------------------- | | Customer & Orders | Classic normali...

(QB_NEW_EN)


[grammar] ~151-~151: There might be a mistake here.
Context: ...rmalized setup with joins and filters | | Inventory Management | Products, stock...

(QB_NEW_EN)


[grammar] ~152-~152: There might be a mistake here.
Context: ...stock levels, locations | | Financial Systems | Transactions, b...

(QB_NEW_EN)


[grammar] ~153-~153: There might be a mistake here.
Context: ...ns, balances, audit logs | | User Profiles | Users, preferen...

(QB_NEW_EN)


[grammar] ~154-~154: There might be a mistake here.
Context: ...ferences, activity logs | | Multi-tenant Systems | Use schemas or ...

(QB_NEW_EN)


[grammar] ~171-~171: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~172-~172: There might be a mistake here.
Context: ...-------------------------------------- | | Keys & IDs | Use UUIDs or consisten...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ...t IDs for primary keys | | Sharding | Let CrateDB auto-shard...

(QB_NEW_EN)


[grammar] ~174-~174: There might be a mistake here.
Context: ... unless you have advanced requirements | | Join Strategy | Minimize joins over la...

(QB_NEW_EN)


[grammar] ~175-~175: There might be a mistake here.
Context: ...rge, high-cardinality columns | | Nested Fields | Use `column_policy = '...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...dynamic'` if schema needs flexibility | | Aggregations | Favor columnar tables ...

(QB_NEW_EN)


[grammar] ~177-~177: There might be a mistake here.
Context: ...for analytical workloads | | Co-location | Consider denormalizati...

(QB_NEW_EN)


[grammar] ~182-~182: There might be a mistake here.
Context: ...esources * CrateDB Docs – Data Modeling * CrateDB Academy – Relational Modeling * ...

(QB_NEW_EN)


[grammar] ~183-~183: There might be a mistake here.
Context: ... * CrateDB Academy – Relational Modeling * Working with Joins in CrateDB * Schema D...

(QB_NEW_EN)


[grammar] ~184-~184: There might be a mistake here.
Context: ...Modeling * Working with Joins in CrateDB * Schema Design Guide ## Summary CrateDB...

(QB_NEW_EN)

docs/start/modelling/vector.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-vector)= # Vector data CrateDB natively supports *...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...arrays (e.g. 768, 1024, 2048 dimensions) * Supports **HNSW (Hierarchical Navigable ...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...)** indexing for fast approximate search * Optimized for cosine, Euclidean, and dot...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...])` declares a fixed-size vector column. * You can ingest vectors directly or compu...

(QB_NEW_EN)


[grammar] ~73-~73: There might be a mistake here.
Context: ... | | ----------------------- | ------------...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...-------------------------------------- | | Semantic Search | Rank documen...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...stead of keywords | | Recommendation Systems | Find similar...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ..., or behaviors | | Image / Audio Retrieval | Store and co...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ... of images/audio | | Fraud Detection | Match behavi...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...a vectors | | Hybrid Search | Combine vect...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...& Scaling * Vector search uses HNSW: state-of-the-art ANN algorithm with log...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...ithm with logarithmic search complexity. * CrateDB parallelizes ANN search across s...

(QB_NEW_EN)


[grammar] ~98-~98: There might be a mistake here.
Context: ...lity must be consistent for each column. ::: ## Best Practices | Area ...

(QB_NEW_EN)


[grammar] ~114-~114: There might be a mistake here.
Context: ...LangChain**: CrateDB has native drivers and REST interface * Embedding models: ...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...esources * CrateDB Docs – Vector Search * Blog: Using CrateDB for Hybrid Search (V...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...B for Hybrid Search (Vector + Full-Text) * CrateDB Academy – Vector Data * [Sample ...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...ll-Text) * CrateDB Academy – Vector Data * [Sample notebooks on GitHub](https://gith...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/start/modelling/relational.md

35-35: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


113-113: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


121-121: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


129-129: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (4)
docs/index.md (1)

12-12: LGTM: grid simplification to 4 columns.

The top grid now uses a single “4” layout; cards look consistent with the new :class-title: sizing.

docs/start/modelling/vector.md (1)

13-16: Adjust “similarity metrics” claim.

Current CrateDB docs describe Euclidean-based scoring for vector search via Lucene HNSW; there’s no documented cosine/dot‑product selector. Please revise to avoid overstating supported metrics.

Suggested reword:

-* Optimized for cosine, Euclidean, and dot-product similarity
+* Uses Lucene’s HNSW ANN with Euclidean-distance–based scoring (see VECTOR_SIMILARITY)

Docs and blog show Euclidean scoring with KNN_MATCH/VECTOR_SIMILARITY. If cosine/dot-product were introduced recently, add the version and DDL/usage details. (cratedb.com)

docs/start/modelling/timeseries.md (1)

55-58: Replace incorrect MAX_BY with CrateDB’s supported ARG_MAX/ARG_MIN

The docs currently reference a non-existent MAX_BY() function. CrateDB provides ARG_MAX(arg, value) and ARG_MIN(arg, value) instead, which return the argument corresponding to the maximum or minimum of another column .

Please update the snippet in docs/start/modelling/timeseries.md (lines 55–58) as follows:

  • Replace the third bullet to use ARG_MAX (and mention ARG_MIN if appropriate).
  • Optionally, note the minimum CrateDB version these functions require (confirm if any).

Proposed diff:

- * **`MAX_BY()`** returns the value from one column matching the min/max value of another column in a group.
+ * **`ARG_MAX(arg, value)`** returns the value of `arg` corresponding to the maximum `value` in a group.
+ * **`ARG_MIN(arg, value)`** returns the value of `arg` corresponding to the minimum `value` in a group.

Likely an incorrect or invalid review comment.

docs/start/index.md (1)

119-119: Nice addition to the toctree; improves discoverability of the new section.

The new Data modelling hub is now linked from Getting Started.

Comment on lines +24 to +26
* `gen_random_text_uuid()`, `now()` or `current_timestamp()` recommended for primary keys in distributed environments
* Default **replication**, **sharding**, and **partitioning** options are built-in for scale

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t recommend timestamps as primary keys.

Suggest removing now()/current_timestamp() as PK recommendations. Time‑based keys concentrate writes, risk collisions, and hinder distribution. Prefer UUIDs or application‑provided, uniformly distributed IDs.

-* `gen_random_text_uuid()`, `now()` or `current_timestamp()` recommended for primary keys in distributed environments
+* Prefer `gen_random_text_uuid()` (or application‑provided uniformly distributed IDs) for primary keys in distributed environments
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `gen_random_text_uuid()`, `now()` or `current_timestamp()` recommended for primary keys in distributed environments
* Default **replication**, **sharding**, and **partitioning** options are built-in for scale
* Prefer `gen_random_text_uuid()` (or application-provided uniformly distributed IDs) for primary keys in distributed environments
* Default **replication**, **sharding**, and **partitioning** options are built-in for scale
🤖 Prompt for AI Agents
In docs/start/modelling/relational.md around lines 24 to 26, remove
now()/current_timestamp() from the primary key recommendation and instead
recommend only gen_random_text_uuid() or application-provided uniformly
distributed IDs; update the bullet to drop time-based PK functions and
explicitly state “prefer UUIDs or uniformly distributed IDs” in place of
timestamps.

Comment on lines +112 to +119
You can also store:

* **Geospatial** (`GEO_POINT`, `GEO_SHAPE`)
* **Vectors** (up to 2048 dims via HNSW indexing)
* **BLOBs** for binary data (e.g. images, logs)

All types are supported within the same table or joined together.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct statement about BLOBs; blob data isn’t stored in regular tables or joinable via SQL.

CrateDB’s BLOBs live in dedicated blob tables with a separate API. They aren’t stored “within the same table” nor joinable via SQL. Recommend clarifying to avoid misleading readers.

-You can also store:
+You can also store:
@@
-* **BLOBs** for binary data (e.g. images, logs)
+* **BLOBs** in dedicated blob tables for large binary objects (e.g., images, logs). Store content addresses (digests/URLs) in relational tables when you need to reference blobs.
@@
-All types are supported within the same table or joined together.
+All listed SQL types can co‑exist in one table; blob data resides in separate blob tables and is referenced from relational rows as needed.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
You can also store:
* **Geospatial** (`GEO_POINT`, `GEO_SHAPE`)
* **Vectors** (up to 2048 dims via HNSW indexing)
* **BLOBs** for binary data (e.g. images, logs)
All types are supported within the same table or joined together.
You can also store:
* **Geospatial** (`GEO_POINT`, `GEO_SHAPE`)
* **Vectors** (up to 2048 dims via HNSW indexing)
* **BLOBs** in dedicated blob tables for large binary objects (e.g., images, logs). Store content addresses (digests/URLs) in relational tables when you need to reference blobs.
All listed SQL types can co-exist in one table; blob data resides in separate blob tables and is referenced from relational rows as needed.
🧰 Tools
🪛 LanguageTool

[grammar] ~114-~114: There might be a mistake here.
Context: ...Geospatial* (GEO_POINT, GEO_SHAPE) * Vectors (up to 2048 dims via HNSW inde...

(QB_NEW_EN)


[grammar] ~115-~115: There might be a mistake here.
Context: ...rs** (up to 2048 dims via HNSW indexing) * BLOBs for binary data (e.g. images, lo...

(QB_NEW_EN)


[style] ~118-~118: ‘joined together’ might be wordy. Consider a shorter alternative.
Context: ... are supported within the same table or joined together. ## Storage Optimization * **Partitio...

(EN_WORDINESS_PREMIUM_JOINED_TOGETHER)

🤖 Prompt for AI Agents
In docs/start/modelling/timeseries.md around lines 112 to 119, the statement
incorrectly implies BLOBs are stored in regular tables and can be joined via
SQL; update the copy to state that BLOBs are stored in dedicated blob tables and
accessed via the blob API (not in regular tables and not joinable via SQL),
while keeping that geospatial and vector types can be stored in the same table;
replace the misleading sentence with a clear distinction between the three cases
and add a short note about the separate blob API.

Comment on lines +19 to +26
```sql
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(FLOAT[768])
);
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use the correct CrateDB vector type syntax.

CrateDB’s type is FLOAT_VECTOR(n), not VECTOR(FLOAT[n]). Update the DDL accordingly.

 CREATE TABLE documents (
   id UUID PRIMARY KEY,
   title TEXT,
   content TEXT,
-  embedding VECTOR(FLOAT[768])
+  embedding FLOAT_VECTOR(768)
 );

Reference docs show FLOAT_VECTOR(n) with example DDL. (cratedb.com)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```sql
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(FLOAT[768])
);
```
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT,
content TEXT,
embedding FLOAT_VECTOR(768)
);
🤖 Prompt for AI Agents
In docs/start/modelling/vector.md around lines 19 to 26, the CREATE TABLE DDL
uses the wrong CrateDB vector type syntax (`VECTOR(FLOAT[768])`); change the
column type to CrateDB's correct form `FLOAT_VECTOR(768)` in the DDL so the
example matches CrateDB docs and will parse correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (15)
docs/start/modelling/primary-key.md (2)

18-20: Fix type mismatch: now() returns TIMESTAMP, not BIGINT.

As written, this snippet won’t parse because now() is a TIMESTAMP expression. Either make the PK a TIMESTAMP, or convert now() to an epoch BIGINT.

Option A — use TIMESTAMP PK:

-```sql
-BIGINT DEFAULT now() PRIMARY KEY
-```
+```sql
+TIMESTAMP WITH TIME ZONE DEFAULT now() PRIMARY KEY
+```

Option B — keep BIGINT PK using epoch millis:

-```sql
-BIGINT DEFAULT now() PRIMARY KEY
-```
+```sql
+BIGINT DEFAULT (extract(epoch FROM now()) * 1000)::BIGINT PRIMARY KEY
+```

118-126: Guard against missing sequence row before OCC update.

db.query(...).first() can return None. Accessing row.last_value would raise an AttributeError. Fail fast with a clear error.

     row = db.query(select_query, sequence_name=sequence_name).first()
-    new_value = row.last_value + 1
+    if row is None:
+        raise RuntimeError(f"Sequence '{sequence_name}' not found. Initialize it first.")
+    new_value = row.last_value + 1

Also applies to: 135-145

docs/start/modelling/fulltext.md (2)

12-16: Define ft_title since it’s boosted later.

The weighted MATCH example references ft_title but the table only defines ft_body. Add ft_title.

 CREATE TABLE documents (
   title       TEXT,
   body        TEXT,
+  INDEX ft_title USING FULLTEXT (title) WITH (analyzer = 'english'),
   INDEX ft_body USING FULLTEXT (body) WITH (analyzer = 'english')
 );

22-23: Make the composite index snippet executable.

Outside CREATE TABLE you must use CREATE INDEX ... ON <table>.

-INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');
+CREATE INDEX ft_all ON documents USING FULLTEXT (title, body) WITH (analyzer = 'english');
docs/start/modelling/geospatial.md (3)

38-39: Clarify description: polygon is for Austria (country), not Vienna.

The inserted shape is the Austria border polygon; Vienna is the capital point.

-We can populate the table with the coordinate shape of Vienna/Austria:
+We can populate the table with the border polygon of Austria and its capital:

82-84: Remove MATCH reference; use geo functions for spatial filtering.

MATCH is for full‑text search. For geo, rely on within(), intersects(), and distance() which leverage geo indexes.

-Furthermore, it is possible to use the **match** predicate with geospatial data in queries.
-
-Note: More precise relational operations on shapes may bypass indexes and can be slower.
+Use geospatial functions such as `within()`, `intersects()`, and `distance()`; they leverage the geospatial indexes.
+Note: Highly precise relational operations on complex shapes may bypass indexes and be slower.

109-109: Fix typos and guidance in Best Practices.

Also avoid recommending MATCH for geo filtering.

-<td>Queries</td><td>Prefer <code>MATCH</code> for indexed filtering; use functions for precise checks</td>
+<td>Queries</td><td>Use <code>within()</code>, <code>intersects()</code>, and <code>distance()</code> for indexed spatial filtering</td>
-<td>Scale &#x26; performance</td><td>Index shapes, use distance/wwithin filters early</td>
+<td>Scale &#x26; performance</td><td>Index shapes, use distance/within filters early</td>
docs/start/modelling/json.md (3)

131-134: Filter arrays of OBJECTs with ANY on projected arrays.

specs['name'] and specs['value'] are arrays; comparing arrays to scalars is invalid. Use ANY.

 SELECT *
 FROM products
 WHERE
-  specs['name'] = 'battery' AND specs['value'] = 'AA';
+  'battery' = ANY(specs['name']) AND 'AA' = ANY(specs['value']);

166-172: FULLTEXT indexes apply to TEXT fields, not numeric.

cpu DOUBLE INDEX USING FULLTEXT is invalid. Use TEXT for full‑text, keep numeric fields numeric.

 CREATE TABLE metrics (
   id UUID PRIMARY KEY,
   data OBJECT(DYNAMIC) AS (
-    cpu DOUBLE INDEX USING FULLTEXT,
-    memory DOUBLE
+    cpu DOUBLE,
+    message TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
+    memory DOUBLE
   )
 );

175-179: Disable indexing inside the OBJECT mapping, not via bracket notation.

data['some_field'] INDEX OFF isn’t valid DDL. Show a correct declaration.

-To exclude fields from indexing, set:
-
-```sql
-data['some_field'] INDEX OFF
-```
+To exclude a nested field from indexing, declare it with `INDEX OFF` inside the object mapping:
+
+```sql
+CREATE TABLE metrics (
+  id UUID PRIMARY KEY,
+  data OBJECT(DYNAMIC) AS (
+    some_field TEXT INDEX OFF
+  )
+);
+```
docs/start/modelling/relational.md (3)

23-26: Don’t recommend timestamps as primary keys. Prefer UUIDs.

Time-based keys hotspot writes and hurt distribution.

-* `gen_random_text_uuid()`, `now()` or `current_timestamp()` recommended for primary keys in distributed environments
+* Prefer `gen_random_text_uuid()` (or application-provided uniformly distributed IDs) for primary keys in distributed environments

73-82: HAVING should reference an aggregate, not the alias (dialect-dependent).

Use the aggregate expression for portability.

 SELECT customer_id, COUNT(*) AS num_orders, SUM(total_amount) AS revenue
 FROM orders
 GROUP BY customer_id
-HAVING revenue > 1000;
+HAVING SUM(total_amount) > 1000;

88-106: Tighten constraint/index wording and fix full‑text index syntax.

  • Use consistent em dashes.
  • Qualify default indexing behavior (not “automatic index for all columns”).
  • Correct INDEX using fulltextINDEX USING fulltext.
-* **Primary Keys** – enforced for uniqueness and data distribution
-* **Check -** enforces custom value validation
-* **Indexes** – automatic index for all columns
-* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters
+* **Primary keys** — enforce uniqueness and data distribution
+* **CHECK** — enforce custom value validation
+* **Indexes** — most scalar columns are indexed by default (type-specific behaviors and exceptions apply)
+* **Full‑text indexes** — manually defined; support many tokenizers, analyzers, and filters
@@
-```sql
+```sql
 CREATE TABLE products (
   id TEXT PRIMARY KEY,
   name TEXT,
   price DOUBLE CHECK (price >= 0),
   tag TEXT INDEX OFF,
-  description TEXT INDEX using fulltext
+  description TEXT INDEX USING fulltext
 );

</blockquote></details>
<details>
<summary>docs/start/modelling/timeseries.md (2)</summary><blockquote>

`62-66`: **Use an explicit timestamp origin with DATE_BIN.**

Passing `0` is ambiguous; provide a clear epoch.


```diff
 WITH avg_metrics AS (
   SELECT device_id,
-         DATE_BIN('1 hour'::interval, ts, 0) AS period,
+         DATE_BIN('1 hour'::interval, ts, TIMESTAMP WITH TIME ZONE '1970-01-01 00:00:00+00') AS period,
          AVG(battery['level']) AS avg_battery
   FROM devices_readings
   GROUP BY device_id, period
 )

114-119: Clarify BLOB storage; not in regular tables nor joinable via SQL.

Corrects a misleading statement.

 You can also store:
@@
-* **Geospatial** (`GEO_POINT`, `GEO_SHAPE`)
-* **Vectors** (up to 2048 dims via HNSW indexing)
-* **BLOBs** for binary data (e.g. images, logs)
+* **Geospatial** (`GEO_POINT`, `GEO_SHAPE`)
+* **Vectors** (up to 2048 dims via HNSW indexing)
+* **BLOBs** in dedicated blob tables for large binary objects (e.g., images, logs). Store content addresses (digests/URLs) in relational tables when you need to reference blobs.
@@
-All types are supported within the same table or joined together.
+All listed SQL types can co‑exist in one table; blob data resides in separate blob tables and is referenced from relational rows as needed.
🧹 Nitpick comments (16)
docs/start/modelling/primary-key.md (5)

24-31: Strengthen collision guidance for timestamp PKs.

Even after fixing the type, inserts within the same millisecond can collide. Add a short note with two mitigation examples: append a small random suffix, or prefer time-ordered UUIDs (UUIDv7).

Example wording to append under Cons:

 * Collisions possible if multiple records are created in the same millisecond
+* Mitigate by appending a small random suffix (e.g., `(ts_ms << 12) | random(0..4095)`), or prefer time-ordered UUIDv7 keys

49-55: Fix section numbering for consistency.

“Use UUIDv7...” should be Strategy 3 to keep the ordered list consistent.

-### Use UUIDv7 for Time-Ordered IDs
+### 3. Use UUIDv7 for Time-Ordered IDs

22-27: Use headings instead of bold for “Pros” / “Cons” to satisfy markdownlint (MD036).

Convert emphasized lines to proper subheadings. This also improves navigability.

Example changes (apply similarly across the page):

-**Pros**
+#### Pros
 ...
-**Cons**
+#### Cons

Also applies to: 38-46, 55-64, 157-166


78-86: Optional: Call out single-shard sequence table trade-off.

CLUSTERED INTO 1 SHARDS ensures strict ordering but can bottleneck at high throughput. Consider adding a note that multi-sequence partitioning (per-tenant/per-entity) or batching reduces contention.


51-64: Tie back to CrateDB best practice: prefer UUIDv7 for range-friendly IDs.

Nice that UUIDv7 is mentioned. Consider adding one sentence about why it’s a strong default in CrateDB: time-ordered, good for range queries, and avoids hot-spotting typical of central counters.

Proposed sentence to append under Pros:

 * Range queries possible
+* Plays well with distributed inserts and avoids central coordination bottlenecks
docs/start/modelling/fulltext.md (2)

56-63: Use headings instead of bold for section labels (MD036).

Convert “Basic usage:” to a heading.

-**Basic usage:**
+### Basic usage

Apply similarly to “Searching multiple indices with weighted ranking:” on Lines 65–69:

-**Searching multiple indices with weighted ranking:**
+### Searching multiple indices with weighted ranking

15-15: Consistency: add a space before ( in FULLTEXT index definitions.

Minor style for consistency with other snippets.

-INDEX ft_body USING FULLTEXT(body)
+INDEX ft_body USING FULLTEXT (body)
-INDEX ft_all USING FULLTEXT(title, body)
+INDEX ft_all USING FULLTEXT (title, body)

Also applies to: 22-22

docs/start/modelling/geospatial.md (1)

21-21: Typo: “boarders” → “borders”.

Minor wording fix.

-Let's define a table with country boarders and capital:
+Let's define a table with country borders and capital:
docs/start/modelling/relational.md (3)

4-4: Use British English: “modelling,” not “modeling”.

Align with the Guide’s preferred British spelling.

-CrateDB is a **distributed SQL database** that offers rich **relational data modeling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary keys,** **joins**, **aggregations**, and **subqueries**, just like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector search, and semi-structured data.
+CrateDB is a **distributed SQL database** that offers rich **relational data modelling** with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports **primary keys,** **joins**, **aggregations**, and **subqueries**, just like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector search, and semi-structured data.

21-23: Use a heading instead of bold text (“Key Features”).

Resolves MD036 and improves structure.

-**Key Features:**
+### Key features

1-192: Standardise British English spelling “modelling” and “normalisation”

Verified two occurrences of American spellings in docs/start/modelling/relational.md:

  • Line 4: change “relational data modeling” → “relational data modelling
  • Line 46: change “## Normalization vs. Embedding” → “## Normalisation vs. Embedding”
docs/start/modelling/timeseries.md (4)

8-11: Qualify “automatic indexing” claim and fix “cardin­ality” artifact.

Be precise about default indexing; remove soft-hyphen artifact.

-* While maintaining a high ingest rate, its **columnar storage** and **automatic indexing** let you access and analyze the data immediately with **fast aggregations** and **near-real-time queries**.
-* Handles **high cardin­ality** and **a variety of data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.
+* While maintaining a high ingest rate, its **columnar storage** and **default indexing for most scalar columns** let you access and analyze the data immediately with **fast aggregations** and **near‑real‑time queries**.
+* Handles **high cardinality** and **many data types**, including nested JSON, geospatial and vector data—all queryable via the same SQL statements.

41-43: Qualify “every column is stored in the column store.”

Make this accurate and type-aware.

-* `month`  is the partitioning key, optimizing data storage and retrieval.
-* Every column is stored in the column store by default for fast aggregations.
-* Using **OBJECT columns** in the `devices_readings` table provides a structured and efficient way to organize complex nested data in CrateDB, enhancing both data integrity and flexibility.
+* `month` is the partitioning key, optimizing data storage and retrieval.
+* Most scalar columns are stored in the column store (doc values) by default for fast aggregations.
+* Using **OBJECT columns** in the `devices_readings` table provides a structured and efficient way to organise complex nested data in CrateDB, enhancing both data integrity and flexibility.

110-110: Fix minor spacing typo.

Add missing space after OBJECT(IGNORED).

-... or `OBJECT(IGNORED)`to accept unknown keys ...
+... or `OBJECT(IGNORED)` to accept unknown keys ...

1-161: Editorial Consistency: Standardize “time series” phrasing and US-English spelling

Please unify the following minor inconsistencies for editorial polish:

• Line 42: “organize” → if using US-English, keep “organize”; if British, change to “organise.”
• Line 133: “time-series data” → decide on hyphenated (“time-series”) vs unhyphenated (“time series”) and apply uniformly.
• Lines 142, 159, 160: “modeling” vs the file path’s “modelling” → choose US-English (“modeling”) or British (“modelling”) across headings, text, links, and filenames.

docs/start/modelling/index.md (1)

39-50: Use “Time series” (two words) to match the target page title.

Align card label with the page header.

-::::{grid-item-card} Timeseries data
+::::{grid-item-card} Time series data
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d34dfb5 and 9e5e9ad.

📒 Files selected for processing (8)
  • docs/start/modelling/fulltext.md (1 hunks)
  • docs/start/modelling/geospatial.md (1 hunks)
  • docs/start/modelling/index.md (1 hunks)
  • docs/start/modelling/json.md (1 hunks)
  • docs/start/modelling/primary-key.md (1 hunks)
  • docs/start/modelling/relational.md (1 hunks)
  • docs/start/modelling/timeseries.md (1 hunks)
  • docs/start/modelling/vector.md (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-23T20:20:42.235Z
Learnt from: amotl
PR: crate/cratedb-guide#235
File: docs/start/going-further.md:20-27
Timestamp: 2025-08-23T20:20:42.235Z
Learning: In the CrateDB Guide documentation, use British English spelling "data modelling" rather than American English "data modeling".

Applied to files:

  • docs/start/modelling/index.md
📚 Learning: 2025-08-23T15:15:18.693Z
Learnt from: amotl
PR: crate/cratedb-guide#235
File: docs/start/query/ai-integration.md:265-273
Timestamp: 2025-08-23T15:15:18.693Z
Learning: In CrateDB documentation, avoid using "data lake" terminology when describing CrateDB's role, as it can create misconceptions. Prefer "real-time analytics store" or similar terms that better reflect CrateDB's actual capabilities as a real-time analytics database.

Applied to files:

  • docs/start/modelling/index.md
  • docs/start/modelling/json.md
  • docs/start/modelling/relational.md
  • docs/start/modelling/timeseries.md
📚 Learning: 2025-05-18T13:25:11.786Z
Learnt from: amotl
PR: crate/cratedb-guide#204
File: docs/integrate/mcp/community.md:6-8
Timestamp: 2025-05-18T13:25:11.786Z
Learning: In the CrateDB Guide documentation, particularly for MCP-related pages, the author prefers a 1-column grid layout for multiple cards to stack them vertically rather than side by side, as this is an intentional design choice.

Applied to files:

  • docs/start/modelling/index.md
📚 Learning: 2025-04-15T22:20:30.716Z
Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-04-15T22:20:30.716Z
Learning: UUIDv7 (UUID version 7) supports efficient range queries because it generates time-ordered sequential values, unlike earlier random UUID versions. This makes UUIDv7 particularly useful for primary keys in distributed database systems like CrateDB where both uniqueness and range query performance are important.

Applied to files:

  • docs/start/modelling/primary-key.md
🪛 LanguageTool
docs/start/modelling/fulltext.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-fulltext)= # Full-text data CrateDB features **nativ...

(QB_NEW_EN)


[grammar] ~126-~126: There might be a mistake here.
Context: ...and search even on high volume datasets. * Integrated SQL interface: eliminates n...

(QB_NEW_EN)


[grammar] ~134-~134: There might be a mistake here.
Context: ...cient. | | Language Support | Pick built-in analyzer matching your content...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ... index creation, analyzers, MATCH usage. * FTS Options & Advanced Features: fuzzi...

(QB_NEW_EN)


[grammar] ~146-~146: There might be a mistake here.
Context: ...l datasets (e.g. Chicago neighborhoods). * CrateDB Community Insights: real‑world...

(QB_NEW_EN)

docs/start/modelling/geospatial.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-geospatial)= # Geospatial data CrateDB supports **real...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ... either a coordinate array [lon, lat] or Well-Known Text (WKT) string `'POINT (l...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...n Text (WKT) string 'POINT (lon lat)'. * Must be declared explicitly; dynamic sch...

(QB_NEW_EN)


[grammar] ~21-~21: Ensure spelling is correct
Context: ...mple Let's define a table with country boarders and capital: ```sql CREATE TABLE count...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~103-~103: There might be a mistake here.
Context: ...acking, mapping, location-layered apps). * Unified SQL platform: spatial data can...

(QB_NEW_EN)


[grammar] ~104-~104: There might be a mistake here.
Context: ...ime-series — in the same table or query. * High ingest and query throughput, suit...

(QB_NEW_EN)

docs/start/modelling/index.md

[grammar] ~1-~1: There might be a mistake here.
Context: (modelling)= (data-modelling)= # Data modelling :::{...

(QB_NEW_EN)


[grammar] ~2-~2: There might be a mistake here.
Context: (modelling)= (data-modelling)= # Data modelling :::{div} sd-text-muted C...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...ent data types. ::: :::::{grid} 2 3 3 3 :padding: 0 :class-container: installati...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...es. ::: :::::{grid} 2 3 3 3 :padding: 0 :class-container: installation-grid :::...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...id ::::{grid-item-card} Relational data 🔗 model-relational :link-type: ref ...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ... Relational data 🔗 model-relational :link-type: ref :link-alt: Relational da...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ... 🔗 model-relational :link-type: ref :link-alt: Relational data :padding: 3 :...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...ink-type: ref :link-alt: Relational data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...f :link-alt: Relational data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...nal data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}table-list...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...:class-title: sd-fs-6 {fas}table-list :::: ::::{grid-item-card} JSON data :li...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...st` :::: ::::{grid-item-card} JSON data 🔗 model-json :link-type: ref :link-...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...d-item-card} JSON data 🔗 model-json :link-type: ref :link-alt: JSON data :pa...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...N data 🔗 model-json :link-type: ref :link-alt: JSON data :padding: 3 :text-a...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...son :link-type: ref :link-alt: JSON data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...pe: ref :link-alt: JSON data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...SON data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}file-lines...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ...:class-title: sd-fs-6 {fas}file-lines :::: ::::{grid-item-card} Timeseries da...

(QB_NEW_EN)


[grammar] ~39-~39: There might be a mistake here.
Context: ...:: ::::{grid-item-card} Timeseries data 🔗 model-timeseries :link-type: ref ...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ... Timeseries data 🔗 model-timeseries :link-type: ref :link-alt: Timeseries da...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ... 🔗 model-timeseries :link-type: ref :link-alt: Timeseries data :padding: 3 :...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...ink-type: ref :link-alt: Timeseries data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...f :link-alt: Timeseries data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~44-~44: There might be a mistake here.
Context: ...ies data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~46-~46: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}timeline :...

(QB_NEW_EN)


[grammar] ~49-~49: There might be a mistake here.
Context: ...1 :class-title: sd-fs-6 {fas}timeline :::: ::::{grid-item-card} Geospatial da...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...:: ::::{grid-item-card} Geospatial data 🔗 model-geospatial :link-type: ref ...

(QB_NEW_EN)


[grammar] ~53-~53: There might be a mistake here.
Context: ... Geospatial data 🔗 model-geospatial :link-type: ref :link-alt: Geospatial da...

(QB_NEW_EN)


[grammar] ~54-~54: There might be a mistake here.
Context: ... 🔗 model-geospatial :link-type: ref :link-alt: Geospatial data :padding: 3 :...

(QB_NEW_EN)


[grammar] ~55-~55: There might be a mistake here.
Context: ...ink-type: ref :link-alt: Geospatial data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~56-~56: There might be a mistake here.
Context: ...f :link-alt: Geospatial data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~57-~57: There might be a mistake here.
Context: ...ial data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~58-~58: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~59-~59: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}globe ::::...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...fs-1 :class-title: sd-fs-6 {fas}globe :::: ::::{grid-item-card} Fulltext data...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...:::: ::::{grid-item-card} Fulltext data 🔗 model-fulltext :link-type: ref :l...

(QB_NEW_EN)


[grammar] ~66-~66: There might be a mistake here.
Context: ...ard} Fulltext data 🔗 model-fulltext :link-type: ref :link-alt: Fulltext data...

(QB_NEW_EN)


[grammar] ~67-~67: There might be a mistake here.
Context: ...ta 🔗 model-fulltext :link-type: ref :link-alt: Fulltext data :padding: 3 :te...

(QB_NEW_EN)


[grammar] ~68-~68: There might be a mistake here.
Context: ...:link-type: ref :link-alt: Fulltext data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ...ref :link-alt: Fulltext data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...ext data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~72-~72: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}font :::: ...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...-fs-1 :class-title: sd-fs-6 {fas}font :::: ::::{grid-item-card} Vector data :...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...` :::: ::::{grid-item-card} Vector data 🔗 model-vector :link-type: ref :lin...

(QB_NEW_EN)


[grammar] ~79-~79: There might be a mistake here.
Context: ...em-card} Vector data 🔗 model-vector :link-type: ref :link-alt: Vector data :...

(QB_NEW_EN)


[grammar] ~80-~80: There might be a mistake here.
Context: ...data 🔗 model-vector :link-type: ref :link-alt: Vector data :padding: 3 :text...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...r :link-type: ref :link-alt: Vector data :padding: 3 :text-align: center :class-c...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...: ref :link-alt: Vector data :padding: 3 :text-align: center :class-card: sd-pt-3...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...tor data :padding: 3 :text-align: center :class-card: sd-pt-3 :class-body: sd-fs-...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...:text-align: center :class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...class-card: sd-pt-3 :class-body: sd-fs-1 :class-title: sd-fs-6 {fas}lightbulb ...

(QB_NEW_EN)


[grammar] ~88-~88: There might be a mistake here.
Context: ... :class-title: sd-fs-6 {fas}lightbulb :::: ::::: ```{toctree} :maxdepth: 1 ...

(QB_NEW_EN)


[grammar] ~106-~106: There might be a mistake here.
Context: ...or ``` :::{rubric} Implementation notes ::: Because CrateDB is a distributed an...

(QB_NEW_EN)


[grammar] ~109-~109: There might be a mistake here.
Context: ...ytical database (OLAP) designed to store large volumes of data, users need to con...

(QB_NEW_EN)


[grammar] ~110-~110: There might be a mistake here.
Context: ... to consider certain details compared to traditional RDBMS. :::{card} Primary k...

(QB_NEW_EN)

docs/start/modelling/json.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-json)= # JSON data CrateDB combines the flexibil...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...-------------------------------------- | | DYNAMIC | New fields are automat...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...tically added to the schema at runtime | | STRICT | Only explicitly define...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...ed fields are allowed | | IGNORED | Extra fields are store...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ...vice']['os'] = 'Android'; ``` :::{note} Dot-notation works for both explicitly a...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...explicitly and dynamically added fields. ::: ## Querying DYNAMIC OBJECTs To sup...

(QB_NEW_EN)


[grammar] ~199-~199: There might be a mistake here.
Context: ...ption | | ------------------ | -----------------...

(QB_NEW_EN)


[grammar] ~200-~200: There might be a mistake here.
Context: ...-------------------------------------- | | Logs & Traces | Unstructured payl...

(QB_NEW_EN)


[grammar] ~201-~201: There might be a mistake here.
Context: ...ctured payloads with flexible metadata | | Sensor & IoT Data | Variable field sc...

(QB_NEW_EN)


[grammar] ~202-~202: There might be a mistake here.
Context: ...le field schemas, nested measurements | | Product Catalogs | Specs, tags, revi...

(QB_NEW_EN)


[grammar] ~203-~203: There might be a mistake here.
Context: ... tags, reviews in varying formats | | User Profiles | Custom settings, ...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ... settings, device info, preferences | | Telemetry / Events | Event streams wit...

(QB_NEW_EN)


[grammar] ~209-~209: There might be a mistake here.
Context: ... | | ---------------- | -------------------...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ...-------------------------------------- | | Schema Evolution | Use DYNAMIC for f...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ... STRICT for control | | Index Management | Avoid over-indexing...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...d fields | | Nested Depth | Prefer flat structu...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...low nesting for performance | | Column Mixing | Combine structured ...

(QB_NEW_EN)


[grammar] ~214-~214: There might be a mistake here.
Context: ...h JSON for hybrid models | | Observability | Monitor number of d...

(QB_NEW_EN)


[grammar] ~219-~219: There might be a mistake here.
Context: ...sources * CrateDB Docs – Object Columns * Working with JSON in CrateDB * CrateDB A...

(QB_NEW_EN)


[grammar] ~220-~220: There might be a mistake here.
Context: ...t Columns * Working with JSON in CrateDB * CrateDB Academy – Modeling with JSON * U...

(QB_NEW_EN)


[grammar] ~221-~221: There might be a mistake here.
Context: ...B * CrateDB Academy – Modeling with JSON * Understanding Column Policies ## Summar...

(QB_NEW_EN)

docs/start/modelling/primary-key.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-primary-key)= # Primary key strategies CrateDB is built...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ... Auto-generated, always-increasing value * Useful when records are timestamped anyw...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ...ents or sources Cons * Not ordered * Harder to read/debug * No efficient rang...

(QB_NEW_EN)


[grammar] ~46-~46: There might be a mistake here.
Context: ...** * Not ordered * Harder to read/debug * No efficient range queries ### Use UUID...

(QB_NEW_EN)


[style] ~51-~51: Consider using a shorter alternative to avoid wordiness.
Context: ...t that preserves temporal ordering, making them better suited for distributed inserts and rang...

(MADE_IT_JJR)

docs/start/modelling/relational.md

[grammar] ~50-~50: There might be a mistake here.
Context: ...arity: use normalized tables with joins. * For performance in high-ingest or read-o...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ... created_at TIMESTAMP ); ``` :::{note} CrateDB lets you query nested fields...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ... items['name'], items['price'], etc. ::: ## Aggregations & Grouping Use fam...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...ced for uniqueness and data distribution * Check - enforces custom value validati...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...eck -** enforces custom value validation * Indexes – automatic index for all colu...

(QB_NEW_EN)


[grammar] ~94-~94: There might be a mistake here.
Context: ...exes** – automatic index for all columns * Full-text indexes - manually defined, ...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...n | | -------------------- | ---------------...

(QB_NEW_EN)


[grammar] ~150-~150: There might be a mistake here.
Context: ...-------------------------------------- | | Customer & Orders | Classic normali...

(QB_NEW_EN)


[grammar] ~151-~151: There might be a mistake here.
Context: ...rmalized setup with joins and filters | | Inventory Management | Products, stock...

(QB_NEW_EN)


[grammar] ~152-~152: There might be a mistake here.
Context: ...stock levels, locations | | Financial Systems | Transactions, b...

(QB_NEW_EN)


[grammar] ~153-~153: There might be a mistake here.
Context: ...ns, balances, audit logs | | User Profiles | Users, preferen...

(QB_NEW_EN)


[grammar] ~154-~154: There might be a mistake here.
Context: ...ferences, activity logs | | Multi-tenant Systems | Use schemas or ...

(QB_NEW_EN)


[grammar] ~171-~171: There might be a mistake here.
Context: ... | | ------------- | ----------------------...

(QB_NEW_EN)


[grammar] ~172-~172: There might be a mistake here.
Context: ...-------------------------------------- | | Keys & IDs | Use UUIDs or consisten...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ...t IDs for primary keys | | Sharding | Let CrateDB auto-shard...

(QB_NEW_EN)


[grammar] ~174-~174: There might be a mistake here.
Context: ... unless you have advanced requirements | | Join Strategy | Minimize joins over la...

(QB_NEW_EN)


[grammar] ~175-~175: There might be a mistake here.
Context: ...rge, high-cardinality columns | | Nested Fields | Use `column_policy = '...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...dynamic'` if schema needs flexibility | | Aggregations | Favor columnar tables ...

(QB_NEW_EN)


[grammar] ~177-~177: There might be a mistake here.
Context: ...for analytical workloads | | Co-location | Consider denormalizati...

(QB_NEW_EN)


[grammar] ~182-~182: There might be a mistake here.
Context: ...esources * CrateDB Docs – Data Modeling * CrateDB Academy – Relational Modeling * ...

(QB_NEW_EN)


[grammar] ~183-~183: There might be a mistake here.
Context: ... * CrateDB Academy – Relational Modeling * Working with Joins in CrateDB * Schema D...

(QB_NEW_EN)


[grammar] ~184-~184: There might be a mistake here.
Context: ...Modeling * Working with Joins in CrateDB * Schema Design Guide ## Summary CrateDB...

(QB_NEW_EN)

docs/start/modelling/timeseries.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-timeseries)= # Time series data CrateDB employs a rela...

(QB_NEW_EN)


[style] ~9-~9: Consider using a more concise synonym.
Context: .... * Handles high cardin­ality and a variety of data types, including nested JSON, ge...

(A_VARIETY_OF)


[grammar] ~9-~9: There might be a mistake here.
Context: ...l queryable via the same SQL statements. * PostgreSQL wire‑protocol compatible, s...

(QB_NEW_EN)


[uncategorized] ~53-~53: Do not mix variants of the same word (‘analyse’ and ‘analyze’) within a single text.
Context: ...L functions tailor‑made for time‑series analyses: * **`DATE_BIN(interval, timestamp, or...

(EN_WORD_COHERENCY)


[grammar] ~55-~55: There might be a mistake here.
Context: ...r bucketed aggregations (down‑sampling). * Window functions like LAG() and `LEA...

(QB_NEW_EN)


[grammar] ~114-~114: There might be a mistake here.
Context: ...Geospatial* (GEO_POINT, GEO_SHAPE) * Vectors (up to 2048 dims via HNSW inde...

(QB_NEW_EN)


[grammar] ~115-~115: There might be a mistake here.
Context: ...rs** (up to 2048 dims via HNSW indexing) * BLOBs for binary data (e.g. images, lo...

(QB_NEW_EN)


[style] ~118-~118: ‘joined together’ might be wordy. Consider a shorter alternative.
Context: ... are supported within the same table or joined together. ## Storage Optimization * **Partitio...

(EN_WORDINESS_PREMIUM_JOINED_TOGETHER)


[grammar] ~122-~122: There might be a mistake here.
Context: ...y/monthly) and sharded across a cluster. * Supports long‑term retention with perfor...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...ention with performant historic storage. * Columnar layout reduces storage footprin...

(QB_NEW_EN)


[grammar] ~137-~137: There might be a mistake here.
Context: ...rations: * Table creation and ingestion * Average per station * Using MAX_BY() t...

(QB_NEW_EN)


[grammar] ~138-~138: There might be a mistake here.
Context: ...tion and ingestion * Average per station * Using MAX_BY() to find highest tempera...

(QB_NEW_EN)


[grammar] ~139-~139: There might be a mistake here.
Context: ...to find highest temperature timestamps * Down-sampling usingDATE_BIN` into 4‑we...

(QB_NEW_EN)

docs/start/modelling/vector.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-vector)= # Vector data CrateDB natively supports *...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...arrays (e.g. 768, 1024, 2048 dimensions) * Supports **HNSW (Hierarchical Navigable ...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...)** indexing for fast approximate search * Optimized for cosine, Euclidean, and dot...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...])` declares a fixed-size vector column. * You can ingest vectors directly or compu...

(QB_NEW_EN)


[grammar] ~73-~73: There might be a mistake here.
Context: ... | | ----------------------- | ------------...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...-------------------------------------- | | Semantic Search | Rank documen...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...stead of keywords | | Recommendation Systems | Find similar...

(QB_NEW_EN)


[grammar] ~76-~76: There might be a mistake here.
Context: ..., or behaviors | | Image / Audio Retrieval | Store and co...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ... of images/audio | | Fraud Detection | Match behavi...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...a vectors | | Hybrid Search | Combine vect...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...& Scaling * Vector search uses HNSW: state-of-the-art ANN algorithm with log...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...ithm with logarithmic search complexity. * CrateDB parallelizes ANN search across s...

(QB_NEW_EN)


[grammar] ~98-~98: There might be a mistake here.
Context: ...lity must be consistent for each column. ::: ## Best Practices | Area ...

(QB_NEW_EN)


[grammar] ~114-~114: There might be a mistake here.
Context: ...LangChain**: CrateDB has native drivers and REST interface * Embedding models: ...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...esources * CrateDB Docs – Vector Search * Blog: Using CrateDB for Hybrid Search (V...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...B for Hybrid Search (Vector + Full-Text) * CrateDB Academy – Vector Data * [Sample ...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...ll-Text) * CrateDB Academy – Vector Data * [Sample notebooks on GitHub](https://gith...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/start/modelling/fulltext.md

78-78: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


89-89: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/primary-key.md

22-22: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


27-27: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


38-38: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


43-43: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


55-55: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


60-60: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


78-78: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


87-87: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


94-94: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


103-103: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


157-157: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


162-162: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/start/modelling/relational.md

17-17: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (4)
docs/start/modelling/fulltext.md (1)

98-101: Replace nonsensical query terms with realistic examples.

'jupm OR verwrlost' looks like accidental gibberish. Use a deliberate misspelling to demonstrate fuzziness clearly.

Suggested change:

-WHERE MATCH((ft_en, ft_de), 'jupm OR verwrlost') USING best_fields WITH (fuzziness = 1)
+WHERE MATCH((ft_en, ft_de), 'jump OR verloren') USING best_fields WITH (fuzziness = 1)

If you prefer to illustrate misspellings, consider 'jumpp OR verlroen' with fuzziness = 1.

docs/start/modelling/vector.md (1)

15-16: Verify similarity-metric claims.

Docs currently highlight Euclidean-based ANN and don’t expose a DDL toggle for cosine/dot-product. Please confirm current support before claiming optimization for multiple metrics; otherwise, reword to avoid overstating.

CrateDB blog and reference emphasize FLOAT_VECTOR, KNN_MATCH, and Euclidean-derived scoring, without configurable metric selection. (cratedb.com)

docs/start/modelling/json.md (1)

4-7: Tone nit: avoid “data lake” terminology in CrateDB docs.

Thanks for keeping the positioning clean here. Just a reminder from prior guidance to avoid “data lake” phrasing elsewhere in this section.

(Recalling prior team preference to use “real-time analytics store.”)

docs/start/modelling/index.md (1)

1-131: Consistency check passed – British English spelling confirmed
Ran a search for American spellings (“modeling”), “Normalization” header misuse, and “data lake” in docs/start/modelling/index.md; no occurrences were found. Everything is aligned with our style guidelines.
– No action items remaining; approving these changes.

* **Indexes** – automatic index for all columns
* **Full-text indexes -** manually defined, supports many tokenizers, analyzers and filters

In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Qualify the “every column is indexed” statement.

This is inaccurate for all types; provide a nuanced explanation.

-In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.
+By default, most scalar columns are indexed to support filtering, grouping, and ordering. Indexing behavior is type-specific (for example, OBJECT sub-fields, arrays, geo, and full‑text use specialized mappings and options), and can be disabled per column. CrateDB maintains indexes automatically—no manual vacuum or reindex steps are required in typical workflows.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to `vacuum` or `re-index` like in other systems. Indexing can be manually turned off.
By default, most scalar columns are indexed to support filtering, grouping, and ordering. Indexing behavior is type-specific (for example, OBJECT sub-fields, arrays, geo, and full-text use specialized mappings and options), and can be disabled per column. CrateDB maintains indexes automatically—no manual vacuum or reindex steps are required in typical workflows.

Comment on lines +33 to +40
Use the `nearest_neighbors` predicate to perform similarity search:

```sql
SELECT id, title, content
FROM documents
ORDER BY embedding <-> [0.12, 0.73, ..., 0.01]
LIMIT 5;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace pgvector-style example with CrateDB SQL (KNN_MATCH + optional VECTOR_SIMILARITY).

nearest_neighbors and the <-> operator aren’t part of CrateDB SQL. Use KNN_MATCH and compute a score if needed.

-Use the `nearest_neighbors` predicate to perform similarity search:
+Use `KNN_MATCH` to perform similarity search:

-```sql
-SELECT id, title, content
-FROM documents
-ORDER BY embedding <-> [0.12, 0.73, ..., 0.01]
-LIMIT 5;
-```
+```sql
+WITH q AS (SELECT [0.12, 0.73, ..., 0.01]::FLOAT_VECTOR(768) AS v)
+SELECT id, title, content, VECTOR_SIMILARITY(embedding, (SELECT v FROM q)) AS score
+FROM documents
+WHERE KNN_MATCH(embedding, (SELECT v FROM q), 5)
+ORDER BY score DESC
+LIMIT 5;
+```

Docs: Vector Search guide, KNN_MATCH, VECTOR_SIMILARITY. (cratedb.com)

🤖 Prompt for AI Agents
In docs/start/modelling/vector.md around lines 33 to 40, the SQL example uses
pgvector-style nearest_neighbors and the `<->` operator which CrateDB does not
support; replace it with a CrateDB-compatible query that (1) builds the query
vector (e.g., via a WITH clause and cast to FLOAT_VECTOR(n)), (2) uses WHERE
KNN_MATCH(embedding, query_vector, k) to retrieve the top-k neighbors, (3)
optionally computes a similarity score with VECTOR_SIMILARITY(embedding,
query_vector) in the SELECT list, and (4) orders by that score DESC and limits
the results to 5. Ensure the example includes the vector cast size (matching
your embedding dim) and uses KNN_MATCH + VECTOR_SIMILARITY instead of
nearest_neighbors / `<->`.

Comment on lines +46 to +52
```sql
SELECT id, title, content, embedding <-> [0.12, ..., 0.01] AS score
FROM documents
WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
ORDER BY score
LIMIT 10;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the hybrid example executable and consistent with the schema.

The query references content_ft and author, which aren’t defined. Either extend the schema or adapt the query. Minimal, consistent fix below (adds an index and uses CrateDB vector syntax).

-```sql
-SELECT id, title, content, embedding <-> [0.12, ..., 0.01] AS score
-FROM documents
-WHERE MATCH(content_ft, 'machine learning') AND author = 'Alice'
-ORDER BY score
-LIMIT 10;
-```
+```sql
+-- Create a full‑text index on content once:
+-- CREATE INDEX ft_content ON documents USING FULLTEXT (content);
+WITH q AS (SELECT [0.12, ..., 0.01]::FLOAT_VECTOR(768) AS v)
+SELECT id, title, content,
+       VECTOR_SIMILARITY(embedding, (SELECT v FROM q)) AS score
+FROM documents
+WHERE MATCH(ft_content, 'machine learning')
+  AND KNN_MATCH(embedding, (SELECT v FROM q), 10)
+ORDER BY score DESC
+LIMIT 10;
+```
🤖 Prompt for AI Agents
In docs/start/modelling/vector.md around lines 46 to 52, the hybrid query
references undefined fields/indexes (content_ft and author) and non‑CrateDB
vector operators; fix by adding a note to create a fulltext index on content
(e.g. CREATE INDEX ft_content ON documents USING FULLTEXT (content)) and replace
the example with a CrateDB‑compatible hybrid query that defines the query vector
in a WITH clause, uses VECTOR_SIMILARITY(...) for score, filters via
MATCH(ft_content, 'machine learning') and performs a KNN_MATCH(embedding,
query_vector, 10) so the example is executable and consistent with the schema.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
docs/start/modelling/primary-key.md (2)

38-41: Fix type mismatch in timestamp PK example.

now() returns a TIMESTAMP, not a BIGINT. Either use a TIMESTAMP PK or convert to epoch BIGINT explicitly.

-```psql
-BIGINT DEFAULT now() PRIMARY KEY
-```
+```psql
+TIMESTAMP WITH TIME ZONE DEFAULT now() PRIMARY KEY
+```

Alternatively, if you want a BIGINT epoch (ms):

-```psql
-BIGINT DEFAULT now() PRIMARY KEY
-```
+```psql
+BIGINT DEFAULT (extract(epoch from now()) * 1000)::BIGINT PRIMARY KEY
+```

171-205: Guard against missing sequence row; avoid string comparison; use returned value.

  • row can be None if the sequence is missing.
  • Comparing str(... .all()) != "[]" is brittle. Check list truthiness.
  • Prefer using the value returned by RETURNING last_value.
-    row = db.query(select_query, sequence_name=sequence_name).first()
-    new_value = row.last_value + 1
+    row = db.query(select_query, sequence_name=sequence_name).first()
+    if row is None:
+        raise RuntimeError(f"Sequence '{sequence_name}' not found. Initialize it first.")
+    new_value = row.last_value + 1
@@
-    if (
-        str(
-            db.query(
-                update_query,
-                new_value=new_value,
-                sequence_name=sequence_name,
-                seq_no=row._seq_no,
-                primary_term=row._primary_term,
-            ).all()
-        )
-        != "[]"
-    ):
-        break
+    result = db.query(
+        update_query,
+        new_value=new_value,
+        sequence_name=sequence_name,
+        seq_no=row._seq_no,
+        primary_term=row._primary_term,
+    ).all()
+    if result:
+        # Use the authoritative value returned by the UPDATE
+        new_value = result[0].last_value
+        break
🧹 Nitpick comments (9)
docs/start/modelling/primary-key.md (9)

47-50: Clarify collision risk and add guidance for safe timestamp keys.

Even with TIMESTAMP(3) precision, concurrent inserts within the same millisecond can collide. Also, clocks can skew across nodes. Recommend documenting a composite key variant to avoid collisions.

 :Cons:
-  - Can result in gaps
-  - Collisions possible if multiple records are created in the same millisecond
+  - Can result in gaps
+  - Collision risk within the same millisecond and under clock skew
+
+:::{tip}
+If you need timestamp-like ordering with uniqueness, use a composite PK:
+
+```psql
+CREATE TABLE events (
+  ts TIMESTAMP WITH TIME ZONE DEFAULT now(),
+  source_id TEXT,              -- writer/node identifier
+  pk_suffix SMALLINT DEFAULT 0, -- optional tiebreaker
+  PRIMARY KEY (ts, source_id, pk_suffix)
+);
+```
+:::

69-77: UUIDv7 section looks solid; mention aligns with known best practice.

Good call-out on temporal ordering and range-query friendliness of UUIDv7. This aligns with prior learnings we’ve captured about UUIDv7 being time-ordered and efficient for range scans in distributed systems like CrateDB.

Consider adding one short example of a UDF invocation or a link to an example statement to lower adoption friction.


117-124: Harden the sequence table schema.

Make last_value non-null with a default. This removes the need to rely on insert-time initialization discipline.

 CREATE TABLE sequences (
        name TEXT PRIMARY KEY,
-       last_value BIGINT
+       last_value BIGINT NOT NULL DEFAULT 0
 ) CLUSTERED INTO 1 SHARDS;

207-212: Add jitter to backoff to reduce thundering herd under contention.

-import time
+import time
+import random
@@
-    delay = base_delay * (2**attempt)
+    delay = base_delay * (2 ** attempt) * (1.0 + random.random() * 0.25)

168-170: Ensure the DB connection is closed on exceptions.

Wrap the workload in try/finally or use a context manager so failures during retries don’t leak connections.

-db = records.Database("crate://")
-sequence_name = "mysequence"
+db = records.Database("crate://")
+sequence_name = "mysequence"
+try:
@@
-db.close()
+finally:
+    db.close()

Alternatively:

with records.Database("crate://") as db:
    sequence_name = "mysequence"
    # ... rest of the code ...

Also applies to: 213-216


59-66: Fix markdown list indentation to satisfy MD007 (ul-indent).

Dedent list items under Pros/Cons to column 0.

-  - Universally unique
-  - No conflicts when merging from multiple environments or sources
+- Universally unique
+- No conflicts when merging from multiple environments or sources

Apply similarly to other Pros/Cons lists flagged by the linter.

Also applies to: 79-85, 110-116


232-232: Remove unused link reference definition for [udf].

MD053 flags this as unused because the page uses {ref} instead. Either remove the reference or switch the inline usage to []() style. Removing it seems simplest.

-[udf]: https://cratedb.com/docs/crate/reference/en/latest/general/user-defined-functions.html

220-227: Clarify “Ordered” for External system IDs in the summary table.

“✅/❌” is ambiguous. Suggest a short note: “Depends on whether the source generates time-ordered IDs.”

-| External system IDs | ✅/❌    | ✅      | ✅        | ✅              | ✅             | Depends on source    |
+| External system IDs | ✅/❌    | ✅      | ✅        | ✅              | ✅             | Orderedness depends on source ID semantics |

69-77: Optional: Add a quick-start snippet for UUIDv7 UDF usage.

A minimal example helps readers copy/paste and try it immediately.

If helpful, I can contribute a small, tested JavaScript UDF example plus a CREATE FUNCTION statement tailored to CrateDB.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9e5e9ad and eb2269c.

📒 Files selected for processing (3)
  • docs/performance/inserts/index.rst (0 hunks)
  • docs/performance/inserts/sequences.rst (0 hunks)
  • docs/start/modelling/primary-key.md (1 hunks)
💤 Files with no reviewable changes (2)
  • docs/performance/inserts/index.rst
  • docs/performance/inserts/sequences.rst
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-15T22:20:30.716Z
Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-04-15T22:20:30.716Z
Learning: UUIDv7 (UUID version 7) supports efficient range queries because it generates time-ordered sequential values, unlike earlier random UUID versions. This makes UUIDv7 particularly useful for primary keys in distributed database systems like CrateDB where both uniqueness and range query performance are important.

Applied to files:

  • docs/start/modelling/primary-key.md
🪛 LanguageTool
docs/start/modelling/primary-key.md

[grammar] ~1-~1: There might be a mistake here.
Context: (model-primary-key)= (autogenerated-sequences)= # Primary key...

(QB_NEW_EN)


[grammar] ~2-~2: There might be a mistake here.
Context: ...primary-key)= (autogenerated-sequences)= # Primary key strategies and autogenerated...

(QB_NEW_EN)


[grammar] ~5-~5: There might be a mistake here.
Context: ...ated sequences :::{rubric} Introduction ::: As you begin working with CrateDB, ...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...ou might be puzzled why CrateDB does not have a built-in, auto-incrementing "seri...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...you through five common alternatives to generate unique primary key values in...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...CrateDB, including a recipe to implement your own auto-incrementing sequence mech...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...aintaining a global auto-increment value would require that a node checks with ot...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...her nodes before allocating a new value. This would create a **global coordinatio...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...n bottleneck**, limit insert throughput, and reduce scalability. CrateDB is desi...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...hoice means CrateDB does not support traditional auto-incrementing primary ke...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...ry key types like SERIAL in PostgreSQL or MySQL by default. :::{rubric} Soluti...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...MySQL by default. :::{rubric} Solutions ::: CrateDB provides flexibility: You ca...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...L by default. :::{rubric} Solutions ::: CrateDB provides flexibility: You can ch...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...y: You can choose a primary key strategy tailored to your use case, whether for s...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...for strict uniqueness, time ordering, or external system integration. You can als...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...o implement true consistent/synchronized sequences if you want to. ## Using a ti...

(QB_NEW_EN)


[style] ~34-~34: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ... true consistent/synchronized sequences if you want to. ## Using a timestamp as a primary key...

(IF_YOU_WANT)


[grammar] ~44-~44: There might be a mistake here.
Context: ... Auto-generated, always-increasing value - Useful when records are timestamped anyw...

(QB_NEW_EN)


[grammar] ~47-~47: There might be a mistake here.
Context: ...n records are timestamped anyway :Cons: - Can result in gaps - Collisions possib...

(QB_NEW_EN)


[grammar] ~59-~59: There might be a mistake here.
Context: ...Y KEY ``` :Pros: - Universally unique - No conflicts when merging from multiple ...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...multiple environments or sources :Cons: - Not ordered - Harder to read/debug -...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...ments or sources :Cons: - Not ordered - Harder to read/debug - No efficient ra...

(QB_NEW_EN)


[grammar] ~64-~64: There might be a mistake here.
Context: ... - Not ordered - Harder to read/debug - No efficient range queries ## Using UUI...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ...rves temporal ordering, making UUIDs better suited for inserts and range quer...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...{ref}User-Defined Function (UDF) <udf> in JavaScript, or in your preferred prog...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...programming language by using one of the available UUIDv7 libraries. :Pros: - ...

(QB_NEW_EN)


[grammar] ~79-~79: There might be a mistake here.
Context: ... Globally unique and almost sequential - Efficient range queries possible :Cons:...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...Efficient range queries possible :Cons: - Not as human-friendly as integer numbers...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...Not as human-friendly as integer numbers - Slight overhead due to UDF use ## Using...

(QB_NEW_EN)


[grammar] ~88-~88: There might be a mistake here.
Context: ...e system that already generates unique IDs, you can reuse those by inserting ...

(QB_NEW_EN)


[grammar] ~89-~89: There might be a mistake here.
Context: ...erting primary key values as-is from the source system. In this case, CrateDB do...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ... need to generate any identifier values, and consistency is ensured across system...

(QB_NEW_EN)


[grammar] ~95-~95: There might be a mistake here.
Context: ...is ensured across systems. :::{seealso} An example for that is [Replicating data...

(QB_NEW_EN)


[grammar] ~96-~96: There might be a mistake here.
Context: ...ses to CrateDB with Debezium and Kafka]. ::: ## Implementing a custom sequence t...

(QB_NEW_EN)


[grammar] ~101-~101: There might be a mistake here.
Context: ...ting numeric ID (e.g., for compatibility or legacy reasons), you can implement a ...

(QB_NEW_EN)


[grammar] ~102-~102: There might be a mistake here.
Context: ...ment a simple sequence generator using a dedicated table and client-side logic. ...

(QB_NEW_EN)


[grammar] ~105-~105: There might be a mistake here.
Context: ...he latest values that have been consumed and client side code to keep it up-to-da...

(QB_NEW_EN)


[grammar] ~106-~106: There might be a mistake here.
Context: ...-to-date in a way that guarantees unique values even when many ingestion processe...

(QB_NEW_EN)


[grammar] ~110-~110: There might be a mistake here.
Context: ...d prefixes, adjust increment size, etc.) - Sequential IDs possible :Cons: - Addi...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...tc.) - Sequential IDs possible :Cons: - Additional client logic about optimistic...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ... to generate unique, incrementing values even in parallel ingestion scenarios. T...

(QB_NEW_EN)


[grammar] ~147-~147: There might be a mistake here.
Context: ...value used from the sequences table, and then attempts an [optimistic UPDATE] wit...

(QB_NEW_EN)


[grammar] ~148-~148: There might be a mistake here.
Context: ... UPDATE] with a RETURNING clause, if a contending process already consumed the ...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...identity nothing will be returned so our process will retry until a value is retu...

(QB_NEW_EN)


[grammar] ~150-~150: There might be a mistake here.
Context: ...rned, then it uses that value as the new ID for the record we are inserting into ...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/start/modelling/primary-key.md

44-44: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


45-45: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


48-48: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


49-49: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


59-59: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


60-60: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


63-63: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


64-64: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


65-65: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


79-79: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


80-80: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


83-83: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


84-84: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


110-110: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


111-111: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


114-114: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


115-115: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


232-232: Link and image reference definitions should be needed
Unused link or image reference definition: "udf"

(MD053, link-image-reference-definitions)

Base automatically changed from getting-started to main August 26, 2025 22:09
@bmunkholm bmunkholm mentioned this pull request Aug 28, 2025
6 tasks
Copy link
Contributor

@bmunkholm bmunkholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl I'll work on a new PR with this. I can't comprehend all the Rabbit comments :-)

@bmunkholm bmunkholm closed this Sep 2, 2025
@amotl amotl deleted the data-modelling branch October 16, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

guidance Matters of layout, shape, and structure. help wanted Extra attention is needed new content New content being added. question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants