-
Notifications
You must be signed in to change notification settings - Fork 2
Storage internals: Refurbish overview page #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 45 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughReorganized storage docs: replaced a Lucene-centric layout with a "Data structures" section (Inverted index, BKD tree, Doc values, Column store) and added a "Storage process" section covering append-only segments, segment merges (TieredMergePolicy), refresh behavior, manual OPTIMIZE/REFRESH guidance, and updated anchors/cross-references. (41 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client (ingest / query)
participant Coord as Coordinator
participant Shard as Shard (segments)
participant BG as Background merges
rect #F0F9FF
Note over Client,Coord: Write path
Client->>Coord: INSERT / bulk ingest
Coord->>Shard: append document to new segment (append-only)
Shard-->>Client: ack (visible after refresh)
end
rect #FFF7F0
Note over BG,Shard: Maintenance
BG->>Shard: TieredMergePolicy triggers merges
Shard-->>Shard: create merged segment
Shard-->>Coord: merged segment becomes active
end
rect #F0FFF4
Note over Client,Shard: Read path
Client->>Coord: SELECT / search
Coord->>Shard: search (inverted index, BKD tree, doc values / column store)
Shard-->>Client: results (as of latest refresh)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
386a5e2 to
12c4363
Compare
cd19761 to
b7b569d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, left also a few comments/suggestions.
b7b569d to
ffce69e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, spotted something more.
ffce69e to
622cd2a
Compare
98c9ad7 to
c5467bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another little bunch of suggestions I am not sure about.
8b2f9bd to
23c257c
Compare
docs/feature/storage/index.md
Outdated
| ## Storage process | ||
|
|
||
| How CrateDB stores data using Lucene. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When starting this topic already with a quick introduction, it makes perfect sense to refer to the CrateDB reference manual here or at the bottom of the section or page. We've identified two candidates. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both is fine, thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first link was inlined using a separate :Sharded storage: term definition at the top of the "Storage process" section to improve the overall narrative per 442d39d, and the second one, not really fitting inside the article, was added to the bottom of the document into the "Related sections" section per 69d0b90.
Maybe someone will have suggestions about wording improvements, as this is just a v0.1. While those "teaser texts" are mostly based on ingredients of the relevant sections they are referring to, they are still "new" in the sense that they try to compress the whole gist into just a few concise words. In that curation process, every support is much welcome.
@coderabbitai: Can you analyze the content of the two commits I was referring to and present suggestions how to improve them, when applicable? As always, please use concise language, active voice, and avoid yapping.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai: Your suggestion saves a few words. Applied per e82d746, thank you.
@matriv: I found the statements are still correct, and I hope you don't have any objections.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
23c257c to
69d0b90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/feature/storage/index.md(4 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: amotl
Repo: crate/cratedb-guide PR: 0
File: :0-0
Timestamp: 2025-10-06T16:40:16.322Z
Learning: In the crate/cratedb-guide repository, documentation is organized by content type: files in `docs/feature/`, `docs/integrate/`, and `docs/ingest/` are reference material designed for quick access, while `docs/start/` contains learning path content. Reference material sections use top-right reference layouts for efficient navigation.
📚 Learning: 2025-10-06T16:40:16.322Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 0
File: :0-0
Timestamp: 2025-10-06T16:40:16.322Z
Learning: In the crate/cratedb-guide repository, documentation is organized by content type: files in `docs/feature/`, `docs/integrate/`, and `docs/ingest/` are reference material designed for quick access, while `docs/start/` contains learning path content. Reference material sections use top-right reference layouts for efficient navigation.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-08-09T11:00:03.578Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 236
File: docs/index.md:312-316
Timestamp: 2025-08-09T11:00:03.578Z
Learning: In the CrateDB Guide documentation, locally stale references (such as `(reference-architectures)=`, `(domain)=`, and similar anchor labels) should be retained even when they appear unused within the repository, as external projects might still reference them. This ensures backward compatibility for external documentation and projects.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-10-10T17:58:26.545Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 395
File: docs/install/cloud/azure/terraform.md:108-127
Timestamp: 2025-10-10T17:58:26.545Z
Learning: In the crate/cratedb-guide repository, during documentation format conversions (e.g., from reStructuredText to MyST/Markdown), markdownlint warnings such as MD053 (unused reference-style links) should be ignored and not flagged as issues.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-08-23T15:15:18.717Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 235
File: docs/start/query/ai-integration.md:265-273
Timestamp: 2025-08-23T15:15:18.717Z
Learning: In CrateDB documentation, avoid using "data lake" terminology when describing CrateDB's role, as it can create misconceptions. Prefer "real-time analytics store" or similar terms that better reflect CrateDB's actual capabilities as a real-time analytics database.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-08-23T15:09:38.537Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 235
File: docs/start/index.md:1-3
Timestamp: 2025-08-23T15:09:38.537Z
Learning: In the CrateDB Guide documentation, the `(use)=` label in `docs/start/index.md` is intentionally placed alongside `(getting-started)=` as part of the documentation architecture, even though it may appear to conflict with top-level "Use" section labeling. This is a deliberate design decision by the maintainers.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-08-23T04:22:52.597Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 235
File: docs/start/going-further.md:20-38
Timestamp: 2025-08-23T04:22:52.597Z
Learning: In the CrateDB Guide repository, the prefix `crate-reference:` in documentation links (like `crate-reference:index`) is an intersphinx reference that points to external content in the CrateDB Reference Manual documentation. These references are intentionally not pointing to local files within the repository.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-10-30T23:12:30.165Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 465
File: docs/admin/troubleshooting/system-tables.md:6-6
Timestamp: 2025-10-30T23:12:30.165Z
Learning: In the cratedb-guide repository, documentation headers and titles should use sentence case (e.g., "Diagnostics with system tables") rather than title case (e.g., "Diagnostics with System Tables"). This style choice enables headers to be referenced more fluently within sentences.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-09-25T19:31:54.320Z
Learnt from: bmunkholm
Repo: crate/cratedb-guide PR: 340
File: docs/home/index.md:84-97
Timestamp: 2025-09-25T19:31:54.320Z
Learning: In the CrateDB Guide docs (MyST), the CrateDB Cloud card on the homepage should link to `getting-started` using `:link-type: ref` instead of the previous `cloud:index` intersphinx target. This change was implemented in PR #340 to direct users to the getting started section rather than directly to the Cloud documentation.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-08-09T21:40:46.154Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 238
File: docs/integrate/streamsets/index.md:18-21
Timestamp: 2025-08-09T21:40:46.154Z
Learning: In the CrateDB Guide documentation, reStructuredText files (`.rst`) use the RST label syntax `.. _label:` while MyST Markdown files (`.md`) use the syntax `(label)=`. Both formats are used throughout the repository and labels can be cross-referenced between them.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-09-22T22:05:05.734Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 279
File: docs/integrate/airflow/data-retention-hot-cold.md:14-14
Timestamp: 2025-09-22T22:05:05.734Z
Learning: When searching for labels in the CrateDB Guide documentation, remember that reStructuredText files (.rst) use the syntax `.. _label:` while MyST Markdown files (.md) use the syntax `(label)=`. Cross-references work between formats, so {ref}`label` in MyST can reference an RST label defined with `.. _label:`.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-10-19T19:21:49.864Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 420
File: docs/connect/erlang.md:100-113
Timestamp: 2025-10-19T19:21:49.864Z
Learning: In the cratedb-guide repository, when documenting cloud or alternative connection methods, show only the differences from the canonical example (e.g., SSL configuration) rather than repeating the entire workflow (query execution, cleanup, etc.). This pattern is used consistently across pages to avoid repetition.
Applied to files:
docs/feature/storage/index.md
📚 Learning: 2025-09-30T14:50:33.725Z
Learnt from: amotl
Repo: crate/cratedb-guide PR: 296
File: docs/feature/query/recurrent.md:36-36
Timestamp: 2025-09-30T14:50:33.725Z
Learning: In CrateDB, `_id` is an internal system column available on each indexed document in doc schema tables and can be retrieved by SELECT queries or used in RETURNING clauses. Reference: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/system-columns.html
Applied to files:
docs/feature/storage/index.md
🪛 markdownlint-cli2 (0.18.1)
docs/feature/storage/index.md
130-130: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
132-132: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
133-133: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🔇 Additional comments (3)
docs/feature/storage/index.md (3)
17-84: Data structures section is well-organized and technically sound.The reorganization from Lucene-centric layout to CrateDB-specific data structures is clear. The progression (document → inverted index → BKD tree → doc values → column store) flows logically, and the explanations address past review feedback (e.g., clarifying doc values as the basis for column store, scope of column store support). Cross-references to external documentation are appropriately placed.
85-164: Storage process section adds valuable context on lifecycle and maintenance.The new section effectively covers Lucene's storage mechanics (sharded, append-only, segment merges) and ties them to operational practicalities (TieredMergePolicy, manual OPTIMIZE, refresh intervals, idle shard behavior). The "tldr" framing works well for reference material. Past feedback (e.g., about reindexing on major versions, refresh behavior, OPTIMIZE/REFRESH commands) has been integrated. The footnote about table recreation on major upgrades is a helpful cross-reference.
165-182: Related sections link structure is clear.Renaming "See also" to "Related sections" and updating the introductory text to reference
indexing-and-storageand high-availability/consistency topics maintains good discoverability. External links (footnote to recreate tables, TieredMergePolicy) are well-positioned.
Suggestions by CodeRabbit.
f171c23 to
0077a6e
Compare
About
Preview
References