-
Notifications
You must be signed in to change notification settings - Fork 2
Storage internals: Indexing and storage #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 28 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (5)
WalkthroughAdded a new indexing-and-storage guide and updated multiple docs to use card-based blocks, standardized Lucene terminology, and replace inline/external links with internal Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
matriv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, left 2 comments.
32f7040 to
274f983
Compare
cfced51 to
a53d709
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matriv: Thanks for the review. I just discovered your comment on the commit itself by accident, see below.
b510f54 to
a78c2e7
Compare
|
@coderabbitai summarized the procedure:
Thanks a stack for your contributions and reviews. /cc @marijaselakovic |
|
Tip For best results, initiate chat on the files or code changes. @amotl, you're very welcome! I'm glad the review was helpful. The storage internals article is a valuable addition to the guide—bringing that content into the documentation makes it more accessible and permanent. Great work on this PR! 🚀 |
933eb5b to
c0b9a89
Compare
c0b9a89 to
4d05a2c
Compare
amotl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By conducting another review cycle, I was able to come up with a few more suggestions.
4d05a2c to
e96ced2
Compare
386a5e2 to
12c4363
Compare
| data structure supports range queries on numeric values in CrateDB. | ||
|
|
||
| :Doc values: This data structure supports more efficient querying document | ||
| fields by id, performs column-oriented retrieval of data, and improves the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
querying document fields by id? is that correct @seut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, maybe using name instead of id would improve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, thanks. I've included the update into a13dfb2.
| within CrateDB: Inverted indexes for text values, BKD trees for numeric | ||
| values, and doc values. | ||
|
|
||
| :Inverted index: You will learn how inverted indexes are implemented in Lucene |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like here the You will learn phrasing, as this is more appropriate for a live presentation agenda slide.
Although this comes from Marija's context, I'd suggest to rephrase it:
Explain how inverted indexes are implemented in Lucene and how CrateDB uses them to index text values and enable fast text searches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I agree. Even if it's an article pulled in from another stage, we shouldn't stop curating it going forward.
I've used your suggestion 1:1, but after some little back and forth, just started using "understand" to begin the sentence, picked up from the definition item of the :BKD tree: term, as you didn't have any complaints about that, see a13dfb2.
Well, it's a repetition of words, but on the other hand, it's symmetric right now. ;]
matriv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, left 1 suggestion and a question.
This article effectively explains CrateDB's storage internals, covering inverted indexes, BKD trees, and doc values with clear explanations, practical examples, and helpful diagrams. Source: https://cratedb.com/blog/indexing-and-storage-in-cratedb
12c4363 to
a13dfb2
Compare
|
It looks like @coderabbitai had some final evaluations about this patch. Please object any of those items if you think different, including suggestions what can be improved. Thanks! |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 22070 🏁 Script executed: Length of output: 860 🏁 Script executed: Length of output: 720 @amotl, I've thoroughly validated all 10 evaluation points against the actual content, and I'm pleased to confirm they are all accurate. Here's my assessment: ✅ All evaluation points validated:
Optional minor enhancements (not objections):
Overall:
🧠 Learnings used |
This comment was marked as duplicate.
This comment was marked as duplicate.
- Mention the fast sorting and aggregations that the doc values mechanism provides. - Improve "doc values" section. - Trim "introduction" section. - Remove "summary" section. - Fix tables. - Various copy editing. Wording. Naming things. - Wrap images into white background for accompanying dark mode. - Implement suggestions from code review, also by CodeRabbit Co-authored-by: Marios Trivyzas <[email protected]>
fb0aaea to
059d7ec
Compare
|
Thanks again for your reviews. 🙇 |
About
The article Indexing and Storage in CrateDB should not be left behind on a platform that took a different direction about content types and style.
Preview
https://cratedb-guide--434.org.readthedocs.build/feature/storage/indexing-and-storage.html
/cc @hammerhead, @surister