-
-
Notifications
You must be signed in to change notification settings - Fork 114
fix: #3722 - Optimize LSM Vector Index fallback scan to target specific bucket instead of entire type #3775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1136,21 +1136,21 @@ private void buildGraphFromScratchWithRetry(final GraphBuildCallback graphCallba | |||||||||||||
| // If pages have corrupted entries (e.g., old-format tombstones), the parser may miss many vectors. | ||||||||||||||
| // In that case, fall back to scanning documents directly to rebuild the vector list. | ||||||||||||||
| boolean documentScanPerformed = false; | ||||||||||||||
| final String typeName = getTypeName(); | ||||||||||||||
| if (typeName != null && !ridToLatestVector.isEmpty()) { | ||||||||||||||
| if (metadata.associatedBucketId != -1 && !ridToLatestVector.isEmpty()) { | ||||||||||||||
| try { | ||||||||||||||
| final long docCount = database.countType(typeName, false); | ||||||||||||||
| final com.arcadedb.engine.Bucket bucket = database.getSchema().getBucketById(metadata.associatedBucketId); | ||||||||||||||
| final long docCount = database.countBucket(bucket.getName()); | ||||||||||||||
| if (ridToLatestVector.size() < docCount * 8 / 10) { | ||||||||||||||
|
Comment on lines
+1141
to
1143
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using
Suggested change
Comment on lines
+1139
to
1143
|
||||||||||||||
| LogManager.instance().log(this, Level.WARNING, | ||||||||||||||
| "Page-parsed vectors (%d) significantly less than document count (%d) for index %s. " | ||||||||||||||
| + "Falling back to document scan to recover missing vectors.", | ||||||||||||||
| ridToLatestVector.size(), docCount, indexName); | ||||||||||||||
|
|
||||||||||||||
| // Scan all documents to find vectors missing from the page-parsed set | ||||||||||||||
| // Scan all documents in the bucket to find vectors missing from the page-parsed set | ||||||||||||||
| final String vectorProp = | ||||||||||||||
| metadata.propertyNames != null && !metadata.propertyNames.isEmpty() ? metadata.propertyNames.getFirst() : | ||||||||||||||
| "vector"; | ||||||||||||||
| database.scanType(typeName, false, record -> { | ||||||||||||||
| database.scanBucket(bucket.getName(), record -> { | ||||||||||||||
| final Document doc = (Document) record; | ||||||||||||||
| final RID rid = doc.getIdentity(); | ||||||||||||||
| if (!ridToLatestVector.containsKey(rid)) { | ||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition
!ridToLatestVector.isEmpty()prevents the fallback mechanism from triggering if the page parser fails to recover any vectors at all (e.g., due to severe corruption). Since thedocCountcheck already handles the case where the bucket is empty, this extra check is unnecessary and prevents recovery in cases of total page corruption.