-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9450: Use BinaryDocValue fields in the taxonomy index based on the existing index version #220
Conversation
BDV field with a different name Using BDV fields with a different "$full_path_binary$" name ensures that the earlier "$full_path$" StringField does not have the same name as the BDV field and hence they don't violate the field type consistency check (LUCENE-9334). This commit also enables the back-compat check that was disabled earlier.
the last index commit If the Lucene version was < 9 then use a StringField or else if the index is fresh or if the index is was built using a version >= 9, then use a BDV field.
2ea7f26
to
b9cbc4c
Compare
Changes in the new b9cbc4c commit:
I think the new commit might be slower that the previous Finally, I think there should be a cleaner way of knowing if the index has atleast one commit or no. I use the Side questions that need more thought:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how simple this is! I left a couple comments.
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
Outdated
Show resolved
Hide resolved
Also, to be clear, even though the opening comment says the PR implemented option 1, it has now iterated onto option 2 (switching based on the index created version metadata). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few small comments! I think this is close!
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
Outdated
Show resolved
Hide resolved
...ne/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestBackwardsCompatibility.java
Show resolved
Hide resolved
It's very expert. It's necessary if you have multiple workers creating indices that you then want to merge together using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks great @gautamworah96 -- thanks! I'll review and push soon.
I like this approach to back-compat (using the index created version) -- it gives a more consistent index than trying to blend in, segment by segment, the new changes.
OK I just merged this via git command-line, but apparently GitHub hasn't noticed. Thanks @gautamworah96 ! |
Category documents added in the Lucene 9.0 taxonomy index use a BDV field with a different name
Using BDV fields with a different "$full_path_binary$" name
ensures that the earlier "$full_path$" StringField does not have the same name as the
BDV field and hence they don't violate the field type consistency check
(LUCENE-9334).
This commit also enables the back-compat check that was disabled
earlier.
https://issues.apache.org/jira/browse/LUCENE-9450
Solution
There were two proposed solutions in the JIRA ticket:
When we were adding the BDV field with the same
Consts.FULL
name, it was causing ajava.lang.IllegalArgumentException: cannot change field "$full_path$" from doc values type=NONE to inconsistent doc values type=BINARY
error because the current logic checks all fields with the same name across segments and ensures that they use the same BinaryDocValues field TYPE.Adding the BDV field with a different name ensures that the check does not trip. We are careful here to use the same new name when trying to retrieve
values
in theDirectoryTaxonomyReader
This PR implements the approach described in step 1.
Tests
Enabled the back-compat test in
TestBackwardsCompatibility.testCreateNewTaxonomy
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.