-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add stopword support to IntervalBuilder #39637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search |
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romseygeek , I left some questions/comments
| if (spaces >= 0) { | ||
| if (synonyms.size() == 1) { | ||
| terms.add(synonyms.get(0)); | ||
| terms.add(extend(synonyms.get(0), spaces)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The synonyms contains the word at the position before the gap so the extend should be applied forward ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies backwards because we're using PositionIncrement to detect when there's a gap preceding us in the TokenStream. If you've got a posInc of 2, that means theres a gap before you in the stream, so you need to extend backwards 1 to cover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually no, it's a bug - I added the test you suggested and it fails, have updated.
| } | ||
| else if (synonyms.size() > 1) { | ||
| terms.add(Intervals.or(synonyms.toArray(new IntervalsSource[0]))); | ||
| terms.add(extend(Intervals.or(synonyms.toArray(new IntervalsSource[0])), spaces)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, you should use extend(0, spaces) and not extend(spaces, 0) ?
| Intervals.extend(Intervals.term("term5"), 1, 0) | ||
| ); | ||
| assertEquals(expected, source); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test with a simple word synonym that starts after a gap ?
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, should it be considered as a bug fix and merged to 7.0 ?
+1, we'll generate incorrect queries otherwise |
The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.
The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.
The
matchinterval builder analyses input text and converts it to anIntervalSource, and as suchmay generate token streams with stopwords. This commit deals with these by using the
extendfactory to cover the gaps produced by these stopwords so that phrase and ordered queries work
correctly.