Skip to content

Conversation

@romseygeek
Copy link
Contributor

The match interval builder analyses input text and converts it to an IntervalSource, and as such
may generate token streams with stopwords. This commit deals with these by using the extend
factory to cover the gaps produced by these stopwords so that phrase and ordered queries work
correctly.

@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v7.2.0 labels Mar 4, 2019
@romseygeek romseygeek self-assigned this Mar 4, 2019
@romseygeek romseygeek requested a review from jimczi March 4, 2019 12:09
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romseygeek , I left some questions/comments

if (spaces >= 0) {
if (synonyms.size() == 1) {
terms.add(synonyms.get(0));
terms.add(extend(synonyms.get(0), spaces));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synonyms contains the word at the position before the gap so the extend should be applied forward ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies backwards because we're using PositionIncrement to detect when there's a gap preceding us in the TokenStream. If you've got a posInc of 2, that means theres a gap before you in the stream, so you need to extend backwards 1 to cover it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually no, it's a bug - I added the test you suggested and it fails, have updated.

}
else if (synonyms.size() > 1) {
terms.add(Intervals.or(synonyms.toArray(new IntervalsSource[0])));
terms.add(extend(Intervals.or(synonyms.toArray(new IntervalsSource[0])), spaces));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you should use extend(0, spaces) and not extend(spaces, 0) ?

Intervals.extend(Intervals.term("term5"), 1, 0)
);
assertEquals(expected, source);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test with a simple word synonym that starts after a gap ?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, should it be considered as a bug fix and merged to 7.0 ?

@romseygeek
Copy link
Contributor Author

should it be considered as a bug fix and merged to 7.0

+1, we'll generate incorrect queries otherwise

@romseygeek romseygeek merged commit 317a80f into elastic:master Mar 5, 2019
@romseygeek romseygeek deleted the interval-spaces branch March 5, 2019 10:31
romseygeek added a commit that referenced this pull request Mar 5, 2019
The match interval builder analyses input text and converts it to an IntervalSource, and as such
may generate token streams with stopwords. This commit deals with these by using the extend
factory to cover the gaps produced by these stopwords so that phrase and ordered queries work
correctly.
romseygeek added a commit that referenced this pull request Mar 5, 2019
The match interval builder analyses input text and converts it to an IntervalSource, and as such
may generate token streams with stopwords. This commit deals with these by using the extend
factory to cover the gaps produced by these stopwords so that phrase and ordered queries work
correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories v7.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants