change implementation so that Read overrides behave as expected#29
Merged
Conversation
Contributor
|
Looks good, let us know when its ready to merge! |
Contributor
Author
|
If it looks good to you, I say go ahead and merge. Most of the times TestGraphTokenizer tests now pass. In certain conditions we still encounter failures but they are not related to the MockReaderWrapper anymore. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Discovered this issue while running Lucene.Net.Analysis\TestGraphTokenizers tests. From time to time an assert would fail at this location:
https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.TestFramework/Analysis/BaseTokenStreamTestCase.cs#L964
The idea here is to randomly simulate an exception being thrown when token stream is being iterated. Instead of "evil exception" being thrown, the following assert would fail:
https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.TestFramework/Analysis/MockReaderWrapper.cs#L92
The underlying problem is the implementation of MockReaderWrapper in C#. It was not exactly a 1-to-1 map to the Java version due to framework differences and bug was introduced in MockReaderWrapper constructor:
https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.TestFramework/Analysis/MockReaderWrapper.cs#L43
StringReader's ReadToEnd() gets called advancing the iterator to the end. When the actual token stream iteration tries to read the stream, 0 is returned as the underlying text stream has been already iterated by the "ReadToEnd" call in the constructor. Furthermore, MockReaderWrapper should override Read() method as to make sure that all the possible ways that the reader is being used are covered (see https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.TestFramework/Analysis/MockTokenizer.cs#L242 how it can call Read with no params).
Also changed the way MockReader is initialized, instead of new StringReader, text string is passed to it directly. That seemed to make the most sense as there does not appear to be the need to create new stream reader on the same text just to extract that text again.
TestGraphTokenizers still occassionally fail but that appears to be due to another bug that gets invoked radomly and is yet to be determined what is the cause of it. Taking care of this "evil exception" branch gets us moving forward in that investigation.