-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[.NET] Avoid allocation and improve parsing time #344
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an impressive improvement, thx.
I made a few smaller comments.
On the HotfixLine
in VS extension: I did some review and as far as I see, the only reason why that was done (stupid that I did not documented it) was that the original implementation handled the cell escape char (\
) badly (see here): If there was a cell escape char it took the next char, but did not verify if the next char exists at all (not checking the result of MoveNext
). This resulted in strange exceptions when users were in the process of typing in a Gherkin document and they were at the end of the file an d typed a \
. But as you have noticed, we have a unit test for that case in the VS extension. In one of the recent changes though (probably in your prev refactoring), this "bug" has been fixed in the Gherkin parser anyway (see here), so indeed we will not need the hotfix line in the VS extension.
Maybe what we could do still is to take-over the 4 related unit tests from the VS extension to the Gherkin parser code. As it is now handled here, it is better to test it here. What do you think?
dotnet/Gherkin/AstNode.cs
Outdated
} | ||
|
||
public IEnumerable<T> GetItems<T>(RuleType ruleType) | ||
public readonly struct Enumerable<T> : IEnumerable<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is a generic type, but could we find a little bit more specific name, like ItemsEnumerable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name 🙂
dotnet/Gherkin/AstNode.cs
Outdated
} | ||
|
||
public void Add<T>(RuleType ruleType, T obj) | ||
public struct Enumerator<T> : IEnumerator<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to make the enumerator also public? (The Enumerable
visibility I understand, as that is needed to avoid boxing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make it public so that the C# compiler can duck-typing and use the struct Enumerable directly (use the type in the IL and don't cast to IEnumerable
). This avoids boxing and any allocation of the enumerable.
If we return an IEnumerator, it's not possible (at least for the .NET Framework) to change this at JIT time.
The same optimisation also happens in the BCL for types like List<T>
.
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); | ||
} | ||
|
||
public struct TagsEnumerator : IEnumerator<GherkinLineSpan> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, do we need to make this public? (Also TableCellsEnumerator
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
I think this is a good idea. I tried to add the tests but other implementations (c and perl) handle the errors differently then the other implementations (including .NET). Edit: I tried to add the test data and fix the other implementations, see #356. Note: The additional tests also pass with this PR. |
🤔 What's changed?
This PR optimises allocations and speed by
GherkinLine
IEnumerable
so thatList
can be used, this avoids re-creating and coping with explicitly sized arraysResults before:
Results after:
⚡️ What's your motivation?
This reduces parsing time by ~45% and allocations by ~45%.
🏷️ What kind of change is this?
♻️ Anything particular you want feedback on?
The removed
IGherkinLine
interface is used in the Reqnroll VisualStudio plugin (I haven't found other implementations on github). But the custom implementation HotfixLine is probably not needed anymore. At least the added regression tests will pass with the originalIGherkinLine
implementation from this repository.📋 Checklist: