Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.NET] Avoid allocation and improve parsing time #344

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

obligaron
Copy link
Contributor

🤔 What's changed?

This PR optimises allocations and speed by

  • changing some types from class to struct
  • removed/deferred some string trimming in GherkinLine
  • change some yield return methods to explicit struct enumerators (no allocations & can be inlined)
  • replaced arrays with IEnumerable so that List can be used, this avoids re-creating and coping with explicitly sized arrays

Results before:

Method Runtime FeatureFile Mean Error StdDev Gen0 Gen1 Allocated
Parser .NET 8.0 tags.feature 23.01 μs 0.440 μs 0.556 μs 2.9297 0.1221 49.04 KB
ParserReuse .NET 8.0 tags.feature 15.49 μs 0.215 μs 0.201 μs 2.5024 0.0916 41.18 KB
Parser .NET Framework 4.8.1 tags.feature 31.99 μs 0.613 μs 0.656 μs 9.2773 0.4272 57.32 KB
ParserReuse .NET Framework 4.8.1 tags.feature 23.45 μs 0.105 μs 0.088 μs 7.3853 0.2747 45.45 KB
Parser .NET 8.0 very_long.feature 565.50 μs 5.492 μs 5.138 μs 95.7031 54.6875 1568.74 KB
ParserReuse .NET 8.0 very_long.feature 546.43 μs 10.785 μs 10.088 μs 94.7266 47.8516 1558.77 KB
Parser .NET Framework 4.8.1 very_long.feature 931.86 μs 5.545 μs 5.187 μs 261.7188 96.6797 1610.54 KB
ParserReuse .NET Framework 4.8.1 very_long.feature 896.85 μs 2.761 μs 2.447 μs 259.7656 100.5859 1596.55 KB

Results after:

Method Runtime FeatureFile Mean Error StdDev Gen0 Gen1 Allocated
Parser .NET 8.0 tags.feature 13.443 μs 0.0661 μs 0.0618 μs 1.5869 - 26.76 KB
ParserReuse .NET 8.0 tags.feature 7.682 μs 0.0559 μs 0.0466 μs 1.1520 0.0305 18.86 KB
Parser .NET Framework 4.8.1 tags.feature 20.678 μs 0.0940 μs 0.0879 μs 5.2185 0.2136 32.13 KB
ParserReuse .NET Framework 4.8.1 tags.feature 13.165 μs 0.0836 μs 0.0782 μs 3.2806 0.1068 20.21 KB
Parser .NET 8.0 very_long.feature 295.813 μs 3.1228 μs 2.6077 μs 49.8047 30.2734 816.38 KB
ParserReuse .NET 8.0 very_long.feature 288.504 μs 0.9828 μs 0.8713 μs 49.3164 25.3906 806.37 KB
Parser .NET Framework 4.8.1 very_long.feature 564.142 μs 3.3668 μs 3.1493 μs 138.6719 54.6875 853.5 KB
ParserReuse .NET Framework 4.8.1 very_long.feature 540.853 μs 3.3997 μs 3.1801 μs 135.7422 45.8984 839.47 KB

⚡️ What's your motivation?

This reduces parsing time by ~45% and allocations by ~45%.

🏷️ What kind of change is this?

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)
  • 💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

The removed IGherkinLine interface is used in the Reqnroll VisualStudio plugin (I haven't found other implementations on github). But the custom implementation HotfixLine is probably not needed anymore. At least the added regression tests will pass with the original IGherkinLine implementation from this repository.

📋 Checklist:

Copy link
Member

@gasparnagy gasparnagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impressive improvement, thx.
I made a few smaller comments.

On the HotfixLine in VS extension: I did some review and as far as I see, the only reason why that was done (stupid that I did not documented it) was that the original implementation handled the cell escape char (\) badly (see here): If there was a cell escape char it took the next char, but did not verify if the next char exists at all (not checking the result of MoveNext). This resulted in strange exceptions when users were in the process of typing in a Gherkin document and they were at the end of the file an d typed a \. But as you have noticed, we have a unit test for that case in the VS extension. In one of the recent changes though (probably in your prev refactoring), this "bug" has been fixed in the Gherkin parser anyway (see here), so indeed we will not need the hotfix line in the VS extension.

Maybe what we could do still is to take-over the 4 related unit tests from the VS extension to the Gherkin parser code. As it is now handled here, it is better to test it here. What do you think?

}

public IEnumerable<T> GetItems<T>(RuleType ruleType)
public readonly struct Enumerable<T> : IEnumerable<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is a generic type, but could we find a little bit more specific name, like ItemsEnumerable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name 🙂

}

public void Add<T>(RuleType ruleType, T obj)
public struct Enumerator<T> : IEnumerator<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to make the enumerator also public? (The Enumerable visibility I understand, as that is needed to avoid boxing.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make it public so that the C# compiler can duck-typing and use the struct Enumerable directly (use the type in the IL and don't cast to IEnumerable). This avoids boxing and any allocation of the enumerable.
If we return an IEnumerator, it's not possible (at least for the .NET Framework) to change this at JIT time.

The same optimisation also happens in the BCL for types like List<T>.

IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public struct TagsEnumerator : IEnumerator<GherkinLineSpan>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above, do we need to make this public? (Also TableCellsEnumerator)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@obligaron
Copy link
Contributor Author

obligaron commented Jan 14, 2025

Maybe what we could do still is to take-over the 4 related unit tests from the VS extension to the Gherkin parser code. As it is now handled here, it is better to test it here. What do you think?

I think this is a good idea. I tried to add the tests but other implementations (c and perl) handle the errors differently then the other implementations (including .NET).

Edit: I tried to add the test data and fix the other implementations, see #356. Note: The additional tests also pass with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants