Skip to content

Conversation

@IDisposable
Copy link
Contributor

@IDisposable IDisposable commented Mar 28, 2023

As noted in #84003, there are four places the same mask is computed in this project,. We can't eliminate the one in RegexGenerator.Emitter or its matching unit test because that's outputting the actual C# source string.

In the actual RegexCharClass.cs file changed the name WordCategoriesMask to avoid a conflict with existing member and added as a private constant. I also changed the Emitter and matching unit test to match for neatness.

Removed the detritus in the comments from the copy-paste from Emitter source (trailing " and ,) and fixed the indentation error in the IsWordChar method.

There are four places the same mask is computed in this project, can't eliminate the one in RegexGenerator.Emitter or its matching unit test because that's outputting the actual C# source string.
Changed the name `WordCategoriesMask` to avoid a conflict with  existing member.
Removed the detritus in the comments from the copy-paste from Emitter source (trailing `"` and `,`)
@ghost ghost added area-System.Text.RegularExpressions community-contribution Indicates that the PR has been added by a community member labels Mar 28, 2023
@ghost
Copy link

ghost commented Mar 28, 2023

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

There are four places the same mask is computed in this project, can't eliminate the one in RegexGenerator.Emitter or its matching unit test because that's outputting the actual C# source string.

Changed the name WordCategoriesMask to avoid a conflict with existing member and added as a private constant. I changed the Emitter and matching unit test to match for neatness.

Removed the detritus in the comments from the copy-paste from Emitter source (trailing " and ,) and fixed the indentation error in the IsWordChar method.

Author: IDisposable
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

…egularExpressions/RegexCharClass.cs

Co-authored-by: Stephen Toub <[email protected]>
@IDisposable IDisposable marked this pull request as ready for review March 28, 2023 19:59
@IDisposable
Copy link
Contributor Author

IDisposable commented Mar 28, 2023

A follow-up couple of questions:

  • Should we update the emitted code version of IsBoundaryWordChar to inline the IsWordChar body like is done in RegexCharClass? It currently emits IsWordChar(ch) || (ch == '\\u200C' | ch == '\\u200D');
  • In the emitted version, we're asking for $"[MethodImpl(MethodImplOptions.AggressiveInlining)]"... would we eventually want to do the same on the RegexCharClass versions of either or both methods?

@stephentoub
Copy link
Member

Should we update the emitted code version of IsBoundaryWordChar to inline the IsWordChar body like is done in RegexCharClass? It currently emits IsWordChar(ch) || (ch == '\u200C' | ch == '\u200D');

We did it this way to avoid code bloat with the code emitted into every assembly that uses it. If we hear of an issue with this, we can revisit, but I'm not aware of any. ECMA word boundaries are rare.

In the emitted version, we're asking for $"[MethodImpl(MethodImplOptions.AggressiveInlining)]"... would we eventually want to do the same on the RegexCharClass versions of either or both methods?

Only if it proves impactful.

@stephentoub stephentoub merged commit 3e59d14 into dotnet:main Mar 29, 2023
@IDisposable IDisposable deleted the coalesce-constant branch April 6, 2023 01:39
@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Text.RegularExpressions community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants