C#: Cleanup and sync StringExtensions with core #67031

raulsntos · 2022-10-07T14:36:47Z

I started with the intention of creating separate commits and maybe creating separate PRs but I kind of gave up half way because all of these changes end up being related to each other in one way or another and I don't think it was useful to keep them separated but I can extract some of the simpler less-controversial changes into smaller PRs if that's preferred.

Changes

Moved GetBaseName to keep methods alphabetically sorted.
Removed Length, users should just use the Length property.
Removed Insert, string already has a method with the same signature that takes precedence.
Removed ToLower and ToUpper, string already has methods with the same signature that take precedence.
Deprecated BeginsWith in favor of string.StartsWith.
Removed EndsWith, string already has a method with the same signature that take precedence.
Removed FindLast in favor of RFind (Remove String::find_last (same as rfind) #40092).
Replaced RFind and RFindN implementation with a call to string.LastIndexOf to avoid marshaling (this fixes a bug caused by using the wrong parameter in the NativeFuncs call).
~~Added LPad and RPad ([Complex Text Layouts] Refactor String to use UTF-32 encoding. #40999)~~.
Added StripEscapes (Fix and expose String::strip_escapes(), use it in LineEdit paste #29347).
Replaced LStrip and RStrip implementation with a call to string.TrimStart and string.TrimEnd.
Deprecated LStrip and RStrip in favor of string.TrimStart and string.TrimEnd.
Added TrimPrefix and TrimSuffix (Add string trim_prefix, trim_suffix, lstrip and rstrip methods #18176).
Removed Erase (String: Remove erase method, bindings can't mutate String #54869 and Remove String::erase method declaration #64714).
~~Renamed OrdAt to UnicodeAt (Renamed String.ord_at to unicode_at #43790)~~.
Removes OrdAt/UnicodeAt in favor of the string indexer.
Added IsValidFileName (a20235a).
Added IsValidHexNumber (Bind is_valid_hex_number string method to GDScript #24586).
Renamed IsValidInteger to IsValidInt (Rename is_valid_integer() to is_valid_int() #49659).
Added support for IPv6 to IsValidIPAddress (Adding IPv6 support #6925).
Added ValidateNodeName (Relaxes node name sanitization in gltf documents. #45545).
Updated the documentation of the IsValid* methods.
Replaced MD5Buffer, MD5Text, SHA256Buffer and SHA256Text implementation to use the System.Security.Cryptography classes and avoid marshaling.
Added SHA1Buffer and SHA1Text (Use wslay as a WebSocket library #30263).
Renamed ToUTF8 to ToUTF8Buffer (Refactored binding system for core types #42780).
Renamed ToAscii to ToASCIIBuffer (Refactored binding system for core types #42780).
Added ToUTF16Buffer and ToUTF32Buffer ([Complex Text Layouts] Refactor String to use UTF-32 encoding. #40999 and Refactored binding system for core types #42780).
Added GetStringFromUTF16 and GetStringFromUTF32 ([Complex Text Layouts] Refactor String to use UTF-32 encoding. #40999).
Added Dedent (Added String::dedent() to remove text indentation #12025).
Added Indent (Make --doctool locale aware #55930).
Added CountN (also moved the caseSensitive parameter in Count to follow the same pattern used by other methods that have an alt N method where the caseSensitive parameter is at the end) (Added String.count method #25090).
- All methods that end with N could be removed since their "overloads" that don't end with N already have an optional caseSensitive parameter (this seems to be @neikeq's preference as well, see Added String.count method #25090 (comment)) but since we already have many methods with an N "overload" I decided to follow the established pattern.

Methods to consider removing

Adding extension methods can pollute the type so we should consider if the methods we add are really useful or necessary. Many of the existing methods don't add much and their behavior can be achieved with a similar one-liner using the methods provided by the BCL. Often we add methods that already exist in the BCL with a different name, this means IntelliSense will show multiple methods with very similar names and that can be confusing to users. I think it would be better to avoid providing those methods and instead recommend users to use the existing APIs which would also benefit from existing analyzers provided by Microsoft or third-party libraries that understand the existing BCL APIs and can warn users of potential incorrect or non-optimal usage.

UnicodeAt since it's just a wrapper over the indexer string[int index].
BeginsWith and EndsWith since they're just a wrapper over string.StartsWith and string.EndsWith.
- It can be confusing for users when there are multiple methods with a very similar name.
- Also, EndsWith has the same signature as string.EndsWith and the instance method takes precedence so it's already unused.
Split since it's just a wrapper over string.Split.
Substr since it's just a wrapper over string.Substring.
Hash since users should probably use string.GetHashCode instead.
Find, FindN, RFind and RFindN since they're just wrappers over string.IndexOf and string.LastIndexOf.
CasecmpTo, NocasecmpTo and CompareTo, users should probably use string.Equals and string.Compare both of which take a StringComparison parameter that allows specifying the culture and case sensitivity.
LPad and RPad since they are just wrappers over string.PadLeft and string.PadRight.
LStrip and RStrip since they are just wrappers over string.TrimStart and string.TrimEnd.
TrimPrefix and TrimSuffix in a future .NET version where RemovePrefix and RemoveSuffix may be added (Add overloads to string trimming dotnet/runtime#14386).
IsValidFloat and IsValidInt since they're just wrappers over float.TryParse and int.TryParse.
ToUTF8Buffer, ToUTF16Buffer, ToUTF32Buffer, ToASCIIBuffer, GetStringFromUTF8, GetStringFromUTF16, GetStringFromUTF32 and GetStringFromASCII since they are just wrappers over System.Text.Encoding.
SHA1Buffer, SHA1Text, SHA256Buffer, SHA256Text, MD5Buffer and MD5Text since they are just wrappers over System.Security.Cryptography classes (and using our methods hides CA5350 and CA5351).

Methods not added

These methods are exposed in GDScript but don't currently exist in StringExtensions and haven't been added/exposed in this PR, we could add them in a future PR if we consider them useful.

humanize_size because it's a static method and it takes an int. (Bind the String::humanize_size method #32546).
get_slice, get_slice_count and get_slicec.
- GetSliceCount is already implemented because it's used in Capitalize but not exposed.
- GetSliceCharacter implements get_slicec because it's used in Capitalize but it's not exposed.
num, num_int64, num_scientific and num_uint64 because users should use ToString and/or IFormattable.
naturalnocasecmp_to
rsplit because we also don't really implement split with the same behavior as GDScript.
repeat because users should probably use the string constructor or StringBuilder.

neikeq

Regarding names like UTF8, I would prefer to use PascalCase as I explain here. UPPERCASE case makes names harder to read when used as part of PascalCase identifier.

This seems to be one of the cases where the .NET API is inconsistent. The properties in the Encoding class use UPPERCASE, while other methods seem to use PascalCase (e.g., Char.ConvertToUtf32 and Char.IsAscii).

neikeq · 2022-11-03T17:34:37Z

modules/mono/glue/GodotSharp/GodotSharp/Core/StringExtensions.cs

-            // TODO: Could be more efficient if we get a char version of `IndexOf`.
-            // See https://github.com/dotnet/runtime/issues/44116
-            return instance.IndexOf(what.ToString(), from,
-                caseSensitive ? StringComparison.Ordinal : StringComparison.OrdinalIgnoreCase);


We could do something like this:

if (caseSensitive) return instance.IndexOf(what, from); // Ordinal, case sensitive return CultureInfo.InvariantCulture.CompareInfo.IndexOf(instance, what, from, CompareOptions.OrdinalIgnoreCase);

modules/mono/glue/GodotSharp/GodotSharp/Core/StringExtensions.cs

neikeq · 2022-11-03T17:44:42Z

modules/mono/glue/GodotSharp/GodotSharp/Core/StringExtensions.cs

-            NativeFuncs.godotsharp_string_md5_buffer(instanceStr, out var md5Buffer);
-            using (md5Buffer)
-                return Marshaling.ConvertNativePackedByteArrayToSystemArray(md5Buffer);
+#pragma warning disable CA5351 // Do Not Use Broken Cryptographic Algorithms


Would be great if our method could be annotated in some way so that callers would get this warning, but it doesn't seem to be possible :(

neikeq · 2022-11-03T18:24:32Z

Regarding the removal of methods. I think we should have some kind of table that could be used as reference to look for the equivalent if the method is removed.

The removal of the following methods would be harmless:

UnicodeAt
BeginsWith and EndsWith
LPad and RPad
LStrip and RStrip

I have some comments about the other suggestions:

Split since it's just a wrapper over string.Split.

string.Split would be more verbose for removing empty entries, and may not be as easy to figure out:

str.Split(",", allowEmpty: false);

str.Split(",", StringSplitOptions.RemoveEmptyEntries);

Substr since it's just a wrapper over string.Substring.

Doesn't seem to be just a wrapper, unless our implementation is doing unnecessary things.

Hash since users should probably use string.GetHashCode instead.

The purpose of this method is to return the same hash as the GDScript hash function. Not sure if it has any uses, though.

Find, FindN, RFind and RFindN since they're just wrappers over string.IndexOf and string.LastIndexOf.

Just like with Split, it's more verbose. Plus, in this case, it's too easy to forget to use StringComparison.Ordinal. When omitted, it uses CurrentCulture.

CasecmpTo, NocasecmpTo and CompareTo, users should probably use string.Equals and string.Compare both of which take a StringComparison parameter that allows specifying the culture and case sensitivity.

I agree about CasecmpTo and NocasecmpTo. But I disagree with removing CompareTo for the same reason regarding StringComparison and CurrentCulture.

IsValidFloat and IsValidInt since they're just wrappers over float.TryParse and int.TryParse.

Slightly agree. TryParse is very well-known, but the need for out _ is kinda ugly.

ToUTF8Buffer, ToUTF16Buffer, ToUTF32Buffer, ToASCIIBuffer, GetStringFromUTF8, GetStringFromUTF16, GetStringFromUTF32 and GetStringFromASCII since they are just wrappers over System.Text.Encoding.

SHA1Buffer, SHA1Text, SHA256Buffer, SHA256Text, MD5Buffer and MD5Text since they are just wrappers over System.Security.Cryptography classes (and using our methods hides CA5350 and CA5351).

Our methods are less verbose, and it would be harder for a user to know about the replacement if they were removed, unless they are already very familiar with the .NET API. I agree that hiding these warnings is bad.

raulsntos · 2022-11-03T21:01:14Z

Regarding names like UTF8, I would prefer to use PascalCase

Agreed, but for this PR I'm following the current naming, since I thought naming changes would grow the scope of this PR too much, but I can rename them if you prefer.

Regarding the removal of methods. I think we should have some kind of table that could be used as reference to look for the equivalent if the method is removed.

I think that's a good idea, should I add it to https://docs.godotengine.org/en/stable/tutorials/scripting/c_sharp/c_sharp_differences.html?

string.Split would be more verbose for removing empty entries, and may not be as easy to figure out

Yes, I'm not sure how common that is though. Would it be enough to add documentation about this?

Substr since it's just a wrapper over string.Substring

Doesn't seem to be just a wrapper, unless our implementation is doing unnecessary things.

Our implementation just clamps the length to the length of the array to avoid ArgumentOutOfRangeException but I think that could easily be done by the user (e.g.: myString.Substring(0, Math.Max(10, myString.Length)) although it is more verbose.

Hash since users should probably use string.GetHashCode instead.

The purpose of this method is to return the same hash as the GDScript hash function. Not sure if it has any uses, though.

I don't know, it doesn't seem useful to me. Would love to see usages of this.

Just like with Split, it's more verbose. Plus, in this case, it's too easy to forget to use StringComparison.Ordinal. When omitted, it uses CurrentCulture.

This is reported by the .NET analyzers (see CA1304 and/or CA1307).

Also, I personally think the R and N preffix and suffix is difficult to understand and not very idiomatic for C# where it's common to avoid abbreviations.

TryParse is very well-known, but the need for out _ is kinda ugly.

I guess it is, I personally don't have an issue with it. Wouldn't it be more common for users to want the parsed value anyway?

ToUTF8Buffer, ToUTF16Buffer, ...

Our methods are less verbose, and it would be harder for a user to know about the replacement if they were removed, unless they are already very familiar with the .NET API. I agree that hiding these warnings is bad.

I agree that these ones are more difficult to find for a less-experienced .NET user, adding documentation could help here though and I'd prefer if users learned about existing .NET APIs that they can use for other non-Godot .NET applications.

akien-mga · 2022-11-25T14:08:58Z

Bump, would be good to find consensus and merge.

raulsntos · 2022-11-25T14:24:54Z

Just to be clear, the discussion is about methods that were not touched by this PR, I tried to keep this PR uncontroversial so it could be merged quickly since it breaks compatibility so I'd want to get this merged as early as possible during the betas.

However, I can update the PR to include the changes that we've already agreed on in the discussion:

The removal of the following methods would be harmless:

UnicodeAt

BeginsWith and EndsWith

LPad and RPad

LStrip and RStrip

And:

Regarding names like UTF8, I would prefer to use PascalCase

neikeq · 2022-11-25T16:11:47Z

The only pending change is my suggestion for Find(char). I would prefer not to remove that method.

The UTF8 word case is not required to merge this, especially considering some of these were already named that way. It's something I would like to change in the future though.

Lastly, it's fine if you want to include the extra methods that can be trivially removed. But I'm wondering if it would be better to mark them as obsolete (with the error flag, instead of a warning). That could be helpful for users upgrading from 3.x. Then we remove them entirely in 4.1 or later.
EDIT: Or how about marking them obsolete with a warning in 3.x, and removing them in 4.0?

- Moved `GetBaseName` to keep methods alphabetically sorted. - Removed `Length`, users should just use the Length property. - Removed `Insert`, string already has a method with the same signature that takes precedence. - Removed `Erase`. - Removed `ToLower` and `ToUpper`, string already has methods with the same signature that take precedence. - Removed `FindLast` in favor of `RFind`. - Replaced `RFind` and `RFindN` implemenation with a ca ll to `string.LastIndexOf` to avoid marshaling. - Added `LPad` and `RPad`. - Added `StripEscapes`. - Replaced `LStrip` and `RStrip` implementation with a call to `string.TrimStart` and `string.TrimEnd`. - Added `TrimPrefix` and `TrimSuffix`. - Renamed `OrdAt` to `UnicodeAt`. - Added `CountN` and move the `caseSensitive` parameter of `Count` to the end. - Added `Indent` and `Dedent`.

- Renamed `IsValidInteger` to `IsValidInt`. - Added `IsValidFileName`. - Added `IsValidHexNumber`. - Added support for IPv6 to `IsValidIPAddress`. - Added `ValidateNodeName`. - Updated the documentation of the `IsValid*` methods.

- Replaced `MD5Buffer`, `MD5Text`, `SHA256Buffer` and `SHA256Text` implementation to use the `System.Security.Cryptography` classes and avoid marshaling. - Added `SHA1Buffer` and `SHA1Text`. - Renamed `ToUTF8` to `ToUTF8Buffer`. - Renamed `ToAscii` to `ToASCIIBuffer`. - Added `ToUTF16Buffer` and `ToUTF32Buffer`. - Added `GetStringFromUTF16` and `GetStringFromUTF32`.

raulsntos · 2022-11-25T16:31:18Z

Rebased and added Find(char) back.

Or how about marking them obsolete with a warning in 3.x, and removing them in 4.0?

I'm fine with this option but I'm worried about users that have already upgraded to 4.0 and may miss the deprecation notice, although maybe that's to be expected of using beta software.

neikeq · 2022-11-26T03:40:29Z

Or how about marking them obsolete with a warning in 3.x, and removing them in 4.0?

I'm fine with this option but I'm worried about users that have already upgraded to 4.0 and may miss the deprecation notice, although maybe that's to be expected of using beta software.

I'm fine with either of the two options, or even a mix of both. It doesn't really do any harm.

- Removed `UnicodeAt` - Removed `EndsWith` - Removed `LPad` and `RPad` - Deprecated `BeginsWith` in favor of `string.StartsWith` - Deprecated `LStrip` and `RStrip` in favor of `string.TrimStart` and `string.TrimEnd`

raulsntos · 2022-11-27T22:28:37Z

I have added a commit that:

Removes:
- UnicodeAt: I think it's unlikely users would be using this over the indexer, also they would have been using OrdAt which is the old name.
- EndsWith: Since a method with the same signature already exists in string and since instance methods take precedence over extension methods it's unlikely users would have been using it.
- LPad and RPad: Since these methods were added in this PR so nobody could be using them.
Deprecates:
- BeginsWith in favor of string.StartsWith.
- LStrip and RStrip in favor of string.TrimStart and string.TrimEnd.

akien-mga · 2022-11-28T07:23:47Z

Thanks!

raulsntos added bug enhancement topic:dotnet breaks compat labels Oct 7, 2022

raulsntos added this to the 4.0 milestone Oct 7, 2022

raulsntos requested a review from a team as a code owner October 7, 2022 14:36

raulsntos force-pushed the dotnet/string-extensions branch from f6e5e96 to 5abd16d Compare October 31, 2022 12:52

neikeq reviewed Nov 3, 2022

View reviewed changes

raulsntos force-pushed the dotnet/string-extensions branch from 5abd16d to 19ccfa8 Compare November 25, 2022 16:30

raulsntos added 3 commits November 25, 2022 17:30

C#: Cleanup and sync IsValid* StringExtensions with core

d9c495f

- Renamed `IsValidInteger` to `IsValidInt`. - Added `IsValidFileName`. - Added `IsValidHexNumber`. - Added support for IPv6 to `IsValidIPAddress`. - Added `ValidateNodeName`. - Updated the documentation of the `IsValid*` methods.

raulsntos force-pushed the dotnet/string-extensions branch from 19ccfa8 to d0b166d Compare November 25, 2022 16:30

C#: Remove/deprecate unnecessary string extensions

dc2ceef

- Removed `UnicodeAt` - Removed `EndsWith` - Removed `LPad` and `RPad` - Deprecated `BeginsWith` in favor of `string.StartsWith` - Deprecated `LStrip` and `RStrip` in favor of `string.TrimStart` and `string.TrimEnd`

neikeq approved these changes Nov 28, 2022

View reviewed changes

akien-mga merged commit 8253c28 into godotengine:master Nov 28, 2022

raulsntos deleted the dotnet/string-extensions branch November 28, 2022 17:21

raulsntos mentioned this pull request Nov 28, 2022

[3.x] C#: Deprecate string extensions that will be removed in 4.x #69304

Merged

raulsntos mentioned this pull request Jan 27, 2023

C#: Remove obsolete StringExtensions methods #72182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C#: Cleanup and sync StringExtensions with core #67031

C#: Cleanup and sync StringExtensions with core #67031

raulsntos commented Oct 7, 2022 •

edited

Loading

neikeq left a comment

neikeq Nov 3, 2022

neikeq Nov 3, 2022

neikeq commented Nov 3, 2022 •

edited

Loading

raulsntos commented Nov 3, 2022

akien-mga commented Nov 25, 2022

raulsntos commented Nov 25, 2022

neikeq commented Nov 25, 2022 •

edited

Loading

raulsntos commented Nov 25, 2022

neikeq commented Nov 26, 2022

raulsntos commented Nov 27, 2022

akien-mga commented Nov 28, 2022

C#: Cleanup and sync StringExtensions with core #67031

C#: Cleanup and sync StringExtensions with core #67031

Conversation

raulsntos commented Oct 7, 2022 • edited Loading

Changes

Methods to consider removing

Methods not added

neikeq left a comment

Choose a reason for hiding this comment

neikeq Nov 3, 2022

Choose a reason for hiding this comment

neikeq Nov 3, 2022

Choose a reason for hiding this comment

neikeq commented Nov 3, 2022 • edited Loading

raulsntos commented Nov 3, 2022

akien-mga commented Nov 25, 2022

raulsntos commented Nov 25, 2022

neikeq commented Nov 25, 2022 • edited Loading

raulsntos commented Nov 25, 2022

neikeq commented Nov 26, 2022

raulsntos commented Nov 27, 2022

akien-mga commented Nov 28, 2022

raulsntos commented Oct 7, 2022 •

edited

Loading

neikeq commented Nov 3, 2022 •

edited

Loading

neikeq commented Nov 25, 2022 •

edited

Loading