Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overloads to string trimming #14386

Open
Tracked by #64603
chr4ss1 opened this issue Mar 25, 2015 · 67 comments
Open
Tracked by #64603

Add overloads to string trimming #14386

chr4ss1 opened this issue Mar 25, 2015 · 67 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@chr4ss1
Copy link
Contributor

chr4ss1 commented Mar 25, 2015

Updated proposal (new)

Edit May 16, 2022 by @GrabYourPitchforks. See #14386 (comment) for further discussion.

namespace System
{
    public static partial class MemoryExtensions
    {
        public static ReadOnlySpan<T> TrimIfStartsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
        public static ReadOnlySpan<char> TrimIfStartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
        public static Span<T> TrimIfStartsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
        public static Span<char> TrimIfStartsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);

        public static ReadOnlySpan<T> TrimIfEndsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
        public static ReadOnlySpan<char> TrimIfEndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
        public static Span<T> TrimIfEndsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
        public static Span<char> TrimIfEndsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);

        public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, T startValue, T endValue);
        public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
        public static ReadOnlySpan<char> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
        public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, T startValue, T endValue);
        public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
        public static Span<char> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
    }
}

/*
 * n.b. New APIs on CompareInfo are not strictly necessary, since there are existing IsPrefix and IsSuffix methods
 * which are suitable. These new APIs would be simple "call IsPrefix / IsSuffix then slice" wrappers. They would
 * be the only APIs on CompareInfo which perform manipulation (slicing) in addition to inspection.
 */
namespace System.Globalization
{
     public class CompareInfo
     {
        public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
        public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);

        public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
        public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
     }
}

Updated Proposal (old)

It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance.

Here are some references gleaned from the original issue.
http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing
http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string
http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string
http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter

Usage

There are only 2 overloads for each method, as shown in the following examples:

// Default overload
"http://foo.com".RemoveStart("http://").RemoveEnd(".com") == "foo"

// StringComparison
"http://foo.com".RemoveStart("HTTP://", StringComparison.OrdinalIgnoreCase) == "foo.com"
Background
Api review
Proposed API
namespace System
{
    public partial class String
    {
        /// <summary>
        /// Removes the specified prefix if it is found at the start of the string.
        /// Uses <see cref="StringComparison.Ordinal"/>.
        /// </summary>
        /// <param name="prefix">The prefix to remove.</param>
        public string RemoveStart(string value);

        /// <summary>
        /// Removes the specified prefix if it is found at the start of the string.
        /// Uses the specified <see cref="StringComparison"/>.
        /// </summary>
        /// <param name="suffix">The prefix to remove.</param>
        /// <param name="comparisonType">The string comparison method to use.</param>
        public string RemoveStart(string value, StringComparison comparisonType);

        /// <summary>
        /// Removes the specified suffix if it is found at the end of the string.
        /// Uses <see cref="StringComparison.Ordinal"/>.
        /// </summary>
        /// <param name="suffix">The suffix to remove.</param>
        public string RemoveEnd(string value);

        /// <summary>
        /// Removes the specified suffix if it is found at the end of the string.
        /// Uses the specified <see cref="StringComparison"/>.
        /// </summary>
        /// <param name="suffix">The suffix to remove.</param>
        /// <param name="comparisonType">The string comparison method to use.</param>
        public string RemoveEnd(string value, StringComparison comparisonType);
    }

    // Per @danmosemsft's suggestion
    public static partial class MemoryExtensions
    {
        // Implementation specializes T for byte and char
        public static ReadOnlySpan<T> RemoveStart(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
            where T : IEquatable<T>;
        public static ReadOnlySpan<T> RemoveEnd(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
            where T : IEquatable<T>;

        // .Fast
        public static ReadOnlySpan<char> RemoveStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
        public static ReadOnlySpan<char> RemoveEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
    }
}

Details, Decisions

(Some of these items from @tarekgh's feedback above)

Naming aligns with existing patterns StartsWith, EndsWith, TrimStart, TrimEnd
Decision: namespace System
Decision: No bool repeat overloads. The callsite can call this recursively (at the risk of more allocations).
Looking that linked StackOverflow questions it seems folks want the non-repeating behavior, i.e. "SIdId".RemoveSuffix("Id") should return "SId", not "S".
We don't want to introduce overloads for TrimEnd and TrimStart that have a non-repeating behavior because it's inconsistent with the existing ones.
At the same time, we feel the bool option is overkill and not very readable from the call site.
Questions
Should be instance methods on String (as opposed to extension methods)?
Should we add corresponding methods to TextInfo (or whatever they care called in globalization)?
If a null value for prefix or suffix is provided, should we throw or just return this?

Old Proposal

The proposal is to add new extension methods / overloads for string trimming,

As you know, right now, it's only possible to trim individual characters from a string, but I would like to trim suffixes & prefixes.

An example usage:

`"MyTest".TrimEnd("Test") == "My" == true`

I feel like they should've be there. It seems kind of weird to me, as I've implemented these by myself in the past few times, and am sure there are quite few people who miss this & would find it useful:

http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing
http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string
http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string
http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter

Now the following statement is not true, but if it would be, it would describe how I am feeling: "ooh hey, we offer you string replace method which can replace individual chars, but not the whole strings. It's not too hard to craft your own one, give it a try!"

The following applies to TrimEnd / TrimStart, but the overloads would be 1:1, so I will discuss only TrimEnd.

First overload:

public string TrimEnd(string suffix)

Behaviour: trim the suffix, case-sensitive, and only once. The comparision is done the same way, as it would be for string.Replace.

"MyTestTest".TrimEnd("Test") == "MyTest" == true

Second overload:

public string TrimEnd(string suffix, StringComparison comparison)

Works as the first one, but allows you to explictly tell how to compare.

"MyTESTtest".TrimEnd("test", StringComparison.InvariantCultureIgnoreCase) == "MyTEST" == true

Third overload(s):

I am not sure if these are needed, but I wanted to throw this out here anyway:

public string TrimEnd(string suffix, bool trimRepeatedly) 
public string TrimEnd(string suffix, bool trimRepeatedly StringComparison comparison) 
"MyTestTEST".TrimEnd("test", true, StringComparison.InvariantCultureIgnoreCase) == "My" == true

This proposal has nothing to do with string.Trim(), as it would be ambigious. "tetet".Trim("tet") == ???

Namespace: System
Type: System.String
Assembly: System.Runtime.dll

I'd be willing to work on this :3

@KrzysztofCwalina
Copy link
Member

I think this would be a good addition.
If we could make these to be simply overloads of the existing methods, maybe it would be better to have these directly on System.String. The only problem is that if we add them directly to System.String, we need to version the System.Runtime.dll contract.
Either way, I think it would be a great topic to discuss at the next API review.

@KrzysztofCwalina
Copy link
Member

Chris, could you please add the following details to this proposal: namespace, type, assembly for these APIs. Once you have it, I will mark it as ready for review.

@ellismg
Copy link
Contributor

ellismg commented Mar 26, 2015

Would also be interesting to understand if we need IgoreCase variants and if they are ordinal or linguistic comparisons.

@chr4ss1
Copy link
Contributor Author

chr4ss1 commented Mar 27, 2015

I've updated the post to clarify & put the details.

@KrzysztofCwalina
Copy link
Member

Thanks! I marked it as ready for API review.

@terrajobst
Copy link
Member

terrajobst commented Apr 29, 2015

We reviewed this issue today. We don't believe it's quite ready yet. Please see the notes for more details.

@joshfree
Copy link
Member

changing state "ready for api review" => "needs more info" per @terrajobst remarks above.

@juliusfriedman
Copy link
Contributor

@KrzysztofCwalina, @svick

Might we also take the time to address scenarios with character escaping such as the following example:

"\a\f\b\t\a\f\b a b v".TrimStart(' ', '\a', '\f', '\t') => "\b\t\a\f\b a b v"

@karelz
Copy link
Member

karelz commented Nov 18, 2016

@terrajobst the link above doesn't work. What was the work needed here?
link above updated, feedback copy-pasted below by @tarekgh (thanks!)

@tarekgh
Copy link
Member

tarekgh commented Nov 18, 2016

API review feedback

  • It seems we should provide functionality that removes suffixes and prefixes
  • In particular, looking that linked StackOverflow questions it seems folks want the non-repeating behavior, i.e. "SIdId".RemoveSuffix("Id") should return "SId", not "S"
  • We don't want to introduce overloads for TrimEnd and TrimStart that have a non-repeating behavior because it's inconsistent with the existing ones.
  • At the same time, we feel the bool option is overkill and not very readable from the call site.
  • We think it's best to provide a separate set of methods but the naming is not clear. We should check what other platforms call them.
    • RemoveSuffix(), RemoveEnd()
    • RemovePrefix(), RemoveStart()
  • Should be instance methods on String (as opposed to extension methods)
  • Should we add corresponding methods to TextInfo (or whatever they care called in globalization)?

@tarekgh
Copy link
Member

tarekgh commented Nov 18, 2016

@joperezr please talk to me if you are going to do any work here

@karelz
Copy link
Member

karelz commented Nov 18, 2016

We need formal API proposal (taking into account the above feedback)

@danmoseley
Copy link
Member

@ChrisEelmaa do you have interest in updating the proposal here?

@chr4ss1
Copy link
Contributor Author

chr4ss1 commented Dec 27, 2016

@danmosemsft yes, let me catch up with the feedback

@danmoseley
Copy link
Member

@ChrisEelmaa just came across this PR, wondered whether you still had interest.

@chr4ss1
Copy link
Contributor Author

chr4ss1 commented Apr 16, 2017

@danmosemsft I've learned not to promise anything that I can't keep, been quite busy lately, but I'll definitely do it at one point unless someone else has done it :P

@joperezr
Copy link
Member

joperezr commented May 1, 2017

@ChrisEelmaa sounds good! let us know in case you do want to take it so that we can assign the issue and provide any help if needed.

@tdinucci
Copy link
Member

I would be happy to take this issue on, but it seems to have stalled. Has there ever been a decision on naming, etc?

I've been doing some digging just to get my head around things. I think I've a handle on everything apart from the unit tests. I know I can't just go and submit a PR today but my thinking is that I'd work off CoreClr (mscorlib/System.Private.CoreLib.sln) and this would then get automatically merged across to CoreFx. However, CoreClr has it's own set of String tests as does CoreFx. What happens here?

@joperezr
Copy link
Member

First, we need to update the proposal and have it be approved before any implementation work starts. That is what @ChrisEelmaa meant about going on the provided feedback and updating it. Once that is done and assuming that the Api is approved, then the norm is to add the API to coreclr and send out a PR and then add the tests on a separate PR to corefx. We will then review the coreclr PR and make sure that with your corefx one you have enough test coverage so that we can feel comfortable merging the implementation. After that, we would merge both PRs.

@tarekgh
Copy link
Member

tarekgh commented Mar 26, 2018

We need to address the design feedback first https://github.com/dotnet/apireviews/blob/267a7c25e418335b8aaf45c466d90425a9a944e4/2015/04-28-misc/README.md#1244-add-overloads-to-string-trimming and then update the proposal with the latest and run it one more time with the design committee.

@tdinucci
Copy link
Member

@joperezr, @tarekgh

Thanks for clarification on the tests. Understood in terms finalising the design, I'll pick up once this has been done.

@grant-d
Copy link
Contributor

grant-d commented Sep 17, 2018

[Picked up from up-for-grabs]

It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance.

Here are some references gleaned from the original issue.
http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing
http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string
http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string
http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter

Usage

There are only 2 overloads for each method, as shown in the following examples:

// Default overload
"http://foo.com".RemoveStart("http://").RemoveEnd(".com") == "foo"

// StringComparison
"http://foo.com".RemoveStart("HTTP://", StringComparison.OrdinalIgnoreCase) == "foo.com"

Background

Proposed API

namespace System
{
    public partial class String
    {
        /// <summary>
        /// Removes the specified prefix if it is found at the start of the string.
        /// Uses <see cref="StringComparison.Ordinal"/>.
        /// </summary>
        /// <param name="prefix">The prefix to remove.</param>
        public string RemoveStart(string value);

        /// <summary>
        /// Removes the specified prefix if it is found at the start of the string.
        /// Uses the specified <see cref="StringComparison"/>.
        /// </summary>
        /// <param name="suffix">The prefix to remove.</param>
        /// <param name="comparisonType">The string comparison method to use.</param>
        public string RemoveStart(string value, StringComparison comparisonType);

        /// <summary>
        /// Removes the specified suffix if it is found at the end of the string.
        /// Uses <see cref="StringComparison.Ordinal"/>.
        /// </summary>
        /// <param name="suffix">The suffix to remove.</param>
        public string RemoveEnd(string value);

        /// <summary>
        /// Removes the specified suffix if it is found at the end of the string.
        /// Uses the specified <see cref="StringComparison"/>.
        /// </summary>
        /// <param name="suffix">The suffix to remove.</param>
        /// <param name="comparisonType">The string comparison method to use.</param>
        public string RemoveEnd(string value, StringComparison comparisonType);
    }

    // Per @danmosemsft's suggestion
    public static partial class MemoryExtensions
    {
        // Implementation specializes T for byte and char
        public static ReadOnlySpan<T> RemoveStart(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
            where T : IEquatable<T>;
        public static ReadOnlySpan<T> RemoveEnd(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
            where T : IEquatable<T>;

        // .Fast
        public static ReadOnlySpan<char> RemoveStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
        public static ReadOnlySpan<char> RemoveEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
    }
}

Details, Decisions

(Some of these items from @tarekgh's feedback above)

  • Naming aligns with existing patterns StartsWith, EndsWith, TrimStart, TrimEnd
  • Decision: namespace System
  • Decision: No bool repeat overloads. The callsite can call this recursively (at the risk of more allocations).
    • Looking that linked StackOverflow questions it seems folks want the non-repeating behavior, i.e. "SIdId".RemoveSuffix("Id") should return "SId", not "S".
    • We don't want to introduce overloads for TrimEnd and TrimStart that have a non-repeating behavior because it's inconsistent with the existing ones.
    • At the same time, we feel the bool option is overkill and not very readable from the call site.

Questions

  • Should be instance methods on String (as opposed to extension methods)?
  • Should we add corresponding methods to TextInfo (or whatever they care called in globalization)?
  • If a null value for prefix or suffix is provided, should we throw or just return this?

POC

This is a relatively simple addition, POC code below:

/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
public string RemovePrefix(string prefix)
    => RemovePrefix(prefix, StringComparison.Ordinal); // Or maybe its own body, depending on perf

/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemovePrefix(string prefix, StringComparison comparisonType)
{
    if (prefix == null)
        throw new ArgumentNullException(nameof(prefix));

    // Alternative to the above guard
    if (suffix == null)
        return this;

    if (prefix.Length == 0)
        return this;

    int len = this.Length - prefix.Length;
    if (len < 0)
        return this;

    // main body elided - @tarekgh will advise implementation
}

/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
public string RemoveSuffix(string suffix)
    => RemoveSuffix(suffix, StringComparison.Ordinal); // Or maybe its own body, depending on perf

/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveSuffix(string suffix, StringComparison comparisonType)
{
    if (suffix == null)
        throw new ArgumentNullException(nameof(suffix));

    // Alternative to the above guard
    if (suffix == null)
        return this;

    if (suffix.Length == 0)
        return this;

    int len = this.Length - suffix.Length;
    if (len < 0)
        return this;

    // main body elided - @tarekgh will advise implementation
}

@grant-d
Copy link
Contributor

grant-d commented Sep 17, 2018

Copying @danmosemsft's feedback over from dupe issue:

It's unfortunate this allocates a temporary string: "http://foo.com".TrimPrefix("http://").TrimSuffix(".com") but perhaps it's not common to do both.
Since this original issue was discussed, Span has become available. Should these be added to the extension methods:
https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L85

@grant-d
Copy link
Contributor

grant-d commented Sep 17, 2018

Span has become available. Should these be added to the extension methods

[Edited]
OK, added such overloads to MemoryExtensions, including the Span...StringComparison methods.

@grant-d
Copy link
Contributor

grant-d commented Sep 19, 2018

Updated spec:
Changed naming to align with existing patterns StartsWith, EndsWith, TrimStart, TrimEnd
(now RemoveStart, RemoveEnd. Was RemovePrefix, RemoveSuffix)

@tarekgh
Copy link
Member

tarekgh commented Aug 25, 2020

@jnm2 sorry we missed this in 5.0 but I moved the milestone to 6.0 at least we'll address it in the next release.

The good news is you can easily have a workaround. Fortunately, @GrabYourPitchforks got #27935 which can help you in writing extension method for now as a workaround till we get this in 6.0.

@tannergooding tannergooding modified the milestones: 6.0.0, 7.0.0 Jul 12, 2021
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Aug 1, 2021
@tarekgh
Copy link
Member

tarekgh commented Aug 3, 2021

We are revisiting this proposal for two reasons:

  • Moving forward we need to avoid adding ay linguistic operations to the string class and instead all linguistic operations should be in CompareInfo only. This will reduce the confusion for the users to know the operation is performed as ordinal or as linguistic.
  • Some naming in the proposal was not reflected the operations that the API is doing.

Here is the new proposal we can look at and discuss.

namespace System
{
    public partial class String
    {
        public string TrimStart(string value, bool ignoreCase = false);
        public string TrimEnd(string value, bool ignoreCase = false);
    }

    public static partial class MemoryExtensions
    {
        public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, bool ignoreCase = false);
        public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, bool ignoreCase = false);
    }
}


namespace System.Globalization
{
     public class CompareInfo
     {
        public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
        public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);

        public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
        public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
     }
}

@tarekgh tarekgh added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels Aug 3, 2021
@GrabYourPitchforks
Copy link
Member

Tarek's latest proposal should also make implementation easier, as we can ditch the giant switch statement in TrimStart and TrimEnd and keep all the complex globalization logic contained within CompareInfo.

Downside is that it means string.StartsWith and string.TrimStart have different behavior, but a proper solution for this is really under the jurisdiction of dotnet/designs#207.

@tarekgh
Copy link
Member

tarekgh commented Aug 8, 2021

@GSPP do you have more input why you don't like the proposal?

@En3Tho
Copy link
Contributor

En3Tho commented Aug 8, 2021

While I'm not @GSPP I can say that this is just adding confusion to the String class. It will be harder for newcomers to find and reason about CompareInfo and honestly, I'm in .Net like 4 years and I have never used it because most useful String overloads take StringComparison parameter.

Also, I don't really like the fact of String methods being all around the place: some take StringComparison, some don't. Again, it feels very unnatural for a newcomer. Like Contains and StartsWith take it, but TrimStart don't. It feels unnatural to me too. Like an overload zoo which is not consistent.

While I can understand that you want to actually clear this confusion, I believe it is impossible with current String class and I feel like it's better to leave it as newcomer friendly as possible. Yeah, it forces you to pass explicit parameter, it might not be that friendly but at least it should be consistent.

I believe it's only possible with new classes, maybe specific ones like Utf16String, Utf8String but those are out of question I guess.

@GSPP
Copy link

GSPP commented Aug 8, 2021

@En3Tho yeah, this is my thinking. This is adding fragmentation to the API surface and discoverability is not good. I have personally very rarely touched these globalization classes. It was rarely necessary and they seemed a bit arcane to me.

Handling a CompareInfo class, one of the more arcane bits of the framework, just to trim a string seems very heavy weight. It's a learnability issue. Try teaching a 7 day C# programmer what a CompareInfo is...

Even placing char algorithms in MemoryExtensions has been, IMHO, not the right choice. Text processing is not "memory processing". Memory means raw bytes and not (possible case insensitive) text processing.

Of course, now that MemoryExtensions is already part of the framework, I guess there is no choice but to carry forward that decision for consistency.

Now string is widely considered to be a bit weird with respect to culture sensitivity. This is a .NET 1.0 mistake. But still, new functionality for strings belongs there. Or, it normally belongs on a StringExtensions class.

There's also a usability issue with the ignoreCase parameter. Which of the six StringComparison modes does this correspond to? I have a hunch but it's really hard to figure out. I would now need to study this proposal to find out.

To say concretely what I would do: I would make all new APIs require a StringComparison parameter. This should have been the .NET 1.0 way and we can do it now. I find from my practical experience that I never know what an API without that argument is going to do. Just provide that parameter. It's not that bad.

@tarekgh
Copy link
Member

tarekgh commented Aug 9, 2021

It will be harder for newcomers to find and reason about CompareInfo and honestly,
While I can understand that you want to actually clear this confusion, I believe it is impossible with current String class and I feel like it's better to leave it as newcomer friendly as possible. Yeah, it forces you to pass explicit parameter, it might not be that friendly but at least it should be consistent.

We are seeing the newcomers mostly interested in the ordinal behavior and not the linguistic behavior. Currently, the newcomers get confused when they use the string class and get linguistic behavior which they don't understand and that is main source of confusion. Switching the default behavior to ordinal can also create confusion. I would recommend you review dotnet/designs#207 and have your input there. This proposal trying to address the issues you are raising here. If you have better ideas to add there, we welcome that.

I'm in .Net like 4 years and I have never used it because most useful String overloads take StringComparison parameter.

What options you used to pass with StringComparison? I am curious to learn more about your scenario.

Also, I don't really like the fact of String methods being all around the place: some take StringComparison, some don't. Again, it feels very unnatural for a newcomer. Like Contains and StartsWith take it, but TrimStart don't. It feels unnatural to me too. Like an overload zoo which is not consistent.

We already have such cases and that is what we are trying to address to reduce that moving forward.

this is my thinking. This is adding fragmentation to the API surface and discoverability is not good. I have personally very rarely touched these globalization classes. It was rarely necessary and they seemed a bit arcane to me.
Handling a CompareInfo class, one of the more arcane bits of the framework, just to trim a string seems very heavy weight. It's a learnability issue. Try teaching a 7 day C# programmer what a CompareInfo is...

Mostly, only a few sectors of users who need the linguistic functionality (e.g. UI developers). From our experience, most of the users expect the ordinal behavior. That is why we are trying to not inject more linguistic functionality to the string class. I am, not really seeing a big deal for anyone want the linguistic functionality to use CompareInfo instead. Most of such users already using CultureInfo and other Globalization classes. I am curious to learn more about your scenarios when you use linguistic functionality.

There's also a usability issue with the ignoreCase parameter. Which of the six StringComparison modes does this correspond to? I have a hunch but it's really hard to figure out. I would now need to study this proposal to find out.

In long term if we try to hide the linguistic functionality from the string class, this issue wouldn't be a concern.

I want to clarify here; I am not trying to reject any feedback. I am trying to collect more data to decide about the future direction in general. If you didn't add your comments on the proposal dotnet/designs#207, I would encourage you to take some time review it and add your feedback there.

@GSPP
Copy link

GSPP commented Aug 11, 2021

In long term if we try to hide the linguistic functionality from the string class, this issue wouldn't be a concern.

I think this would be a grave mistake. This is not how real-world development works... People don't want do deal with complex APIs simply to perform linguistic string processing. I really do not intend any disrespect but this strong focus on ordinal APIs appears to be a bit "ivory tower" to me. (A little joking: It appears to be the Java way... 😏).

I find the StringComparison approach to be entirely adequate to real-world development scenarios. I personally do it that way. I always specify it and that solves the problem. The code looks nice.

I just made a comment (#43956 (comment)) that is pertinent to this discussion as well.

@GrabYourPitchforks
Copy link
Member

@GSPP Thanks for the feedback. We often see devs use StringComparison in an attempt to do the right thing as well. The problem that we've seen in practice is that devs will very often use StringComparison.InvariantCulture[IgnoreCase], which is almost never the correct comparer for any reasonable scenario. Over the code we've scanned on GitHub and other forums, the number of times we saw InvariantCulture used compared to Ordinal was alarming and indicated to us that this is a significant pit of failure for developers. That's one of the reasons we're very hesitant to add any such overloads to new APIs on string, regardless of the ultimate fate of the existing API surface.

FWIW, I speculated a bit on why I think devs are drawn to InvariantCulture[IgnoreCase]. If that reasoning proves correct, then we can use that information to figure out a pit-of-success design for APIs like those proposed here.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented May 5, 2022

Hashed this out a bit with @tarekgh. We think the below might be a viable proposal for .NET 7.

namespace System
{
    public static partial class MemoryExtensions
    {
        public static ReadOnlySpan<T> TrimIfStartsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
        public static ReadOnlySpan<char> TrimIfStartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
        public static Span<T> TrimIfStartsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
        public static Span<char> TrimIfStartsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);

        public static ReadOnlySpan<T> TrimIfEndsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
        public static ReadOnlySpan<char> TrimIfEndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
        public static Span<T> TrimIfEndsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
        public static Span<char> TrimIfEndsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);

        public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, T startValue, T endValue);
        public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
        public static ReadOnlySpan<char> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
        public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, T startValue, T endValue);
        public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
        public static Span<char> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
    }
}

// APIs on CompareInfo are optional.
// They can be implemented via existing IsPrefix / IsSuffix methods.
namespace System.Globalization
{
     public class CompareInfo
     {
        public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
        public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);

        public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
        public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
     }
}

It differs from the earlier proposal in that it's focused on spans for the time being, which allows callers to use the generic versions on more data types.

ReadOnlySpan<char> input1 = "Hello there!";
input1 = input1.TrimIfStartsWith(input1, "Hello");
Console.WriteLine(input1); // " there!"

ReadOnlySpan<char> input2 = "Hi there!";
input2 = input2.TrimIfStartsWith(input2, "Hello");
Console.WriteLine(input2); // "Hi there!"

// using new "u8" literal syntax!
ReadOnlySpan<byte> input3 = "https://example.com/"u8;
input3 = input3.TrimIfStartsWith(input3, "https://"u8);
Console.WriteLine(Encoding.UTF8.GetString(input3)); // "example.com/"

// Removing surrounding parentheses.
ReadOnlySpan<char> input4 = "(Hello world!)";
input4 = input4.TrimIfSurroundedBy('(', ')');
Console.WriteLine(input4); // "Hello world!" (no parens)

The behavior for the generic method is Ordinal. For the T = char methods, you can specify a StringComparison parameter, matching the pattern of most other T = char methods on MemoryExtensions.

We already have methods called MemoryExtensions.TrimStart and MemoryExtensions.TrimEnd, which both implement "trim any" semantics. That is, TrimStart(..., "xyz") trims any of the characters x, y, and z any number of times from the start of the string. This is not the same behavior we'd have for the methods being proposed here, which is why it seemed ill-advised to make them overloads.

To work around this, we propose the method names TrimIfStartsWith and TrimIfEndsWith. This makes it clear that the logical equivalent of a StartsWith / EndsWith check is taking place, and if that call returns a match, then the trim takes place.

The methods hanging off of CompareInfo are named TrimPrefix and TrimSuffix to match the other *Prefix and *Suffix methods on the type. We didn't want to bring these names forward onto MemoryExtensions directly since we don't commonly use these terms outside the CompareInfo class.

The CompareInfo class would accept both string and span inputs, matching the pattern of the other methods on the type.

There are no methods hanging directly off of string. We can add those later once details about how we want to expose culture information on the string type are hashed out and what the default options would be. In the meantime, if you need the equivalent of a string-returning string.TrimIfStartsWith(value) method, you can use CultureInfo.InvariantCulture.CompareInfo.TrimPrefix(value, CompareOptions.Ordinal).

The extension methods themselves would be utilize the new CompareInfo.IsPrefix and CompareInfo.IsSuffix overloads we added in .NET 5.

The TrimIfSurroundedBy method is a bit of a special case since it wraps several operations together. It performs an IsPrefix check against the starting value and trims, then it performs an IsSuffix check against the ending value and trims. If both the prefix and the suffix check succeeded, then the final trimmed span is returned. If either the prefix or the suffix check does not succeed, then this is considered "no match" and the original span is returned. The culture-aware behavior of trimming between checking IsPrefix and IsSuffix is to avoid the situation where the start and end values might overlap within the target span.

Further discussion

I've seen a few cases where callers want to know if any prefix / suffix was matched before trimming, since they'll want to perform some extra operation. For example:

ReadOnlySpan<char> originalInput = GetSomeInput();
var trimmedInput = originalInput.TrimIfSurroundedBy('\"', '\"');
DoSomeOperation();
if (wasOriginallySurroundedByQuotes)
  ReintroduceQuotesToOutput();

For these cases, rather than complicate the APIs proposed here with out bool wasTrimmed arguments, I'd suggest the caller just check whether the original span length differes from the trimmed span length. If the lengths are different, a trimming operation was performed. (n.b. This check is incorrect for linguistic trimming, which might report zero-length matches, but I suspect linguistic trimming will be very much an edge case scenario.)

In MemoryExtensions, all methods are Ordinal by default unless the caller specifies a custom StringComparison. This includes APIs like StartsWith and EndsWith.

However, the APIs string.StartsWith and string.EndsWith use CurrentCulture by default, meaning the two lines below can produce different results:

bool startsWith1 = myString.StartsWith("Hello");
bool startsWith2 = myString.AsSpan().StartsWith("Hello");

If we do end up adding these methods directly to string, we should be mindful of the possibility for confusion when people call these APIs. It would be confusing for developers if string.StartsWith returns false (because it's using a CurrentCulture comparison by default) but string.TrimIfStartsWith strips data anyway (because it's using Ordinal comparison by default).

@GrabYourPitchforks GrabYourPitchforks added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels May 17, 2022
@bartonjs
Copy link
Member

bartonjs commented Jun 9, 2022

  • The name "Trim" currently implies (potentially) multiple removal, these new ones are single, so should not use "Trim"
  • We should use consistent wording between CompareInfo and MemoryExtensions
  • Together, the previous two bullet suggest "RemovePrefix" instead of "TrimIfStartsWith".
  • Is returning a Span-like type the right shape here? If so, we need 4 of each method, not 2 (Span, ROSpan, Memory, ROMemory)
    • One way to reduce the overload explosion is to return System.Range
  • Consider consistency between when the prefix/suffix/etc has to be a span-like, vs a single item

@bartonjs bartonjs added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 9, 2022
@dakersnar dakersnar modified the milestones: 7.0.0, Future Aug 8, 2022
@Mr0N
Copy link

Mr0N commented Nov 3, 2022

This is the second proposal of the method, these two questions are not connected in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.