Skip to content

Unify handling of constant and not constant chars during string concat#72086

Closed
AlekseyTs wants to merge 48 commits intodotnet:mainfrom
AlekseyTs:concat-string-char-v2
Closed

Unify handling of constant and not constant chars during string concat#72086
AlekseyTs wants to merge 48 commits intodotnet:mainfrom
AlekseyTs:concat-string-char-v2

Conversation

@AlekseyTs
Copy link
Contributor

No description provided.

DoctorKrolic and others added 30 commits January 11, 2024 11:50
…ser-provided span-based `string.Concat` when possible
…on (dotnet#71890)

Extracted from dotnet#71793 (suggested in dotnet#71793 (comment))

Currently if `object.ToString()` member is missing error is reported on the whole concatenation. It makes more sense to report in on the argument, which actually requested that member.

E.g. consider concatenation `a + b + c` where `a` is a string and `b` and `c` are not. If `object.ToString()` is missing without a chage of this PR error locations will be for `b` - `a + b` and for `c` - the whole `a + b + c`. This PR chages it so that errors are reported directly on `b` and `c`. Added 2 tests to show this behavior for 3 and 4 concatenation arguments

I know that this is a very minor change and I would not normally make it, but since such separation of changes was requested by a compiler team meber, here is it. The "fix" is just several charachters long. And regardless of whether this is accepted or not, a lot more tests of missing `object.ToString()` for concatenation are about to come in dotnet#71793
@ghost ghost added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Feb 13, 2024
preparedArgs.Add(receiver);
continue;
}
else if (arg.ConstantValueOpt is { IsString: true, StringValue: [char c] })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why you want to unify lowering towards span-based approach instead of turning constant chars into strings and lower them using normal string.Concat. In fact, this has already worked that way before all that saga with spans started

Copy link
Contributor Author

@AlekseyTs AlekseyTs Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why you want to unify lowering towards span-based approach instead of turning constant chars into strings and lower them using normal string.Concat

Because I do not think there is a good reason to deviate based on whether a char is a constant or not. If we think the span-based approach is good enough for non-constant chars, it should be good enough for constants as well. If it is not, then I am questioning why bother with span-based approach at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not, then I am questioning why bother with span-based approach at all.

The primary benifit of span-based approach is avoiding intermediate allocation during char -> string transformation. As a tradeoff the size of IL increases. With constant char there is no intermediate allocation because we can do transformation at compile time. So by applying span-based approach there we still pay the increased IL size, but the primary benifit is no longer there.
I understand that since you demand this I'll have to deliver it. What I still don't get is why you prefer such little consistency so much over IL size savings (simetimes quite significant as in example below) + little runtime perf boost. I mean, aren't constants always somewhat special from the compiler's point of view: they can be inlined/folded/rearranged just purely based on the fact, that they are known values without evaluation side effects, so it is somewhat natural to expect that emit strategy can differ between non-constant and constant data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that since you demand this I'll have to deliver it.

That is not a demand, but a suggestion. And it is based on my personal opinion. Other team members might have different opinion. And you can have different opinion as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it is somewhat natural to expect that emit strategy can differ between non-constant and constant data?

Yes, but each case is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the state at commit 48 as a compromise?

verifier.VerifyIL("Test.M2", """
{
// Code size 14 (0xe)
// Code size 34 (0x22)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a good concrete example (body of M2 is s + 'c'.ToString() + s + s). Code size increased from 14 to 34, which is 2.4 times increase. At the same time benchmarks show that this not only didn't improve perf, but actually made it worse:

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22621.3007/22H2/2022Update/SunValley2)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2


| Method        | Length | Mean      | Error    | StdDev   | Gen0   | Allocated |
|-------------- |------- |----------:|---------:|---------:|-------:|----------:|
| ConcatStrings | 1      |  11.13 ns | 0.073 ns | 0.065 ns | 0.0019 |      32 B |
| ConcatSpans   | 1      |  12.42 ns | 0.054 ns | 0.050 ns | 0.0019 |      32 B |
| ConcatStrings | 1000   | 178.33 ns | 3.058 ns | 2.860 ns | 0.3600 |    6024 B |
| ConcatSpans   | 1000   | 184.59 ns | 3.651 ns | 3.237 ns | 0.3600 |    6024 B |
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();

[MemoryDiagnoser]
public class Benchmarks
{
    private string _s;

    [Params(1, 1000)]
    public int Length { get; set; }

    [GlobalSetup]
    public void Setup() => _s = new string('s', Length);

    [Benchmark]
    public string ConcatStrings() => M1(_s);

    [Benchmark]
    public string ConcatSpans() => M2(_s);

    private static string M1(string s) => string.Concat(s, "c", s, s);

    private static string M2(string s)
    {
        var c = 'c';
        return string.Concat(s, new ReadOnlySpan<char>(in c), s, s);
    }
}

As you can see, span-based approach is slightly worse for short string length and the gap is getting larger as this length increases, so it doesn't seem like statistical noise and span codegen is clearly less efficient than string one. If we go back to the original issue the purpose of span-based concat was to avoid an intermediate allocation of unknownChar.ToString(), which slowed things down and added preasure to the GC. This is not a problem if we lower constant char to a constant string since constant string literals are interned by the runtime, so they are not allocated every time code is executed. Therefore it seems wasteful for me to emit more IL to get worse runtime performance and add unnecessary work to the JIT (since more IL corresponds to more time, required to compile it)

@AlekseyTs AlekseyTs closed this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants