Unify handling of constant and not constant chars during string concat by AlekseyTs · Pull Request #72086 · dotnet/roslyn

AlekseyTs · 2024-02-13T17:59:31Z

No description provided.

…ser-provided span-based `string.Concat` when possible

…on (dotnet#71890) Extracted from dotnet#71793 (suggested in dotnet#71793 (comment)) Currently if `object.ToString()` member is missing error is reported on the whole concatenation. It makes more sense to report in on the argument, which actually requested that member. E.g. consider concatenation `a + b + c` where `a` is a string and `b` and `c` are not. If `object.ToString()` is missing without a chage of this PR error locations will be for `b` - `a + b` and for `c` - the whole `a + b + c`. This PR chages it so that errors are reported directly on `b` and `c`. Added 2 tests to show this behavior for 3 and 4 concatenation arguments I know that this is a very minor change and I would not normally make it, but since such separation of changes was requested by a compiler team meber, here is it. The "fix" is just several charachters long. And regardless of whether this is accepted or not, a lot more tests of missing `object.ToString()` for concatenation are about to come in dotnet#71793

… `object.ToString()`

DoctorKrolic · 2024-02-13T19:07:38Z

src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs

                    preparedArgs.Add(receiver);
                    continue;
                }
+                else if (arg.ConstantValueOpt is { IsString: true, StringValue: [char c] })


I still don't understand why you want to unify lowering towards span-based approach instead of turning constant chars into strings and lower them using normal string.Concat. In fact, this has already worked that way before all that saga with spans started

I still don't understand why you want to unify lowering towards span-based approach instead of turning constant chars into strings and lower them using normal string.Concat

Because I do not think there is a good reason to deviate based on whether a char is a constant or not. If we think the span-based approach is good enough for non-constant chars, it should be good enough for constants as well. If it is not, then I am questioning why bother with span-based approach at all.

If it is not, then I am questioning why bother with span-based approach at all.

The primary benifit of span-based approach is avoiding intermediate allocation during char -> string transformation. As a tradeoff the size of IL increases. With constant char there is no intermediate allocation because we can do transformation at compile time. So by applying span-based approach there we still pay the increased IL size, but the primary benifit is no longer there.
I understand that since you demand this I'll have to deliver it. What I still don't get is why you prefer such little consistency so much over IL size savings (simetimes quite significant as in example below) + little runtime perf boost. I mean, aren't constants always somewhat special from the compiler's point of view: they can be inlined/folded/rearranged just purely based on the fact, that they are known values without evaluation side effects, so it is somewhat natural to expect that emit strategy can differ between non-constant and constant data?

I understand that since you demand this I'll have to deliver it.

That is not a demand, but a suggestion. And it is based on my personal opinion. Other team members might have different opinion. And you can have different opinion as well.

so it is somewhat natural to expect that emit strategy can differ between non-constant and constant data?

Yes, but each case is different.

How about the state at commit 48 as a compromise?

DoctorKrolic · 2024-02-13T19:25:17Z

src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs

        verifier.VerifyIL("Test.M2", """
            {
-              // Code size       14 (0xe)
+              // Code size       34 (0x22)


Here is a good concrete example (body of M2 is s + 'c'.ToString() + s + s). Code size increased from 14 to 34, which is 2.4 times increase. At the same time benchmarks show that this not only didn't improve perf, but actually made it worse:

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22621.3007/22H2/2022Update/SunValley2) AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores .NET SDK 8.0.200 [Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2 DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2 | Method | Length | Mean | Error | StdDev | Gen0 | Allocated | |-------------- |------- |----------:|---------:|---------:|-------:|----------:| | ConcatStrings | 1 | 11.13 ns | 0.073 ns | 0.065 ns | 0.0019 | 32 B | | ConcatSpans | 1 | 12.42 ns | 0.054 ns | 0.050 ns | 0.0019 | 32 B | | ConcatStrings | 1000 | 178.33 ns | 3.058 ns | 2.860 ns | 0.3600 | 6024 B | | ConcatSpans | 1000 | 184.59 ns | 3.651 ns | 3.237 ns | 0.3600 | 6024 B |

using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; BenchmarkRunner.Run<Benchmarks>(); [MemoryDiagnoser] public class Benchmarks { private string _s; [Params(1, 1000)] public int Length { get; set; } [GlobalSetup] public void Setup() => _s = new string('s', Length); [Benchmark] public string ConcatStrings() => M1(_s); [Benchmark] public string ConcatSpans() => M2(_s); private static string M1(string s) => string.Concat(s, "c", s, s); private static string M2(string s) { var c = 'c'; return string.Concat(s, new ReadOnlySpan<char>(in c), s, s); } }

As you can see, span-based approach is slightly worse for short string length and the gap is getting larger as this length increases, so it doesn't seem like statistical noise and span codegen is clearly less efficient than string one. If we go back to the original issue the purpose of span-based concat was to avoid an intermediate allocation of unknownChar.ToString(), which slowed things down and added preasure to the GC. This is not a problem if we lower constant char to a constant string since constant string literals are interned by the runtime, so they are not allocated every time code is executed. Therefore it seems wasteful for me to emit more IL to get worse runtime performance and add unnecessary work to the JIT (since more IL corresponds to more time, required to compile it)

DoctorKrolic and others added 30 commits January 11, 2024 11:50

Add well-known members

ce62431

Better null-annotate TryGetWellKnownTypeMember

ed3da0e

Optimize for 2 arguments

6958b61

Optimize for 3 arguments

ba7663d

Optimize for 4 arguments

9e10fe6

Test concat of 5 chars

db98d9f

Add missing grouping cases for concat of 4

bfb7388

Add missing tests for concat 2 arguments case

ce042d0

Extract span-based rewrite into common helper and try to merge with u…

05f8250

…ser-provided span-based `string.Concat` when possible

Merge branch 'dotnet:main' into concat-string-char-v2

95d901f

Handle missing object.ToString

a288d21

Handle constantChar.ToString() case

176f3e6

Fix tests on .NET Framework

e1766d3

Move assert to avoid nullability warning

e4e26f1

Rename

2c4bf14

Use array builder

70656d5

Use block body

da90a6b

Revert naming

f5f7efe

Feedback

e2ea475

Get member from compilation

404d7fc

Don't bother about user-made concat calls and leave them as is

083c380

Handle case when char doesn't override object.ToString()

1118dcf

Fix

2b11728

Track sequence locals

6c3423e

Early return

54765fa

Still produce span-based concat when char doesn't directly override…

199862e

… `object.ToString()`

Throw unreachable exception

91888a3

Simplify

50bf4e2

Drop

0f6302d

DoctorKrolic and others added 16 commits February 6, 2024 20:12

Simplify

db25bda

Inline helper back

53c529f

Use pooled hash set

201fd04

Use switch statement

49a865f

Use containing type

a3a58a6

Assert

1d3447c

Convert to foreach

3634b2f

Move into if

0b048b8

Free pooled object

a384b97

Remove assert

05a2e3c

Free

71ea57d

Handle ROS constructor ref kinds other than ref readonly or in

0cc529d

Support await scenarios

8d3a40b

Pass ref kind

04046af

Feedback

323b50a

Unify handling of constant and not constant chars during string concat

7a54762

ghost added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Feb 13, 2024

AlekseyTs mentioned this pull request Feb 13, 2024

Improve codegen for concatenation of string and char (attempt 2) #71793

Merged

DoctorKrolic reviewed Feb 13, 2024

View reviewed changes

AlekseyTs added 2 commits February 13, 2024 15:00

Fixup

0f6c8e4

Reduce amount of scenarios affected by unification

c694f10

AlekseyTs closed this Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify handling of constant and not constant chars during string concat#72086

Unify handling of constant and not constant chars during string concat#72086
AlekseyTs wants to merge 48 commits intodotnet:mainfrom
AlekseyTs:concat-string-char-v2

AlekseyTs commented Feb 13, 2024

Uh oh!

DoctorKrolic Feb 13, 2024

Uh oh!

AlekseyTs Feb 13, 2024 •

edited

Loading

Uh oh!

DoctorKrolic Feb 14, 2024

Uh oh!

AlekseyTs Feb 14, 2024

Uh oh!

AlekseyTs Feb 14, 2024

Uh oh!

AlekseyTs Feb 15, 2024

Uh oh!

DoctorKrolic Feb 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlekseyTs commented Feb 13, 2024

Uh oh!

DoctorKrolic Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

AlekseyTs Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DoctorKrolic Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

AlekseyTs Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

AlekseyTs Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

AlekseyTs Feb 15, 2024

Choose a reason for hiding this comment

Uh oh!

DoctorKrolic Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlekseyTs Feb 13, 2024 •

edited

Loading