Skip to content

Improve codegen for concatenation of string and char#70971

Closed
DoctorKrolic wants to merge 37 commits intodotnet:mainfrom
DoctorKrolic:concat-string-char
Closed

Improve codegen for concatenation of string and char#70971
DoctorKrolic wants to merge 37 commits intodotnet:mainfrom
DoctorKrolic:concat-string-char

Conversation

@DoctorKrolic
Copy link
Contributor

Closes: #66827

@DoctorKrolic DoctorKrolic requested a review from a team as a code owner November 27, 2023 20:15
@ghost ghost added Community The pull request was submitted by a contributor who is not a Microsoft employee. Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 27, 2023
@jaredpar jaredpar added this to the 17.9 milestone Nov 27, 2023
@jaredpar jaredpar removed the untriaged Issues and PRs which have not yet been triaged by a lead label Nov 27, 2023
@jaredpar
Copy link
Member

@333fred, @jjonescz PTAL

@333fred
Copy link
Member

333fred commented Nov 27, 2023

@DoctorKrolic it looks like there are a number of failing tests, and the bootstrap compiler is broken as well.

@jaredpar
Copy link
Member

Here is the failing stack trace from the bootstrap build

Stack Trace
   at Microsoft.CodeAnalysis.CommandLine.ExitingTraceListener.Exit(String originalMessage)
   at Microsoft.CodeAnalysis.CommandLine.ExitingTraceListener.WriteLine(String message)
   at System.Diagnostics.TraceInternal.Fail(String message)
   at System.Diagnostics.Debug.Assert(Boolean condition, String message)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitArgument(BoundExpression argument, RefKind refKind)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitArguments(ImmutableArray`1 arguments, ImmutableArray`1 parameters, ImmutableArray`1 argRefKindsOpt)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitObjectCreationExpression(BoundObjectCreationExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpressionCore(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpression(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitArgument(BoundExpression argument, RefKind refKind)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitArguments(ImmutableArray`1 arguments, ImmutableArray`1 parameters, ImmutableArray`1 argRefKindsOpt)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStaticCallExpression(BoundCall call, UseKind useKind)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitCallExpression(BoundCall call, UseKind useKind)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpressionCore(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpression(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitAssignmentValue(BoundAssignmentOperator assignmentOperator)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitAssignmentExpression(BoundAssignmentOperator assignmentOperator, UseKind useKind)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpressionCore(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpressionCoreWithStackGuard(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitExpression(BoundExpression expression, Boolean used)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatementAndCountInstructions(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitSequencePointStatement(BoundSequencePoint node)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatements(ImmutableArray`1 statements)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitUninstrumentedBlock(BoundBlock block)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitBlock(BoundBlock block)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatementList(BoundStatementList list)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatementAndCountInstructions(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitSequencePointStatement(BoundSequencePointWithSpan node)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatements(ImmutableArray`1 statements)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitUninstrumentedBlock(BoundBlock block)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitBlock(BoundBlock block)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatementList(BoundStatementList list)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.EmitStatement(BoundStatement statement)
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.GenerateImpl()
   at Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator.Generate(Boolean& hasStackalloc)
   at Microsoft.CodeAnalysis.CSharp.MethodCompiler.GenerateMethodBody(PEModuleBuilder moduleBuilder, MethodSymbol method, Int32 methodOrdinal, BoundStatement block, ImmutableArray`1 lambdaDebugInfo, ImmutableArray`1 closureDebugInfo, ImmutableArray`1 stateMachineStateDebugInfos, StateMachineTypeSymbol stateMachineTypeOpt, VariableSlotAllocator variableSlotAllocatorOpt, BindingDiagnosticBag diagnostics, DebugDocumentProvider debugDocumentProvider, ImportChain importChainOpt, Boolean emittingPdb, ImmutableArray`1 codeCoverageSpans, AsyncForwardEntryPoint entryPointOpt)
   at Microsoft.CodeAnalysis.CSharp.MethodCompiler.CompileMethod(MethodSymbol methodSymbol, Int32 methodOrdinal, ProcessedFieldInitializers& processedInitializers, SynthesizedSubmissionFields previousSubmissionFields, TypeCompilationState compilationState)
   at Microsoft.CodeAnalysis.CSharp.MethodCompiler.CompileNamedType(NamedTypeSymbol containingType)
   at Microsoft.CodeAnalysis.CSharp.MethodCompiler.<>c__DisplayClass25_0.<CompileNamedTypeAsync>b__0()
   at Roslyn.Utilities.UICultureUtilities.<>c__DisplayClass5_0.<WithCurrentUICulture>b__0()
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

Investigating bootstrap failures

@DoctorKrolic
Copy link
Contributor Author

@jaredpar Thanks for the detailed stack trace and a documentation link, that was very helpful!

The problem itself is quite simple. I just haven't considered that not all expressions can be passed by reference. I'm very glad this was caught so early. The solution is to write the result to a temp local and then pass it by reference to the constructor of ReadOnlySpan<char>. This produces even more IL than a ToString approach, but I've verified in a benchmark, that span version is still better

Benchmark
BenchmarkDotNet v0.13.10, Windows 11 (10.0.22621.2715/22H2/2022Update/SunValley2)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2


| Method        | Mean      | Error     | StdDev    | Gen0   | Allocated |
|-------------- |----------:|----------:|----------:|-------:|----------:|
| ConcatStrings | 12.153 ns | 0.1038 ns | 0.0971 ns | 0.0033 |      56 B |
| ConcatSpans   |  9.825 ns | 0.0582 ns | 0.0516 ns | 0.0019 |      32 B |
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();

[MemoryDiagnoser]
public class Benchmarks
{
    [Benchmark]
    public string ConcatStrings()
        => M1("s", 'c');

    [Benchmark]
    public string ConcatSpans()
        => M2("s", 'c');

    private static string M1(string s, char c) => string.Concat(s, char.ToLowerInvariant(c).ToString(), s);

    private static string M2(string s, char c)
    {
        var c1 = char.ToLowerInvariant(c);
        return string.Concat(s, new ReadOnlySpan<char>(in c1), s);
    }
}

@DoctorKrolic
Copy link
Contributor Author

Ok, the other error was due to the fact that we cannot run .NET 8 tests inside a .NET Framework host, which is reasonable. So I had to add a bunch of conditions to disable output verification there. Now we only test diagnostics and IL output on .NET Framework and all of this + execution result on .NET 8 and higher. The PR is now fully ready for review

@333fred
Copy link
Member

333fred commented Nov 28, 2023

Will try to get to a full review of this on Thursday.

}
""";

var comp = CompileAndVerify(source, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "sccs" : null, targetFramework: TargetFramework.Net80, verify: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? default : Verification.Skipped);
Copy link
Member

@jjonescz jjonescz Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead do something like targetFramework: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? TargetFramework.Net80 : TargetFramework.NetStandard? Then we could keep the expectedOutput verification. We can still verify IL only in .NET 8 (or verify both Net80 and NetStandard, perhaps only in a few tests) - plus in those "missing member" tests, the IL should even be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still verify IL only in .NET 8

The word "still" is not correct here. We can verify IL on both .NET 8 and .NET Framework. We can't run .NET 8 program on .NET Framework host though, which is the reason why we skip output verification on .NET Framework. Yes, with this suggestion we will be able to verify output on both hosts, but in such case we would loose the ability to check IL on .NET Framework since it will fallback to ToString strategy due to the fact, that required members for the optimization are missing. This would be yet another test of the existing behavior before this PR, which is already covered by other tests in the file. So I think it is better to verify, that compiler, which runs on .NET Framework, correctly emits optimal IL with runtime references, containing required members for that.

plus in those "missing member" tests, the IL should even be the same

Yes, it is true. However, currently tests verify, that missing at least 1 member, required for optimization, makes compiler take less optimal path. If we run .NET Framework tests with NetStandard references, we would loose this exact check on .NET Framework (since it doesn't have any of these members).

comp.MakeMemberMissing((WellKnownMember)spanConcatMember);

// Just verify that we can still run this and get expected output.
// This is not something that can be seen in real-life scenarios, so don't care about precise IL we generate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the expected output and IL the same as in the test above? Cannot it be merged into one test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunatelly not. Let's consider s + c + s example, where s is a string and c is a char. Compiler lowers this left-to-right, meaning, that first s + c is lowered and then happens lowering of its result and the final s. On the first iteration if we pick up optimized path we get string.Concat(s, new ReadOnlySpan<char>(in c)). On the second one we "unwrap" string.Concat back, receivig s, wrapped into an implicit conversion to a span, and new ReadOnlySpan<char>(in c). Then we wrap the last s into a a span and get string.Concat(s, new ReadOnlySpan<char>(in c), s). And now let's see what if only 1 span-based Concat overload is missing. If we miss span concat of 2 on first iteration we get string.Concat(s, c.ToString()), so when we get to the second one we unwrap 2 string arguments and emit string.Concat(s, c.ToString(), s). However if we are missing span concat of 3 we get string.Concat(s, new ReadOnlySpan<char>(in c)) on the first iteration, but then we don't unwrap it on the second iteration and get string.Concat(string.Concat(s, new ReadOnlySpan<char>(in c)), s). So yeah, unfortunatelly, it we want to check IL we can only do it separately for each case, which IMO is not worth it since in reality case when only 1 span-based Concat is missing is not possible

DoctorKrolic and others added 2 commits November 29, 2023 19:17
Co-authored-by: Jan Jones <jan.jones.cz@gmail.com>
}

Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar());
Debug.Assert(previousLocals.Count == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(previousLocals.Count == 0);

This assumption isn't obvious to me at the moment.

}

Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar() && !fourthType.IsReadOnlySpanChar());
Debug.Assert(previousLocals.Count == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(previousLocals.Count == 0);

This assumption isn't obvious to me at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption isn't obvious to me at the moment.

Now it looks just wrong. This assert fails for the following unit-test, and, if all failing asserts are disabled, compiler crashes in stack optimizer:


        [Fact]
        public void Test3()
        {
            var source = """
                public class C 
                { 
                    static void Main()
                    {
                        var a = "a";
                        var b = 'b';
                        var c = "c";
                        var d = 'd';
                
                        System.Console.WriteLine((a + b) + (c + d)); 
                    }
                }
                """;

            var comp = CreateCompilation(source, options: TestOptions.ReleaseExe, targetFramework: TargetFramework.Net80);
            comp.MakeMemberMissing(WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpanReadOnlySpanReadOnlySpan);

            var verifier = CompileAndVerify(compilation: comp, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "abcd" : null, verify: /*ExecutionConditionUtil.IsCoreClr ? default : */Verification.Skipped).VerifyDiagnostics();

 //           verifier.VerifyIL("C.Main", "");
        }

@@ -299,33 +334,208 @@ private BoundExpression RewriteStringConcatenationOneExpr(BoundExpression lowere

private BoundExpression RewriteStringConcatenationTwoExprs(SyntaxNode syntax, BoundExpression loweredLeft, BoundExpression loweredRight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RewriteStringConcatenationTwoExprs

For this and the other two similar helpers, I think we should make sure unit-tests cover an entire following test matrix:

  • All possible combinations of input types with all helpers available.
  • Scenarios from the previous item, but with one out of the five new helpers missing (all combinations)

No need to verify IL for this test matrix, but should verify result of concatenation at runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the other two similar helpers, I think we should make sure unit-tests cover an entire following test matrix ...

I understand that creating a set of tests like that is a lot of work. The reason why I felt it is necessary, is the nature of the code changes, too many assumptions (I think I already proved many of them wrong), complicated distributed conditional logic, etc. Besides a lot of work involved, simply implementing the matrix could still miss some of the scenarios that I found problematic. It is pretty much impossible to implement test matrix that is going to provide complete coverage of all possible combinations of affected scenarios. So, I suggest to not take on this task just yet. I am planning to make some suggestions about changing the implementation. Hopefully the suggestions might result in code which is much easier to reason about and, therefore, it would be much easier get confident about overall correctness.

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review pass (commit 33)

@DoctorKrolic
Copy link
Contributor Author

I think we do care about the IL, and we would want the test to fail if it changes.

Tests you point to are for cases when 1 span-based string.Concat overload is missing. We verify that compiler doesn't crash in such case and on .NET 8 also verify that we get expected output. But since all span-based Concat overloads were added to the runtime library at the same time, this isn't something that can be seen in real-life scenarios. Moreover, in #70971 (comment) I explained why we cannot generalize these tests under 1 test case. So if we want to check IL for them we would have to make a test for every missing member separatly, bloating size of our test base without strong reasons for it.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 9, 2024

So if we want to check IL for them we would have to make a test for every missing member separatly, bloating size of our test base without strong reasons for it.

I think there is a reason for that. Besides simply testing the current change, the unit-tests are also act as regression tests, catching any unintended changes in behavior that might be introduced in the future. With IL verification the tests will serve that purpose much better.

return sequence;
}

Debug.Assert(!leftType.IsReadOnlySpanChar() && !rightType.IsReadOnlySpanChar());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(!leftType.IsReadOnlySpanChar() && !rightType.IsReadOnlySpanChar());

I guess it is fine to keep the assert, but it looks redundant, given the asserts at the beginning of the function

return sequence;
}

Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar() && !fourthType.IsReadOnlySpanChar());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar() && !fourthType.IsReadOnlySpanChar());

Same comment as for the previous helper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a unit-test that breaks this assert and results in an invalid program:

        [Fact]
        public void Test2()
        {
            var source = """
                public class C 
                { 
                    static void Main()
                    {
                        var a = "a";
                        var b = "b";
                        var c = "c";
                        var d = "d";
                
                        System.Console.WriteLine(
                            System.String.Concat((System.ReadOnlySpan<char>)a, (System.ReadOnlySpan<char>)b) +
                            System.String.Concat((System.ReadOnlySpan<char>)c, (System.ReadOnlySpan<char>)d)); 
                    }
                }
                """;

            var comp = CreateCompilation(source, options: TestOptions.ReleaseExe, targetFramework: TargetFramework.Net80);
            comp.MakeMemberMissing(WellKnownMember.System_String__op_Implicit_ToReadOnlySpanOfChar);
            comp.MakeMemberMissing(WellKnownMember.System_ReadOnlySpan_T__ctor_Reference);

            var verifier = CompileAndVerify(compilation: comp, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "abcd" : null, verify: /*ExecutionConditionUtil.IsCoreClr ? default : */Verification.Skipped).VerifyDiagnostics();

            //verifier.VerifyIL("C.Main", "");
        }

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 9, 2024

Done with review pass (commit 37)

TryGetWellKnownTypeMember<MethodSymbol>(lowered.Syntax, WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpanReadOnlySpan, out _, isOptional: true))
{
arguments = boundCall.Arguments;
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true;

Unit-test Test3 from my other comment hits this code path, but we never even try to use WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpanReadOnlySpan. Therefore, checking its presence here (in the enclosing 'if' condition) feels premature.

// If it's a string already, just return it
if (expr.Type.IsStringType())
// If it's a char, return it here, so we can apply span-based optimization later
if (expr.Type.IsStringType() || expr.Type.IsCharType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| expr.Type.IsCharType()

It looks like we cannot benefit from leaving this expression as char unless we have WellKnownMember.System_ReadOnlySpan_T__ctor_Reference. It might be much simpler to check for the helper right here. If we don't have it, we just keep going with the regular conversion. This way we don't need to worry about the lack of the helper later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, it might make sense to check for WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpan here as well. Most likely, even when we will be able to take advantage of Concat Span overloads with more parameters, we will allocate a string for the char, and then get span from it. To me, it doesn't feel worthwhile spending effort to support an optimization for scenarios like that.

break;

default:
Debug.Assert(previousLocals.Count == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(previousLocals.Count == 0);

This assert fails for the following unit-test. Once all failing asserts are disabled, compiler crashes in stack optimizer.

        [Fact]
        public void Test4()
        {
            var source = """
                public class C 
                { 
                    static void Main()
                    {
                        var a = "a";
                        var b = "b";
                        var c = "c";
                        var d = "d";
                        var e = 'e';
                                                
                        System.Console.WriteLine((a + b + c) + (d + e)); 
                    }
                }
                """;

            var comp = CreateCompilation(source, options: TestOptions.ReleaseExe, targetFramework: TargetFramework.Net80);

            var verifier = CompileAndVerify(compilation: comp, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "abcde" : null, verify: /*ExecutionConditionUtil.IsCoreClr ? default : */Verification.Skipped).VerifyDiagnostics();

            //verifier.VerifyIL("C.Main", "");
        }


foreach (var loweredArg in loweredArgs)
{
Debug.Assert(loweredArg.HasErrors || loweredArg.Type is { } argType && (argType.IsStringType() || argType.IsCharType()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(loweredArg.HasErrors || loweredArg.Type is { } argType && (argType.IsStringType() || argType.IsCharType()));

This assert fails for a unit-test Test4 from my other comment.

var second = leftFlattened[1];
var third = leftFlattened[2];
result = RewriteStringConcatenationThreeExprs(syntax, first, second, third);
Debug.Assert(previousLocals.Count <= 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(previousLocals.Count <= 1);

Is this fact important? Are we taking advantage of it somewhere?

@AlekseyTs
Copy link
Contributor

I am still thinking about an alternative implementation strategy. However, there is a question that we might want to answer first - #66827 (comment).

@DoctorKrolic
Copy link
Contributor Author

I am still thinking about an alternative implementation strategy

I would like to see your thoughts on that before changing things. Current implementation is based on an assumption, that right-hand side of a concatenation is a single value, which is wrong and can be broken by a) grouping arguments like a + (b + c) and b) user input of string.Concat, e.g. a + string.Concat(b, c). In order to correctly handle both cases a bunch of things will probably need to change

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to explicitly block the merge for now in order to avoid an accidental merge of this PR

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 11, 2024

    private BoundExpression RewriteStringConcatenation(SyntaxNode syntax, BinaryOperatorKind operatorKind, BoundExpression loweredLeft, BoundExpression loweredRight, TypeSymbol type)

I would like to propose the following alternative implementation strategy.
Since there are many issues in different places with the current implementation, instead of going through each modified place and spelling out suggested modifications, I am making a proposal against the unmodified version of this file. I would also like to suggest implementing it in a separate PR. And, if we will go with a separate PR, I would like to suggest leaving legacy CodeGenStringConcat tests where they were, but add new tests to a new file in Emit2

The proposal:

  1. Adjust TryFoldTwoConcatOperands behavior to not wrap char.ToString calls into coalesce expression. It looks like, in order to achieve that, it will be sufficient to adjust RewriteStringConcatenationOneExpr behavior.
  2. Adjust TryFoldTwoConcatOperands behavior to see through char.ToString in case the char is a constant and implement, constant folding accordingly. I think we need to handle char constants on either, or both sides.
  3. Adjust FlattenConcatArg to unwrap String.Concat(<readonly spans>) calls if and only if the shape of the node matches the shape that we create during this optimization (a sequence without side effects, locals are chars that are used in arguments that are new ReadOnlySpan<char>(in char) calls, other arguments are _String__op_Implicit_ToReadOnlySpanOfChar calls). The arguments should be unwrapped to the original strings and chars, then chars should be wrapped into char.ToString the way ConvertConcatExprToString does that. Temporary locals used by the sequences can be safely dropped at this point.
  4. In the switch below, when we have 2, 3, or 4 flattened arguments, check if any of the arguments is a char.ToString call. If there are arguments like that, check if we have corresponding String.Concat(<readonly spans>) helper and _String__op_Implicit_ToReadOnlySpanOfChar, new ReadOnlySpan<char>(in char) helpers. If so, lower using String.Concat(<readonly spans>) helper, doing necessary transformations for the arguments (wrap strings, unwrap then wrap chars). I suggest to not change RewriteStringConcatenation<Two/Three>/Four>Exprs helpers for that, but rather add new dedicated helpers to target String.Concat(<readonly spans>) methods and handle all necessary bound node transformations in them. It might be even better to have a single helper that handles all of them, it might take the method symbol and an array of arguments, which are then processed in a loop and are transformed as appropriate.
  5. Make sure all modified and added code paths are covered by unit-tests.
  6. Done. I hope I didn't forget anything. In any case, I think this should be good to start with.

Refers to: src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs:34 in 227a687. [](commit_id = 227a687, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 11, 2024

If there is a big concern that with the strategy from above we will be potentially wrapping/unwrapping the same arguments multiple times, we could consider a different strategy that might work:
In public override BoundNode VisitBinaryOperator(BoundBinaryOperator node) and in private BoundExpression VisitCompoundAssignmentOperator(BoundCompoundAssignmentOperator node, bool used), we could detect that the top level operator is a string concatenation and handle that case specially. In particular:

  • collect all arguments involved into the concatenation from the entire tree into a flat list
  • lower the arguments
  • optionally deconstruct any String.Concat calls that TryExtractStringConcatArgs deconstructs today
  • apply constant folding across the entire set of arguments
  • At this point we have the complete set of things we need to concatenate, decide what helper to use and use it applying necessary transformations for the arguments.

The RewriteStringConcatenation helper, in the form in which it exists today, will no longer be necessary. it can simply delegate to the same new algorithm right after the "lower the arguments" step.

There are some downsides to this approach though. It is more complex to implement, very little can be reused from the existing code as is. The compound assignment case is going to be very tricky because we wouldn't want to lower the right hand side of the assignment upfront (it looks like this is what the current implementation is doing today), it can be a concatenation on itself and we would not want to lower it without the left hand side as one of the arguments for the concatenation. All that leads to a greater risk, more "opportunities" to get something wrong and break things.

@AlekseyTs
Copy link
Contributor

@DoctorKrolic Would you like to try an alternative approach?

@DoctorKrolic
Copy link
Contributor Author

Would you like to try an alternative approach?

Yes, I've already implemented approach based on your suggestion in #70971 (comment). Unfortunatelly it isn't quite ready for a full review yet and I'm away from my usual place, so I cannot finish it (after trying to work with roslyn on my weak laptop and experiencing 30-40 seconds of waiting each time I want to run a unit test to verify even a simple change I quickly gave up on that idea). Expect a PR early next week and thanks for paying attention!

@AlekseyTs
Copy link
Contributor

Expect a PR early next week and thanks for paying attention!

Sounds good. There is no rush, I just wanted to know your plan.

@DoctorKrolic
Copy link
Contributor Author

Closing this PR in favour of #71793

@DoctorKrolic DoctorKrolic deleted the concat-string-char branch January 24, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve string concatenation with non-const chars to use spans

5 participants