Improve codegen for concatenation of string and char#70971
Improve codegen for concatenation of string and char#70971DoctorKrolic wants to merge 37 commits intodotnet:mainfrom DoctorKrolic:concat-string-char
string and char#70971Conversation
|
@DoctorKrolic it looks like there are a number of failing tests, and the bootstrap compiler is broken as well. |
|
Here is the failing stack trace from the bootstrap build |
|
@jaredpar Thanks for the detailed stack trace and a documentation link, that was very helpful! The problem itself is quite simple. I just haven't considered that not all expressions can be passed by reference. I'm very glad this was caught so early. The solution is to write the result to a temp local and then pass it by reference to the constructor of Benchmarkusing BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Benchmarks>();
[MemoryDiagnoser]
public class Benchmarks
{
[Benchmark]
public string ConcatStrings()
=> M1("s", 'c');
[Benchmark]
public string ConcatSpans()
=> M2("s", 'c');
private static string M1(string s, char c) => string.Concat(s, char.ToLowerInvariant(c).ToString(), s);
private static string M2(string s, char c)
{
var c1 = char.ToLowerInvariant(c);
return string.Concat(s, new ReadOnlySpan<char>(in c1), s);
}
} |
|
Ok, the other error was due to the fact that we cannot run .NET 8 tests inside a .NET Framework host, which is reasonable. So I had to add a bunch of conditions to disable output verification there. Now we only test diagnostics and IL output on .NET Framework and all of this + execution result on .NET 8 and higher. The PR is now fully ready for review |
|
Will try to get to a full review of this on Thursday. |
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs
Show resolved
Hide resolved
| } | ||
| """; | ||
|
|
||
| var comp = CompileAndVerify(source, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "sccs" : null, targetFramework: TargetFramework.Net80, verify: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? default : Verification.Skipped); |
There was a problem hiding this comment.
Could we instead do something like targetFramework: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? TargetFramework.Net80 : TargetFramework.NetStandard? Then we could keep the expectedOutput verification. We can still verify IL only in .NET 8 (or verify both Net80 and NetStandard, perhaps only in a few tests) - plus in those "missing member" tests, the IL should even be the same.
There was a problem hiding this comment.
We can still verify IL only in .NET 8
The word "still" is not correct here. We can verify IL on both .NET 8 and .NET Framework. We can't run .NET 8 program on .NET Framework host though, which is the reason why we skip output verification on .NET Framework. Yes, with this suggestion we will be able to verify output on both hosts, but in such case we would loose the ability to check IL on .NET Framework since it will fallback to ToString strategy due to the fact, that required members for the optimization are missing. This would be yet another test of the existing behavior before this PR, which is already covered by other tests in the file. So I think it is better to verify, that compiler, which runs on .NET Framework, correctly emits optimal IL with runtime references, containing required members for that.
plus in those "missing member" tests, the IL should even be the same
Yes, it is true. However, currently tests verify, that missing at least 1 member, required for optimization, makes compiler take less optimal path. If we run .NET Framework tests with NetStandard references, we would loose this exact check on .NET Framework (since it doesn't have any of these members).
| comp.MakeMemberMissing((WellKnownMember)spanConcatMember); | ||
|
|
||
| // Just verify that we can still run this and get expected output. | ||
| // This is not something that can be seen in real-life scenarios, so don't care about precise IL we generate |
There was a problem hiding this comment.
Isn't the expected output and IL the same as in the test above? Cannot it be merged into one test?
There was a problem hiding this comment.
Unfortunatelly not. Let's consider s + c + s example, where s is a string and c is a char. Compiler lowers this left-to-right, meaning, that first s + c is lowered and then happens lowering of its result and the final s. On the first iteration if we pick up optimized path we get string.Concat(s, new ReadOnlySpan<char>(in c)). On the second one we "unwrap" string.Concat back, receivig s, wrapped into an implicit conversion to a span, and new ReadOnlySpan<char>(in c). Then we wrap the last s into a a span and get string.Concat(s, new ReadOnlySpan<char>(in c), s). And now let's see what if only 1 span-based Concat overload is missing. If we miss span concat of 2 on first iteration we get string.Concat(s, c.ToString()), so when we get to the second one we unwrap 2 string arguments and emit string.Concat(s, c.ToString(), s). However if we are missing span concat of 3 we get string.Concat(s, new ReadOnlySpan<char>(in c)) on the first iteration, but then we don't unwrap it on the second iteration and get string.Concat(string.Concat(s, new ReadOnlySpan<char>(in c)), s). So yeah, unfortunatelly, it we want to check IL we can only do it separately for each case, which IMO is not worth it since in reality case when only 1 span-based Concat is missing is not possible
| } | ||
|
|
||
| Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar()); | ||
| Debug.Assert(previousLocals.Count == 0); |
| } | ||
|
|
||
| Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar() && !fourthType.IsReadOnlySpanChar()); | ||
| Debug.Assert(previousLocals.Count == 0); |
There was a problem hiding this comment.
This assumption isn't obvious to me at the moment.
Now it looks just wrong. This assert fails for the following unit-test, and, if all failing asserts are disabled, compiler crashes in stack optimizer:
[Fact]
public void Test3()
{
var source = """
public class C
{
static void Main()
{
var a = "a";
var b = 'b';
var c = "c";
var d = 'd';
System.Console.WriteLine((a + b) + (c + d));
}
}
""";
var comp = CreateCompilation(source, options: TestOptions.ReleaseExe, targetFramework: TargetFramework.Net80);
comp.MakeMemberMissing(WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpanReadOnlySpanReadOnlySpan);
var verifier = CompileAndVerify(compilation: comp, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "abcd" : null, verify: /*ExecutionConditionUtil.IsCoreClr ? default : */Verification.Skipped).VerifyDiagnostics();
// verifier.VerifyIL("C.Main", "");
}
| @@ -299,33 +334,208 @@ private BoundExpression RewriteStringConcatenationOneExpr(BoundExpression lowere | |||
|
|
|||
| private BoundExpression RewriteStringConcatenationTwoExprs(SyntaxNode syntax, BoundExpression loweredLeft, BoundExpression loweredRight) | |||
There was a problem hiding this comment.
For this and the other two similar helpers, I think we should make sure unit-tests cover an entire following test matrix:
- All possible combinations of input types with all helpers available.
- Scenarios from the previous item, but with one out of the five new helpers missing (all combinations)
No need to verify IL for this test matrix, but should verify result of concatenation at runtime.
There was a problem hiding this comment.
For this and the other two similar helpers, I think we should make sure unit-tests cover an entire following test matrix ...
I understand that creating a set of tests like that is a lot of work. The reason why I felt it is necessary, is the nature of the code changes, too many assumptions (I think I already proved many of them wrong), complicated distributed conditional logic, etc. Besides a lot of work involved, simply implementing the matrix could still miss some of the scenarios that I found problematic. It is pretty much impossible to implement test matrix that is going to provide complete coverage of all possible combinations of affected scenarios. So, I suggest to not take on this task just yet. I am planning to make some suggestions about changing the implementation. Hopefully the suggestions might result in code which is much easier to reason about and, therefore, it would be much easier get confident about overall correctness.
AlekseyTs
left a comment
There was a problem hiding this comment.
Done with review pass (commit 33)
Tests you point to are for cases when 1 span-based |
I think there is a reason for that. Besides simply testing the current change, the unit-tests are also act as regression tests, catching any unintended changes in behavior that might be introduced in the future. With IL verification the tests will serve that purpose much better. |
| return sequence; | ||
| } | ||
|
|
||
| Debug.Assert(!leftType.IsReadOnlySpanChar() && !rightType.IsReadOnlySpanChar()); |
| return sequence; | ||
| } | ||
|
|
||
| Debug.Assert(!firstType.IsReadOnlySpanChar() && !secondType.IsReadOnlySpanChar() && !thirdType.IsReadOnlySpanChar() && !fourthType.IsReadOnlySpanChar()); |
There was a problem hiding this comment.
Here is a unit-test that breaks this assert and results in an invalid program:
[Fact]
public void Test2()
{
var source = """
public class C
{
static void Main()
{
var a = "a";
var b = "b";
var c = "c";
var d = "d";
System.Console.WriteLine(
System.String.Concat((System.ReadOnlySpan<char>)a, (System.ReadOnlySpan<char>)b) +
System.String.Concat((System.ReadOnlySpan<char>)c, (System.ReadOnlySpan<char>)d));
}
}
""";
var comp = CreateCompilation(source, options: TestOptions.ReleaseExe, targetFramework: TargetFramework.Net80);
comp.MakeMemberMissing(WellKnownMember.System_String__op_Implicit_ToReadOnlySpanOfChar);
comp.MakeMemberMissing(WellKnownMember.System_ReadOnlySpan_T__ctor_Reference);
var verifier = CompileAndVerify(compilation: comp, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "abcd" : null, verify: /*ExecutionConditionUtil.IsCoreClr ? default : */Verification.Skipped).VerifyDiagnostics();
//verifier.VerifyIL("C.Main", "");
}
|
Done with review pass (commit 37) |
| TryGetWellKnownTypeMember<MethodSymbol>(lowered.Syntax, WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpanReadOnlySpan, out _, isOptional: true)) | ||
| { | ||
| arguments = boundCall.Arguments; | ||
| return true; |
There was a problem hiding this comment.
| // If it's a string already, just return it | ||
| if (expr.Type.IsStringType()) | ||
| // If it's a char, return it here, so we can apply span-based optimization later | ||
| if (expr.Type.IsStringType() || expr.Type.IsCharType()) |
There was a problem hiding this comment.
It looks like we cannot benefit from leaving this expression as char unless we have WellKnownMember.System_ReadOnlySpan_T__ctor_Reference. It might be much simpler to check for the helper right here. If we don't have it, we just keep going with the regular conversion. This way we don't need to worry about the lack of the helper later.
There was a problem hiding this comment.
For simplicity, it might make sense to check for WellKnownMember.System_String__Concat_ReadOnlySpanReadOnlySpan here as well. Most likely, even when we will be able to take advantage of Concat Span overloads with more parameters, we will allocate a string for the char, and then get span from it. To me, it doesn't feel worthwhile spending effort to support an optimization for scenarios like that.
| break; | ||
|
|
||
| default: | ||
| Debug.Assert(previousLocals.Count == 0); |
There was a problem hiding this comment.
This assert fails for the following unit-test. Once all failing asserts are disabled, compiler crashes in stack optimizer.
[Fact]
public void Test4()
{
var source = """
public class C
{
static void Main()
{
var a = "a";
var b = "b";
var c = "c";
var d = "d";
var e = 'e';
System.Console.WriteLine((a + b + c) + (d + e));
}
}
""";
var comp = CreateCompilation(source, options: TestOptions.ReleaseExe, targetFramework: TargetFramework.Net80);
var verifier = CompileAndVerify(compilation: comp, expectedOutput: RuntimeUtilities.IsCoreClr8OrHigherRuntime ? "abcde" : null, verify: /*ExecutionConditionUtil.IsCoreClr ? default : */Verification.Skipped).VerifyDiagnostics();
//verifier.VerifyIL("C.Main", "");
}
|
|
||
| foreach (var loweredArg in loweredArgs) | ||
| { | ||
| Debug.Assert(loweredArg.HasErrors || loweredArg.Type is { } argType && (argType.IsStringType() || argType.IsCharType())); |
| var second = leftFlattened[1]; | ||
| var third = leftFlattened[2]; | ||
| result = RewriteStringConcatenationThreeExprs(syntax, first, second, third); | ||
| Debug.Assert(previousLocals.Count <= 1); |
|
I am still thinking about an alternative implementation strategy. However, there is a question that we might want to answer first - #66827 (comment). |
I would like to see your thoughts on that before changing things. Current implementation is based on an assumption, that right-hand side of a concatenation is a single value, which is wrong and can be broken by a) grouping arguments like |
AlekseyTs
left a comment
There was a problem hiding this comment.
I am going to explicitly block the merge for now in order to avoid an accidental merge of this PR
I would like to propose the following alternative implementation strategy. The proposal:
Refers to: src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs:34 in 227a687. [](commit_id = 227a687, deletion_comment = False) |
|
If there is a big concern that with the strategy from above we will be potentially wrapping/unwrapping the same arguments multiple times, we could consider a different strategy that might work:
The There are some downsides to this approach though. It is more complex to implement, very little can be reused from the existing code as is. The compound assignment case is going to be very tricky because we wouldn't want to lower the right hand side of the assignment upfront (it looks like this is what the current implementation is doing today), it can be a concatenation on itself and we would not want to lower it without the left hand side as one of the arguments for the concatenation. All that leads to a greater risk, more "opportunities" to get something wrong and break things. |
|
@DoctorKrolic Would you like to try an alternative approach? |
Yes, I've already implemented approach based on your suggestion in #70971 (comment). Unfortunatelly it isn't quite ready for a full review yet and I'm away from my usual place, so I cannot finish it (after trying to work with roslyn on my weak laptop and experiencing 30-40 seconds of waiting each time I want to run a unit test to verify even a simple change I quickly gave up on that idea). Expect a PR early next week and thanks for paying attention! |
Sounds good. There is no rush, I just wanted to know your plan. |
|
Closing this PR in favour of #71793 |
Closes: #66827