Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize codegen for collections expression of single spread of ReadOnlySpan for collection builder emit strategy #73102

Merged
merged 14 commits into from
Jul 11, 2024

Conversation

DoctorKrolic
Copy link
Contributor

Implements suggestion from #71296 (comment). Now when collection expression if of form [.. readOnlySpan] we can just take that readOnlySpan instead of copying it since we know that it is an immutable type

…OnlySpan` in case of collection builder emit strategy
@DoctorKrolic DoctorKrolic requested a review from a team as a code owner April 19, 2024 12:36
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Apr 19, 2024
@dotnet-policy-service dotnet-policy-service bot added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Apr 19, 2024
@DoctorKrolic DoctorKrolic changed the title Optimize codegen for collections expression of single spread of ReadOnlySpan in case of collection builder emit strategy Optimize codegen for collections expression of single spread of ReadOnlySpan Apr 19, 2024
@CyrusNajmabadi
Copy link
Member

Can you state explicitly under what circumstances this optimization will apply? For example I want to make sure it does not apply to:

ReadOnlySpan<int> intCopy = [.. rosInts]

Thanks!

@DoctorKrolic
Copy link
Contributor Author

ReadOnlySpan<int> intCopy = [.. rosInts]

It does apply to this case, making it effectively a reassignment. Why it is bad? The ROS is an immutable type and the previous codegen didn't perform deep copy of elements anyway, so this in unobservable to the user (equalify change doesn't count since it isn't guaranteed for collection expressions). Am I worng?

@CyrusNajmabadi
Copy link
Member

Ros is not an immutable type. It's an immutable view over potentially mutable data.

Trivial example, wrap a normal array with a ros. Then assign to two other ROSs (one as a normal assignment, one as a spread copy). Now change the original array. What do you see?

@CyrusNajmabadi
Copy link
Member

Afaict, this is only safe if passed to a collection builder Create method that is scoped. That way we know the new collection must be making a copy itself. Right @RikkiGibson?

@RikkiGibson
Copy link
Contributor

It's incorrect to optimize [..ros] to ros. You have to copy it, even if target type is also ReadOnlySpan, due to semantic differences Cyrus has laid out.

Afaict, this is only safe if passed to a collection builder Create method that is scoped. That way we know the new collection must be making a copy itself. Right @RikkiGibson?

I'm not following the scenario you have in mind. If a Create method is being used to create the resulting collection, then it's not the ReadOnlySpan<T> ros1 = [..ros]; case.

I can't think of any case where we can safely "skip" copying when user code includes a spread. It would have to involve proving that the spread operand and its referents is not referenced by anything else, and treating it like a "move" of the storage instead of a copy. It's not a type of optimization we really do in Roslyn. Especially for a case where, if user wants those semantics, they could just replace [..ros] with ros.

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 19, 2024

Looking at the code a bit more. Yeah, I see that VisitCollectionBuilderCollectionExpression is implemented by just lowering the collection expr as ReadOnlySpan then passing it to Create. This means that if you do only the ImmutableArray part of this PR then you make codegen worse by doing that copy.

In the case of MyCollectionBuilderCollection<T> collection = [..ros], where ros is ReadOnlySpan<T>, it def feels like you could skip copying the ros.

And yes, to avoid a behavioral breaking change you need to take care to still do the copy for RefStruct<T> Create(/*non-scoped*/ ReadOnlySpan<T>).

@CyrusNajmabadi
Copy link
Member

I can't think of any case where we can safely "skip" copying when user code includes a spread. It would have to involve proving that the spread operand and its referents is not referenced by anything else, and treating it like a "move" of the storage instead of a copy.

Here's teh scenario:

ReadOnlySpan<int> x = ... backing mutable store ...;
ImmutableArray<int> y = [.. x];

This should be safe (IMO, but correct me if i'm wrong), but it should be possible to compile this to:

ImmutableArray<int> y = ImmutableArray.Create<int>(x);

Instead of:

ReadOnlySpan<int> __compilerCopy = ... copy x ...;
ImmutableArray<int> y = ImmutableArray.Create<int>(__compilerCopy);

Basically, if the ROS arg is scoped, we know that the final collection can't keep a reference to it. So it will make a copy itself, so we don't need to.

(I could def be wrong on this).

@RikkiGibson
Copy link
Contributor

I agree with your assessment.

@CyrusNajmabadi
Copy link
Member

And yes, to avoid a behavioral breaking change you need to take care to still do the copy for RefStruct Create(/non-scoped/ ReadOnlySpan).

Interestingly enough, ImmutableArray.Create doesn't seem to have a scoped argument:

https://github.com/dotnet/runtime/blob/b067ce009ab5992168d1d635a9a9beb387008608/src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray.cs#L90

I'm curious why that is. @stephentoub any insights here?

@RikkiGibson
Copy link
Contributor

You only ever need scoped when the signature has a ref struct output. In this case it only returns ImmutableArray so we already know it can't return references in the parameter.

@CyrusNajmabadi
Copy link
Member

In the case of IA, we'd either need to special case. Or see if the signature of it could be updated. And if there's a good reason the sig is as it is, and the non-copy is not-safe, we likely should not special case this type. :)

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 19, 2024

@DoctorKrolic as far as how to adjust the implementation. I would recommend to do a check in VisitCollectionBuilderCollectionExpression to see if we are in this special case:

  • collection-expr is a ROS of a single spread, with same element type as the collection-expr, and Create method cannot return references in its argument.

In which case, don't call into VisitArrayOrSpanCollectionExpression, just use the (lowered) spread-operand expression as the argument to Create.

@RikkiGibson
Copy link
Contributor

Sorry for the spam. It can probably be generalized a little further than this, e.g. if expr in [..expr] is implicitly convertible to the ReadOnlySpan that we want, in a way that doesn't imply copying (for example, if it is an array or mutable span), we could also just do that conversion and pass the result to Create (given the ref safety criteria is also met. Dealer's choice on whether you actually go for that optimization.

@DoctorKrolic
Copy link
Contributor Author

@RikkiGibson Can you clarify this part please:

Create method cannot return references in its argument

I am generally not that much familiar with scoped and such, so an example of a test, which would fail in case we reuse existing ROS where we shouldn't would be helpful

@CyrusNajmabadi
Copy link
Member

You only ever need scoped when the signature has a ref struct output. In this case it only returns ImmutableArray so we already know it can't return references in the parameter.

PErfect. That makes sense. Thanks!

@CyrusNajmabadi
Copy link
Member

@stephentoub Ignore the ping :)

@DoctorKrolic
Copy link
Contributor Author

for example, if it is an array or mutable span), we could also just do that conversion and pass the result to Create (given the ref safety criteria is also met

I would like to focus on the original case here with ROS only to keep the size of the PR small. My expirience says that making several "simple" optimizations might result in exponential explosion of test cases, which need to be checked)

@CyrusNajmabadi
Copy link
Member

I am generally not that much familiar with scoped and such, so an example of a test, which would fail in case we reuse existing ROS where we shouldn't would be helpful

Sure. Say you have the following:

[CollectionBuilder(typeof(MyRefType), "Create")]
ref struct MyRefType<T>
{
// ...
}

static class MyRefType
{
    public static MyRefType<T> Create<T>(ReadOnlySpan<T> values); // versus
    public static MyRefType<T> Create<T>(scoped ReadOnlySpan<T> values);
}

In the case of:

ReadOnlySpan<int> s = ... create ...;
MyRefType<int> t = [.. s];

Because the first MyRefType.Create overload returns a ref-struct, and is not scoped, it might capture the ROS being passed in, and thus would see the underlying values of it change if it wrapped something like a mutable array. So, to be safe, we must make a copy.

However, the arg to the second 'Create' overload is 'scoped'. This informs the compiler taht it could not be captured by the return value of create. And as such, we don't need to make a copy.

As rikki points out, this is only needed when returning a ref-struct, as a non-ref-struct could not ever capture the ROS coming in, so it would have to be making a copy to function at all.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Apr 19, 2024

Because the first MyRefType.Create overload returns a ref-struct, and is not scoped, it might capture the ROS being passed in, and thus would see the underlying values of it change if it wrapped something like a mutable array. So, to be safe, we must make a copy.

AS an fuller example of that:

[CollectionBuilder(typeof(MyRefType), "Create")]
ref struct HalfSpan<T>
{
    private readonly ReadOnlySpan<T> _ros;
    public int Length => _ros.Length / 2;
    public T this[int index] => _ros[index * 2];
}

static class MyRefType
{
    public static MyRefType<T> Create<T>(ReadOnlySpan<T> values) => new HalfSpan<T>(values);
}

// later

string[] values = ["a", "b", "c", "d";
HalfSpan<string> s = [.. values];

// This change must not be visible in 's'.  `.. values` promises you get your own 'copy' of hte values.
values[0] = "A"; values[1] = "B"; values[2] = "C"; values[3] = "D";

@RikkiGibson
Copy link
Contributor

As far as how to correctly implement the ref safety criteria. I think the following will do (in a code path where we are already narrowed down to the (MyCollectionBuilderType)[..readOnlySpan] case):

var canSkipCopyingArgument = !createMethod.ReturnType.IsRefLikeType || createMethod.Parameters[0].EffectiveScope != ScopedKind.ScopedValue;

@cston
Copy link
Member

cston commented May 2, 2024

@DoctorKrolic, we'll probably revisit postponed PRs after wrapping up 17.11.

@DoctorKrolic
Copy link
Contributor Author

@cston 17.11 P1 went to public. Does it mean that you are done with it and we can merge this PR?

@jaredpar
Copy link
Member

That is the just the initial preview of 17.11. We will likely be heads down until one of the much later previews.

@DoctorKrolic
Copy link
Contributor Author

@cston @jaredpar main is 17.12 now, so can we finally merge this PR?

@cston
Copy link
Member

cston commented Jul 3, 2024

/azp run roslyn-CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@RikkiGibson
Copy link
Contributor

❌CSC(0,0): error CS8103: Combined length of user strings used by the program exceeds allowed limit. Try to decrease use of string literals.

Seriously? This PR is where we happen to hit this limit? 😓

@DoctorKrolic
Copy link
Contributor Author

Seriously? This PR is where we happen to hit this limit?

Only in 2 CI legs though, one of which is a debug win build, while release win build and other debug builds are fine. This is either odd or suspicious

@cston cston self-requested a review July 3, 2024 20:58
@RikkiGibson
Copy link
Contributor

RikkiGibson commented Jul 3, 2024

Seriously? This PR is where we happen to hit this limit?

Only in 2 CI legs though, one of which is a debug win build, while release win build and other debug builds are fine. This is either odd or suspicious

Rerunning, we'll see if it comes up the same and go from there.

It's not necessarily surprising though that debug build would include more strings than release build, due to string literals within #if DEBUG sections. If it really is the case that we hit the limit here, we might have to move CollectionExpressionTests.cs to another project.

@RikkiGibson
Copy link
Contributor

Rerunning, we'll see if it comes up the same and go from there.

This time it came up green, I'm not sure what the difference might have been, but we are probably good to go.

@cston please follow up if you have any further feedback on this PR.

@RikkiGibson RikkiGibson merged commit 7e93076 into dotnet:main Jul 11, 2024
24 checks passed
@RikkiGibson
Copy link
Contributor

Thanks @DoctorKrolic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Blocked Community The pull request was submitted by a contributor who is not a Microsoft employee. untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants