Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily populate FileIdentifier fields #69882

Merged

Conversation

ToddGrun
Copy link
Contributor

Found while inspecting profiles for turning on/off razor precise semantic tokens. In the code analysis process, I see 1.6% of CPU and 0.4% of allocations spent in MS.CA.C#.FileIdentifier.Create. (which is over half of the CPU time spent in in CreateSemanticModel). I tested this locally by opening the Roslyn sln, waiting several minutes, and saw over 30K FileIdentifier objects created.

One thing I was curious about was which members were being heavily used in this object as there are differing costs associated with populating each of the members. What I found was that, at least for the scenarios I was testing, none of the objects members were being called for any of the 30K FileIdentifier objects, and thus we could lazily populate this object and not likely ever incur those costs.

image

image

Found while inspecting profiles for turning on/off razor precise semantic tokens. In the code analysis process, I see 1.6% of CPU and 0.4% of allocations spent in MS.CA.C#.FileIdentifier.Create. (which is over half of the CPU time spent in in CreateSemanticModel). I tested this locally by opening the Roslyn sln, waiting several minutes, and saw over 30K FileIdentifier objects created.

One thing I was curious about was which members were being heavily used in this object as there are differing costs associated with populating each of the members. What I found was that, at least for the scenarios I was testing, none of the objects members were being called for any of the 30K FileIdentifier objects, and thus we could lazily populate this object and not likely ever incur those costs.
@ToddGrun ToddGrun requested a review from a team as a code owner September 11, 2023 18:39
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Sep 11, 2023
@ToddGrun ToddGrun linked an issue Sep 12, 2023 that may be closed by this pull request
@ToddGrun
Copy link
Contributor Author

@333fred, @AlekseyTs ptal

string? encoderFallbackErrorMessage = null;
ImmutableArray<byte> hash = default;
try
if (_filePath is not null)
Copy link
Contributor Author

@ToddGrun ToddGrun Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (_filePath is not null)

Just realized I'm not sure if the properties are called on different threads. I could put _filePath in a local and use that in the method if that is a concern. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and changed it, better safe than sorry

}

public static FileIdentifier Create(SyntaxTree tree)
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create

Is this helper used? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 5)

@RikkiGibson RikkiGibson self-assigned this Sep 12, 2023
@@ -1371,7 +1371,7 @@ FileIdentifier getFileIdentifierForFileTypes()
if (binder is BuckStopsHereBinder lastBinder)
{
// we never expect to bind a file type in a context where the BuckStopsHereBinder lacks an AssociatedFileIdentifier
return lastBinder.AssociatedFileIdentifier.Value;
return lastBinder.AssociatedFileIdentifier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's insert ?? throw ExceptionUtilities.Unreachable() here so we preserve the null test being performed by the original .Value access.

@@ -9,33 +9,91 @@

namespace Microsoft.CodeAnalysis.CSharp;

internal readonly struct FileIdentifier
internal class FileIdentifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sealed


var displayFilePath = GeneratedNames.GetDisplayFilePath(filePath);

_encoderFallbackErrorMessage = encoderFallbackErrorMessage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still don't have thread safety with this approach. I could be misunderstanding something, though. My suggestion was:

  • make all the class fields readonly, and initialize their values in the constructor, rather than EnsureInitialized.
  • for those types where we want to cache this, e.g. BuckStopsHereBinder, source type symbols, and PE type symbols, include a field of type FileIdentifier?, which initially has value null.
  • When the accessor e.g. AssociatedFileIdentifier is called, and the field is null, interlocked-initialize the FileIdentifier field with a fully initialized instance and return it.

I think this most conveniently solves the thread safety questions while still giving us the perf win, as most of the binders and symbols will never actually create the instance.

I'm definitely open to other approaches and would like to make sure @333fred and @AlekseyTs would also be comfortable with this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to go whatever approach you guys like best. I'm not clear on exactly where the concern lies with the current approach, and not clear on exactly what you are proposing, but if Fred and Aleksey like it better than what this PR has so far, I'm definitely willing to give that a go.

Copy link
Contributor

@RikkiGibson RikkiGibson Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with Aleksey and we think that what I've described is overkill, and also just moves the problem from internal to this type to external to it, in some ways.

I think it will be adequate to instead use the solution you have, but modify the properties like DisplayFilePath, etc. to Volatile.Read the fields, rather than just returning them normally.

The scenario I want to prevent is that we do not get in an inconsistent state, where another thread observes that _filePath is null, but also observes the "derived" fields like _displayFilePath as still null, due to inadequate synchronization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing several Volatile.Reads, would it be sufficient to just do Volatile.Write(ref _filePath, null) when assigning _filePath instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing several Volatile.Reads, would it be sufficient to just do Volatile.Write(ref _filePath, null) when assigning _filePath instead?

Yes, I think so. Doc says

A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the way I read it too and it seemed easier. Just wanted to make sure I wasn't missing something.

Copy link
Contributor

@AlekseyTs AlekseyTs Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write.

I believe Interlocked exchange (above) provides the same guarantees. Also, this doesn't affect other threads, but @RikkiGibson was concerned about reads happening on another thread.


ImmutableInterlocked.InterlockedInitialize(ref _filePathChecksumOpt, hash);

Volatile.Write(ref _filePath, null);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volatile.Write(ref _filePath, null);

I believe Interlocked exchange (above) provides the same guarantees, but doing this doesn't hurt. Also, this doesn't affect other threads, but @RikkiGibson was concerned about reads happening on another thread. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RikkiGibson -- do you still have concerns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw Cyrus use StrongBox in another PR for a similar concern. I think I would feel better about _filePathChecksumOpt being a StrongBox? instead of an ImmutableArray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a similar concern

What concern? I am not aware of any issue with InterlockedInitialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cyrus's fix wasn't around usage of InterlockedInitialize, but was around concurrent read/write of a struct. With the change from commit 7 to commit 8, I no longer need to worry about the _filePathChecksumOpt member getting partially written during another thread reading it. I have no problem changing it back if you are convinced that wouldn't have been a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe the Volatile.Write alone is sufficient here. That does guarantee that the writes happen in a specific order, however there is nothing ensuring that the reads occur in a specific order. The reads on other threads of _filePath and _displayFilePath can be re-oredered with respect to each other. That effectively undoes the protections you're trying to get with Volatile.Write. Those need to be changed to use Volatile.Read, at least on _filePath to get the protections.

Copy link
Member

@jaredpar jaredpar Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having these type of ordered writes / reads is they are very difficult to get correct and maintain. Very few people really understand this area. I wasn't even confident in posting my analysis until I'd gotten private buy off that I was remembering the rules correctly. Even if we get this right now it's very easy for the invariants to be broken by future developers who don't understand the significance of the ordering.

I would recommend instead putting all of the data that is calculated here into a single value like Tuple<string, string>. That way there is a single value that is either set or not set. No need for tricky read / write ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem changing it back if you are convinced that wouldn't have been a problem.

It is a very common pattern that we use in symbols to rely on ImmutableInterlocked.InterlockedInitialize and simply read the array in another place. It feels like StrongBox is an overkill here, or all other places should be fixed up as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of putting all the data into a single reference type. I'll go with something like the following unless there is a good reason someone has to use Tuple directly instead:

private record FileIdentifierData(string? EncoderFallbackErrorMessage, string DisplayFilePath, ImmutableArray FilePathChecksumOpt);

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 7)


_encoderFallbackErrorMessage = encoderFallbackErrorMessage;
_displayFilePath = displayFilePath;
_filePathChecksumOpt = new(hash);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new

Target typed new is forbidden in Compilers #Closed

{
private record FileIdentifierData(string? EncoderFallbackErrorMessage, string DisplayFilePath, ImmutableArray<byte> FilePathChecksumOpt);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record

Unless I am mistaken, we are not using any of the code generated for this type other than properties. Please switch to a regular class. Also, for private types like this, we often stick with public readonly fields (not even bothering with properties). #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are primary ctors ok to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are primary ctors ok to use?

As far as I know.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 9)

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 10)

@ToddGrun ToddGrun merged commit 638ca0c into dotnet:main Sep 14, 2023
27 checks passed
@ghost ghost added this to the Next milestone Sep 14, 2023
@Cosifne Cosifne modified the milestones: Next, 17.8 P3 Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf: Lazily populate FileIdentifier fields
6 participants