-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazily populate FileIdentifier fields #69882
Lazily populate FileIdentifier fields #69882
Conversation
Found while inspecting profiles for turning on/off razor precise semantic tokens. In the code analysis process, I see 1.6% of CPU and 0.4% of allocations spent in MS.CA.C#.FileIdentifier.Create. (which is over half of the CPU time spent in in CreateSemanticModel). I tested this locally by opening the Roslyn sln, waiting several minutes, and saw over 30K FileIdentifier objects created. One thing I was curious about was which members were being heavily used in this object as there are differing costs associated with populating each of the members. What I found was that, at least for the scenarios I was testing, none of the objects members were being called for any of the 30K FileIdentifier objects, and thus we could lazily populate this object and not likely ever incur those costs.
@333fred, @AlekseyTs ptal |
string? encoderFallbackErrorMessage = null; | ||
ImmutableArray<byte> hash = default; | ||
try | ||
if (_filePath is not null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and changed it, better safe than sorry
} | ||
|
||
public static FileIdentifier Create(SyntaxTree tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with review pass (commit 5) |
@@ -1371,7 +1371,7 @@ FileIdentifier getFileIdentifierForFileTypes() | |||
if (binder is BuckStopsHereBinder lastBinder) | |||
{ | |||
// we never expect to bind a file type in a context where the BuckStopsHereBinder lacks an AssociatedFileIdentifier | |||
return lastBinder.AssociatedFileIdentifier.Value; | |||
return lastBinder.AssociatedFileIdentifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's insert ?? throw ExceptionUtilities.Unreachable()
here so we preserve the null test being performed by the original .Value
access.
@@ -9,33 +9,91 @@ | |||
|
|||
namespace Microsoft.CodeAnalysis.CSharp; | |||
|
|||
internal readonly struct FileIdentifier | |||
internal class FileIdentifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sealed
|
||
var displayFilePath = GeneratedNames.GetDisplayFilePath(filePath); | ||
|
||
_encoderFallbackErrorMessage = encoderFallbackErrorMessage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still don't have thread safety with this approach. I could be misunderstanding something, though. My suggestion was:
- make all the class fields
readonly
, and initialize their values in the constructor, rather thanEnsureInitialized
. - for those types where we want to cache this, e.g. BuckStopsHereBinder, source type symbols, and PE type symbols, include a field of type
FileIdentifier?
, which initially has valuenull
. - When the accessor e.g.
AssociatedFileIdentifier
is called, and the field is null, interlocked-initialize theFileIdentifier
field with a fully initialized instance and return it.
I think this most conveniently solves the thread safety questions while still giving us the perf win, as most of the binders and symbols will never actually create the instance.
I'm definitely open to other approaches and would like to make sure @333fred and @AlekseyTs would also be comfortable with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm willing to go whatever approach you guys like best. I'm not clear on exactly where the concern lies with the current approach, and not clear on exactly what you are proposing, but if Fred and Aleksey like it better than what this PR has so far, I'm definitely willing to give that a go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with Aleksey and we think that what I've described is overkill, and also just moves the problem from internal to this type to external to it, in some ways.
I think it will be adequate to instead use the solution you have, but modify the properties like DisplayFilePath
, etc. to Volatile.Read
the fields, rather than just returning them normally.
The scenario I want to prevent is that we do not get in an inconsistent state, where another thread observes that _filePath
is null, but also observes the "derived" fields like _displayFilePath
as still null, due to inadequate synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing several Volatile.Reads, would it be sufficient to just do Volatile.Write(ref _filePath, null) when assigning _filePath instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing several Volatile.Reads, would it be sufficient to just do Volatile.Write(ref _filePath, null) when assigning _filePath instead?
Yes, I think so. Doc says
A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the way I read it too and it seemed easier. Just wanted to make sure I wasn't missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write.
I believe Interlocked exchange (above) provides the same guarantees. Also, this doesn't affect other threads, but @RikkiGibson was concerned about reads happening on another thread.
|
||
ImmutableInterlocked.InterlockedInitialize(ref _filePathChecksumOpt, hash); | ||
|
||
Volatile.Write(ref _filePath, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Interlocked exchange (above) provides the same guarantees, but doing this doesn't hurt. Also, this doesn't affect other threads, but @RikkiGibson was concerned about reads happening on another thread. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RikkiGibson -- do you still have concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw Cyrus use StrongBox in another PR for a similar concern. I think I would feel better about _filePathChecksumOpt being a StrongBox? instead of an ImmutableArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a similar concern
What concern? I am not aware of any issue with InterlockedInitialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyrus's fix wasn't around usage of InterlockedInitialize, but was around concurrent read/write of a struct. With the change from commit 7 to commit 8, I no longer need to worry about the _filePathChecksumOpt member getting partially written during another thread reading it. I have no problem changing it back if you are convinced that wouldn't have been a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe the Volatile.Write
alone is sufficient here. That does guarantee that the writes happen in a specific order, however there is nothing ensuring that the reads occur in a specific order. The reads on other threads of _filePath
and _displayFilePath
can be re-oredered with respect to each other. That effectively undoes the protections you're trying to get with Volatile.Write
. Those need to be changed to use Volatile.Read
, at least on _filePath
to get the protections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with having these type of ordered writes / reads is they are very difficult to get correct and maintain. Very few people really understand this area. I wasn't even confident in posting my analysis until I'd gotten private buy off that I was remembering the rules correctly. Even if we get this right now it's very easy for the invariants to be broken by future developers who don't understand the significance of the ordering.
I would recommend instead putting all of the data that is calculated here into a single value like Tuple<string, string>
. That way there is a single value that is either set or not set. No need for tricky read / write ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem changing it back if you are convinced that wouldn't have been a problem.
It is a very common pattern that we use in symbols to rely on ImmutableInterlocked.InterlockedInitialize
and simply read the array in another place. It feels like StrongBox is an overkill here, or all other places should be fixed up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of putting all the data into a single reference type. I'll go with something like the following unless there is a good reason someone has to use Tuple directly instead:
private record FileIdentifierData(string? EncoderFallbackErrorMessage, string DisplayFilePath, ImmutableArray FilePathChecksumOpt);
Done with review pass (commit 7) |
|
||
_encoderFallbackErrorMessage = encoderFallbackErrorMessage; | ||
_displayFilePath = displayFilePath; | ||
_filePathChecksumOpt = new(hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…he data that is created on demand.
{ | ||
private record FileIdentifierData(string? EncoderFallbackErrorMessage, string DisplayFilePath, ImmutableArray<byte> FilePathChecksumOpt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are primary ctors ok to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are primary ctors ok to use?
As far as I know.
Done with review pass (commit 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (commit 10)
Found while inspecting profiles for turning on/off razor precise semantic tokens. In the code analysis process, I see 1.6% of CPU and 0.4% of allocations spent in MS.CA.C#.FileIdentifier.Create. (which is over half of the CPU time spent in in CreateSemanticModel). I tested this locally by opening the Roslyn sln, waiting several minutes, and saw over 30K FileIdentifier objects created.
One thing I was curious about was which members were being heavily used in this object as there are differing costs associated with populating each of the members. What I found was that, at least for the scenarios I was testing, none of the objects members were being called for any of the 30K FileIdentifier objects, and thus we could lazily populate this object and not likely ever incur those costs.