Simplifying internal structure and logic for the SyntaxNodeCache#80825
Simplifying internal structure and logic for the SyntaxNodeCache#80825CyrusNajmabadi wants to merge 16 commits intodotnet:mainfrom
Conversation
|
|
||
| #endregion | ||
|
|
||
| #region Caching |
There was a problem hiding this comment.
this code was moved. with no changes.
src/Compilers/Core/Portable/Syntax/InternalSyntax/SyntaxNodeCache.cs
Outdated
Show resolved
Hide resolved
| this.hash = hash; | ||
| this.node = node; | ||
| } | ||
| } |
There was a problem hiding this comment.
This type was not helpful. First, it made it so that one had to reason about the semantics of torn reads/writes in the array below. The semantics were correct, but were very subtle and difficult to prove. Second, extra data stored here did not help at all. The hash that was read could not actually be asserted to be anything (since collisions overwrite). And the hash could only be used, at best, to no effect to compare against the existing hash into that location. But at worse, it could disallow reuse of a node that could be reused during a tear.
The hash itself is computed from data that is already checked in IsNodeEquivalent. While technically this might reduce the number of checks performed when there was a collision, practically, this turns out to almost always be one check max as the Kind validation almost always immediately fails. See PR op for more data on this.
Note: the Kind/Flags checks are non-virtual and are just plucking raw data out of the green node. So we're not adding any indirections or anything like that.
5365f57 to
5f04d74
Compare
|
@dotnet/roslyn-compiler for consideration. Found during razor code investigations. |
c39989c to
5f04d74
Compare
| { | ||
| var child0 = new SyntaxTokenWithTrivia(SyntaxKind.IntKeyword, null, null); | ||
| SyntaxNodeCache.AddNode(child0, child0.GetCacheHash()); | ||
| SyntaxNodeCache.AddNode(child0, SyntaxNodeCache.GetCacheHash(child0)); |
There was a problem hiding this comment.
moved everything related to node caching to the SyntaxNodeCache type itself.
|
@dotnet/roslyn-compiler this is ready for review. |
|
@ToddGrun did you make this change with Razor? If so, can you link to your PR for that? |
|
|
@dotnet/roslyn-compiler this is ready for review. |
|
@dotnet/roslyn-compiler ptal. |
1 similar comment
|
@dotnet/roslyn-compiler ptal. |
src/Compilers/Core/Portable/Syntax/InternalSyntax/SyntaxNodeCache.cs
Outdated
Show resolved
Hide resolved
src/Compilers/Core/Portable/Syntax/InternalSyntax/SyntaxNodeCache.cs
Outdated
Show resolved
Hide resolved
src/Compilers/Core/Portable/Syntax/InternalSyntax/SyntaxNodeCache.cs
Outdated
Show resolved
Hide resolved
…che.cs Co-authored-by: Jan Jones <jan.jones.cz@gmail.com>
…che.cs Co-authored-by: Jan Jones <jan.jones.cz@gmail.com>
|
@jjonescz can you do a /pr-val for me? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
|
View PR Validation Run triggered by @jjonescz Parameters
https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/702563 |
|
@jjonescz how did the build go? |
|
@dotnet/roslyn-compiler ptal. |
333fred
left a comment
There was a problem hiding this comment.
Overall LGTM, but I do have a question on hash distribution. I think the existing behavior is the same, but I'm curious nonetheless.
|
|
||
| private static readonly Entry[] s_cache = new Entry[CacheSize]; | ||
| /// <summary> | ||
| /// Simply array indexed by the hash of the cached node. Note that unlike a typical dictionary/hashtable, this |
There was a problem hiding this comment.
| /// Simply array indexed by the hash of the cached node. Note that unlike a typical dictionary/hashtable, this | |
| /// Simple array indexed by the hash of the cached node. Note that unlike a typical dictionary/hashtable, this |
|
|
||
| int hash = child.GetCacheHash(); | ||
| int hash = GetCacheHash(child); | ||
| int idx = hash & CacheMask; |
There was a problem hiding this comment.
It feels like nothing actually ensures there's a uniform distribution here. Is that a correct intuition? Is there anything we could do to ensure that?


Found during an investigation with razor for similar code they had copied from roslyn. A few things were changed here.
I recommend reviewing one commit at a time.
First, the core data stored in the cache array, changed from a large struct to a single pointer. This struct could tear, which could lead to overly conservative behavior. For example, if a read from the cache overlapped a write from some otehr thread, the read could potentially see a mismatched hashcode, leading it to not accept a node from the cache it should accept.
Second: code specific to the cache was placed in GreenNode itself, despite never being used anywhere byt from the cache. This code was moved to the cache allowing everything to be self contained and not exposed to unnecessary parts of the codebase.
Third: Docs were beefed up to explain the semantics here, and why it's ok that this type has safe hashing semantics, even with a "last collision wins" implementation.
--
Data backing this up. I tested caching on all the C# files of roslyn. Here were the stats:
Number of files parsed: 15955
Number of times we checked the cached for actually cacheable items: 17,151,515
Number of cache successes: 9,536,254. Around 56%. So 56% of the time we attempt to lookup something in the cache, we can find and reuse an item.
Cache collisions: 7,539,649. Number of times we looked up in the cache, found an item, but couldn't reuse it because it didn't match what we were looking up.
Note: 7,539,649 + 9,536,254 != 17,151,515. It's about 75k short. That means only in 75k cases (0.4% of lookups) did we lookup in the cache and find nothing. Given that hte cache only has 65k elements. That means that extremely quickly it filled up entirely, and we're basically always either finding a match or hitting a collision.
Cache collisions with same kind: 37,335 (0.2% of cases). This is the number of times we collided with something existing and it had the kind. This shows that we really don't need to compare hashes with what is currently stored there as practically always the kind check is enough. Conceptually, this makes sense. Nodes with different kinds are already going to distribute themselves broadly across the entire space of the array (there are only really 500 syntax kinds, and 161 of those kinds actually have a slotcount <= 3), while the array has 65k elements. So chance of collision on diff kind is already low, as those 168 kinds will distribute well across the keyspace. Similarly, flags will generally be very similar. So they won't move things around. So to collide on a different kind, you'd need to somehow have it + the child hashes work together to give you that bad luck.
So keeping the hash around doesn't really save anything. While making the code more difficult to reason about.
--
Note: i tested parsing time before/after this with roslyn. Parsing time after the change for all those files was 1.3680783 and before 1.3965380s on average. This is likely within the margin of error.
--
Note: ideally we can keep this code entirely in sync with Razor as well. Making it easier to reason about both systems.