Skip to content

Simplifying internal structure and logic for the SyntaxNodeCache#80825

Open
CyrusNajmabadi wants to merge 16 commits intodotnet:mainfrom
CyrusNajmabadi:removeHash
Open

Simplifying internal structure and logic for the SyntaxNodeCache#80825
CyrusNajmabadi wants to merge 16 commits intodotnet:mainfrom
CyrusNajmabadi:removeHash

Conversation

@CyrusNajmabadi
Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Oct 20, 2025

Found during an investigation with razor for similar code they had copied from roslyn. A few things were changed here.

I recommend reviewing one commit at a time.

First, the core data stored in the cache array, changed from a large struct to a single pointer. This struct could tear, which could lead to overly conservative behavior. For example, if a read from the cache overlapped a write from some otehr thread, the read could potentially see a mismatched hashcode, leading it to not accept a node from the cache it should accept.

Second: code specific to the cache was placed in GreenNode itself, despite never being used anywhere byt from the cache. This code was moved to the cache allowing everything to be self contained and not exposed to unnecessary parts of the codebase.

Third: Docs were beefed up to explain the semantics here, and why it's ok that this type has safe hashing semantics, even with a "last collision wins" implementation.

--

Data backing this up. I tested caching on all the C# files of roslyn. Here were the stats:

Number of files parsed: 15955

Number of times we checked the cached for actually cacheable items: 17,151,515
Number of cache successes: 9,536,254. Around 56%. So 56% of the time we attempt to lookup something in the cache, we can find and reuse an item.

Cache collisions: 7,539,649. Number of times we looked up in the cache, found an item, but couldn't reuse it because it didn't match what we were looking up.

Note: 7,539,649 + 9,536,254 != 17,151,515. It's about 75k short. That means only in 75k cases (0.4% of lookups) did we lookup in the cache and find nothing. Given that hte cache only has 65k elements. That means that extremely quickly it filled up entirely, and we're basically always either finding a match or hitting a collision.

Cache collisions with same kind: 37,335 (0.2% of cases). This is the number of times we collided with something existing and it had the kind. This shows that we really don't need to compare hashes with what is currently stored there as practically always the kind check is enough. Conceptually, this makes sense. Nodes with different kinds are already going to distribute themselves broadly across the entire space of the array (there are only really 500 syntax kinds, and 161 of those kinds actually have a slotcount <= 3), while the array has 65k elements. So chance of collision on diff kind is already low, as those 168 kinds will distribute well across the keyspace. Similarly, flags will generally be very similar. So they won't move things around. So to collide on a different kind, you'd need to somehow have it + the child hashes work together to give you that bad luck.

So keeping the hash around doesn't really save anything. While making the code more difficult to reason about.

--

Note: i tested parsing time before/after this with roslyn. Parsing time after the change for all those files was 1.3680783 and before 1.3965380s on average. This is likely within the margin of error.

--

Note: ideally we can keep this code entirely in sync with Razor as well. Making it easier to reason about both systems.

@CyrusNajmabadi CyrusNajmabadi changed the title Removing seemingly unnecessary data from cache Removing unnecessary checks from SyntaxNodeCache Oct 21, 2025

#endregion

#region Caching
Copy link
Contributor Author

@CyrusNajmabadi CyrusNajmabadi Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was moved. with no changes.

this.hash = hash;
this.node = node;
}
}
Copy link
Contributor Author

@CyrusNajmabadi CyrusNajmabadi Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type was not helpful. First, it made it so that one had to reason about the semantics of torn reads/writes in the array below. The semantics were correct, but were very subtle and difficult to prove. Second, extra data stored here did not help at all. The hash that was read could not actually be asserted to be anything (since collisions overwrite). And the hash could only be used, at best, to no effect to compare against the existing hash into that location. But at worse, it could disallow reuse of a node that could be reused during a tear.

The hash itself is computed from data that is already checked in IsNodeEquivalent. While technically this might reduce the number of checks performed when there was a collision, practically, this turns out to almost always be one check max as the Kind validation almost always immediately fails. See PR op for more data on this.

Note: the Kind/Flags checks are non-virtual and are just plucking raw data out of the green node. So we're not adding any indirections or anything like that.

@CyrusNajmabadi CyrusNajmabadi changed the title Removing unnecessary checks from SyntaxNodeCache Simplifying internal structure and logic for the SyntaxNodeCache Oct 21, 2025
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review October 21, 2025 10:24
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner October 21, 2025 10:24
@CyrusNajmabadi
Copy link
Contributor Author

@dotnet/roslyn-compiler for consideration. Found during razor code investigations.

{
var child0 = new SyntaxTokenWithTrivia(SyntaxKind.IntKeyword, null, null);
SyntaxNodeCache.AddNode(child0, child0.GetCacheHash());
SyntaxNodeCache.AddNode(child0, SyntaxNodeCache.GetCacheHash(child0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved everything related to node caching to the SyntaxNodeCache type itself.

@CyrusNajmabadi CyrusNajmabadi marked this pull request as draft November 11, 2025 20:22
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review November 12, 2025 11:37
@CyrusNajmabadi
Copy link
Contributor Author

@dotnet/roslyn-compiler this is ready for review.

@CyrusNajmabadi
Copy link
Contributor Author

@ToddGrun did you make this change with Razor? If so, can you link to your PR for that?

@ToddGrun
Copy link
Contributor

@ToddGrun did you make this change with Razor? If so, can you link to your PR for that?

dotnet/razor#12370

@CyrusNajmabadi
Copy link
Contributor Author

@dotnet/roslyn-compiler this is ready for review.

@CyrusNajmabadi
Copy link
Contributor Author

@dotnet/roslyn-compiler ptal.

1 similar comment
@CyrusNajmabadi
Copy link
Contributor Author

@dotnet/roslyn-compiler ptal.

@CyrusNajmabadi
Copy link
Contributor Author

@jjonescz can you do a /pr-val for me?

@dotnet dotnet deleted a comment from github-actions bot Jan 23, 2026
@dotnet dotnet deleted a comment from github-actions bot Jan 23, 2026
@dotnet dotnet deleted a comment from github-actions bot Jan 23, 2026
@dotnet dotnet deleted a comment from github-actions bot Jan 23, 2026
@jjonescz

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jjonescz

This comment was marked as resolved.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 23, 2026

View PR Validation Run triggered by @jjonescz

Parameters
  • Validation Type: pr-val
  • Pipeline ID: 8972
  • Pipeline Version: main
  • PR Number: 80825
  • Commit SHA: 30a6a9fa650c42c36644e695cab553248b99a6a9
  • Source Branch: removeHash
  • Target Branch: main
  • Build ID: 13155424

https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/702563

@CyrusNajmabadi
Copy link
Contributor Author

@jjonescz how did the build go?

@jjonescz
Copy link
Member

jjonescz commented Jan 28, 2026

There is one regression in Speedometer

image

but it looks like noise

image

@CyrusNajmabadi
Copy link
Contributor Author

CyrusNajmabadi commented Feb 14, 2026

@dotnet/roslyn-compiler ptal.

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I do have a question on hash distribution. I think the existing behavior is the same, but I'm curious nonetheless.


private static readonly Entry[] s_cache = new Entry[CacheSize];
/// <summary>
/// Simply array indexed by the hash of the cached node. Note that unlike a typical dictionary/hashtable, this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Simply array indexed by the hash of the cached node. Note that unlike a typical dictionary/hashtable, this
/// Simple array indexed by the hash of the cached node. Note that unlike a typical dictionary/hashtable, this


int hash = child.GetCacheHash();
int hash = GetCacheHash(child);
int idx = hash & CacheMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like nothing actually ensures there's a uniform distribution here. Is that a correct intuition? Is there anything we could do to ensure that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants