-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Fix SourceTextComparer IEqualityComparer contract violation #81000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changed GetHashCode to use GetContentHash() instead of GetChecksum(), aligning it with the content-based Equals method. This ensures the IEqualityComparer contract is satisfied: two SourceText instances with the same content but different encodings/BOMs will now have the same hash code. Added comprehensive tests to validate the fix. Co-authored-by: CyrusNajmabadi <[email protected]>
Replaced LINQ Concat with manual byte array creation to avoid potential ambiguity with custom extension methods. Co-authored-by: CyrusNajmabadi <[email protected]>
|
|
||
| namespace Microsoft.CodeAnalysis.UnitTests.Text | ||
| { | ||
| public class SourceTextComparerTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot just move these into SourceTextTests.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 38aa364. Moved all tests into SourceTextTests.cs and deleted the separate file.
| Hash.Combine(contentsHash, | ||
| Hash.Combine(encodingHash, ((int)obj.ChecksumAlgorithm).GetHashCode()))); | ||
| var contentHash = obj.GetContentHash(); | ||
| return !contentHash.IsDefault ? Hash.CombineValues(contentHash) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot You won't need to check for .IsDefault. Also, just use MemoryMarshal.TryRead(contentHash.AsSpan(), ...) to just read out the first 4 bytes of the already strongly hashed bytes into the int you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 97acc72. Removed the IsDefault check and switched to MemoryMarshal.TryRead to read the first 4 bytes directly as an int.
Moved all SourceTextComparer tests from the separate SourceTextComparerTests.cs file into SourceTextTests.cs and deleted the separate test file. Test method names prefixed with "Comparer_" to distinguish them as comparer-specific tests. Co-authored-by: CyrusNajmabadi <[email protected]>
Simplified GetHashCode to use MemoryMarshal.TryRead to directly read the first 4 bytes of the content hash as an int, removing the unnecessary IsDefault check and Hash.CombineValues call. Co-authored-by: CyrusNajmabadi <[email protected]>
| Assert.Equal("span", ex.ParamName); | ||
| } | ||
|
|
||
| [Fact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot update this to [Fact, WorkItem("")] with the write url in the workitem for what is being fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in fc7e4c0. Added WorkItem attributes with the issue URL to all 7 comparer tests.
| Hash.Combine(encodingHash, ((int)obj.ChecksumAlgorithm).GetHashCode()))); | ||
| var contentHash = obj.GetContentHash(); | ||
| MemoryMarshal.TryRead(contentHash.AsSpan(), out int hashCode); | ||
| return hashCode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot actually, can just be MemoryMarshal.Read<int>(... Add comment stating that the content hash is 16 bytes, so reading off the first 4 bytes is always safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c8af757. Changed to MemoryMarshal.Read<int> and added a comment explaining that the 16-byte hash makes reading the first 4 bytes safe.
Added WorkItem attributes referencing issue #71006 to all 7 SourceTextComparer tests to document which issue they validate. Co-authored-by: CyrusNajmabadi <[email protected]>
Changed from MemoryMarshal.TryRead to MemoryMarshal.Read<int> and added a comment explaining that GetContentHash returns a 16-byte hash, making it safe to read the first 4 bytes as an int. Co-authored-by: CyrusNajmabadi <[email protected]>
…text-comparer-issue
RikkiGibson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, comment is not blocking.
| // Create source texts from byte arrays where one has a BOM and the other doesn't | ||
| const string content = "Test content"; | ||
|
|
||
| var utf8WithBOM = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this flag only controls the return value of GetPreamble(). You could consider deleting the UTF8Encoding locals and just using Encoding.UTF8 in this test.
Fix SourceTextComparer to use content-based equality consistently
Problem
SourceTextComparer violates the IEqualityComparer contract:
Equalsuses content-based comparison (viaContentEquals)GetHashCodeuses checksum which includes encoding and original bytesSolution
Change
GetHashCodeto useSourceText.GetContentHash()instead of the checksum-based approach. This makes both methods strictly content-based.Changes
GetHashCode()to useGetContentHash()withMemoryMarshal.Read<int>to efficiently read the first 4 bytes of the 16-byte hash as the hash codeTesting Results
This also fixes
SyntaxTreeComparerand improvesSourceTextValueProvidercaching behavior in analyzers and incremental generators.Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.