Skip to content

Use the compiler encoding for baseline source files recovered from file system#81933

Merged
tmat merged 4 commits intodotnet:mainfrom
tmat:HREncoding-1
Jan 9, 2026
Merged

Use the compiler encoding for baseline source files recovered from file system#81933
tmat merged 4 commits intodotnet:mainfrom
tmat:HREncoding-1

Conversation

@tmat
Copy link
Member

@tmat tmat commented Jan 8, 2026

Hot Reload compares the current document snapshot with a baseline document snapshot to find out what the semantic difference is between them.
P
The the baseline solution is captured when the debugging session starts, but the document content at that point doesn't necessarily match the source code that the compiler used to compile the baseline assembly. We compare the checksum of the binary content of the document against the checksum that the compiler stored in the PDB. If the baseline document checksum doesn't match we read the source file from disk, in case it hasn't been overwritten yet and still contains the content used by the compiler for baseline compilation. If the checksum matches the PDB we know that the decoded text of the document can be used as a baseline for change detection.

When reading the file content we need to use the exact encoding that the compiler used, otherwise we might interpret the binary content differently than the compiler did. Previously we used the IDE encoding, but it turns out the IDE doesn't necessarily know what encoding was used by the compiler. E.g. LSP doesn't have a concept of encoding and thus the LSP server always uses UTF8 when creating SourceText.

We could try to make sure the encoding is always correctly set in the IDE. The LSP server could detect the encoding. However, Hot Reload already has all the information that the compiler had when compiling the assembly. The compiler auto-detects the encoding from the file content unless it's specified via CodePage project property. If the property is specified the value is stored in the compiler options record in the PDB, which we can read.

This PR changes the code to always auto-detect the encoding from file content. Implementing support for CodePage property is tracked by a follow up issue: #81930

Partially fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2067885/ - if the file is not saved when Hot Reload is applied it will work. If the file is saved though it's still not working: #82434

…e system.

Hot Reload compares the current document snapshot with a baseline document snapshot to find out what the semantic difference is between them.

The the baseline solution is captured when the debugging session starts, but the document content at that point doesn't necessarily match the source code that the compiler used to compile the baseline assembly. We compare the checksum of the binary content of the document against the checksum that the compiler stored in the PDB.
If the baseline document checksum doesn't match we read the source file from disk, in case it hasn't been overwritten yet and still contains the content used by the compiler for baseline compilation. If the checksum matches the PDB we know that the decoded text of the document can be used as a baseline for change detection.

When reading the file content we need to use the exact encoding that the compiler used, otherwise we might interpret the binary content differently than the compiler did.
Previously we used the IDE encoding, but it turns out the IDE doesn't necessarily know what encoding was used by the compiler. E.g. LSP doesn't have a concept of encoding and thus the LSP server always uses UTF8 when creating SourceText.

We could try to make sure the encoding is always correctly set in the IDE. The LSP server could detect the encoding.
However, Hot Reload already has all the information that the compiler had when compiling the assembly.
The compiler auto-detects the encoding from the file content unless it's specified via CodePage project property.
If the property is specified the value is stored in the compiler options record in the PDB, which we can read.

This PR changes the code to always auto-detect the encoding from file content. Implementing support for CodePage property is tracked by a follow up issue: dotnet#81930
@tmat
Copy link
Member Author

tmat commented Jan 9, 2026

@DustinCampbell ptal

@tmat tmat changed the title Use the compiler encoding for basline source files recovered from file system Use the compiler encoding for baseline source files recovered from file system Jan 9, 2026
@tmat tmat merged commit 90057a7 into dotnet:main Jan 9, 2026
26 checks passed
@tmat tmat deleted the HREncoding-1 branch January 9, 2026 19:22
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Jan 9, 2026
@davidwengier davidwengier modified the milestones: Next, 18.4 Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants