-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C#: Add binlog support to buildless with source generator support #16747
C#: Add binlog support to buildless with source generator support #16747
Conversation
catch (Exception ex) | ||
{ | ||
// If this happened, it was probably because | ||
// - the same file was compiled multiple times, or | ||
// - the file doesn't exist (due to wrong #line directive or because it's an in-memory source generated AST). | ||
// In any case, this is not a fatal error. | ||
logger.LogWarning("Problem archiving " + dest + ": " + ex); | ||
} |
Check notice
Code scanning / CodeQL
Generic catch clause Note
1baaa46
to
4bdef60
Compare
csharp/extractor/Semmle.Extraction.CSharp/Extractor/Extractor.cs
Outdated
Show resolved
Hide resolved
To make sure I understand: The idea with this change is that the user should be able to provide a path to a directory containing a "binary log"; A binary log is created during a previous compilation of the source code and contains information about the compiler calls and the generated syntax trees? |
Yes, more or less. The user would pass a
The binary log contains information about the compiler calls. The generated ASTs are produced by |
But there is no new compilation attempt right? |
No, there's no new compilation, other than the usual AST creation and symbol resolution based on the compiler call arguments. There's some detail in #16346 regarding how things work. I edited this answer a couple of times. I think what you're interested in is that whether there's a new call to csc. There's no new call. We're using |
csharp/extractor/Semmle.Extraction.CSharp/Extractor/Extractor.cs
Outdated
Show resolved
Hide resolved
csharp/extractor/Semmle.Extraction/Extractor/BinaryLogExtractionContext.cs
Show resolved
Hide resolved
// Compute a unique folder name for the generated files: | ||
generatedFolderName = "generated"; | ||
|
||
if (Directory.Exists(generatedFolderName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will check for generated
in the current working directory. Did you mean to do something like Path.Combine(cwd, generatedFolderName)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is on purpose. The current working directory will point to the --source-root
specified in the codeql database creation command. In practice this will be the repo root that you're analysing. All the processed source files are stored in the source archive in the codeql database. And this logic is making sure that we're assigning the generated ASTs paths that don't collide with any already existing source files.
The integration test results in csharp/ql/integration-tests/all-platforms/binlog/Files.expected
show some sample paths, for example:
generated/b/test.csproj (net8.0)/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs
We could also use Path.Combine(cwd, generatedFolderName)
, then the paths would point to locations closer to their original .csproj
files, such as
b/generated/test.csproj (net8.0)/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs
I think the root level generated
folder should work, and then all these generated ASTs are assigned to a single folder.
69d89cd
to
24089f6
Compare
I rebased this PR after upgrading the Microsoft.CodeAnalysis dependency from 4.8.0 to 4.9.2 in #16832. This allowed me to upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Very nice work @tamasvajk!
24089f6
to
16b63cd
Compare
I rebased to fix a merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
16b63cd
to
1e2d1ef
Compare
Sorry for the constant noise on this PR. I rebased it again to fix some conflicts with #16857. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR is adding MsBuild binary log extraction support to buildless. There's another variant of this in #16672. The main difference is that this version also runs source generators and extracts the additional ASTs.
I've compared traced databases to binlog extracted ones, and found that the alert retention rate (in the nightly query suite) is around 97.5% on
dotnet/efcore
, 99% ondotnet/msbuild
, and 99% ondotnet/roslyn
. The latter is an approximation, because Roslyn uses source generators, and the generated files are at different locations in the traced and binlog extracted DBs, so the alerts are not matching in these files.Database creation times are faster with binlog extraction. Speedups (on a single measurement):
roslyn
: 43.2%,efcore
: 35%,msbuild
: 64.6%.