Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#: Add binlog support to buildless with source generator support #16747

Merged
merged 9 commits into from
Jun 28, 2024

Conversation

tamasvajk
Copy link
Contributor

@tamasvajk tamasvajk commented Jun 13, 2024

This PR is adding MsBuild binary log extraction support to buildless. There's another variant of this in #16672. The main difference is that this version also runs source generators and extracts the additional ASTs.

I've compared traced databases to binlog extracted ones, and found that the alert retention rate (in the nightly query suite) is around 97.5% on dotnet/efcore, 99% on dotnet/msbuild, and 99% on dotnet/roslyn. The latter is an approximation, because Roslyn uses source generators, and the generated files are at different locations in the traced and binlog extracted DBs, so the alerts are not matching in these files.

Database creation times are faster with binlog extraction. Speedups (on a single measurement): roslyn: 43.2%, efcore: 35%, msbuild: 64.6%.

@github-actions github-actions bot added the C# label Jun 13, 2024
Comment on lines +136 to +143
catch (Exception ex)
{
// If this happened, it was probably because
// - the same file was compiled multiple times, or
// - the file doesn't exist (due to wrong #line directive or because it's an in-memory source generated AST).
// In any case, this is not a fatal error.
logger.LogWarning("Problem archiving " + dest + ": " + ex);
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.
csharp/extractor/Semmle.Extraction/Entities/File.cs Dismissed Show dismissed Hide dismissed
@tamasvajk tamasvajk force-pushed the buildless/binary-log-extractor-2 branch 3 times, most recently from 1baaa46 to 4bdef60 Compare June 19, 2024 08:27
@tamasvajk tamasvajk marked this pull request as ready for review June 19, 2024 09:21
@tamasvajk tamasvajk requested review from a team as code owners June 19, 2024 09:21
@michaelnebel
Copy link
Contributor

To make sure I understand: The idea with this change is that the user should be able to provide a path to a directory containing a "binary log"; A binary log is created during a previous compilation of the source code and contains information about the compiler calls and the generated syntax trees?

@tamasvajk
Copy link
Contributor Author

To make sure I understand: The idea with this change is that the user should be able to provide a path to a directory containing a "binary log"; A binary log is created during a previous compilation of the source code and contains information about the compiler calls and the generated syntax trees?

Yes, more or less. The user would pass a .binlog file to the database creation, for example:

dotnet build /t:rebuild /bl:xyz.binlog
codeql database create DB_xyz --language=csharp --build-mode=none -Obinlog=xyz.binlog

The binary log contains information about the compiler calls. The generated ASTs are produced by basic.compilerlog.util based on the arguments of the compiler calls.

@michaelnebel
Copy link
Contributor

To make sure I understand: The idea with this change is that the user should be able to provide a path to a directory containing a "binary log"; A binary log is created during a previous compilation of the source code and contains information about the compiler calls and the generated syntax trees?

Yes, more or less. The user would pass a .binlog file to the database creation, for example:

dotnet build /t:rebuild /bl:xyz.binlog
codeql database create DB_xyz --language=csharp --build-mode=none -Obinlog=xyz.binlog

The binary log contains information about the compiler calls. The generated ASTs are produced by basic.compilerlog.util based on the arguments of the compiler calls.

But there is no new compilation attempt right?

@tamasvajk
Copy link
Contributor Author

tamasvajk commented Jun 20, 2024

But there is no new compilation attempt right?

No, there's no new compilation, other than the usual AST creation and symbol resolution based on the compiler call arguments. There's some detail in #16346 regarding how things work.

I edited this answer a couple of times. I think what you're interested in is that whether there's a new call to csc. There's no new call. We're using basic.compilerlog.util to get to the compilation object and the ASTs. I had another variant of this feature in #16672, which is more straightforward, but ignores source generators.

// Compute a unique folder name for the generated files:
generatedFolderName = "generated";

if (Directory.Exists(generatedFolderName))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will check for generated in the current working directory. Did you mean to do something like Path.Combine(cwd, generatedFolderName)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is on purpose. The current working directory will point to the --source-root specified in the codeql database creation command. In practice this will be the repo root that you're analysing. All the processed source files are stored in the source archive in the codeql database. And this logic is making sure that we're assigning the generated ASTs paths that don't collide with any already existing source files.

The integration test results in csharp/ql/integration-tests/all-platforms/binlog/Files.expected show some sample paths, for example:

generated/b/test.csproj (net8.0)/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs

We could also use Path.Combine(cwd, generatedFolderName), then the paths would point to locations closer to their original .csproj files, such as

b/generated/test.csproj (net8.0)/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs

I think the root level generated folder should work, and then all these generated ASTs are assigned to a single folder.

@tamasvajk tamasvajk force-pushed the buildless/binary-log-extractor-2 branch from 69d89cd to 24089f6 Compare June 26, 2024 09:54
@tamasvajk
Copy link
Contributor Author

I rebased this PR after upgrading the Microsoft.CodeAnalysis dependency from 4.8.0 to 4.9.2 in #16832. This allowed me to upgrade basic.compilerlog.util to the latest version. I pushed an extra commit to start using compilationData.GetGeneratedSyntaxTrees().

michaelnebel
michaelnebel previously approved these changes Jun 26, 2024
Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Very nice work @tamasvajk!

@tamasvajk
Copy link
Contributor Author

I rebased to fix a merge conflict.

michaelnebel
michaelnebel previously approved these changes Jun 27, 2024
Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tamasvajk
Copy link
Contributor Author

Sorry for the constant noise on this PR. I rebased it again to fix some conflicts with #16857.

Comment on lines +120 to +124
catch (Exception ex) // lgtm[cs/catch-of-all-exceptions]
{
logger.LogError($" Unhandled exception: {ex}");
return ExitCode.Errors;
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.
michaelnebel
michaelnebel previously approved these changes Jun 27, 2024
Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tamasvajk tamasvajk merged commit 1cf5e89 into github:main Jun 28, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants