Skip to content

Conversation

@jkoritzinsky
Copy link
Member

Start an implementation of ilasm in C# in src/tools/ilasm. ANTLR4 is used to build the grammar.

I decided to structure it as a library (with a frontend to come later) to match ILVerify/ILVerification.

This implementation is far from complete. The grammar is missing any updates from the last year or two and much is untested and unimplemented.

It has one passing test for a simple case so far.

Putting out a PR now so other team members and community members can work on it.

Contributes to #121847

…l representation we can construct a metadata image from.
…concepts (and some easy cases to knock out now)
…reprocessor.

Add some basic tests for the preprocessor
…ods that we can't implement E2E due to SRM limitations (found some!)
…efs when possible, correctly handle MethodRefSig scenarios.
Copilot AI review requested due to automatic review settings December 2, 2025 19:18
Copilot finished reviewing on behalf of jkoritzinsky December 2, 2025 19:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the initial scaffolding and partial implementation of a C# version of ilasm (IL Assembler) in src/tools/ilasm. The implementation uses ANTLR4 for grammar parsing and is structured as a library with ILAssembler as the main project and ILAssembler.Tests for unit tests. The PR includes basic compilation infrastructure, preprocessor support, metadata building, and one passing test case.

Key changes include:

  • Core assembler library with grammar visitor pattern for IL parsing
  • Entity registry system for managing metadata entities
  • Preprocessor token source for handling #define, #ifdef, and #include directives
  • Build pipeline configuration to enable testing of the new managed ilasm

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
ILAssembler/GrammarVisitor.cs Main visitor implementation for traversing parsed IL grammar (~4350 lines)
ILAssembler/EntityRegistry.cs Registry and entity management for metadata building (~1300 lines)
ILAssembler/PreprocessedTokenSource.cs Preprocessor implementation for handling directives
ILAssembler/DocumentCompiler.cs Top-level document compilation orchestration
ILAssembler/NameHelpers.cs Utility for splitting dotted names into namespace and name
ILAssembler.Tests/* Unit tests for preprocessor and document compiler
eng/pipelines/coreclr/ilasm.yml Pipeline configuration for building and testing managed ilasm
eng/Subsets.props Build subset configuration for the tools.ilasm subset

@AaronRobinsonMSFT AaronRobinsonMSFT marked this pull request as draft December 2, 2025 19:34
@AaronRobinsonMSFT AaronRobinsonMSFT added this to the Future milestone Dec 2, 2025
@jkotas
Copy link
Member

jkotas commented Dec 3, 2025

Putting out a PR now so other team members and community members can work on it.

It would be useful to add a punch list of things to work on in the issue.

<EnableDefaultCompileItems>true</EnableDefaultCompileItems>
<Nullable>enable</Nullable>
<NoWarn>$(NoWarn);CS3021</NoWarn>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need unsafe code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We have two places that use unsafe code:

  1. We use the BlobBuilder constructor, which takes an unmanaged pointer.
  2. Native ilasm has the ability to specify an ANSI string that should be interpreted as a Unicode string. The APIs on Marshal to get an ANSI string without using unmanaged memory are all internal, so I'm using the APIs that allocate unmanaged memory. To be explicit, I'm not using IntPtr, and the upcoming unsafe definition change would require the code to be marked unsafe anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the BlobBuilder constructor, which takes an unmanaged pointer.

Hmmm, that's unfortunately. I am wondering whether it would be possible to introduce a safe S.R.M API to do this.

Native ilasm has the ability to specify an ANSI string that should be interpreted as a Unicode string.

Is it actually using ANSI code page? If I am reading the code correctly, we force the codepage to UTF8 everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I'll shift it to encode as UTF-8 always.

I'll add the BlobBuilder API to the list of APIs to propose for SRM.

public GrammarResult VisitTls(CILParser.TlsContext context)
{
// TODO-SRM: System.Reflection.Metadata doesn't provide APIs to point a data declaration at a TLS slot or into the IL stream.
// We have tests for the TLS case (CoreCLR only supports it on Win-x86), but not for the IL case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ok to drop the support for managed C++ TLS in ilasm.

into the IL stream

What does this mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native ilasm has the ability to point a data declaration at 3 places:

  1. read-only data.
  2. TLS data.
  3. The data stream with method bodies.

We have no tests for the 3rd case, but it does exist in the grammar. We can remove it (especially if we're removing the support for managed C++ TLS).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education, what does the IL source for data stream with method bodies look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In IL, it looks like the following:

.data cil Foo = int32(32)

I think below is the intended use case:

.method public static int32 main()
{
.data cil byearray ( 00 01 02 /* etc */ )
}

I think the intention was to allow you to specify arbitrary method bodies, but I don't think the emitting actually supports that correctly.

@jkotas
Copy link
Member

jkotas commented Dec 3, 2025

cc @dotnet/jit-contrib

"noncasdemand" => new(DeclarativeSecurityActionEx.NonCasDemand),
"noncaslinkdemand" => new(DeclarativeSecurityActionEx.NonCasLinkDemand),
"noncasinheritance" => new(DeclarativeSecurityActionEx.NonCasInheritanceDemand),
_ => throw new UnreachableException()
Copy link
Contributor

@hez2010 hez2010 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems runtime async is not included in either GraamaVisitor.cs or CIL.g4. Did you use an old grammar definition before runtime async?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grammar was written in December of 2023, so yeah it was pre-runtime-async. As I mentioned, it needs some updates.

@huoyaoyuan
Copy link
Member

Putting out a PR now so other team members and community members can work on it.

How are we expected to collaborate on this? Apparently it needs to fulfill more functionalities. Should we merge it as scaffold and open more PRs?

@hez2010
Copy link
Contributor

hez2010 commented Dec 3, 2025

Well, I'm not against moving ilasm/ildasm to managed, just put my two cents here.
When I was working on the const generics prototype which requires changes to the metadata, I found that as long as I could make ilasm/ildasm work, the coreclr should automatically be able to resolve the new metadata because they share the same code path. By making ilasm/ildasm work, you will end up making coreclr metadata reader barely work at the same time, which gives you the initial direction about what needs to be done to implement a new metadata-level feature. And you can check the output of ildasm to verify whether you have implemented the feature correctly, without the need to inspect all those data structures and (compressed) tokens under a debugger.
This is a great advantage of keeping ilasm/ildasm in C++, so I would like to confirm that we don't care about losing such an ability.

@huoyaoyuan
Copy link
Member

This is a great advantage of keeping ilasm/ildasm in C++

I think we can continue the discussion in #121847. Since the metadata reader is expected to be replaced by DNMD, the precondition can change here.

@jkotas
Copy link
Member

jkotas commented Dec 3, 2025

How are we expected to collaborate on this? Apparently it needs to fulfill more functionalities. Should we merge it as scaffold and open more PRs?

Yes, it is the idea.

I would like to confirm that we don't care about losing such an ability.

I do not mind with losing this. We actually reached a point where it tends to be easier to prototype new functionality like that in NAOT first. NAOT uses managed System.Reflection.Metadata.

Since the metadata reader is expected to be replaced by DNMD

This is not a committed plan. (If we were to do this, it needs to come with measurable user-observable improvements and it needs to replace the existing metadata reader completely.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants