-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Initial scaffolding and partial implementation of ilasm in C# #122112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…l representation we can construct a metadata image from.
…concepts (and some easy cases to knock out now)
…reprocessor. Add some basic tests for the preprocessor
…iversal/easier to use
…ods that we can't implement E2E due to SRM limitations (found some!)
… the MetadataBuilder (including methods).
…efs when possible, correctly handle MethodRefSig scenarios.
…iling) unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces the initial scaffolding and partial implementation of a C# version of ilasm (IL Assembler) in src/tools/ilasm. The implementation uses ANTLR4 for grammar parsing and is structured as a library with ILAssembler as the main project and ILAssembler.Tests for unit tests. The PR includes basic compilation infrastructure, preprocessor support, metadata building, and one passing test case.
Key changes include:
- Core assembler library with grammar visitor pattern for IL parsing
- Entity registry system for managing metadata entities
- Preprocessor token source for handling
#define,#ifdef, and#includedirectives - Build pipeline configuration to enable testing of the new managed ilasm
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
ILAssembler/GrammarVisitor.cs |
Main visitor implementation for traversing parsed IL grammar (~4350 lines) |
ILAssembler/EntityRegistry.cs |
Registry and entity management for metadata building (~1300 lines) |
ILAssembler/PreprocessedTokenSource.cs |
Preprocessor implementation for handling directives |
ILAssembler/DocumentCompiler.cs |
Top-level document compilation orchestration |
ILAssembler/NameHelpers.cs |
Utility for splitting dotted names into namespace and name |
ILAssembler.Tests/* |
Unit tests for preprocessor and document compiler |
eng/pipelines/coreclr/ilasm.yml |
Pipeline configuration for building and testing managed ilasm |
eng/Subsets.props |
Build subset configuration for the tools.ilasm subset |
Co-authored-by: Copilot <[email protected]>
It would be useful to add a punch list of things to work on in the issue. |
| <EnableDefaultCompileItems>true</EnableDefaultCompileItems> | ||
| <Nullable>enable</Nullable> | ||
| <NoWarn>$(NoWarn);CS3021</NoWarn> | ||
| <AllowUnsafeBlocks>true</AllowUnsafeBlocks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need unsafe code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We have two places that use unsafe code:
- We use the BlobBuilder constructor, which takes an unmanaged pointer.
- Native ilasm has the ability to specify an ANSI string that should be interpreted as a Unicode string. The APIs on Marshal to get an ANSI string without using unmanaged memory are all internal, so I'm using the APIs that allocate unmanaged memory. To be explicit, I'm not using
IntPtr, and the upcoming unsafe definition change would require the code to be marked unsafe anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the BlobBuilder constructor, which takes an unmanaged pointer.
Hmmm, that's unfortunately. I am wondering whether it would be possible to introduce a safe S.R.M API to do this.
Native ilasm has the ability to specify an ANSI string that should be interpreted as a Unicode string.
Is it actually using ANSI code page? If I am reading the code correctly, we force the codepage to UTF8 everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. I'll shift it to encode as UTF-8 always.
I'll add the BlobBuilder API to the list of APIs to propose for SRM.
| public GrammarResult VisitTls(CILParser.TlsContext context) | ||
| { | ||
| // TODO-SRM: System.Reflection.Metadata doesn't provide APIs to point a data declaration at a TLS slot or into the IL stream. | ||
| // We have tests for the TLS case (CoreCLR only supports it on Win-x86), but not for the IL case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be ok to drop the support for managed C++ TLS in ilasm.
into the IL stream
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
native ilasm has the ability to point a data declaration at 3 places:
- read-only data.
- TLS data.
- The data stream with method bodies.
We have no tests for the 3rd case, but it does exist in the grammar. We can remove it (especially if we're removing the support for managed C++ TLS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my education, what does the IL source for data stream with method bodies look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In IL, it looks like the following:
.data cil Foo = int32(32)
I think below is the intended use case:
.method public static int32 main()
{
.data cil byearray ( 00 01 02 /* etc */ )
}
I think the intention was to allow you to specify arbitrary method bodies, but I don't think the emitting actually supports that correctly.
|
cc @dotnet/jit-contrib |
| "noncasdemand" => new(DeclarativeSecurityActionEx.NonCasDemand), | ||
| "noncaslinkdemand" => new(DeclarativeSecurityActionEx.NonCasLinkDemand), | ||
| "noncasinheritance" => new(DeclarativeSecurityActionEx.NonCasInheritanceDemand), | ||
| _ => throw new UnreachableException() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems runtime async is not included in either GraamaVisitor.cs or CIL.g4. Did you use an old grammar definition before runtime async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This grammar was written in December of 2023, so yeah it was pre-runtime-async. As I mentioned, it needs some updates.
How are we expected to collaborate on this? Apparently it needs to fulfill more functionalities. Should we merge it as scaffold and open more PRs? |
|
Well, I'm not against moving ilasm/ildasm to managed, just put my two cents here. |
I think we can continue the discussion in #121847. Since the metadata reader is expected to be replaced by DNMD, the precondition can change here. |
Yes, it is the idea.
I do not mind with losing this. We actually reached a point where it tends to be easier to prototype new functionality like that in NAOT first. NAOT uses managed System.Reflection.Metadata.
This is not a committed plan. (If we were to do this, it needs to come with measurable user-observable improvements and it needs to replace the existing metadata reader completely.) |
Start an implementation of ilasm in C# in src/tools/ilasm. ANTLR4 is used to build the grammar.
I decided to structure it as a library (with a frontend to come later) to match ILVerify/ILVerification.
This implementation is far from complete. The grammar is missing any updates from the last year or two and much is untested and unimplemented.
It has one passing test for a simple case so far.
Putting out a PR now so other team members and community members can work on it.
Contributes to #121847