Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][wasm] Bundle assemblies as WebCIL #79416

Merged
merged 57 commits into from
Jan 21, 2023

Conversation

lambdageek
Copy link
Member

@lambdageek lambdageek commented Dec 8, 2022

Define a new container format for .NET assemblies that looks less like a Windows PE file. Use it for bundling assemblies in wasm projects.

TODO:

  • Turn off Mono support for webcil on non-WebAssembly platforms
  • Turn webcil bundling off by default. Add an MSBuild property that users can set to preview webcil bundling. **done. Set <WasmEnableWebcil>true</WasmEnableWebcil> to try it
  • Run Wasm.Build.Test tests 2x: with and without webcil

Nice to have:

  • Maybe do something with InternalsVisibleTo so that we can use the internal System.Reflection.Metadata classes to implement WebcilReader and WebcilConverter. At least on netcore. (on .NET Framework we must depend on the 6.0.0 SRM because that's what VS MSBuild ships)

Follow-up PR:

  • Add a variant to the debugger tests that runs with webcil.

@ghost
Copy link

ghost commented Dec 8, 2022

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Define a new container format for .NET assemblies that looks less like a Windows PE file. Use it for bundling assemblies in wasm projects.

TODO:

  • convert satellite assemblies, too
  • disable on non-wasm platforms
  • add a property to the SDK to disable webcil packing (the task has a property but it's not used by the targets yet)
Author: lambdageek
Assignees: lambdageek
Labels:

arch-wasm, area-AssemblyLoader-mono, area-Build-mono

Milestone: -

@lambdageek
Copy link
Member Author

Current status: it's working for basic samples, but failing with the libraries tests, for example, because various bits of code above the runtime assume that assemblies have .dll extensions.

I was hoping to avoid "logical name"/"physical name" shenanigans, but I might need to adjust the wasm driver.c to consume .webcil files from the app bundle but expose them to the runtime as if they were ".dll". I'm going to press on without that for a bit longer and try to adjust the loader's search algorithm to just substitute .webcil when people ask for .dll, and if it doesn't pan out, I'll change the bundling logic instead.

@jkoritzinsky
Copy link
Member

Would this be an interesting format to support on all platforms? Or just WebAssembly platforms? Should we define tooling to be able to emit to this format directly (a la APIs in System.Reflection.Metadata)?

Also cc: @AaronRobinsonMSFT who is doing some work with metadata parsing in unmanaged code.

@lambdageek
Copy link
Member Author

lambdageek commented Dec 9, 2022

Would this be an interesting format to support on all platforms? Or just WebAssembly platforms? Should we define tooling to be able to emit to this format directly (a la APIs in System.Reflection.Metadata)?

Also cc: @AaronRobinsonMSFT who is doing some work with metadata parsing in unmanaged code.

Yea I was going to noodle around with making something like System.Reflection.Metadata.WebCIL similar to SRM.PortableExecutable over the holidays. (It's not something I pursued from the start because SRM is not well suited for the kind of roundtripping that we're doing in the WasmAppBuilder task)

But to be honest, I'm somewhat undecided if this is just an internal implementation detail of the final publishing step for WASM or if it's something that we want to use more broadly. For example, we may not want to keep a strict 1-1 correlation between the assemblies that comprise an app and the .webcil files that comprise an app bundle - it would be cool to split CoreLib and the main app assembly into hot/cold parts that can be downloaded separately, for example. Also I don't think we're entirely settled on whether we will serve .webcil directly or if it will be further bundled (for example by baking webcil data into web assembly .wasm modules).

Edit As a practical matter to make testing/debugging a little bit easier I made mono support loading .webcil data on all platforms, but I'm not sure it makes sense to ship that

@lambdageek
Copy link
Member Author

The debugger tests are failing because mono_wasm_add_assembly is calling mono_has_pdb_checksum which is supposed to look for a PDB checksum in a .dll, but that code doesn't use the loader plugin mechanism, so it doesn't know anything about .webcil files. And also .webcil files don't initialize MonoDotNetHeader:datadir.pe_debug properly.

@lambdageek
Copy link
Member Author

Ok, next roadblock is that we actually need a WebcilReader (analogous to PEReader) for the browser debug proxy - it reads bytes received in the browser and tries to figure out how to get PDB data or CodeView data out of them. Working on this now...

@lambdageek

This comment was marked as off-topic.

@lambdageek
Copy link
Member Author

Next set of failures is because the PE/COFF debug directory entries have a file pointer (ie a raw file offset) to locate the actual embedded PPDB or CodeView data. So I guess the WebcilWriter will need to fix those up when writing them out 🤮

Or maybe I can fudge the file format to have a header the same size as the original PE/COFF file and just fill the intervening space with zeroes? Not sure which is more of a hack

@lambdageek

This comment was marked as outdated.

@build-analysis build-analysis bot mentioned this pull request Dec 30, 2022
@lambdageek lambdageek force-pushed the feature-webcil-loader branch 6 times, most recently from b17d45f to 385db1d Compare January 5, 2023 19:55
@lambdageek
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lewing
Copy link
Member

lewing commented Jan 19, 2023

does this have any conflicts with #79284

@thaystg
Copy link
Member

thaystg commented Jan 19, 2023

does this have any conflicts with #79284

Merge conflicts probably yes, theoretical conflicts I don't think so, once we still have information that we need to get the symbols from symbol server, like pdbGuid, it's just in another format. We also use the real dll name to find it on symbol server, not sure if this will not be myDll.dll and it will become myDll.webcil if webcil is enabled, then we will probably have an issue here.

#ifdef ENABLE_WEBCIL
const char *p = strstr (assembly_name, ".webcil");
/* if assembly_name ends with .webcil, check if aname matches, with a .dll extension instead */
if (p && *(p + 7) == 0) {
Copy link
Member

@lewing lewing Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
if (p && *(p + 7) == 0) {
if (p && *(p + strlen (".webcil")) == 0) {

the compiler will do the right thing

Comment on lines +197 to +201
var tmpWebcil = Path.GetTempFileName();
var webcilWriter = Microsoft.WebAssembly.Build.Tasks.WebcilConverter.FromPortableExecutable(inputPath: assembly, outputPath: tmpWebcil, logger: Log);
webcilWriter.ConvertToWebcil();
var finalWebcil = Path.ChangeExtension(assembly, ".webcil");
FileCopyChecked(tmpWebcil, Path.Combine(asmRootPath, Path.GetFileName(finalWebcil)), "Assemblies");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you aren't doing something like

Suggested change
var tmpWebcil = Path.GetTempFileName();
var webcilWriter = Microsoft.WebAssembly.Build.Tasks.WebcilConverter.FromPortableExecutable(inputPath: assembly, outputPath: tmpWebcil, logger: Log);
webcilWriter.ConvertToWebcil();
var finalWebcil = Path.ChangeExtension(assembly, ".webcil");
FileCopyChecked(tmpWebcil, Path.Combine(asmRootPath, Path.GetFileName(finalWebcil)), "Assemblies");
var finalWebcil = Path.Combine(asmRootPath, Path.GetFileName(Path.ChangeExtension(assembly, ".webcil")), "Assemblies");
var webcilWriter = Microsoft.WebAssembly.Build.Tasks.WebcilConverter.FromPortableExecutable(inputPath: assembly, outputPath: finalWebcil, logger: Log);
webcilWriter.ConvertToWebcil();
_fileWrites.Add(finalWebcil);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewing I think I misunderstood FileCopyChecked. I thought it was avoiding writing to dst if it already exists with the same content. My intention was to avoid messing with modification times of finalWebcil unless something really changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Utils.CopyIfDifferent(string src, string dst, bool useHash).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!Utils.CopyIfDifferent(tmpObjFile, objFile, useHash: true))
Log.LogMessage(MessageImportance.Low, $"Did not overwrite {objFile} as the contents are unchanged");
else
Log.LogMessage(MessageImportance.Low, $"Copied {tmpObjFile} to {objFile}");

@lewing
Copy link
Member

lewing commented Jan 19, 2023

we'll need to update https://github.com/dotnet/sdk/blob/ea03dc7f05bfbaac843dc2ac09e26a484acfe8f8/src/BlazorWasmSdk/Targets/BlazorWasm.web.config#L13 as well

and

foreach (string extn in new string[] { ".dll", ".pdb", ".dat", ".blat" })

@lambdageek
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lambdageek
Copy link
Member Author

@lewing Is it alright if I do your suggestions in a follow-up PR?

@lambdageek
Copy link
Member Author

temporarily disabling dedup and re-running

@lambdageek
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lambdageek lambdageek merged commit 68d1b8f into dotnet:main Jan 21, 2023
lambdageek added a commit to lambdageek/xharness that referenced this pull request Jan 23, 2023
WebCIL is a new container format for .NET assemblies used when
publishing WebAssembly apps (see dotnet/runtime#79416)

Related to dotnet/runtime#80807
lambdageek added a commit to lambdageek/dotnet-sdk that referenced this pull request Jan 23, 2023
WebCIL is a new container format for .NET assemblies when publishing
to webassembly.  (See dotnet/runtime#79416)

Related to dotnet/runtime#80807
lambdageek added a commit to dotnet/xharness that referenced this pull request Jan 23, 2023
WebCIL is a new container format for .NET assemblies used when
publishing WebAssembly apps (see dotnet/runtime#79416)

Related to dotnet/runtime#80807
mdh1418 pushed a commit to mdh1418/runtime that referenced this pull request Jan 24, 2023
Define a new container format for .NET assemblies that looks less like a Windows PE file. Use it for bundling assemblies in wasm projects.

* Implement WebCIL loader

  It will try to look for WebCIL formatted images instread of normal .dll files

* Checkpoint works on wasm sample; add design doc

* Push .dll->.webcil probing lower in the bundle logic

* Also convert satellite assemblies and implement satellite matching

* [wasm] don't leak .webcil image names to the debugger

   In particular this will make source and breakpoint URLs look like `dotnet://foo.dll/Foo.cs` which means that grabbing PDBs via source link will work, etc.

* Add PE DebugTableDirectory to webcil

   This is used to retrieve the PPDB data and/or the PDB checksum from an image.

   Refactor mono_has_pdb_checksum to support webcil in addition to PE images

* Implement a WebcilReader for BorwserDebugProxy like PEReader

  This needs some improvements:
   - add support for reading CodeView and EmbeddedPDB data
   - copy/paste less from the WebcilWriter task
   - copy/paste less from PEReader (will require moving WebcilReader to SRM)

* [debug] Match bundled pdbs if we're looking up .webcil files

  The pdbs are registered by wasm with a notional .dll filename. if the debugger does a lookup using a .webcil name instead, allow the match

* Adjust debug directory entries when writing webcil files

   the PE COFF debug directory entries contain a 'pointer' field which is an offset from the start of the file.

   When writing the webcil file, the header is typically smaller than a PE file, so the offsets are wrong.  Adjust the offsets by the size of the file.

   We assume (and assert) the debug directory entries actually point at some PE COFF sections in the PE file (as opposed to somewhere past the end of the known PE data).

   When writing, we initially just copy all the sections directly, then seek to where the debug directory entries are, and overwrite them with updated entries that have the correct 'pointer'

* Fix bug in WebcilWriter

   Stream.CopyTo takes a buffer size, not the number of bytes to copy.

* bugfix: the debug directory is at pe_debug_rva not at the CLI header

* skip debug fixups if there's no debug directory

* WebcilReader: implement CodeView and Emebedded PPDB support

* [WBT] Add UseWebcil option (default to true)

* rename WebcilWriter -> WebcilConverter [NFC]

* fixup AssemblyLoadedEventTest

* hack: no extension on assembly for breakpoint

* pass normal .dll name for MainAssemblyName in config

   let the runtime deal with it - bundle matching will resolve it to the .webcil file

* Wasm.Debugger.Tests: give CI 10 more minutes

* Add Microsoft.NET.WebAssembly.Webcil assembly project

   Mark it as shipping, but not shipping a nuget package.

   The idea is that it will be shipped along with the WasmAppBuilder msbuild task, and with the BrowserDebugProxy tool.

* Move WebcilConverter to Microsoft.NET.WebAssembly.Webcil

* Move WebcilReader to Microsoft.NET.WebAssembly.Webcil

   delete the duplicated utility classes

* make the webcil magic and version longer

* Code style improvements from review

* Improve some exception messages, when possible

* Suggestings from code review

* Add WasmEnableWebcil msbuild property.  Off by default

* Build non-wasm runtimes without .webcil support

* Run WBT twice: with and without webcil

   This is a total of 4 runs: with and without workloads x with and without webcil

* do the cartesian product correctly in msbuild

* also add webcil to template projects

* environment variable has to be non-null and "true"

   We set it to "false" sometimes

* Fix wasm work items

   They should be the same whether or not webcil is used.  Just the WorkloadItemPrefix should be used to change the name.

* Update src/libraries/sendtohelix-wasm.targets

* PInvokeTableGeneratorTests: don't try to use the net472 WasmAppBuilder

   Look for the default target framework subdirectory under the tasks directory in the runtime pack when trying to find the tasks dll. In particular don't try to load the net472 version on modern .NET

* PInvokeTableGeneratorTests: Add more diagnostic output if tasksDir is not found

* simplify prefix comparison in bundled_assembly_match

* WasmAppBuilder improve logging

   Just emit a single Normal importance message about webcil; details as Low importance.

* Add missing using

Co-authored-by: Ankit Jain <[email protected]>
Co-authored-by: Larry Ewing <[email protected]>
lambdageek added a commit to lambdageek/aspnetcore that referenced this pull request Jan 24, 2023
WebCIL is a new container format for .NET assemblies when publishing to webassembly.  (See dotnet/runtime#79416)

Related to dotnet/runtime#80807
lambdageek added a commit to dotnet/sdk that referenced this pull request Jan 24, 2023
WebCIL is a new container format for .NET assemblies when publishing to webassembly.  (See dotnet/runtime#79416)

Related to dotnet/runtime#80807
ViktorHofer pushed a commit to dotnet/sdk that referenced this pull request Jan 25, 2023
WebCIL is a new container format for .NET assemblies when publishing
to webassembly.  (See dotnet/runtime#79416)

Related to dotnet/runtime#80807
@ghost ghost locked as resolved and limited conversation to collaborators Feb 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants