-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interop Type Mapping #110691
Comments
Is the intent that the whole implementation would be source generated, using the FD just for its API surface area? This was in the road map for FD, but we held off on making the necessary APIs public/protected (right now implementations need to be in the same assembly). If this is a requirement, we'll need to do the due diligence as part of this to confirm the abstract APIs that need to be exposed to enable this. FD also currently lives in the System.Collections.Immutable library, which ships in both the shared framework and as a separate nuget package. Is that an appropriate location? |
There are a few different scenario:
Nit: Perfect hash functions tend to waste space. I am not sure whether we want to go there. Regular hash vs. perfect hash is small implementation detail that is not important for the overall design. |
We map Objective-C protocols to interfaces, so I think
One thing that drove the current design for iOS was to use as little allocated (dirty / writable) memory as possible, because in some cases (in particular app extensions on iOS), the platform poses rather strict memory limits and will terminate the process if those limits are broken. This is why in some cases we've used statically allocated lists for our data structures (in C - which means they don't require any allocations at startup, it all lives in read-only memory the OS can page out whenever needed and it doesn't count towards the memory limits) + binary search over those lists (we can sort them at build time). The number of entries in the arrays (low thousands at the very upper end) didn't make the binary search significantly slower than a dictionary lookup. I'm not advocating for this particular solution, but if we could find an implementation where the data can be mmap'ed into the process as read-only memory (and not copied around afterwards), I think that would be great (and also performant on all other platforms as well). |
To figure out these details, it may be useful to work on CsWinRT and/or Objective-C interop patch that shows how this API would be consumed. |
Actually a few more thoughts:
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Enum | AttributeTargets.Delegate | AttributeTargets.Interface, Inherited = false)]
public sealed class TypeMappingAttribute : Attribute
{
public TypeMappingAttribute(string mapping);
public TypeMappingAttribute(string mapping, string nativeContext);
}
[TypeMapping ("OSObjCType", "Objective-C")]
[TypeMapping ("OSSwiftType", "Swift")]
public class ImportantType {} [AttributeUsage(AttributeTargets.Assembly, Inherited = false)]
public sealed class ExternalTypeMappingAttribute : Attribute
{
public ExternalTypeMappingAttribute(Type type, string mapping);
public ExternalTypeMappingAttribute(Type type, string mapping, string nativeContext);
}
[assembly: ExternalTypeMapping (typeof (System.DateTime), "NSDate", "Objective-C")]
[assembly: ExternalTypeMapping (typeof (System.DateTime), "Date", "Swift")] |
Would love to work together on this! (also cc. @manodasanW, of course). I'm reading through the proposal and it's not immediately obvious to me how exactly we'd get CsWinRT to switch to this new API surface and what would the effect be in practice. Perhaps we should schedule a call once everyone is back from vacation? 🙂 |
One of our main concerns for Native AOT WinRT components in Windows is the binary size impact of using any WinRT interop at all, because the second we bring CsWinRT in, we end up rooting pretty much the entire reflection stack due to it using Is it in scope for this proposal to make it possible for CsWinRT to potentially switch to this and stop having to lookup attributes? Would there be performance concerns in case you have lots of types (eg. the base Windows SDK projections has hundreds and hundreds of projected types, if not thousands) to using this new API to retrieve vtable info for a type, vs. just using |
Given that we need to reference external types, it may be simpler to only support the detached type mapping ( Attribute is just one of the possible ways to encode the information. A few alternative ways to encode the information:
I think so - in trimmed or NAOT compiled scenarios at least. |
If that's the case then it might be worth also mentioning our other (primary) approach to generating vtable for types today in CsWinRT, both for projections and for user types. The two Aaron linked are just two approaches we use for either built-in custom type mappings (eg. say But for everything else, we use We basically do something like this, when marshalling:
All the logic I outlined here is more or less all in this method. It would be nice if we could somehow use this new system to streamline all these possible approaches into a single, better one 🙂 |
The more I think about this, the more I wonder: does this not have considerable overlap with the "extension interfaces" feature that @agocke was proposing? That was I was originally planning to use for CsWinRT 3.0, if it ever came out. Consider this: // In WinRT.Runtime
public interface IWinRTExposedType
{
static abstract ComWrappers.ComInterfaceEntry[] GetVtableEntries();
}
// Each projected type or built-in custom mapped type would also get this
public extension StringWinRTExposedTypeExtension for string : IWinRTExposedType
{
public static ComWrappers.ComInterfaceEntry[] GetVtableEntries() => ...;
}
// For user types, the generator would simply generate these extensions instead Then in CsWinRT, we can now simply do I suppose what I'm wondering is: are these two features just different ways of achieving the same, or would we want something like this even if we had extension interfaces already? If not, would it make sense to try to push for extension interfaces instead, as they would provide a more generalized solution that would also be useful in scenarios other than interop? 🤔 |
I do not think there was ever "extension interface" proposal that had a chance of working. Extension interfaces discussions that I have seen were about hard problems that nobody has a solution for. |
I'm referring specifically to the one Andy had (I think there's a detailed proposal somewhere in csharplang, can't find it right now though), which was a simplified version of the generalized extension interfaces feature that was first proposed, and used the "orphan rule" to handle loading the extensions. Which should work just fine here in theory, because generally speaking there's only two cases:
Just thinking out loud and wondering whether there could indeed be some overlap here, and whether one approach might be preferable over the other. I understand that extension interfaces would likely be more costly though. |
I'll transfer a couple of notes from #110300 (comment) since they seem relevant:
I'm also interested in the lookup logic: Another question is about preservation - I understand this would keep the mapped type, but how about members on the type - would we preserve any? Would it be configurable? |
@stephentoub I don't think FD is the correct public API, but it is correct as an implementation detail.
@stephentoub Yes, I believe that should be fine. I don't think we would need anything in SPCL to be aware of this.
@jkotas Nice list. I will integrate that into the issue.
@jkotas Yep, perfect hash functions have that behavior. There are also minimum perfect hash functions, based on all the keys which we can explore. I agree this is an implementation detail, but size and speed are both dimensions that will need to be explored.
@rolfbjarne Good point.
@rolfbjarne Sure. The data containing the dictionary/hashmap will be just that - static binary data blob. The secondary look-up array was choosen as a compromise between a large assembly image vs some of CoreCLR's other allocation mechanisms. Definitely an implementation angle to experiement with.
@MichalStrehovsky Thanks. These are helpful.
@MichalStrehovsky That, to me, seems like an implementation detail for each specific interop case. Where if the assembly is on disk or embedded somewhere, it needs to be defined for the specific scenario. Also, it needs to have some flexibility for the non-.NET loading case. My expectation here is this would be a low level API for building up whatever interop is needed, so finding and loading a Java class library is going to be very different from finding and loading a Swift library. It is quite likely for the .NET case we would have a few built-in that were created during source generation.
@MichalStrehovsky My experience here is that we shouldn't be trimming across the interop boundary lower than a type. For years the trimmer has broken COM scenarios and always get's it wrong in some terrible way. When a type is involved in interop, it is either trimmed entirely or left as-is. That is my preference for now. Happy to learn better approaches here, but I'm suspicious of it getting it right. |
It was the case for built-in COM interop that had fundamental problems with trimming. The newer trim-compatible interop systems do not have this problem, the trimmer does not special cases them and trims types involved in interop at method granularity like any other type., |
How does it do this for COM? As recently as last year, the trimmer broke |
Could you please share a link to the issue? |
There wasn't an issue. It was during an inner dev cycle and I was reminded it had to do with the library trimming scenario, not the official trimmer scenario. |
Even with MPHFs, an additional lookup table for the actual data is needed, in addition to the hash table itself. Consequently, the total object size is always larger than the original data. Note that the size difference between PHFs and MPHFs relates to the hash table size, not the entire key-value table. MPHFs aim to keep the hash table size close to the original keys while eliminating collisions, but it's only part, not the whole structure. An ideal hash would pack the key and value into a single number and operate on the key bits portion during the lookup, a fascinating concept, though such a function hasn't been discovered yet. 😅 If speed is the bottleneck and the size tradeoff due to the extra lookup table is acceptable, opting for a perfect or minimal perfect hash implementation is reasonable. Otherwise, classic solutions like |
Looking over the proposal as-is, there's a few questions that come to my mind:
|
For binding to Swift, this is used strictly for bind time use. As far as I'm concerned the information can get stripped out at build time. What is needed is:
There are at least 4 main tasks that will use this information:
|
I think we (CsWinRT) basically need #50333, on top of the interop table map, to properly get the semantics we're looking for. |
@agocke, I've updated the proposal based on offline conversations. All, I've placed the previous proposal under a drop down in the description - nothing should be lost. |
@AaronRobinsonMSFT a couple more questions:
Should we just say "if the Trimmer marks the
This applies to both |
A boxed type would mean an allocation, no?
Not sure if we want to allocate a dictionary if no map was created. Doesn't seem to make a lot of sense to me, but I'll leave that up to API review or the Trimmer team (cc @agocke / @sbomer) to say if they prefer an empty dictionary or |
Yeah no I get that's what it meant, just meant people usually say "allocation of a type" to refer to objects of that type (as in, a reference type) being instantiated, not boxed value types. But alright, fair enough 😅
I get that, but I'm saying the docs state the method will just throw if the map is missing. Meaning the return type doesn't matter. The only case where the method won't throw is if there is a map, meaning the return type is not nullable. Is that not the case? |
The behavior for missing typemap created at build time can be one of the following:
I think we should do (2) as part of this proposal (change the API shape to the TryGet... pattern), and consider doing (3) in future if we find good motivating scenario for it. |
Agree. Will update API. |
Not to get too much into the weeds on this since we can let the API review process work, but the |
The shape of the Try pattern has minimal impact on the implementation. It is like 5 line code delta between the different shapes of the Try pattern.
You can make the same argument for many advanced Try pattern APIs. We use the standard Try pattern for all APIs where it makes sense irrespective of how advanced they are. I do not recall a case where API review approved non-standard Try pattern. |
What's the advice for cases where one needs to have a proxy for an internal type in case you cannot possibly generate attributes in its containing assembly? To make an example: we expect to be able to pretty much always do this, but I am not sure I see a way to do that for types internal to the BCL. In CsWinRT we need special mapping for some of them, such as:
Would it be possible to add a constructor taking a string with a constant type name, like we mentioned? Eg.: [AttributeUsage(AttributeTargets.Assembly, AllowMultiple = false)]
public sealed class TypeMapAssociationAttribute<TTypeUniverse> : Attribute
where TTypeUniverse : ITypeMapUniverse
{
public TypeMapAssociationAttribute(string sourceTypeName, string sourceAssemblyName, Type proxy)
{ }
} Then we could use it like so: // In WinRT.Runtime.dll
[assembly: TypeMapAssociation<WindowsRuntimeTypeUniverse>(
"System.Collections.Specialized.ReadOnlyList",
"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
typeof(ReadOnlyListProxyType))] |
There is nothing stopping us from changing these types. How is this a valid or safe marshalling scenario? My initial reaction would be, you can't do that and if you are it should be stopped. Can you describe the justification for this scenario? |
Don't get me wrong, I hate that we need this, and I'd love to have a better solution. The problem is that we need to know all the relevant info on a given type when we try to marshal. This is generally done by only allowing marshalling for types we can "see", so that we can generate the full vtable at compile time. We can't dynamically construct it at runtime, because information that is relevant (such as the implemented interfaces) might be trimmed. This is generally fine, as virtually all types are visible. The problem is that Other possible ways to fix it would be:
I assume option (2) hasn't been done because that If there's some other better solution you have in mind, I'd love to hear it 😅 EDIT: here is the exact code in CsWinRT where we currently handle this. |
I'm not following this argument. The APIs declare they use the non-generic |
For posterity, this will end up with more types than those used in type checks because in the limit this is possible: // No trimming or AOT warning here, this is safe:
// T could be any object reachable through reflection or allocated in the app.
typeof(Gen<>).MakeGenericType(someObject.GetType()).GetMethod("Check").Invoke(null, []);
class Gen<T> where T : class
{
public static bool Check(object o) => o is T;
} I don't think it would cause problems, but wanted to spell it out in case there is a problem that I don't see.
Does this mean we'd need to prevent any optimizations around allocation from happening? E.g. if we end up with a side-effect free equivalent of For this part: [AttributeUsage(AttributeTargets.Assembly, AllowMultiple = true)]
public sealed class TypeMapAttribute<TTypeUniverse> : Attribute
{
[RequiresUnreferencedCode("Interop types may be removed by trimming")]
public TypeMapAttribute(string value, Type target, Type trimTarget) { }
} I'm actually not sure if we'll be able to suppress this RUC warning here. @sbomer can you think of a way to suppress the warnings when this attribute is applied on an assembly? Given that the methods on TypeMapping are marked as
I assume this would be an error - will we be able to make it an error in a source generator/analyzer so that it errors out in the JITted case too? static Type Wrap<T>(Type t) where T : ITypeMapUniverse => TypeMapping.TryGetExternalTypeMapping<T>(out var v) ? v.GetValueOrDefault(t) : null; |
Agree. @agocke and @sbomer really helped with the Trimmer semantics here so I'll defer to them to comment.
This was as suggestion because there are two constructors and one is problematic. Perhaps I misunderstood the suggestion. /cc @sbomer
The only type this would return |
Here's my thinking: The We're not doing that because it doesn't satisfy the back-compat constraints, but I am still thinking of what the API could look like if we ever added a "strict mode", maybe for other app models. In that mode the It's not really RUC in the usual sense, because the presence of the attribute alone is not a problem (the problem is its participation in the heuristic-based lookup - nothing to do with attribute removal). Since the linker will need to understand these attributes, I think we could add built-in logic to suppress the warning (in non-"strict mode"). |
I think of the "keep map entries when a type is checked" as a heuristic, so I don't see a problem with it being overly conservative (as long as it doesn't keep everything). I think this is fine. |
I think we're still OK to optimize this. As far as I can tell, this will be used via |
I didn't realize the |
That certainly sounds much nicer for downstream consumers (eg. CsWinRT) 😅 |
@MichalStrehovsky Yes, that is a possible future option. We don't have a compelling need for that at the moment though and building and maintaining it isn't a priority. It is something that has been discussed, but for now libraries that find it interesting/useful could easily layer their own logic in this case and if we see need we can provide it. |
What would the recommendation be for libraries to do this? For instance, consider CsWinRT. We can call the |
If the API returns |
Ok so basically we would always do the fallback (and suppress any trim/AOT warnings there), and assume that it'd only ever run when there's no trimming? It feels like calling To clarify, we can work with this. Not as nice as having a built-in fallback, but not a blocker either of course. |
Do we know if this is feasible? E.g. whether the JIT-based F5 launch can deal with ILLink running over the produced assemblies without causing issues with things like debugging and hot reload? It feels like a can of worms on the surface of it. I've been thinking about this some more and have another observation: this proposal is currently solving two things:
We're currently locking the startup optimization behind "you need to run ILLink" because the proposal expects the startup optimization will happen in ILLink ( We actually have a different component that concerns itself with startup that is not ILLink - ReadyToRun. I'm wondering a bit whether it would be more natural to do the startup optimization there. ReadyToRun doesn't just cache code, it also collects different observations about the input IL - it builds a hashtable for fast lookup of types by name, another table for fast lookup of attributes, etc. It could build hash tables for type maps too. It might be worth a thought to structure this so that:
Native AOT would obviously do this differently, but that applies to this proposal too. The obvious disadvantage is that Mono doesn't have ReadyToRun but maybe it would be fine to just run the fallback code. It could also be similarly integrated into Mono AOT if absolutely needed, but that's work we'd ideally want to avoid. |
Interop Type Mapping
Background
When interop between languages/platforms involves the projection of types, some kind of type mapping logic must often exist. This mapping mechanism is used to determine what .NET type should be used to project a type from language X and vice versa.
The most common mechanism for this is the generation of a large look-up table at build time, which is then injected into the application or Assembly. If injected into the Assembly, there is typically some registration mechanism for the mapping data. Additional modifications and optimizations can be applied based on the user experience or scenarios constraints (that is, build time, execution environment limitations, etc).
At present, there are at least three (3) bespoke mechanisms for this in the .NET ecosystem:
C#/WinRT - Built-in mappings, Generation of vtables for AOT.
.NET For Android - Assembly Store doc, Assembly Store generator, unmanaged Assembly Store types.
Objective-C - Registrar, Managed Static Registrar.
Related issue(s):
Proposal
The .NET ecosystem should provide an official API and process for handling type mapping in interop scenarios.
Priorties
The below .NET APIs represents only part of the feature. The complete scenario would involve additional steps and tooling.
Provided by BCL (that is, NetCoreApp)
Given the above types the following would take place.
TypeMapAttribute
assembly attribute that declared the external type system name, a targettype, and optionally a "trim-target" type use by the Trimmer to determine if the target
type should be included in the map. If the trim-target type is used in a type check, then
the entry will be inserted into the map. If the
TypeMapAttribute
constructor that doesn'ttake a trim-target is used, the entry will be inserted unconditionally.
The target type would have interop specific "capabilities" (for example, create an instance).
Types used in a managed-to-unmanaged interop operation would use
TypeMapAssociationAttribute
to define a conditional link between the source and proxy type. In other words, if the
source is kept, so is the proxy type. If Trimmer observes an explicit allocation of the source
type, the entry will be inserted into the map.
During application build, source would be generated and injected into the application
that defines appropriate
TypeMapAssemblyTargetAttribute
instances. This attribute would help theTrimmer know other assemblies to examine for
TypeMapAttribute
andTypeMapAssociationAttribute
instances. These linked assemblies could also be used in the non-Trimmed scenario whereby we
avoid creating the map at build-time and create a dynamic map at run-time instead.
The Trimmer will build two maps based on the above attributes from the application reference
closure.
(a) Using
TypeMapAttribute
a map fromstring
to targetType
. If a trim-targetType
was provided, the Trimmer will determine if it is used in a type check. If it was used in
a type check, the mapping will be included. If the trim-target type is not provided, the mapping
will be included unconditionally.
(b) Using
TypeMapAssociationAttribute
a map fromType
toType
(source to proxy).The map will only contain an entry if the Trimmer determines the source type was explicitly
allocated.
Note Conflicting key/value mappings in the same type universe would be reconciled by the
application re-defining the mapping entry that will be in the respective map. If a conflict
is still present, the build should fail.
Note The emitted map format is a readonly binary blob that will be stored in the application
assembly. The format of the binary blob is an implementation detail that will be passed to
an internal type contained with CoreLib.
The Trimmer will consider calls to
TypeMapping.GetExternalTypeMapping<>
andTypeMapping.GetTypeProxyMapping<>
as intrinsic operations and replaced inline with the appropriatemap instantiation (for example, Java via
JavaTypeUniverse
).Example usage
Provided by .NET for Android runtime
.NET for Android projection library
User application
Interop runtime (for example, .NET for Android) usage example.
Previous proposal
The above attribute would be searched in all assemblies passed to a tool/MSBuild Task/etc and the result would be a binary blob. This binary blob could be in one of several forms (a) a
.cs
file that defines astatic ReadOnlySpan<byte>
, (b) a binary file that could be embedded in a .NET assembly as a resource, or perhaps another option.Type mapping scenarios and trimmer:
Dynamic: No special tooling has run over the whole app. The type mappings are per-assembly and registered at runtime. Tooling can be used to generate or optimize the per-assembly mapping. This should handle plugins where a new assembly with additional mappings shows up in flight.
IL trimming: The IL trimmer should not treat the types involved in the mapping as roots - if the type is not otherwise referenced by the app, it should be removed from the mapping. IL trimmer (or some other tool) generates the per-app mapping blob that is consumed at runtime. It may be easiest to make this a IL trimmer feature. This scenario does not handle plugins.
AOT: It is similar to IL trimming case, but the exact format of the blob may need to be different - both to make it more efficient and to avoid dependency on metadata tokens that expect the IL trimming implementation is going to have.
This API would be based on the generated static data of type mappings being generated using a static hashing approach for the data. From the .NET side, it could be implemented through
FrozenDictionary<TKey, TValue>
and an instantiated from the generated data. Astatic ReadOnlySpan<byte>
field inserted into the application or Assembly would integration seamless, but an embedded resource is also workable. This concept is similar to existing technologies in C, such as gperf or C Minimal Perfect Hashing Library. Size and performance should be measured using different approaches.The API shape would be a mapping from a
TKey
, likely astring
, to an internal type, conceptually containing the following:The above type contains a field for the discovery of the type, not presently loaded, and the second represents an index into an array that contains the loaded type to use. The strawman example below helps illustrate the workflow.
The text was updated successfully, but these errors were encountered: