-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: Add type to avoid boxing .NET intrinsic types #28882
Comments
Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64) |
These all can be fixed, without too much work. TypedReference has been neglected, but that does not mean it is a useless type. (Some of this is described in https://github.com/dotnet/corefx/issues/29736.) I think fixing TypedReference would be a better choice than introducing a new Variant type, if everything else is equal.
I think the design should allow all types without falling back to boxing.
This should be a non-goal. It is fine if the winning design that we pick happens to work on .NET Framework, but trying to make it work on .NET Framework should be an explicit non-goal. We have made a contious design to not restrict our design choices to what works on .NET Framework. |
Goal is 24 bytes. We've looked at a lot of different ways of packing that in. A pointer and 16 bytes of data. It might involve some contortions or dropping down to 12 bytes of data.
Not trying to infer it is useless, just not appropriate in this case. I'm not sure how you'd make it a non-ref struct or make as fast as something targeted at key types.
Fair enough, I've changed it to nice-to-have. There are, however, real business needs for mitigating formatting inefficiencies on .NET Framework.
I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support. |
FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a |
To be super clear, I don't see this as a solves-all-boxing solution. I absolutely think we can benefit from broader approaches, but I have a concern about being efficient with core types. Being able to quickly tell that you have an int and extract it is super valuable I think. Certainly for the |
Depends on how the actual formatting is implemented. If you can dispatch a virtual formatting method, ability to switch over a primitive type does not seem super valuable.
Something like this would work too. It is pretty similar to |
I'd be fine with that as well if it was similarly seamless to a caller. |
Rather than an attribute and a promise I'd like to leverage the type system if possible here. 😄 What if instead we added a JIT intrinsic that "boxes" value types into a
The JIT could choose to make this a heap or stack allocation depending on the scenario. The important part is that it would move the boxing operation into a type whose lifetime we need to carefully monitor. The compiler will do it for us. That doesn't completely solve the problem because you can't have |
That also sounds reasonable. (Though the [UnsafeWontEscape] approach could also work on the existing APIs: we just attribute the existing object arguments in the existing methods, and apps just get better.) |
How would Either way, it sounds reasonable too. |
I do think that if our goal is just to solve the parameter passing problem, something based on references (which can work uniformly on all types) is worth thinking about (this is Jan's TypedReference approach). However that does leave out the ability to have something that can represent anything (but all primitives efficiently (without extra allocation)) that you can put into objects (which is what Variant is). I think the fact that we don't have a standard 'Variant' type in the framework is rather unfortunate. Ultimately it is an 'obvious' type to have in the system (even if ultimately you solve the parameter passing issue with some magic stack allocated array of types references). I also am concernd that we are solving a 'simple' problem (passing prameters) with a more complex one (tricky refernece based classes whose safety is at best subtle). I think we should have a Variant class, it is straightforward, and does solve some immediate problems without having to design a rather advanced feature (that probably would not make V3.0. For what it is worth... |
I agree with that and the Variant proposal would look reasonable to me if the Variant was optimized for primitive types only. The proposal makes it optimized for primitive types and set of value types that we think are important for logging today. It does not feel like a design that will survive over time. I suspect that there will be need to optimize more types, but it won't be possible to extend the design to fit them. |
Note that generally speaking, a Variant is a chunk of memory that holds things in-line and a pointer that allows you to hold 'anything'. Semantically it is always the case that a variant can hold 'anything', so that is nice in that the there is not a 'sematic' cliff, only a performance cliff (thus as long as the new types that we might want to add in the future are not perf critical things are OK. I note that the list that really are perf-critical are pretty small and likely to not change over time (int, string, second tier are long, and maybe DateTime(Offset)). So I don't think we are taking a huge risk there. And there are things you can do 'after the fact' Lets assume we only alotted 16 bytes for in-line data but we wanted something bigger. If there is any 'skew' to the values (this would for most types, but not for random number generated IDs), you could at least store the 'likely' values inline and box the rest. It would probably be OK, and frankly it really is probably the right tradeoff (it would be surprising to me that a new type in the future so dominated the perf landscape over existing types that it was the right call to make the struct bigger to allow it to be stored inline). That has NEVER happened so far. Indeed from a cost-benefit point of view, we really should be skewing things to the int and string case becasue these are so much more likely to dominate hot paths. We certainly don't want this to be bigger than 3 pointers, and it would be nice to get it down to 2 (but that does require heroics for any 8 byte sized things (long, double, datetime ...), so I think we are probably doing 3. But it does feel like a 'stable' design (5 years from now we would not feel like we made a mistake), sure bugger types will be slow, but I don't think would want to make the type bigger even if we could. It would be the wrong tradeoff. So, I think Variant does have a reasonablys table design point, that can stand the test of time. From my point of view, I would prefer that the implementation be tuned for overwhelmingly likely case of int an string). My ideal implementation would be a 8 bytes of inline-data / discriminator, and 1 object pointer. This is a pro |
One of the main use cases this is being proposed for is around string interpolation and string formatting. I realize there are other uses cases, so not necessarily instead of a something Variant-like, but specifically to address the case of string interpolation, I had another thought on an approach…. Today, you can define a method like: AppendFormat(FormattableString s); and use that as the target of string interpolation, e.g. AppendFormat($”My type is {GetType()}. My value is {_value:x}.”); Imagine we had a pattern (or an interface, though that adds challenge for ref structs) the compiler could recognize where a type could expose a method of the form: AppendFormat(object value, ReadOnlySpan<char> format); The type could expose additional overloads as well, and the compiler would use normal overload resolution when determining which method to call, but the above would be sufficient to allow string interpolation to be used with the type in the new way. We could add this method to StringBuilder, for example, along with additional overloads for efficiency, e.g. public class StringBuilder
{
public void AppendFormat(object value, ReadOnlySpan<char> format);
public void AppendFormat(int value, ReadOnlySpan<char> format);
public void AppendFormat(long value, ReadOnlySpan<char> format);
public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);
… // etc.
} We could also define new types (as could anyone), as long as they implemented this pattern, e.g. public ref struct ValueStringBuilder
{
public ValueStringBuilder(Span<char> initialBuffer);
public void AppendFormat(FormattableString s);
public void AppendFormat(object value, ReadOnlySpan<char> format);
public void AppendFormat(int value, ReadOnlySpan<char> format);
public void Appendformat(long value, ReadOnlySpan<char> format);
public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);
… // etc.
public Span<char> Value { get; }
} Now, when you call: ValueStringBuilder vsb = …;
vsb.AppendFormat($”My type is {GetType()}. My value is {_value:x}.”); rather than generating what it would generate today if this took a FormattableString: vsb.AppendFormat(FormattableStringFactory.Create("My type is {0}. My value is {1:x}.”, new object[] { GetType(), (object)_value })); or if it took a string: vsb.AppendFormat(string.Format("My type is {0}. My value is {1:x}.”, GetType(), (object)_value)); it would instead generate: vsb.AppendFormat(“My type is “, default);
vsb.AppendFormat(GetType(), default);
vsb.AppendFormat(“. My value is “, default);
vsb.AppendFormat(_value, “x”);
vsb.AppendFormat(".", default); There are more calls here, but most of the parsing is done at compile time rather than at run time, and a type can expose overloads to allow any type T to avoid boxing, including one that takes a generic T if so desired. |
If you throw out e.g. public readonly struct Variant : IFormattable
{
private readonly IntPtr _data;
private readonly object _typeOrData;
public unsafe bool TryGetValue<T>(out T value) where T : IFormattable
{
if (typeof(T) == typeof(int))
{
if ((object)typeof(T) == _typeOrData)
{
value = Unsafe.As<IntPtr, int>(in _data);
}
value = default;
return false;
}
// etc.
}
public override string ToString()
{
return ToString(null, null);
}
public string ToString(string format, IFormatProvider formatProvider)
{
if ((object)typeof(int) == _typeOrData)
{
return Unsafe.As<IntPtr, int>(in _data).ToString(format, formatProvider);
}
// etc.
}
} And box others to |
@benaadams - Generally I like the kind of approach you are suggesting. In my ideal world, Variant would be a object reference and an 8 bytes for buffer. It should be super-fast on int and string, and non-allocating on data types 8 bytes or smaller (by using the object as a discriminator for 8 byte types). For Datatypes larger than 8 bytes, either box, or you encode the common values into 8 bytes or less, and box the uncommon values. This has the effect of skewing the perf toward the overwhelmingly common cases of int and string (and they don't pay too much extra bloat for the rarer cases). |
@stephentoub Generally speaking I like the idea of moving parsing to compile time. I'll play around to see what sort of perf implications it has. One thing I'd want to make sure we have an answer for is how do we fit int count = 42;
Console.WriteLine($"The count is {count}.");
// And we have the following overload
void WriteLine(in ValueStringBuilder builder);
// Then C# generates:
ValueStringBuilder vsb = new ValueStringBuilder();
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose(); // Note that this isn't critical, it just returns any rented space to the ArrayPool We could also add overloads that take Console.WriteLine(myFormatProvider, $"The count is {count}.");
// Creates the following
ValueStringBuilder vsb = new ValueStringBuilder(myFormatProvider);
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose(); |
Pulling It would be cool if we could borrow bits from the object pointer (much like an ATOM is used in Win32 APIs), but that obviously would require runtime support. |
It is pretty common to pass around strings as |
This is one of the advantages I see to the aforementioned AppendFormat approach. In theory you just have another |
Indeed. In the |
I'm going to break out a separate proposal for "interpolated string -> Append sequence" and do a bit of prototyping to examine the performance. |
Just to add my 2 cents here- storing heterogeneous data whose types are not known are compile time has a lot more uses than just string interpolation. Take our old friend Having a true Variant type could bring great performance benefits in such a scenario. I'd even say its a far more important scenario than string interpolation. Most metrics have shown the popularity of Python exploding to one of the most-used languages in the last few years. And the reason is because of the great libraries it has for working with data. The market is clearly saying it wants better and more efficient ways of working with data and .NET should oblige. |
@MgSam do you think avoiding boxing on common types is good enough? The initial proposal doesn't handle everything, but allows putting data on the heap (e.g. creating
Stashing arbitrary struct data in |
Yes, I think common types likely cover 95% of the use cases. You don't often have nested objects when working with large tables of data. |
The proposal needs to have detail on these use cases. Are there going to be any public ASP.NET APIs that consume this type?
The set of fundamental types depends on scenario. For example, we have a similar union in this repo here: Lines 29 to 74 in 01b7e73
|
Are these two examples kind of struct disciminated union? (Or one with named alas) dotnet/csharplang#113 🤔 |
This feels a lot like discriminated unions and I wonder if we could build the general purpose feature which allows for any type, including managed, and then have a mechanism for the compiler and runtime to work together to efficiently store constructions which happen to be unmanaged. |
I think the type needs to flow without being viral. I like both TypedReference approach for values that can't escape the heap and the variant approach for thing that do. I can also see a type like this being super useful for fast reflection and serialization. Today, generics are too viral and don't work for framework code and TypedReference is ref only and not usable in many scenarios where I'd want to use this. For just as an example, I've been looking at a way to do fast reflection for ages (not boxing the arguments and supporting Span). A version of this type that's supported any T would be ideal but I don't know what that would look like or if it would even be possible without runtime support (like span). The other use case is logging without boxing. I'd like to allow callers to preserve primitive types without boxing and allow the logger provider to unwrap and serialize. |
In my proposal I offered to use tuple as a container. Tuples are normal structs, not ref-like structs. Tuple can represent arbitrary number of arguments of any type (except internal static string TupleItemToString<T>(in T tuple, int index, IFormatProvider? provider) where T : struct, ITuple; It can be converted to more low-level version to be compatible with other scenarios like usage of internal static bool TupleItemToString<T>(in T tuple, int index, Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) where T : struct, ITuple; Tuple can be passed to logging or In reality, implementation of
In case of JIT intrinsic, Public API can be look like this: public sealed class String
{
public static string Format(IFormatProvider? provider, string format, in TArgs args)
where TArgs : struct, System.Runtime.CompilerServices.ITuple;
public static void Format(IFormatProvider? provider, string format, in TArgs args, IBufferWriter<char> output)
where TArgs : struct, System.Runtime.CompilerServices.ITuple;
} |
This has all the problems I stated above about generic code. The T needs to flow everywhere and that's what Variant/Value and TypedReference solve that the ITuple solution does not |
Is this in response to my proposal? Discriminated unions are a feature for declaring types. My point is that you wouldn't build a single type to handle all possible use cases, each use case would declare a suitable type, and if that type happens to be purely unmanaged then the compiler would codegen it differently. |
I'm familiar with the DU proposal but I don't it's suitable for the same things Variant/Value will be used for. |
I'd be interested in an example that you think couldn't be represented with discriminated unions. DUs seem to me a strict increase in expressive power. |
Here's a canonical example from logging: runtime/src/libraries/Microsoft.Extensions.Logging.Abstractions/src/LoggerMessage.cs Lines 376 to 397 in 7213840
We need to flow these generics to avoid boxing through the method, through the return value, then we need to make a generic LogValues object with the same number of generic arguments. Then when these objects get logged, we need the consumer to be able to unpack them from non-generic code, so we end up boxing everything through the This also happens for the reflection APIs where you want to pass a variable sized Span to invoke a method. Here's an example of how reflection could use these APIs: class MethodInfo
{
public Value InvokeFast(object instance, Span<Value> args);
} I really want a way to round trip a The key issue is that framework code that's shuttling these types around doesn't want to force generics everywhere and it's even more complex when you have multiple generic arguments (like a Span). Maybe what I want is associated types. |
The Value type proposed here won't give you this super power. It won't work for Span. TypedReference has this superpower, and that is why the plan we are on with reflection is based on TypedReference. |
I know it won't work with Span but the ref struct restrictions are too much for many scenarios. So I think we need TypedReference and Value (like Span and Memory). |
We have that already: TypedReference and object. This proposal is about creating an alternative storage for object-like value that is more efficient for some set of types and less efficient for the rest. You can imagine to only use it as an internal implementation detail when you need to store the data on the heap. If it is an internal implementation detail, it does not need to be a public type and the set of the more efficient types can be tailored for each use case, and it is where the discriminated unions would be useful. |
@davidfowl That many unconstrained generic parameters is a bit of a code smell. It may be that you want existential types. Lemme take some time to look at how all that is used and get back to you. |
I don't see how TypedReference solves the problem when I need the value off stack. The problem is it's not an internal contract because there's tons of public API boundaries that we need to cross to flow these types. Here's a super simple example, in SignalR, I would like this https://github.com/dotnet/aspnetcore/blob/547de595414d6ebb9ddeaa4a231815449c9f3c60/src/SignalR/common/SignalR.Common/src/Protocol/HubMethodInvocationMessage.cs#L22 to be a @agocke Looking forward to what you come up with. |
I agree that TypedReference does not work once you need to store it off stack. We have
It would be useful to have list of public APIs in the frameworks that would get overloads to use this type, and estimate the performance benefits. |
I spend sometime looking at where the types of places where this would be beneficial. The pattern is basically pushing the generic type as close to the place that is going to consume it as possible without expose it in all of the layers. It's also a way to store variable sizes primitive in a single type without boxing.
|
Thoughts on estimating the performance benefits? This is shifting costs: It reduces cycles spent on GC allocations, but pays for it by spending more cycles on compressing and decompressing the bits and by increasing concept count. #28882 (comment) says "Most operations are a few to several nanoseconds". Boxing an integer costs about as much. Is it going to be a net win to replace one with the other? |
That's a good question. As for the concept count problem, I don't think it'll be widely used in a ton of public APIs, but it a couple of places where performance usually matters and generic types don't work. It also feels complementary to the typed reference based reflection APIs that we plan to add. I guess we can try some experiments with a couple of the above scenarios and the existing implementation. |
The other important place we want to add this would be in the ADO.NET. There's a ton of boxing there for similar reasons and we're currently investigating a high performance alternative that wants to avoid it. See DbParameter.Value. It comes down to this, APIs that do want to squeeze out the performance and avoid GC allocations need this. They can each define their own exchange type of we can have something in the BCL provided for them. I don't know if it's a winning strategy to have each library define their own. |
For ADO.NET, there's #17446 for adding an API to write parameters without boxing. A generic |
I'm skeptical we need anything more than generic lifetimes to handle this, given that https://github.com/agocke/serde.net works without boxing value types. Ref structs can't be used because they can't be passed as generic arguments, but that's due to not tracking lifetimes through generics, which results in a safety hole at the moment if it's allowed for ref structs. |
Serde is designed to generate lean efficient serializers at build time. The ValueObject is not relevant for serializers designed with the same principles as Serge. Many .NET serializers and other popular libraries are not designed like that. They are very dynamic systems that do a lot of discovery at runtime. Trying to fit these components into the same mode as Serde as an after-though typically does not work well. ValueObject is meant for performance optimizations of dynamic components like that. |
Would it be worthwhile to support arbitrary (unmanaged) user-defined structures in the variant without boxing if they fit within 8 bytes? Based on the prototype in the OP, it'd be a matter of extending the |
Background and Motivation
Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the most common value types without boxing and still allows storing within other types (including arrays) or on the heap when needed.
ASP.NET and Azure SDK have both expressed a need for this functionality for scenarios such as logging.
This following is an evolved proposal based on feedback from various sources. The original proposal is included below. Key changes were to make this a smaller type, alignment with
object
semantics, support of any object, and more focused non-boxing support.Proposed API
Fully working prototype
Usage Examples
Details
Goals
object
semantics (can't box a nullable, for example)decimal
)DateTime
or mostDateTimeOffset
values (1800-2250 for local times supported)Other benefits
Other Possible Names
Original Proposal
Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the information without heap allocations. The canonical example of where this would be useful is in `String.Format`.Related proposals and sample PRs
Goals
Non Goals
Nice to Have
General Approach
Variant
is a struct that contains an object pointer and a "union" struct that allows stashing of arbitrary blittable (i.e.where unmanaged
) value types that are within a specific size constraint.Sample Usage
Surface Area
FAQ
Why "Variant"?
Why isn't
Variant
a ref struct?Span
of ref structs.What about variadic argument support (
__arglist
,ArgIterator
, etc.)?What about
TypedReference
and__makeref
, etc.?TypedReference
is a ref struct (see above).Variant
gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster. (Simple test of wrapping/unwrapping an int it is roughly 10-12% faster depending on inlining.)Why not support anything that fits?
How about enums?
cc: @jaredpar, @vancem, @danmosemsft, @jkotas, @davidwrighton, @stephentoub
The text was updated successfully, but these errors were encountered: