-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ContainsReferences property #4309
Comments
Dictionary.Clear could take advantage of that too. And probably some concurrent collections. |
@omariom Yes you are right actually. Array.Clear doesn't have to be called when the elements are of a value type. We can easily fix this by adding the condition !typeof(T).IsValueType and clearing the array only if it is true. This will reduce the asymptotic running time of List.Clear to constant. Also this won't be a breaking change because setting value typed elements to zero is not part of the API's spec as it only guarantees releasing references. However, the JIT may or may not optimize the check away. We'll have to see. Either way, I think we should make the change. |
@hadibrais Unfortunately checking for valuetypeness alone is not enough. Value types can contain references. |
The proposed ContainsReferences property should return true when the type is a reference type or a value type containing references either directly or indirectly. |
@omariom Yeah I missed that. But there's an elegant way to solve this problem without incurring any performance penalties. We can add a constructor in which the user can indicate whether to call Array.Clear or not. The default, of course, would be to call Array.Clear. Otherwise, explicitly including a reliable type check might degrade the perf of the method if the list contained reference type elements. |
ContainsReferences would be useful in this case only if it has a small constant running time. |
@hadibrais
|
@hadibrais |
I agree that the constructor will reduce maintainability but that will be the fastest implementation. |
I think ContainsReferences can be much more useful if it returned those fields that are of reference types rather than just saying whether the type contains reference fields of not. |
As far as I know all elements of Array-based arrays are properly aligned, therefore the current implementation of Array.Clear is not optimal. It can be made smaller and faster. |
@hadibrais Do you mean that for the case the array element doesn't contains references it can be implemented more efficiently? |
No I mean when it contains references, it can be made more efficient since references are aligned on address boundaries so they can be zeroed out in GC-safe way without any extra bytes left unzeroed. However, this is only true when the prefer-32bit option in Visual Studio is not checked. |
Every little helps. Good one. |
Found another place where it could help with perf and greener environment. In the implementation of ConcurrentQueue+Segment.TryRemove if typeof(T).ContainsReferences == false then this line: _array[lowLocal] = default(T); //release the reference to the object. can be skipped. And as I suspect all the voodoo around _numSnapshotTakers as well. @stephentoub Can you pls check it? Am I right? Is it worth the efforts? |
I personally think a ContainsReferences feature would be a nice addition. Whether or not the JIT could treat it as an intrinsic, with the JIT's ability to treat readonly statics of scalars as constants (https://github.com/dotnet/coreclr/issues/1079), a library like System.Collections could have a cached Boolean value based on it, e.g. internal static class TypeCache<T>
{
internal static readonly bool ContainsReferences = typeof(T).ContainsReferences;
} The JIT should then still be able to remove conditional branches using this You could even implement something like this yourself, e.g. it's possible there are some corner cases I've missed, but here's a quick example: internal static class TypeCache<T>
{
internal static readonly bool ContainsReferences = GetContainsReferences(typeof(T));
private static bool GetContainsReferences(Type t)
{
if (!t.IsValueType)
return true;
foreach (FieldInfo fi in t.GetFields(BindingFlags.DeclaredOnly | BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance))
{
if (fi.FieldType != t && GetContainsReferences(fi.FieldType))
return true;
}
return false;
}
} and when I look at the disassembly for a few different call sites, I see the JIT doing exactly what you hoped it would (this is using VS2015 RC targeting x64):
Yes, that line should be removable if T is known to not contain references.
(I think this is what you're saying, but just to be sure...) All of the code related to _numSnapshotTakers would still need to remain; it's just that you could avoid executing it if you knew that clearing wasn't necessary, e.g. instead of if (_source._numSnapshotTakers <= 0)
{
_array[lowLocal] = default(T); //release the reference to the object.
} you'd have: if (TypeCache<T>.ContainsReferences && _source._numSnapshotTakers <= 0)
{
_array[lowLocal] = default(T); //release the reference to the object.
} That could actually be more beneficial where _numSnapshotTakers is modified rather than where it's used (as it is here to determine whether to clear). The _numSnapshotTakers mechanism exists because we want to be able to clear, but if code has taken a snapshot of the collection (e.g. to enumerate it while other code is changing it concurrently), we don't actually want to change any of the existing elements by zeroing them out. Noting whether a snapshot is currently in progress requires some synchronization, e.g. the interlocked operation at https://github.com/dotnet/corefx/blob/master/src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentQueue.cs#L272, but if we knew that clearing was never necessary, we could avoid such synchronization.
I've no doubt in some scenarios it could be a measurable win. Whether it's "worth the efforts" to do this in the BCL / runtime would I think require more effort to effectively answer. |
Yes, that's what I meant. |
It reports true for pointer fields. |
So add |
I did this: if (!t.IsValueType && !t.IsPointer)
return true; |
And that doesn't work for you or it does? This will end up doing some unnecessary work in the case of a pointer, as it'll still try to get the fields of the pointer (whereas in my suggestion it just exits immediately), but I'd expect it to still work: the pointer shouldn't have any declared fields, so it'll end up falling through to the |
Aah.. I missed your point. If put as you suggested in the beginning then yes, it works and avoids checking for fields. |
Ok, good. If you wanted to experiment with that and use it to, for example, modify some of the collections in corefx locally to get some numbers about the benefits it could have, that would be useful information in making a decision about whether a feature like this is something that should be exposed from the BCL. |
Has the future come for this issue? It could be the fist step in adding |
t.IsPointer tests for unmanaged pointers. Why is that something that should be forbidden? (Was this supposed to be t.IsByRef?) |
In https://github.com/dotnet/corefx/issues/13427 we discussed this too, and @jkotas suggested we add public static class RuntimeHelpers
{
public static bool ContainsReference<T>();
} OR public static class RuntimeHelpers
{
public static bool IsReferenceFree<T>(); // aka !ContainsReference
} |
@nietras I would prefer |
@nietras if (Runtimehelpers.ContainsReferences<T>())
throw new InvalidOperationException("Types having references cannot be used with Unsafecast.")
// or
if (Runtimehelpers.ContainsReferences<T>())
{
// Free for GC
Array.Clear(_items, 0, _size);
} |
This depends on your perspective, but I agree that in However, in my use case I would write something like: if (Runtimehelpers.IsReferenceFree<T>())
{
Unsafe.CopyBlock(ref _items[0], size);
}
else
{
// for loop
} But of course this can be moved around. I would also hope you cache the Next question is whether to use |
This method is really about GC pointers. I am wondering whether the name should reflect it. What about |
We, high level C# devs, call them references :) |
@jkotas if you don't mind, what exactly is the difference between "reference" and "GC pointers" in your mind? I understand that from a native view reference is something entirely different, but I thought |
Just a thought... in ECMA lingo, the term we're looking for is a union of "reference" (pointer to an object-as-a-whole) and "managed pointer" (the types denoted by the "ref" keyword in C#), or more practically, things that implicitly contain managed pointers (e.g. Span.) But perhaps instead of the name enumerating the kind of types you can have inside, the name can talk about the motivating restriction: the type must not contain bits that are asynchronously rewritten by the garbage collector. Something like "ContainsGcManagedContent()" or something. That would make it clearer why it's used when it's used. |
GC managed items are non-value types?
|
@benaadams I think two of your proposals are not accurate names:
|
@svick narrowing down my suggestions then |
Unsubscribing from thread as my year-end vacation is about to start. See 'yall in January. |
Aren't managed pointers disallowed as fields? Hence, these are not relevant to this functionality? At least that is how I read the ECMA, see I.8.2.1.1 "Managed pointer types are only allowed for local variable (§I.8.6.1.3) and parameter signatures (§I.8.6.1.4)". And this is also the reason for the design of
In ECMA a reference is clearly defined to be a reference to an object as a whole. And since only reference types can be fields (besides value types) then the check is for whether there are any "reference type" in a types fields seen as a whole. An example of the use of "reference" in an API would simply be And since we a nitpicking here, neither bool IsReferenceOrContainsReferences<T>() //sans last "s" if singular preferred
// OR the negative of this
bool IsValueTypeAndReferenceFree<T>() Not ideal since its a bit long, but at least if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
ThrowHelper.ThrowArgumentException_InvalidTypeWithPointersNotSupported(typeof(T)); compared to the current: if (!SpanHelpers.IsReferenceFree<T>())
ThrowHelper.ThrowArgumentException_InvalidTypeWithPointersNotSupported(typeof(T)); and If we split the check in two then everytime this would be used one would have to write something like: if (!typeof(T).IsValueType && RuntimeHelpers.ContainsReference<T>())
{
...
} which would be pretty terrible. But calling it just e.g. Other options would be to name it according to asking if the type is "referring" to any references which would include the type itself. Can't think of a good name though. Below a list of some other suggestions including my current preferred (most of these are poor, so suggestions are welcomed, but I think we need to keep bool IsReferenceOrContainsReferences<T>()
bool IsOrHasReference<T>()
bool IsReferenceOrHasReferences<T>()
bool RefersToAnyReferences<T>()
bool Refers<T>()
bool IsReferent<T>()
... Sans last "s" if singular preferred instead of "references". On a side note, I actually think there is a problem with the method name |
|
Yes, or
However, not sure this means they are necessarily interchangeable.
|
At least we don't have to invalidate a cache :) |
There is definition of "reference type", "object reference", "typed reference" or "member reference". I do not think there is a clear definition of "reference" alone. Similar for pointer, there is definition of "managed pointer", "pointer type" or "this pointer"; but no "pointer" alone.
Yes, they are disallowed. It may be interesting to consider what this method should return and be called if they were allowed in theory.
I like how this makes it more accurate.
Yet another overload of reference, not that different from TypedReference. However, the name of this method is likely going to be revisited anyway. |
Ah yes, I should have written reference type in fact the method in question is checking for that i.e. IsReferenceTypeOrContainsReferenceTypes<T>() but that seems unnecessary given the input is type And yes, overall "reference", "pointer" all have existing meanings both in managed and unmanaged cases, but since a By the way, how does the runtime handle the case when |
Managed pointers are allowed to point to unmanaged memory (see II.14.4.2 in ECMA-335). It is also possible with C# today:
|
Yes, but how does the GC know that a given pointer can be ignored? That is, that the pointer does not point to managed memory? Is it the same as for |
BTW: This logic is in
|
Ok, I have created issue in the corefx repo to get this launched into the API review process https://github.com/dotnet/corefx/issues/14047 |
Seems to be dupe of https://github.com/dotnet/corefx/issues/14047 which has been fixed. |
Currently generic List calls Array.Clear on its underlying array in the implementation of Clear, RemoveAll, RemoveRange mathods.
It has to do so because the generic argument can contain references which must be freed for GC.
But what if it is plain Int32 or any other value type that doesn't have references in it directly or indirectly?
If Type had a property that could say if the type contains references then clearing the array could be completely skipped.
I see a huge performance opportunity here - less CPU work, less memory traffic and pollution.
And not only there. If JIT considered this property value as a JIT time constant then the check itself and the branch of the generated code could be skipped as well. Other methods could benefit from that without sacrificing a nanosecond - like RemoveAt.
The text was updated successfully, but these errors were encountered: