Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ContainsReferences property #4309

Closed
omariom opened this issue Jun 13, 2015 · 50 comments
Closed

Add ContainsReferences property #4309

omariom opened this issue Jun 13, 2015 · 50 comments
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@omariom
Copy link
Contributor

omariom commented Jun 13, 2015

Currently generic List calls Array.Clear on its underlying array in the implementation of Clear, RemoveAll, RemoveRange mathods.
It has to do so because the generic argument can contain references which must be freed for GC.
But what if it is plain Int32 or any other value type that doesn't have references in it directly or indirectly?

If Type had a property that could say if the type contains references then clearing the array could be completely skipped.
I see a huge performance opportunity here - less CPU work, less memory traffic and pollution.

And not only there. If JIT considered this property value as a JIT time constant then the check itself and the branch of the generated code could be skipped as well. Other methods could benefit from that without sacrificing a nanosecond - like RemoveAt.

@omariom
Copy link
Contributor Author

omariom commented Jun 13, 2015

Dictionary.Clear could take advantage of that too. And probably some concurrent collections.

@hadibrais
Copy link
Contributor

@omariom Yes you are right actually. Array.Clear doesn't have to be called when the elements are of a value type. We can easily fix this by adding the condition !typeof(T).IsValueType and clearing the array only if it is true. This will reduce the asymptotic running time of List.Clear to constant. Also this won't be a breaking change because setting value typed elements to zero is not part of the API's spec as it only guarantees releasing references. However, the JIT may or may not optimize the check away. We'll have to see. Either way, I think we should make the change.

@omariom
Copy link
Contributor Author

omariom commented Jun 14, 2015

@hadibrais Unfortunately checking for valuetypeness alone is not enough. Value types can contain references.

@omariom
Copy link
Contributor Author

omariom commented Jun 14, 2015

The proposed ContainsReferences property should return true when the type is a reference type or a value type containing references either directly or indirectly.

@hadibrais
Copy link
Contributor

@omariom Yeah I missed that. But there's an elegant way to solve this problem without incurring any performance penalties. We can add a constructor in which the user can indicate whether to call Array.Clear or not. The default, of course, would be to call Array.Clear. Otherwise, explicitly including a reliable type check might degrade the perf of the method if the list contained reference type elements.

@hadibrais
Copy link
Contributor

ContainsReferences would be useful in this case only if it has a small constant running time.

@omariom
Copy link
Contributor Author

omariom commented Jun 14, 2015

@hadibrais
I see 2 issues with adding a new ctor:

  1. Only new code will benefit from the change.
  2. With any change to the type (making it ref from value or adding a ref field directly or indirectly) we will have to check ALL the usages of the type in Lists. Failing to change the param from false to true may lead to "memory leaks".

@omariom
Copy link
Contributor Author

omariom commented Jun 14, 2015

ContainsReferences would be useful in this case only if it has a small constant running time.

@hadibrais
MethodTable already has ContainsPointers method. If it conforms to all the requirements of ContainsReferences then the team could just expose it. And it will be very fast even without JIT support.

@hadibrais
Copy link
Contributor

I agree that the constructor will reduce maintainability but that will be the fastest implementation.

@hadibrais
Copy link
Contributor

I think ContainsReferences can be much more useful if it returned those fields that are of reference types rather than just saying whether the type contains reference fields of not.

@hadibrais
Copy link
Contributor

As far as I know all elements of Array-based arrays are properly aligned, therefore the current implementation of Array.Clear is not optimal. It can be made smaller and faster.

@omariom
Copy link
Contributor Author

omariom commented Jun 14, 2015

@hadibrais Do you mean that for the case the array element doesn't contains references it can be implemented more efficiently?

@hadibrais
Copy link
Contributor

No I mean when it contains references, it can be made more efficient since references are aligned on address boundaries so they can be zeroed out in GC-safe way without any extra bytes left unzeroed. However, this is only true when the prefer-32bit option in Visual Studio is not checked.

@billetdoux
Copy link

Every little helps. Good one.

@omariom
Copy link
Contributor Author

omariom commented Jun 15, 2015

Found another place where it could help with perf and greener environment.

In the implementation of ConcurrentQueue+Segment.TryRemove if typeof(T).ContainsReferences == false then this line:

 _array[lowLocal] = default(T); //release the reference to the object. 

can be skipped. And as I suspect all the voodoo around _numSnapshotTakers as well.

@stephentoub Can you pls check it? Am I right? Is it worth the efforts?

@stephentoub
Copy link
Member

I personally think a ContainsReferences feature would be a nice addition. Whether or not the JIT could treat it as an intrinsic, with the JIT's ability to treat readonly statics of scalars as constants (https://github.com/dotnet/coreclr/issues/1079), a library like System.Collections could have a cached Boolean value based on it, e.g.

internal static class TypeCache<T>
{
    internal static readonly bool ContainsReferences = typeof(T).ContainsReferences;
}

The JIT should then still be able to remove conditional branches using this TypeCache<T>.ContainsReferences when it's known to be false.

You could even implement something like this yourself, e.g. it's possible there are some corner cases I've missed, but here's a quick example:

internal static class TypeCache<T>
{
    internal static readonly bool ContainsReferences = GetContainsReferences(typeof(T));

    private static bool GetContainsReferences(Type t)
    {
        if (!t.IsValueType)
            return true;

        foreach (FieldInfo fi in t.GetFields(BindingFlags.DeclaredOnly | BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance))
        {
            if (fi.FieldType != t && GetContainsReferences(fi.FieldType))
                return true;
        }

        return false;
    }
}

and when I look at the disassembly for a few different call sites, I see the JIT doing exactly what you hoped it would (this is using VS2015 RC targeting x64):

        Console.WriteLine(TypeCache<int>.ContainsReferences);
00007FFDABEF542E  xor         ecx,ecx  
00007FFDABEF5430  call        00007FFE0A94A590  
        Console.WriteLine(TypeCache<DateTime>.ContainsReferences);
00007FFDABEF5435  xor         ecx,ecx  
00007FFDABEF5437  call        00007FFE0A94A590  
        Console.WriteLine(TypeCache<List<int>.Enumerator>.ContainsReferences);
00007FFDABEF543C  mov         ecx,1  
00007FFDABEF5441  call        00007FFE0A94A590  

Can you pls check it?

Yes, that line should be removable if T is known to not contain references.

And as I suspect all the voodoo around _numSnapshotTakers as well.

(I think this is what you're saying, but just to be sure...) All of the code related to _numSnapshotTakers would still need to remain; it's just that you could avoid executing it if you knew that clearing wasn't necessary, e.g. instead of

 if (_source._numSnapshotTakers <= 0)
{
    _array[lowLocal] = default(T); //release the reference to the object.
}

you'd have:

 if (TypeCache<T>.ContainsReferences && _source._numSnapshotTakers <= 0)
{
    _array[lowLocal] = default(T); //release the reference to the object.
}

That could actually be more beneficial where _numSnapshotTakers is modified rather than where it's used (as it is here to determine whether to clear). The _numSnapshotTakers mechanism exists because we want to be able to clear, but if code has taken a snapshot of the collection (e.g. to enumerate it while other code is changing it concurrently), we don't actually want to change any of the existing elements by zeroing them out. Noting whether a snapshot is currently in progress requires some synchronization, e.g. the interlocked operation at https://github.com/dotnet/corefx/blob/master/src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentQueue.cs#L272, but if we knew that clearing was never necessary, we could avoid such synchronization.

Is it worth the efforts?

I've no doubt in some scenarios it could be a measurable win. Whether it's "worth the efforts" to do this in the BCL / runtime would I think require more effort to effectively answer.

@omariom
Copy link
Contributor Author

omariom commented Jun 15, 2015

it's just that you could avoid executing it if you knew that clearing wasn't necessary

Yes, that's what I meant.

@omariom
Copy link
Contributor Author

omariom commented Jun 16, 2015

You could even implement something like this yourself, e.g. it's possible there are some corner cases I've missed..

It reports true for pointer fields.
What if, in the future, interior pointer fields are added to CLR?
That's why I prefer CLR help me with it.

@stephentoub
Copy link
Member

It reports true for pointer fields. That's why I prefer CLR help me with it.

So add if (t.IsPointer) return false; to the beginning of that GetContainsReferences method...?

@omariom
Copy link
Contributor Author

omariom commented Jun 16, 2015

I did this:

if (!t.IsValueType && !t.IsPointer)
        return true;

@stephentoub
Copy link
Member

I did this

And that doesn't work for you or it does? This will end up doing some unnecessary work in the case of a pointer, as it'll still try to get the fields of the pointer (whereas in my suggestion it just exits immediately), but I'd expect it to still work: the pointer shouldn't have any declared fields, so it'll end up falling through to the return false; at the end.

@omariom
Copy link
Contributor Author

omariom commented Jun 16, 2015

Aah.. I missed your point. If put as you suggested in the beginning then yes, it works and avoids checking for fields.

@stephentoub
Copy link
Member

Ok, good. If you wanted to experiment with that and use it to, for example, modify some of the collections in corefx locally to get some numbers about the benefits it could have, that would be useful information in making a decision about whether a feature like this is something that should be exposed from the BCL.

cc: @KrzysztofCwalina

@omariom omariom changed the title Type.ContainsReferences property Add ContainsReferences property to Type class Jun 21, 2015
@omariom
Copy link
Contributor Author

omariom commented Aug 1, 2016

Has the future come for this issue?

It could be the fist step in adding blittable constraint to the runtime and the language.

@ghost
Copy link

ghost commented Nov 9, 2016

t.IsPointer tests for unmanaged pointers. Why is that something that should be forbidden? (Was this supposed to be t.IsByRef?)

@nietras
Copy link
Contributor

nietras commented Nov 23, 2016

In https://github.com/dotnet/corefx/issues/13427 we discussed this too, and @jkotas suggested we add ContainsReference<T> to RuntimeHelpers. Whether this should be called IsReferenceFree<T> or not is an open question too. So the proposal for an API for this could be:

public static class RuntimeHelpers
{
    public static bool ContainsReference<T>();
}

OR

public static class RuntimeHelpers
{
    public static bool IsReferenceFree<T>(); // aka !ContainsReference
}

@masonwheeler
Copy link
Contributor

@nietras I would prefer ContainsReferences. As your code snippet explicitly points out, IsReferenceFree is a negative, which means you can end up with double negatives. It's cleaner to just use a positive notation.

@omariom
Copy link
Contributor Author

omariom commented Nov 23, 2016

@nietras
In the cases we have now ContainsReferences is more natural:

if (Runtimehelpers.ContainsReferences<T>())
   throw new InvalidOperationException("Types having references cannot be used with Unsafecast.")

// or

if (Runtimehelpers.ContainsReferences<T>())
{
    // Free for GC
    Array.Clear(_items, 0, _size);
}

@jkotas jkotas changed the title Add ContainsReferences property to Type class Add ContainsReferences property Nov 23, 2016
@nietras
Copy link
Contributor

nietras commented Nov 23, 2016

IsReferenceFree is a negative, which means you can end up with double negatives. It's cleaner to just use a positive notation.

This depends on your perspective, but I agree that in Span<T> https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Span.cs#L114 this inverted check is weird and less readable. When the method could easily have been called ContainsReference here.

However, in my use case I would write something like:

if (Runtimehelpers.IsReferenceFree<T>())
{
    Unsafe.CopyBlock(ref _items[0], size);
}
else
{
    // for loop
}

But of course this can be moved around. I would also hope you cache the ContainsReferences call like in https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/SpanHelpers.cs#L50 so this works well on other runtimes too.

Next question is whether to use ContainsReference<T> singular or ContainsReferences<T> plural? I've seen both.

@jkotas
Copy link
Member

jkotas commented Nov 23, 2016

This method is really about GC pointers. I am wondering whether the name should reflect it. What about ContainsGCPointers<T>?

@omariom
Copy link
Contributor Author

omariom commented Nov 23, 2016

We, high level C# devs, call them references :)
GcPointers is a too low level and runtime name.

@nietras
Copy link
Contributor

nietras commented Nov 23, 2016

GC pointers

@jkotas if you don't mind, what exactly is the difference between "reference" and "GC pointers" in your mind?

I understand that from a native view reference is something entirely different, but I thought ref is used to differentiate on .NET. Anywhere the lingo is defined? Have actually been thinking about starting a discussion on how we could coin a term that would exactly convey that something has no GC pointers, e.g. "reference free", "unmanaged" or ?? and all the terms that can not be used "primitive", "ref free", "blittable", yet are often used by some, which means conversations around this can be... imprecise at best.

@ghost
Copy link

ghost commented Nov 23, 2016

Just a thought... in ECMA lingo, the term we're looking for is a union of "reference" (pointer to an object-as-a-whole) and "managed pointer" (the types denoted by the "ref" keyword in C#), or more practically, things that implicitly contain managed pointers (e.g. Span.)

But perhaps instead of the name enumerating the kind of types you can have inside, the name can talk about the motivating restriction: the type must not contain bits that are asynchronously rewritten by the garbage collector. Something like "ContainsGcManagedContent()" or something. That would make it clearer why it's used when it's used.

@benaadams
Copy link
Member

GC managed items are non-value types?

ContainsClassReference<T> or ContainsObjectReference<T> or ContainsHeapReference<T>

@omariom
Copy link
Contributor Author

omariom commented Nov 23, 2016

\cc @KrzysztofCwalina

@svick
Copy link
Contributor

svick commented Nov 23, 2016

@benaadams I think two of your proposals are not accurate names:

  • ContainsClassReference<T>: interfaces are reference types that are not classes (and the value can be a boxed value type, so that doesn't have to be a class either)
  • ContainsHeapReference<T>: heap is an implementation detail (and there actually seems to be some work being done to allocate reference types on the stack: Work towards objects stack allocation coreclr#6653)

@jkotas
Copy link
Member

jkotas commented Nov 23, 2016

what exactly is the difference between "reference" and "GC pointers" in your mind?

No real difference - they are often used interchangeably. I was just wondering about the best name to use for this API.

For reference, here are the existing names used internally in the runtime implementations for similar concept (the runtime concept is slightly different - it always describes the fields even for non-valuetypes, but I assume that what is getting proposed here should always return true for non-valuetypes):

@benaadams
Copy link
Member

@svick narrowing down my suggestions then ContainsObjectReference<T> 😄

@ghost
Copy link

ghost commented Nov 23, 2016

Unsubscribing from thread as my year-end vacation is about to start. See 'yall in January.

@nietras
Copy link
Contributor

nietras commented Nov 26, 2016

a union of "reference" (pointer to an object-as-a-whole) and "managed pointer" (the types denoted by the "ref" keyword in C#)

Aren't managed pointers disallowed as fields? Hence, these are not relevant to this functionality? At least that is how I read the ECMA, see I.8.2.1.1 "Managed pointer types are only allowed for local variable (§I.8.6.1.3) and parameter signatures (§I.8.6.1.4)". And this is also the reason for the design of Span<T>.

ContainsObjectReference<T>

Object is superfluous in my view. And perhaps even wrong since this applies to interface types etc.

In ECMA a reference is clearly defined to be a reference to an object as a whole. And since only reference types can be fields (besides value types) then the check is for whether there are any "reference type" in a types fields seen as a whole. An example of the use of "reference" in an API would simply be object.ReferenceEquals(object,object).

And since we a nitpicking here, neither IsReferenceFree nor ContainsReference names are entirely correct. They don't just check if a type contains any reference types they also check if the type T itself is a reference type. So given this I would think we either need to split this in two or name it correctly as something like:

bool IsReferenceOrContainsReferences<T>() //sans last "s" if singular preferred
// OR the negative of this
bool IsValueTypeAndReferenceFree<T>() 

Not ideal since its a bit long, but at least IsReferenceOrContainsReferences<T>() is accurate and still reads well:

if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
     ThrowHelper.ThrowArgumentException_InvalidTypeWithPointersNotSupported(typeof(T));

compared to the current:

if (!SpanHelpers.IsReferenceFree<T>())
     ThrowHelper.ThrowArgumentException_InvalidTypeWithPointersNotSupported(typeof(T));

and ThrowArgumentException_InvalidTypeWithPointersNotSupported should be renamed to ThrowArgumentException_InvalidTypeWithReferencesNotSupported.

If we split the check in two then everytime this would be used one would have to write something like:

if (!typeof(T).IsValueType && RuntimeHelpers.ContainsReference<T>())
{
   ...
}

which would be pretty terrible. But calling it just e.g. ContainsReference is incomplete in my view.

Other options would be to name it according to asking if the type is "referring" to any references which would include the type itself. Can't think of a good name though. Below a list of some other suggestions including my current preferred (most of these are poor, so suggestions are welcomed, but I think we need to keep reference as part of the name instead of pointer or similar):

bool IsReferenceOrContainsReferences<T>()
bool IsOrHasReference<T>()
bool IsReferenceOrHasReferences<T>()
bool RefersToAnyReferences<T>()
bool Refers<T>()
bool IsReferent<T>()
...

Sans last "s" if singular preferred instead of "references".

On a side note, I actually think there is a problem with the method name DangerousGetPinnableReference this does not return a reference but a ref, which are different things, right?

@omariom
Copy link
Contributor Author

omariom commented Nov 26, 2016

DangerousGetPinnableManagedPointer?

@nietras
Copy link
Contributor

nietras commented Nov 26, 2016

Yes, or DangerousGetPinnableByRef. At least from the ECMA it says:

A managed pointer (§I.12.1.1.2), or byref (§I.8.6.1.3, §I.12.4.1.5.2), can point to a local variable, parameter, field of a compound type, or element of an array.

However, not sure this means they are necessarily interchangeable.

DangeruousGetPinnableRef probably doesn't work since this is more C# speak than anything, I think F# uses ref to make a Ref<T> reference type, so that is not the same. VB.NET uses ByRef I think.

@omariom
Copy link
Contributor Author

omariom commented Nov 26, 2016

At least we don't have to invalidate a cache :)

@jkotas
Copy link
Member

jkotas commented Nov 26, 2016

In ECMA a reference is clearly defined to be a reference to an object as a whole

There is definition of "reference type", "object reference", "typed reference" or "member reference". I do not think there is a clear definition of "reference" alone.

Similar for pointer, there is definition of "managed pointer", "pointer type" or "this pointer"; but no "pointer" alone.

Aren't managed pointers disallowed as fields?

Yes, they are disallowed. It may be interesting to consider what this method should return and be called if they were allowed in theory.

at least IsReferenceOrContainsReferences<T>() is accurate

I like how this makes it more accurate.

method name DangerousGetPinnableReference this does not return a reference

Yet another overload of reference, not that different from TypedReference. However, the name of this method is likely going to be revisited anyway.

@nietras
Copy link
Contributor

nietras commented Nov 26, 2016

do not think there is a clear definition of "reference" alone.

Ah yes, I should have written reference type in fact the method in question is checking for that i.e.

IsReferenceTypeOrContainsReferenceTypes<T>()

but that seems unnecessary given the input is type T, if it had been on Type like IsPrimitive (not IsPrimitiveType) then of course it would be without Type in the name.

And yes, overall "reference", "pointer" all have existing meanings both in managed and unmanaged cases, but since a ref (or managed pointer, which actually might just "reference" native memory) is different from a "managed reference", "Reference" seems a bit misleading in this case.

By the way, how does the runtime handle the case when DangerousGetPinnnableReference returns a ref to native memory? I.e. how does fixed or pinning work in that case?

@jakobbotsch
Copy link
Member

By the way, how does the runtime handle the case when DangerousGetPinnnableReference returns a ref to native memory? I.e. how does fixed or pinning work in that case?

Managed pointers are allowed to point to unmanaged memory (see II.14.4.2 in ECMA-335). It is also possible with C# today:

int* ptr = (int*)Marshal.AllocHGlobal(4);
int.TryParse("", out *ptr);

fixed just adds a local managed pointer marked as pinned, so the GC should ignore these when they do not point to managed memory.

@nietras
Copy link
Contributor

nietras commented Nov 27, 2016

so the GC should ignore these when they do not point to managed memory

Yes, but how does the GC know that a given pointer can be ignored? That is, that the pointer does not point to managed memory? Is it the same as for ref in that it checks each ref against each managed memory segment or similar?

@jkotas
Copy link
Member

jkotas commented Nov 27, 2016

ref is the thing being pinned, so it is the same thing.

BTW: This logic is in GCHeap::Promote:

  1. Ignore null pointers
  2. If it is ref (GC_CALL_INTERIOR), find the object that it belongs to. If it does not belong to any object, we are done.
  3. If is pinned (GC_CALL_PINNED), pin the object for current GC.
  4. Mark the object graph as alive

@jkotas
Copy link
Member

jkotas commented Nov 28, 2016

Ok, I have created issue in the corefx repo to get this launched into the API review process https://github.com/dotnet/corefx/issues/14047

@karelz
Copy link
Member

karelz commented Mar 17, 2017

Seems to be dupe of https://github.com/dotnet/corefx/issues/14047 which has been fixed.
Please reopen if it is invalid understanding.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests