Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ArrayBuilder<T> #37852

Closed
carlreinke opened this issue Jun 13, 2020 · 10 comments
Closed

Add ArrayBuilder<T> #37852

carlreinke opened this issue Jun 13, 2020 · 10 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@carlreinke
Copy link
Contributor

carlreinke commented Jun 13, 2020

Background and Motivation

Today people use List<T> as an array builder, but it isn't a very good one. For .NET 5, there is an API (#27061) being added to expose the innards of List<T> in order to make it a slightly better array builder. It would be better to have an actual array builder!

Proposed API

#nullable enable

namespace System.Collections.Generic
{
    public sealed class ArrayBuilder<T> : IList<T>, IReadOnlyList<T>
    {
        public ArrayBuilder() => throw null;
        public ArrayBuilder(int capacity) => throw null;

        public T this[int index] { get => throw null; set { } }

        public int Capacity { get; set; }
        public int Count { get; set; }
        bool ICollection<T>.IsReadOnly => throw null;

        public void Add(T item) => throw null;
        public void AddRange(IEnumerable<T> items) => throw null;
        public void Insert(int index, T item) => throw null;
        public bool Remove(T element) => throw null;
        public void RemoveAt(int index) => throw null;
        public void Clear() => throw null;

        public void Reverse() => throw null;
        public void Sort() => throw null;
        public void Sort(IComparer<T> comparer) => throw null;
        public void Sort(Comparison<T> comparison) => throw null;

        public bool Contains(T item) => throw null;
        public int IndexOf(T item) => throw null;
        public int IndexOf(T item, int startIndex) => throw null;
        public int IndexOf(T item, int startIndex, int count) => throw null;
        public int LastIndexOf(T item) => throw null;
        public int LastIndexOf(T item, int startIndex) => throw null;
        public int LastIndexOf(T item, int startIndex, int count) => throw null;

        public IEnumerator<T> GetEnumerator() => throw null;
        IEnumerator IEnumerable.GetEnumerator() => throw null;

        public void CopyTo(T[] array, int index) => throw null;
        public void CopyTo(Span<T> destination) => throw null;

        public T[] ToArray() => throw null;  // always returns a copy
        public ArraySegment<T> MoveToArraySegment() => throw null;
        public T[] MoveToArray() => throw null;  // throws if Count != Capacity
    }
}

Alternative Designs

  • System.Collections.Generic.ArrayBuilder<T> actually already exists today as an internal struct, but its current form is probably not what you would want from a public API surface. This API doesn't propose to replace usages of the existing ArrayBuilder<T>, which is a mutable struct and elides checks in favor of performance. The internal ArrayBuilder<T> shouldn't take the good name and namespace, so it would be renamed.
  • System.Collections.Immutable.ImmutableArray<T>.Builder could be copied almost wholesale.

See Also

Postscript

If this could be done before .NET 5 ships, I would suggest that System.Runtime.InteropServices.CollectionsMarshal.AsSpan<T>(List<T>? list) be removed. The motivating use case was "List is a very convenient datastructure for creating an array of an unknown size."¹ and this would solve that use case.

cc: @GrabYourPitchforks

@carlreinke carlreinke added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 13, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jun 13, 2020
@svick
Copy link
Contributor

svick commented Jun 13, 2020

Can you explain the ways in which this API serves as a better array builder than List<T>?

@carlreinke
Copy link
Contributor Author

@svick There are several cases in which using List<T> would require you to make a copy but ArrayBuilder<T> would not:

  • If you know the length up front, you can get a T[] out via MoveToArray() without copying. In this case, the only benefit over just allocating T[] and initializing it yourself is that this helps you track how many elements you've added.
  • If you don't know the length up front, you can opportunistically get a T[] out without copying.
    var array = builder.Count == builder.Capacity
        ? builder.MoveToArray()
        : builder.ToArray();
    Ideally you pair this with a good guess at Capacity.
  • You can always get an ArraySegment<T> (and thus a Memory<T>/Span<T>) out without copying.

@GrabYourPitchforks
Copy link
Member

I'm not sure I like the idea of having all of these different APIs on a "builder" type that's not meant to serve as an exchange type. It seems the primary benefit of this over List<T> is when you already know the capacity ahead of time and you're trying to avoid a double-allocation of the underlying array. At that point, it's simple enough to write:

T[] arr = new T[capacity];
for (int i = 0; i < arr.Length; i++)
{
    arr[i] = GetOrCreateElementToAdd();
}

/* use arr */

Or, if you know the max capacity but aren't quite sure you can fill it:

T[] arr = new T[maxCapacity];
int i = 0;
while (needToAddAnItem)
{
    arr[i++] = GetOrCreateElementToAdd();
}

Array.Resize(ref arr, i);
/* use arr */

What am I missing?

@carlreinke
Copy link
Contributor Author

carlreinke commented Jun 13, 2020

I'm not sure I like the idea of having all of these different APIs on a "builder" type that's not meant to serve as an exchange type.

Why wouldn't it serve as an exchange type? You could pass it into anything that knows how to Add() to an IList<T>.

You could certainly cut down the API by not implementing IList<T> and cutting Reverse() and Sort() (which are sort of fringe, but both List<T> and ImmutableArray<T>.Builder have them), which would make it less likely to be an exchange type, if that's a goal. I think that IList<T> does add enough value to warrant implementing it.

It seems the primary benefit of this over List is when you already know the capacity ahead of time and you're trying to avoid a double-allocation of the underlying array.

There's a secondary benefit of getting rid of System.Runtime.InteropServices.CollectionsMarshal.AsSpan<T>(List<T>? list), which is frankly a bad API that should not be shipped. (To quote @bartonjs: "This is no.") One less unsafe API that didn't need to be.

There seemed to be a fairly good consensus during the review of that API that people want a growable array kind of type.

What am I missing?

You left out the scenario where you underestimate the capacity rather than overestimate it. You can still get some benefit out in that scenario if what you want out is a ArraySegment<T>/Memory<T>/Span<T> and you don't intend it to live long (and thus don't mind some wasted space).

@GrabYourPitchforks
Copy link
Member

Yes, I'm aware of the problems of the CollectionsMarshal API. I was one of the people that pushed back in the review, as it means that our implementation of List<T> is for all eternity now locked to having a contiguous array of elements of type T. It prevents us from specializing List<bool> as a bit array, for instance. But that ship has sailed.

Given this state of the world, I could see a simpler solution for your scenario:

class CollectionsMarshal
{
    public static ArraySegment<T> GetArraySegmentAndReset(List<T> list);
}

The implementation gives you back the ArraySegment<T> which contains the list's data - no-copy - and internally resets all of the List<T> fields back to their originally constructed values. It essentially disconnects the underlying array from the list and transfers ownership to you.

@carlreinke
Copy link
Contributor Author

But that ship has sailed.

I've heard that the API review committee does not like to reconsider its decisions. It's not strictly true that it doesn't reconsider them, though.

I'm giving some external push-back and proposing a safe API that solves the motivating use case. Unless there's some other reason that CollectionsMarshal can't be backed out, all I'm short on is someone to push for this to be considered before the 5.0 API freezes.

Of course, GetArraySegmentAndReset(List<T> list) would also solve that motivating use case, so it should certainly be considered -- particularly if CollectionsMarshal really is here to stay.

@GrabYourPitchforks
Copy link
Member

Thinking about this a bit more, I believe GetArraySegmentAndReset would be a "safe" API. Which means we could ostensibly add it directly to List<T> and not bounce through CollectionsMarshal.

But I think CollectionsMarshal is really here to stay. There's still a scenario where people want to party on the list's backing store, even without giving up ownership of the array. And we wouldn't want to expose that API directly on List<T>. There are other scenarios that people are considering adding CollectionsMarshal APIs for. For instance, there's a proposal for a ref TValue GetRef<TKey, TValue>(Dictionary<TKey, TValue> dict, TKey key) API, which would allow you to get and update structs in-place inside a dictionary instance. I don't recall the issue number offhand.

@svick
Copy link
Contributor

svick commented Jun 14, 2020

@GrabYourPitchforks

For instance, there's a proposal for a ref TValue GetRef<TKey, TValue>(Dictionary<TKey, TValue> dict, TKey key) API, which would allow you to get and update structs in-place inside a dictionary instance. I don't recall the issue number offhand.

That's #27062.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Jun 14, 2020

Thinking about this a bit more, I believe GetArraySegmentAndReset would be a "safe" API. Which means we could ostensibly add it directly to List<T> and not bounce through CollectionsMarshal.

@GrabYourPitchforks I remember hearing about "something something ... new members on List<T> have non-trivial JIT costs so new APIs should be added with care ... something something". Is that a concern?

I think I like both the original dedicated ArrayBuilder<T> and the GetArraySegmentAndReset (maybe just ArraySegment<T> Collect()?) replacement much more than CollectionsMarshal.AsSpan<T>(List<T>? list) which feels like it makes something that should be trivially safe unsafe.

Edit: Is it OK for the new APIs to use ArraySegment<T>?

@Frassle
Copy link
Contributor

Frassle commented Jun 28, 2020

Edit: Is it OK for the new APIs to use ArraySegment?

Wouldn't Memory<T> be the more useful type to return these days?

edit Memory hides that your getting an array unless you ask it to then copy. So yeh ArraySegment is probably better.

@joperezr joperezr added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Jul 7, 2020
@joperezr joperezr added this to the Future milestone Jul 7, 2020
@terrajobst terrajobst removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 25, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

8 participants