-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster list/array computation expressions #11592
Conversation
Here's the code I used for performance testing:
|
match values with | ||
| :? ('T[]) as valuesAsArray -> | ||
for v in valuesAsArray do | ||
this.Add v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible this could be a bit faster and avoid so many writes into the fields of ListCollector. However ListCollector is ultimately a mutable struct on the stack, so writes will be fast, it may not end up any faster
// cook a faster iterator for lists and arrays | ||
match values with | ||
| :? ('T[]) as valuesAsArray -> | ||
for v in valuesAsArray do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over arrays is considerably faster than iterating sequences
Random test failure:
|
I went through tests\walkthroughs\DebugStepping\TheBigFileOfDebugStepping.fsx and made some general improvements to debugging of list, array and sequence expressions and checked that all the sample list/sequence expressions in that file debug OK |
This is now ready (some baselines may still need updating, but nearly all are done) Note that a lot of sequence expression state machine generation is removed from the test baselines in favour of simpler code, hence about -2000 lines net in this PR |
OK, everything green, this is ready. I've updated the RFC and notes in the PR |
@TIHan @cartermp @KevinRansom @vzarytovskii This is ready for your review |
On review with @TIHan:
eg. let f () =
[| let mutable x = 1
if today() then yield x
if f() then yield 1
for i in 0.. 5 do
if g() then yield 1
|] becomes roughly: let f () =
let x = ref 1
let mutable collector = ArrayCollector<'T>()
if today() then collector.Add(x.Value)
if f() then collector.Add(1)
let enum = (0..5).GetEnumerator()
try
while enum.MoveNext() do
if g() then collector.Add(1)
finally
enum.Dispose()
collector.Close() This is still always faster than the corresponding sequence expression code. In a separate PR we could improve (2). Improving (1) is more difficult because the mutable-to-ref promotion happens well before the generation of collecting code. Separately we could also imagine lifting the language restriction that prevents the use of span, byref capturing etc. in list and array expressions - at least for compiled code. Quotations may still have problems with these. |
Would it be hard to statically analyze (in the optimization/IL gen phase) the sequence expression and learn the minimum number of elements that will be yielded? In the case of |
@kerams Yes in theory, though I think the cases where that would matter would be cases where the size projection was a formulae of the input sizes being iterated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsyme and I spoke on this last week. I quickly went over it today - most of the code changes are test changes. We will have new public APIs in FSharp.Core it looks like; they are meant for the codegen.
Looking at the core of the change, it will be great to have this and might be able to extend it further for other collections, such as ImmutableArray.
Thanks - I'll merge this.
Yes. I'm not yet sure of the right generalization. In principle any synchronous consumption of a For other collections, e.g. immutable array/block, it may depend on whether we allow |
Note also there are many other places inside FSharp.Core we might be able to ArrayCollector to avoid creating a ResizeArray - e.g. even just for Seq.toArray |
This implements RFC FS-1099 - library support for faster computed list and array expressions and improves the performance of computed list and array expressions
This is drawing from the lessons of RFC FS-1087 and FS-1097 that we should (somewhat obviously) have a struct collector and just generate the synchronous code. This is what the implementation does - it looks for the compiled internal form of
[ ... ]
and[| ... |]
and transforms to synchronous code (which is the original code withyield
andyield!
replaced by calls to the corresponding collector method).For historical reasons back to F# 1.0 the compiled internal form of
[ ... ]
and[| ... |]
is like thisPreviously we compiled the
seq { ... }
into a state machine, but still calledSeq.toList
andSeq.toArray
. This was basedon thinking overly-influenced by LINQ and its focus on IEnumerable for everything. However IEnumerable is a
needless inversion of control and computationally expensive with extra allocations, MoveNext etc., - so
much so that LINQ is routinely avoided by some teams.
Instead, when ultimately producing lists and arrays, we can produce near-optimal synchronous code directly.
we should always have compiled these things in this way...
Notes:
This is an optimization that preserves the same semantics and execution order, so should work for all existing code. It involves a small addition to FSharp.Core documented in the RFC above. The optimization kicks in if the collectors are present in the referenced FSharp.Core.
One particular optimization is that, for lists, a
yield!
of a list in tailcall position simply stitches that list into the result without copying (AddManyAndClose
onListCollector<T>
). This is valid because lists are immutable - and we already do this forList.append
for example. In theory this could reduce someO(n)
operations toO(1)
though I doubt we'll see that in practice.There is also an optimizations in ArrayCollector to avoid creating a ResizeArray for size 0,1 or 2 arrays. This is obviously a good optimization based on any reasonable model of allocation costs. However it may not be optimal and we can adjust this in the future - the relevant struct fields are internal and can be changed. It would be good to further measure the stack-space/allocation/data-copying tradeoffs and decide if it's worth extending this further.
I went through tests\walkthroughs\DebugStepping\TheBigFileOfDebugStepping.fsx and made some improvements to debugging of list, array and sequence expressions and checked that all the sample list/sequence expressions in that file debug OK. Specifically the location of the debug points associated with
try
andwith
andwhile
andfinally
keywords is now correctly recovered from the internal form.The perf results on micro samples are good and pretty much as expected from previous experiments with using state machines for
list { ... }
andarray {... }
comprehensions. Note that some other people have experimented with faster builders using reflection emit codegen too.We don't expect any change in fixed size arrays or lists
I don't expect any cases where this will either be slower or use more stack in a signficant way compared to our old way of doing these (which is to create a sequence expression state machine and iterate).
For correctness testing the existing tests we have are ok I think - we have zillions of computed list and array expressions in the test suites and compiler that return results of many different sizes.
Some IL code generation tests will likely fail, we'll need to update those
We need to check debug stepping (it should be possible to make this much improved if it's not alrready)